New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Threads: Increase timeouts #3292
Conversation
Default values were changed quite long ago, in 084cb4d We had abort at 1000 ms back then, which was more than required. Now abort is set at 200 ms, but this value is a bit too low for low-end machines running programs that are doing many (async) tasks on the threadpool in background, like mine: ``` WAITING for 12 threads, got 11 suspended suspend_thread suspend took 202 ms, which is more than the allowed 200 ms ... Native stacktrace: mono() [0x492e2f] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10ed0) [0x7f81ec3b5ed0] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38) [0x7f81ebe211c8] /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x7f81ebe2264a] mono() [0x6445a9] mono() [0x644893] mono() [0x6353b4] mono() [0x5dd799] mono() [0x5eb3ea] mono() [0x5de663] mono() [0x5deb20] mono() [0x5c2799] [0x41dc2c6c] Debug info from gdb: ================================================================= Got a SIGABRT while executing native code. This usually indicates a fatal error in the mono runtime or one of the native libraries used by your application. ================================================================= ``` Valid apps and setups are getting ```SIGABRT```ed due to running (only a bit) above threshold. Therefore, I'd like to slightly increase the timeout, from ```200``` to ```300``` ms, which would solve my issue without resorting to setting it through environment property. I know that this issue is specific because most likely nobody is running into such problems with more powerful machines, but for low-end servers running with low tick frequency and slower CPUs, this might be a really good change. Note: ```SLEEP_DURATION_BEFORE_WARNING``` doesn't seem to be used anymore, but I adapted it for consistency, as well as corrected outdated comments.
Hi @JustArchi, I'm your friendly neighborhood .NET Foundation Pull Request Bot (You can call me DNFBOT). Thanks for your contribution! TTYL, DNFBOT; |
Hello! I'm the build bot for the Mono project. I need approval from a Mono team member to build this pull request. A team member should reply with "approve" to approve a build of this pull request, "whitelist" to whitelist this and all future pull requests from this contributor, or "build" to explicitly request a build, even if one has already been done. Contributors can ignore this message. |
While I do not mind the change, I would love to know the history of the current settings. |
Change looks reasonable to me as well, but I have to say I don't like the idea of these arbitrary timeouts. It was just a matter of time until somebody ran into problems with them. Could we not disable these timeouts entirely in normal operation and only enable them with some debug environment variable/flag? |
Let's change the timeout to infinite and people can use the existing env var to troubleshoot hangs. |
Alright, that should be it 👍 |
Check mono_threads_init(). An environment variable MONO_SLEEP_ABORT_LIMIT I though I had documented it in mono.1 but apparently not. We had abort at 1000 ms back then, which was more than required. Now abort |
I know about that, and if the change was special only for me I'd use it, but I thought that I'm not the first and not the last guy running into this issue, so it's better to change defaults. Besides, it doesn't work correctly for me. Did you check if it does for you?
|
Hmm, I thought I had because I had been running into the problem on a I know about that, and if the change was special only for me I'd use it, |
Personally I can't see how this PR would put things into negative - the field is more like a debug field, and we shouldn't really guess what value is appropriate considering various different setups and configurations, so if there are no more objections I'd really appreciate a merge 👍. |
Hello! I'm the build bot for the Mono project. I need approval from a Mono team member to build this pull request. A team member should reply with "approve" to approve a build of this pull request, "whitelist" to whitelist this and all future pull requests from this contributor, or "build" to explicitly request a build, even if one has already been done. Contributors can ignore this message. |
Related bug: https://bugzilla.xamarin.com/show_bug.cgi?id=42665 |
Fix for env var parsing + timeout change: #3304 |
@JustArchi @kumpera Thanks for your work! I'm asking because I'm having massive problems with the ThreadPool with a high traffic ServiceStack webservice hosted with HyperfastCGI behind Nginx, which does a lot of http-calls server side (ca. 4 per request). Requests per seconds are dramatically low (~5rps) and the web requests tend to time out under a certain (not high!) load. I'm fighting this since almost one year now without success, and this here seems to be a ray of hope. Using the latest Mono nightly ( Yet it was not the fault of the receiving server - it didn't even get the then-timed-out request from Mono. So at least having this Exception is a progress, but still the timeouts occure, for which I have no explanation. |
Default values were changed quite long ago, in 084cb4d
We had abort at 1000 ms back then, which was more than required. Now abort is set at 200 ms, but this value is a bit too low for low-end machines running programs that are doing many (async) tasks on the threadpool in background, like mine:
Valid apps and setups are getting
SIGABRT
ed due to running (only a bit) above threshold.Therefore, I'd like to slightly increase the timeout, from
200
to300
ms, which would solve my issue without resorting to setting it through environment property.I know that this issue is specific because most likely nobody is running into such problems with more powerful machines, but for low-end servers running with low tick frequency and slower CPUs, this might be a really good change.
Note:
SLEEP_DURATION_BEFORE_WARNING
doesn't seem to be used anymore, but I adapted it for consistency, as well as corrected outdated comments.