Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threads: Increase timeouts #3292

Closed
wants to merge 2 commits into from
Closed

Threads: Increase timeouts #3292

wants to merge 2 commits into from

Conversation

JustArchi
Copy link
Contributor

Default values were changed quite long ago, in 084cb4d

We had abort at 1000 ms back then, which was more than required. Now abort is set at 200 ms, but this value is a bit too low for low-end machines running programs that are doing many (async) tasks on the threadpool in background, like mine:

WAITING for 12 threads, got 11 suspended
suspend_thread suspend took 202 ms, which is more than the allowed 200 ms
...
Native stacktrace:

        mono() [0x492e2f]
        /lib/x86_64-linux-gnu/libpthread.so.0(+0x10ed0) [0x7f81ec3b5ed0]
        /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38) [0x7f81ebe211c8]
        /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x7f81ebe2264a]
        mono() [0x6445a9]
        mono() [0x644893]
        mono() [0x6353b4]
        mono() [0x5dd799]
        mono() [0x5eb3ea]
        mono() [0x5de663]
        mono() [0x5deb20]
        mono() [0x5c2799]
        [0x41dc2c6c]

Debug info from gdb:


=================================================================
Got a SIGABRT while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries
used by your application.
=================================================================

Valid apps and setups are getting SIGABRTed due to running (only a bit) above threshold.

Therefore, I'd like to slightly increase the timeout, from 200 to 300 ms, which would solve my issue without resorting to setting it through environment property.

I know that this issue is specific because most likely nobody is running into such problems with more powerful machines, but for low-end servers running with low tick frequency and slower CPUs, this might be a really good change.

Note: SLEEP_DURATION_BEFORE_WARNING doesn't seem to be used anymore, but I adapted it for consistency, as well as corrected outdated comments.

Default values were changed quite long ago, in 084cb4d

We had abort at 1000 ms back then, which was more than required. Now abort is set at 200 ms, but this value is a bit too low for low-end machines running programs that are doing many (async) tasks on the threadpool in background, like mine:

```
WAITING for 12 threads, got 11 suspended
suspend_thread suspend took 202 ms, which is more than the allowed 200 ms
...
Native stacktrace:

        mono() [0x492e2f]
        /lib/x86_64-linux-gnu/libpthread.so.0(+0x10ed0) [0x7f81ec3b5ed0]
        /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38) [0x7f81ebe211c8]
        /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x7f81ebe2264a]
        mono() [0x6445a9]
        mono() [0x644893]
        mono() [0x6353b4]
        mono() [0x5dd799]
        mono() [0x5eb3ea]
        mono() [0x5de663]
        mono() [0x5deb20]
        mono() [0x5c2799]
        [0x41dc2c6c]

Debug info from gdb:


=================================================================
Got a SIGABRT while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries
used by your application.
=================================================================
```

Valid apps and setups are getting ```SIGABRT```ed due to running (only a bit) above threshold.

Therefore, I'd like to slightly increase the timeout, from ```200``` to ```300``` ms, which would solve my issue without resorting to setting it through environment property.

I know that this issue is specific because most likely nobody is running into such problems with more powerful machines, but for low-end servers running with low tick frequency and slower CPUs, this might be a really good change.

Note: ```SLEEP_DURATION_BEFORE_WARNING``` doesn't seem to be used anymore, but I adapted it for consistency, as well as corrected outdated comments.
@dnfclas
Copy link

dnfclas commented Jul 17, 2016

Hi @JustArchi, I'm your friendly neighborhood .NET Foundation Pull Request Bot (You can call me DNFBOT). Thanks for your contribution!

This seems like a small (but important) contribution, so no Contribution License Agreement is required at this point. Real humans will now evaluate your PR.

TTYL, DNFBOT;

@monojenkins
Copy link
Contributor

Hello! I'm the build bot for the Mono project.

I need approval from a Mono team member to build this pull request. A team member should reply with "approve" to approve a build of this pull request, "whitelist" to whitelist this and all future pull requests from this contributor, or "build" to explicitly request a build, even if one has already been done.

Contributors can ignore this message.

@migueldeicaza
Copy link
Contributor

While I do not mind the change, I would love to know the history of the current settings.

@alexrp
Copy link
Contributor

alexrp commented Jul 17, 2016

Change looks reasonable to me as well, but I have to say I don't like the idea of these arbitrary timeouts. It was just a matter of time until somebody ran into problems with them. Could we not disable these timeouts entirely in normal operation and only enable them with some debug environment variable/flag?

@kumpera
Copy link
Contributor

kumpera commented Jul 17, 2016

Let's change the timeout to infinite and people can use the existing env var to troubleshoot hangs.

@JustArchi
Copy link
Contributor Author

Alright, that should be it 👍

@nealef
Copy link
Contributor

nealef commented Jul 17, 2016

Check mono_threads_init(). An environment variable MONO_SLEEP_ABORT_LIMIT
is checked which will override the default value.

I though I had documented it in mono.1 but apparently not.


We had abort at 1000 ms back then, which was more than required. Now abort
is set at 200 ms, but this value is a bit too low for low-end machines
running programs that are doing many (async) tasks on the threadpool in
background, like mine:
WAITING for 12 threads, got 11 suspended
suspend_thread suspend took 202 ms, which is more than the allowed 200 ms

@JustArchi
Copy link
Contributor Author

An environment variable MONO_SLEEP_ABORT_LIMIT
is checked which will override the default value.

I know about that, and if the change was special only for me I'd use it, but I thought that I'm not the first and not the last guy running into this issue, so it's better to change defaults.

Besides, it doesn't work correctly for me. Did you check if it does for you?

root@archi:/tmp# touch test.exe
root@archi:/tmp# export MONO_SLEEP_ABORT_LIMIT=500
root@archi:/tmp# mono test.exe
MONO_SLEEP_ABORT_LIMIT must be a number >= 40

@nealef
Copy link
Contributor

nealef commented Jul 18, 2016

Hmm, I thought I had because I had been running into the problem on a
heavily used system which is what prompted me to make the change. I will
check it out again but if everyone is happy increasing the default then
the point may be moot.


I know about that, and if the change was special only for me I'd use it,
but I thought that I'm not the first and not the last guy running into
this issue, so it's better to change defaults.
Besides, it doesn't work correctly for me. Did you check if it does for
you?

@JustArchi
Copy link
Contributor Author

Personally I can't see how this PR would put things into negative - the field is more like a debug field, and we shouldn't really guess what value is appropriate considering various different setups and configurations, so if there are no more objections I'd really appreciate a merge 👍.

@monojenkins
Copy link
Contributor

Hello! I'm the build bot for the Mono project.

I need approval from a Mono team member to build this pull request. A team member should reply with "approve" to approve a build of this pull request, "whitelist" to whitelist this and all future pull requests from this contributor, or "build" to explicitly request a build, even if one has already been done.

Contributors can ignore this message.

@kumpera
Copy link
Contributor

kumpera commented Jul 20, 2016

@kumpera
Copy link
Contributor

kumpera commented Jul 20, 2016

Fix for env var parsing + timeout change: #3304

@kumpera kumpera closed this Jul 20, 2016
@derFunk
Copy link

derFunk commented Jul 29, 2016

@JustArchi @kumpera Thanks for your work!
Can somebody explain what really gets aborted after MONO_SLEEP_ABORT_LIMITms ? Thanks!

I'm asking because I'm having massive problems with the ThreadPool with a high traffic ServiceStack webservice hosted with HyperfastCGI behind Nginx, which does a lot of http-calls server side (ca. 4 per request).

Requests per seconds are dramatically low (~5rps) and the web requests tend to time out under a certain (not high!) load.

I'm fighting this since almost one year now without success, and this here seems to be a ray of hope.

Using the latest Mono nightly ((Nightly 4.5.2.799/ce8e905 Thu Jul 28 09:13:48 UTC 2016)) is now at least showing me timeout exceptions 'System.Net.WebException: The request timed out when MONO_SLEEP_ABORT_LIMIT set to 1000 - before that, no exceptions were raised and the thread seemed to silently be killed without returning a http response to the client (499 in nginx, because mono didn't answert the request (in time/60s) which results in 504 timeouts to the client!).

Yet it was not the fault of the receiving server - it didn't even get the then-timed-out request from Mono.

So at least having this Exception is a progress, but still the timeouts occure, for which I have no explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants