Workaround for the high cpu usage issue in coroutines on linux #5186

merged 5 commits into from Jan 11, 2017


None yet

2 participants


Take a look at MWE of the issue:

when defined(mycoro):
  from mycoro import run, start, suspend
  from coro import run, start, suspend

proc test_routine() =
  while true:


mycoro is a coro version from this PR.
This code was compiled on GNU/Linux 4.8.13-1-ARCH by Nim compiler 0.15.0 (exact git hash: 8ec2855)

nim c -d:nimCoroutines -d:mycoro -o:mytest test.nim
nim c -d:nimCoroutines -o:test test.nim

After running both of test and mytest executables for a 10 seconds with measurement of cpu usage by time, I've got the following results:

$ time ./test                                
^CTraceback (most recent call last)
test.nim(12)             test
coro.nim(74)             run
os.nim(1482)             sleep
SIGINT: Interrupted by Ctrl-C.

real    0m10,055s
user    0m0,230s
sys     0m0,543s
$ time ./mytest
^CTraceback (most recent call last)
test.nim(12)             test
mycoro.nim(82)           run
SIGINT: Interrupted by Ctrl-C.

real    0m9,991s
user    0m0,003s
sys     0m0,000s

It can be seen, that original coro library consumes much CPU resources when sleeping.
My investigations lead me to the StackOverflow question. As long as coro uses os.sleep and its underlying implementation on linux is based on nanosleep() system call, the coroutines will be consuming such a big amount of resources. nanosleep() provides the high precision delay, while its enough for coroutines to use delay with the millisecond precision. Probably, the high precision of the nanosleep() syscall is really necessary in other os.sleep application, but in coro it may be better to use some alternative, such as select with timeout (according to this answer).

Araq commented Jan 6, 2017

That sounds more like os.sleep should be patched instead. Sleeping shouldn't stress the CPU.

xomachine commented Jan 7, 2017 edited

I agree, but linux maintainers may have a reason to implement nanosleep() in such a way.
Actually, I don't know which implementation of os.sleep should be taken instead. select() looks too "hacky", usleep() is obsolete in POSIX... And the question is how replacing of nanosleep() will affect all other code.
I will rework this PR as soon as these questions will be answered.

UPD: I've tried to patch os.sleep() but encountered another problem: nanosleep() eats resources only when used in coro module. My attempts to reproduce the issue without coro module were not succeed.

xomachine commented Jan 8, 2017 edited

I'm finally figured out the issue origin. It's not a nanosleep() issue, but a rounding issue. When minDelay value is lesser than 10^-3, it is being rounded down to zero, so the os.sleep() proc is being called many-many times with zero argument. select() was a solution only for this MWE (coroutines were being suspended in syncronized rhythm when minDelay was never lesser than 10^-3).

+ # remaining_s - remaining time in seconds
+ let remaining_s = coro.sleepTime - (epochTime() - coro.lastRun)
+ # remaining - in milliseconds
+ # comparing to the int.high is required to avoid integer overflow
Araq Jan 8, 2017 Member

Integer overflow traps in debug builds, you cannot use 'min' to work around this.

xomachine Jan 8, 2017 Contributor

Well, should I just ignore the overflow possibility? Or it is possible to put assert with the overflow check?

Araq Jan 9, 2017 Member

Yeah, just ignore the overflow for now.


It does not look like that the check failure is related to my patch. ./koch tests cat generic on my local machine passes without any problem. Should I pull recent changes from the upstream?

@Araq Araq merged commit d356c37 into nim-lang:devel Jan 11, 2017

1 check failed

continuous-integration/travis-ci/pr The Travis CI build failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment