Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unix: switch uv_sleep() to nanosleep() #2552

Closed
wants to merge 5 commits into from

Conversation

bnoordhuis
Copy link
Member

@bnoordhuis bnoordhuis commented Nov 26, 2019

Before this commit, uv_sleep() made two library calls: sleep() to sleep
for the requested number of seconds, and then an usleep() call to sleep
for the remaining milliseconds. Make a single nanosleep() call instead.

Receiving a signal will wake up prematurely from sleep() but then
re-enter sleep mode again in usleep(), which seems undesirable.

CI: https://ci.nodejs.org/job/libuv-test-commit/1634/
CI: https://ci.nodejs.org/job/libuv-test-commit/1636/
CI: https://ci.nodejs.org/job/libuv-test-commit/1650/
CI: https://ci.nodejs.org/job/libuv-test-commit/1652/
CI: https://ci.nodejs.org/job/libuv-test-commit/1653/

Copy link
Member

@santigimeno santigimeno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a question

src/unix/core.c Outdated Show resolved Hide resolved
src/unix/core.c Outdated Show resolved Hide resolved
@cjihrig
Copy link
Contributor

cjihrig commented Nov 28, 2019

On zOS, the CI is failing. I reran the CI a few times, but am still seeing the same failures:

not ok 210 - spawn_and_kill
# timeout
# Output from process `spawn_and_kill`:
# close_cb
# CEE5205S The signal SIGTERM was received.
not ok 211 - spawn_and_kill_with_std
# timeout
# Output from process `spawn_and_kill_with_std`:
# close_cb
# close_cb
# CEE5205S The signal SIGTERM was received.

@cjihrig
Copy link
Contributor

cjihrig commented Dec 3, 2019

I'm a bit confused. I found this article, which states:

The cond_timed_wait service is similar to the POSIX function nanosleep(). (Refer to
the POSIX standard for a description of nanosleep().) If you need the nanosleep()
function, you can use cond_timed_wait to implement your own version.

Does that mean nanosleep() is implemented, but does not work properly on zOS? Are the CI failures not actually related to this PR?

cc: @libuv/zos

@bnoordhuis
Copy link
Member Author

@libuv/zos For context, there might be a bug in libuv-on-zos's nanosleep() emulation here:

int nanosleep(const struct timespec* req, struct timespec* rem) {
unsigned nano;
unsigned seconds;
unsigned events;
unsigned secrem;
unsigned nanorem;
int rv;
int rc;
int rsn;
nano = (int)req->tv_nsec;
seconds = req->tv_sec;
events = CW_CONDVAR;
#if defined(_LP64)
BPX4CTW(&seconds, &nano, &events, &secrem, &nanorem, &rv, &rc, &rsn);
#else
BPX1CTW(&seconds, &nano, &events, &secrem, &nanorem, &rv, &rc, &rsn);
#endif
assert(rv == -1 && errno == EAGAIN);
if(rem != NULL) {
rem->tv_nsec = nanorem;
rem->tv_sec = secrem;
}
return 0;
}

Notably, this PR starts using the rem out parameter, which was unused before.

@richardlau
Copy link
Contributor

richardlau commented Dec 3, 2019

@libuv/zos For context, there might be a bug in libuv-on-zos's nanosleep() emulation here:

int nanosleep(const struct timespec* req, struct timespec* rem) {
unsigned nano;
unsigned seconds;
unsigned events;
unsigned secrem;
unsigned nanorem;
int rv;
int rc;
int rsn;
nano = (int)req->tv_nsec;
seconds = req->tv_sec;
events = CW_CONDVAR;
#if defined(_LP64)
BPX4CTW(&seconds, &nano, &events, &secrem, &nanorem, &rv, &rc, &rsn);
#else
BPX1CTW(&seconds, &nano, &events, &secrem, &nanorem, &rv, &rc, &rsn);
#endif
assert(rv == -1 && errno == EAGAIN);
if(rem != NULL) {
rem->tv_nsec = nanorem;
rem->tv_sec = secrem;
}
return 0;
}

Notably, this PR starts using the rem out parameter, which was unused before.

According to the documentation that @cjihrig linked to BPX4CTW/BPX1CTW returns EAGAIN/EINTR/EINVAL in the return code parameter (rc in the code above) and makes no mention of errno. Although if errno is anything other than EAGAIN the assert on L366 would trigger.

cc @ccw-1 and @zsw007 since there have been changes to this function that haven't been upstreamed here.

Before this commit, uv_sleep() made two library calls: sleep() to sleep
for the requested number of seconds, and then an usleep() call to sleep
for the remaining milliseconds. Make a single nanosleep() call instead.

Receiving a signal will wake up prematurely from sleep() but then
re-enter sleep mode again in usleep(), which seems undesirable.
Reception of a signal makes nanosleep() return prematurely. Restart the
system call when that happens.
* Remove an assert() that wasn't actually being tested because the CI
  builds libuv with `-DNDEBUG` on that platform.

* Fix the emulation to update `errno` correctly.

* Fix the emulation to update the `rem` out parameter correctly.
@bnoordhuis
Copy link
Member Author

bnoordhuis commented Dec 3, 2019

Although if errno is anything other than EAGAIN the assert on L366 would trigger.

It looks like libuv is built with -DNDEBUG on z/os and hence the macro isn't compiled.

The nanosleep() emulation looks buggy to me after reading the docs on BPX1CTW/BPX4CTW. I've pushed a commit that hopefully fixes it, PTAL.

CI: https://ci.nodejs.org/job/libuv-test-commit/1650/ - edit: still failing...

@richardlau
Copy link
Contributor

Looks like it still failed... reading the failing tests (spawn_and_kill and spawn_and_kill_with_std) it looks like the tests send a signal to interrupt so perhaps

events = CW_CONDVAR;

should be

 events = CW_CONDVAR | CW_INTRPT;

based on the documentation:

If you do not include the CW_INTRPT event when you use cond_timed_wait, some services that are used by other threads or processes cannot cause the waiting thread to resume processing. In particular, the following services do not cause an event notification unless CW_INTRPT is specified in the event list:

  • kill
  • pthread_cancel
  • pthread_kill
  • pthread_quiesce

@bnoordhuis
Copy link
Member Author

bnoordhuis commented Dec 3, 2019

Worth a try, thanks! New CI: https://ci.nodejs.org/job/libuv-test-commit/1652/

edit: of course it fails to build due to a Undeclared identifier CW_INTRPT error... CW_CONDVAR is defined on line 30 in that file but I don't know what the value for CW_INTRPT should be.

@richardlau
Copy link
Contributor

richardlau commented Dec 3, 2019

Looks like we'll need to define CW_INTRPT. Based on https://github.com/ibmruntimes/node/blob/0b9900ffc6897254492cd1883b9ed987741d55bf/deps/zoslib/include/zos.h#L99 add

#define CW_INTRPT 1

just before

#define CW_CONDVAR 32
?

Edit: found a source for what the values should be: https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.4.0/com.ibm.zos.v2r4.bpxb100/ycw.htm

@bnoordhuis
Copy link
Member Author

bnoordhuis commented Dec 3, 2019

Done, thanks! New CI: https://ci.nodejs.org/job/libuv-test-commit/1653/ - edit: green!

Copy link
Contributor

@richardlau richardlau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

z/OS changes LGTM.

cjihrig pushed a commit that referenced this pull request Dec 4, 2019
Before this commit, uv_sleep() made two library calls: sleep() to sleep
for the requested number of seconds, and then an usleep() call to sleep
for the remaining milliseconds. Make a single nanosleep() call instead.

Receiving a signal will wake up prematurely from sleep() but then
re-enter sleep mode again in usleep(), which seems undesirable.

PR-URL: #2552
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
Reviewed-By: Saúl Ibarra Corretgé <saghul@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
cjihrig pushed a commit that referenced this pull request Dec 4, 2019
Reception of a signal makes nanosleep() return prematurely. Restart the
system call when that happens.

PR-URL: #2552
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
Reviewed-By: Saúl Ibarra Corretgé <saghul@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
cjihrig pushed a commit that referenced this pull request Dec 4, 2019
* Remove an assert() that wasn't actually being tested because the CI
  builds libuv with `-DNDEBUG` on that platform.

* Fix the emulation to update `errno` correctly.

* Fix the emulation to update the `rem` out parameter correctly.

PR-URL: #2552
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
Reviewed-By: Saúl Ibarra Corretgé <saghul@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
@cjihrig
Copy link
Contributor

cjihrig commented Dec 4, 2019

Landed in 5500253...635e0ce. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants