Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failures on OSX 12 #1910

Closed
mhdawson opened this issue Jul 5, 2018 · 23 comments
Closed

Failures on OSX 12 #1910

mhdawson opened this issue Jul 5, 2018 · 23 comments
Assignees
Labels

Comments

@mhdawson
Copy link
Contributor

mhdawson commented Jul 5, 2018

  • Version: 1.x
  • Platform: osx 12

Failure when trying to add osx12 to the test matrix:
https://ci.nodejs.org/view/All/job/libuv-test-commit-osx/959/nodes=osx1012/

Not sure if they are real failures or something to do with the machine.

not ok 16 - condvar_3
# exit code 6
# Output from process `condvar_3`:
# Assertion failed in ../test/test-condvar.c on line 161: 0 == wc.wait_cond(&wc, &wc.posted_1)
ok 17 - condvar_4

not ok 49 - fs_event_error_reporting
# timeout
# Output from process `fs_event_error_reporting`: (no output)

not ok 116 - hrtime
# exit code 6
# Output from process `hrtime`:
# Assertion failed in ../test/test-hrtime.c on line 50: diff < (uint64_t) 80 * NANOSEC / MILLISEC
@cjihrig
Copy link
Contributor

cjihrig commented Aug 11, 2018

@mhdawson what is the status of adding this to the CI? Has it been put on hold?

@refack refack self-assigned this Aug 11, 2018
@refack
Copy link
Contributor

refack commented Aug 11, 2018

Similar three (https://ci.nodejs.org/view/All/job/libuv-test-commit-osx/984/nodes=osx1012/)
18 and 115 seem timing related.
48 seems FS related, which is not surprising.

not ok 18 - condvar_5
# exit code 6
# Output from process `condvar_5`:
# Assertion failed in ../test/test-condvar.c on line 262: elapsed <= 1.5 * timeout

not ok 48 - fs_event_error_reporting
# timeout
# Output from process `fs_event_error_reporting`: (no output)

not ok 115 - hrtime
# exit code 6
# Output from process `hrtime`:
# Assertion failed in ../test/test-hrtime.c on line 50: diff < (uint64_t) 80 * NANOSEC / MILLISEC

P.S. The node test suite does not hit these edge cases...

@mhdawson
Copy link
Contributor Author

@cjihrig I did not add it to the CI to avoid making it consistently red. 1010 and 1011 are in, we can enable 1012 any time we are ready.

@copumpkin
Copy link

FWIW, I'm getting condvar_5 failures on macOS 10.13 too

@davisjam
Copy link
Contributor

@copumpkin What is the assert for condvar_5?

@copumpkin
Copy link

copumpkin commented Sep 15, 2018

Are you asking which assert fails? It's this one:

  ASSERT(elapsed <= 1.5 * timeout); /* 1.1 too small for OSX. */

I haven't added any instrumentation yet to figure out what the actual elapsed was though. Happens fairly consistently for me though; not sure if it's a real issue or if my machine is just under heavy load.

@davisjam
Copy link
Contributor

@copumpkin If you could let me know what elapsed and timeout are, that would be swell.

@copumpkin
Copy link

I'll see if I can track it down, gimme a bit 😄

@copumpkin
Copy link

And now it seems to be passing consistently again... not exactly reassuring 😕

@copumpkin
Copy link

copumpkin commented Sep 15, 2018

Got it to fail:

elapsed = 173798319
timeout = 100000000

Seems like in general these timing tests are likely going to fail when the system is under heavy load or concurrent builds are happening. Perhaps just put an upper bound on them to ensure they don't run for ridiculous amounts of time, but any single-digit factor doesn't seem out of the question to me.

Edit: just tried again and got elapsed = 174015564. At the least I'd probably bump that multiplier to 2.0, but as I said above, it seems safer to just make it something like 5.0. Timing tests are never fun or predictable 😦

@davisjam
Copy link
Contributor

it seems safer to just make it something like 5.0

Indeed. I'll open a PR in a bit.

@davisjam
Copy link
Contributor

@copumpkin Have you had any issues with condvar_3? mhdawson flagged that in his initial post.

@copumpkin
Copy link

Nope, condvar_5 is the only one I've seen fail in maybe 15-20 builds of libuv in the past couple of days. This is on 1.23.0 by the way, on macOS 10.13 on a nearly maxed out MacBook Pro.

@davisjam
Copy link
Contributor

@copumpkin Take a look at this PR.

cjihrig pushed a commit that referenced this issue Sep 19, 2018
Problem:
Upper bound on thread wakeup was set to 1.5 * (requested timeout).
On MacOS wakeup delay factors of 1.75 have been reported.

Solution:
Increase the bound to 5 * (requested timeout).

Refs: #1910
PR-URL: #1990
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
@mhdawson
Copy link
Contributor Author

Tried to re-enable 12 for libuv testing today but there are 2 failures:

https://ci.nodejs.org/view/libuv/job/libuv-test-commit-osx/1039/nodes=osx1012/console

not ok 115 - hrtime
# exit code 6
# Output from process `hrtime`:
# Assertion failed in ../test/test-hrtime.c on line 50: diff < (uint64_t) 80 * NANOSEC / MILLISEC

not ok 48 - fs_event_error_reporting
# timeout
# Output from process `fs_event_error_reporting`: (no output)

@mhdawson
Copy link
Contributor Author

So seems like the condvar one is resolved, but the other 2 are still there.

@davisjam
Copy link
Contributor

davisjam commented Sep 20, 2018

Sounds like 115 - hrtime may also be a time-based issue like condvar was.

The failing code is:

  while (i > 0) {
    a = uv_hrtime();
    uv_sleep(45);
    b = uv_hrtime();

    diff = b - a;

    /*  printf("i= %d diff = %llu\n", i, (unsigned long long int) diff); */

    /* The windows Sleep() function has only a resolution of 10-20 ms. Check
     * that the difference between the two hrtime values is somewhat in the
     * range we expect it to be. */
    ASSERT(diff > (uint64_t) 25 * NANOSEC / MILLISEC);
    ASSERT(diff < (uint64_t) 80 * NANOSEC / MILLISEC);
    --i;
  }
  return 0;

@davisjam
Copy link
Contributor

This one is rather more mysterious:

not ok 48 - fs_event_error_reporting
# timeout
# Output from process `fs_event_error_reporting`: (no output)

@mhdawson Simlar to #1997, if I push a branch with some logging code, could you run CI on it and link to the results so we can see what's going on here?

@mhdawson
Copy link
Contributor Author

Yes, created job to make that easier :)

@mhdawson
Copy link
Contributor Author

If you are going to need a lot of runs we can also ask to get you temporary access to be able to kick off jobs if you can't do that now.

@davisjam
Copy link
Contributor

I opened #1998 to try to figure out what's happening in fs_event_error_reporting.

@davisjam
Copy link
Contributor

If you are going to need a lot of runs we can also ask to get you temporary access to be able to kick off jobs if you can't do that now.

I have the ability to start Node.js builds on Jenkins.
It does not look like I have the power to start libuv jobs.
I would be delighted to acquire this power.

@stale
Copy link

stale bot commented Aug 10, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 10, 2019
@stale stale bot closed this as completed Aug 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants