Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: unflake async hooks statwatcher test #30362

Closed
wants to merge 1 commit into from

Conversation

@lundibundi
Copy link
Member

lundibundi commented Nov 11, 2019

Under high load 2 types of issues arise with this test.

  • filesystem calls gets queued even when the 'sync' is used which leads
    to async_hooks being called with the events of tmpdir clean or
    initial file write after clean.
    This is solved by counting all 'change' calls while making sure there
    is no dependency of StatWatcher's on one another and the expected
    changes are waited for.

  • some events are getting lost with the current
    clean->write->write_and_watch strategy. Specifically I observed the
    file size going from 0 (empty) to 5 ('foo++') entirely skipping 3 ('foo') even though the write
    call was there (this happened reliably on -j128).
    So I've changed the strategy to avoid additional write considering
    this still tests the hooks correctly.
    This may indicate some sort of bug in async_hooks though I'm not sure.
    Not the async hooks but the fs watcher as it was missing an event.

With these changes, this test reliably passes with -j128 --repeat 4000 on my machine. Time to test this on CI.

Closes: #21425

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines
@lundibundi

This comment has been minimized.

Copy link
Member Author

lundibundi commented Nov 11, 2019

Could someone help with node-stress-single-test, what would be the correct GIT_REMOTE_REF for this PR? (refs/pull/30362/head?)
I wanted to run tests on win2016 and ubuntu1604x64 for async-hooks/test-statwatcher.

@Trott

This comment has been minimized.

Copy link
Member

Trott commented Nov 12, 2019

Thank you for this!!!!

refs/pull/30362/head should be the correct GIT_REMOTE_REF. If I did it right, it should be running at https://ci.nodejs.org/job/node-stress-single-test/15/.

@lundibundi

This comment has been minimized.

Copy link
Member Author

lundibundi commented Nov 12, 2019

Well, looks like it was successful.
Though I'm still debating whether my w1HookCount/w2HookCount is a good solution to a different amount of events we get. In a current workflow, we can still either get no_file->file_write (99% of time) or no_file->file_creation->file_write (got this 3 times under -j128 --repeat 4000).

@Trott

This comment has been minimized.

Copy link
Member

Trott commented Nov 12, 2019

@nodejs/testing @nodejs/async_hooks This could use some reviews.

@lundibundi

This comment has been minimized.

Copy link
Member Author

lundibundi commented Nov 26, 2019

@Trott

This comment has been minimized.

Copy link
Member

Trott commented Nov 27, 2019

@Qard
Qard approved these changes Dec 21, 2019
@nodejs-github-bot

This comment has been minimized.

@Trott
Trott approved these changes Dec 21, 2019
@Trott Trott added the author ready label Dec 21, 2019
@richardlau

This comment has been minimized.

Copy link
Member

richardlau commented Dec 21, 2019

If this unflakes the test shouldn’t it be removed from async-hooks.status?

Under high load 2 types of issues arise with this test.
* filesystem calls gets queued even when the 'sync' is used which leads
  to async_hooks being called with the events of tmpdir clean or
  initial file write after clean.
  This is solved by counting all 'change' calls while making sure there
  is no dependency of StatWatcher's on one another and the expected
  changes are waited for.

* some events are getting lost with the current
  clean->write->write_and_watch strategy. Specifically I observed the
  file size going from 0 to 5 entirely skipping 3 even though the write
  call was there (this happened reliably on -j128).
  So I've changed the strategy to avoid additional write considering
  this still tests the hooks correctly.
  This may indicate some sort of bug in async_hooks though I'm not sure.

Closes: #21425
@lundibundi lundibundi force-pushed the lundibundi:fix-stat-watcher-test branch from a4e124f to 33b8300 Dec 21, 2019
@lundibundi

This comment has been minimized.

Copy link
Member Author

lundibundi commented Dec 21, 2019

@richardlau done. I had to rebase (and force push) as this change wasn't present at the time.

@nodejs-github-bot

This comment has been minimized.

@nodejs-github-bot

This comment has been minimized.

@Trott

This comment has been minimized.

Copy link
Member

Trott commented Dec 22, 2019

Landed in ea6a457

@Trott Trott closed this Dec 22, 2019
Trott added a commit that referenced this pull request Dec 22, 2019
Under high load 2 types of issues arise with this test.
* filesystem calls gets queued even when the 'sync' is used which leads
  to async_hooks being called with the events of tmpdir clean or
  initial file write after clean.
  This is solved by counting all 'change' calls while making sure there
  is no dependency of StatWatcher's on one another and the expected
  changes are waited for.

* some events are getting lost with the current
  clean->write->write_and_watch strategy. Specifically I observed the
  file size going from 0 to 5 entirely skipping 3 even though the write
  call was there (this happened reliably on -j128).
  So I've changed the strategy to avoid additional write considering
  this still tests the hooks correctly.
  This may indicate some sort of bug in async_hooks though I'm not sure.

Closes: #21425

PR-URL: #30362
Fixes: #21425
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Vladimir de Turckheim <vlad2t@hotmail.com>
BridgeAR added a commit that referenced this pull request Jan 3, 2020
Under high load 2 types of issues arise with this test.
* filesystem calls gets queued even when the 'sync' is used which leads
  to async_hooks being called with the events of tmpdir clean or
  initial file write after clean.
  This is solved by counting all 'change' calls while making sure there
  is no dependency of StatWatcher's on one another and the expected
  changes are waited for.

* some events are getting lost with the current
  clean->write->write_and_watch strategy. Specifically I observed the
  file size going from 0 to 5 entirely skipping 3 even though the write
  call was there (this happened reliably on -j128).
  So I've changed the strategy to avoid additional write considering
  this still tests the hooks correctly.
  This may indicate some sort of bug in async_hooks though I'm not sure.

Closes: #21425

PR-URL: #30362
Fixes: #21425
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Vladimir de Turckheim <vlad2t@hotmail.com>
@BridgeAR BridgeAR mentioned this pull request Jan 7, 2020
targos added a commit that referenced this pull request Jan 14, 2020
Under high load 2 types of issues arise with this test.
* filesystem calls gets queued even when the 'sync' is used which leads
  to async_hooks being called with the events of tmpdir clean or
  initial file write after clean.
  This is solved by counting all 'change' calls while making sure there
  is no dependency of StatWatcher's on one another and the expected
  changes are waited for.

* some events are getting lost with the current
  clean->write->write_and_watch strategy. Specifically I observed the
  file size going from 0 to 5 entirely skipping 3 even though the write
  call was there (this happened reliably on -j128).
  So I've changed the strategy to avoid additional write considering
  this still tests the hooks correctly.
  This may indicate some sort of bug in async_hooks though I'm not sure.

Closes: #21425

PR-URL: #30362
Fixes: #21425
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Vladimir de Turckheim <vlad2t@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.