Skip to content

Conversation

@joyeecheung
Copy link
Member

@joyeecheung joyeecheung commented Oct 27, 2025

After the write triggers a restart of the grandchild, the newly spawned second grandchild can post another 'script ready' message before the stdout from the first grandchild is relayed by the watcher and processed by this parent process to kill the watcher. If we write again and trigger another restart, we can end up in an infinite loop and never receive the stdout of the grandchildren in time.
Only write once to verify the first grandchild process receives the expected signal. We don't care about the subsequent grandchild processes.

Drive-by: add some logs to aid debugging in case they fail again in the CI.

Refs: #60297
Refs: #60391

@nodejs-github-bot nodejs-github-bot added needs-ci PRs that need a full CI run. test Issues and PRs related to the tests. labels Oct 27, 2025
After the write triggers a restart of the grandchild, the newly
spawned second grandchild can post another 'script ready' message
before the stdout from the first grandchild is relayed by the
watcher and processed by this parent process to kill
the watcher. If we write again and trigger another restart, we can
end up in an infinite loop and never receive the stdout of the
grandchildren in time.
Only write once to verify the first grandchild process receives
the expected signal. We don't care about the subsequent grandchild
processes.
@joyeecheung
Copy link
Member Author

@codecov
Copy link

codecov bot commented Oct 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.58%. Comparing base (3c8c1ef) to head (4d88f9d).
⚠️ Report is 21 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #60443      +/-   ##
==========================================
+ Coverage   88.07%   88.58%   +0.50%     
==========================================
  Files         704      704              
  Lines      207778   207826      +48     
  Branches    39949    40055     +106     
==========================================
+ Hits       182998   184099    +1101     
+ Misses      16791    15790    -1001     
+ Partials     7989     7937      -52     

see 108 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Oct 28, 2025
@joyeecheung
Copy link
Member Author

  [MESSAGE] {
    'watch:require': [ '/home/iojs/tmp/.tmp.1/kill-signal-for-watch.js' ]
  }
  [MESSAGE] script ready
  [MESSAGE] {
    'watch:require': [ '/home/iojs/tmp/.tmp.1/kill-signal-for-watch.js' ]
  }
  [MESSAGE] script ready
  [STDOUT] Restarting '/home/iojs/tmp/.tmp.1/kill-signal-for-watch.js'
  __SIGINT received__

  [PARENT] Sending kill signal to child process: 510581
  [STDOUT] __SIGTERM received__

  [PARENT] Sending kill signal to child process: 510581

The stress test failed in this PR, though the message seems helpful. It does prove my hypothesis about the script ready message firing too fast so that the stdout may not be processed before the testing process already receives the script ready from the restarted grandchildren and there could be an infinite write-watch-restart loop happening. Though it seems when that happens, the second grandchild would actually get killed with SIGTERM, as it's not killed by the parent for restart, but as a consequence of its parent being directly killed by the testing process. I think we can differentiate by also printing the pid there, since we only care about the first grandchild, and the second grandchild is just the leftover.

@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 28, 2025
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@joyeecheung
Copy link
Member Author

Stress test came back clean: https://ci.nodejs.org/job/node-stress-single-test/622/

@joyeecheung joyeecheung force-pushed the fix-flake branch 2 times, most recently from e40cc9e to 4d88f9d Compare October 28, 2025 12:54
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@joyeecheung
Copy link
Member Author

CI is green. Can you take a look again? Thanks @lpinca @MoLow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-ci PRs that need a full CI run. test Issues and PRs related to the tests.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants