New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
removing timeout when starting an node #1237
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering about the wisdom between no timeout and a very long one (10-30 min?). DOn't we use this in tests too?
do we use the timeout for tests? I'm not really following the logic of needing a startup timeout for a node for normal operations, but could be convinced for tests |
T-systems have reported that they see a time out on node startup and their node enters a restart loop.
I would like to remove the time out all together. The unit test have an inherent timeout of 10 mins and should be good. |
@smnzhu @huitseeker @Kay-Zee bumping up this PR. |
Codecov Report
@@ Coverage Diff @@
## master #1237 +/- ##
==========================================
- Coverage 54.86% 54.80% -0.06%
==========================================
Files 504 504
Lines 31918 31918
==========================================
- Hits 17512 17494 -18
- Misses 12031 12050 +19
+ Partials 2375 2374 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepting to unblock, it seems clear the current timeout is too small, at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an operational perspective, it seems like having some timeout makes sense, though definitely much longer than 1 minute. If a node hangs on initial startup or a later restart, operators will want their supervisor tasks to restart the process. if it hangs for a long time/indefinitely they'll require manual intervention.
I'm not sure about what startup/shutdown step could possibly hang that long, so I'm fine with this if long hangs are not a concern.
bors merge |
A node startup can take a very long time for certain node type e.g. Execution node. A timeout is not really needed when a node starts/stop.