Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signer runtime is stuck for some SPO #1312

Closed
4 tasks done
jpraynaud opened this issue Oct 20, 2023 · 2 comments · Fixed by #1374
Closed
4 tasks done

Signer runtime is stuck for some SPO #1312

jpraynaud opened this issue Oct 20, 2023 · 2 comments · Fixed by #1374
Assignees
Labels
bug ⚠️ Something isn't working

Comments

@jpraynaud
Copy link
Member

jpraynaud commented Oct 20, 2023

Why

Some SPO report that their signer node has its state machine runtime stuck for multiple days:

This could be due to KES keys rotation, but there is no obvious link 🤔. We need to investigate further the problem.
Some SPO state that this could happen after a restart of the Block Producer node.

The signer nodes seem to be healthy.

In general, a restart of the signer node fixes the problem.

What

Can we reproduce the problem with?

  • KES keys rotation
  • Block Producer restart

Can we find in the code if there is a problem with the tick of the runtime?

How

  • Try to reproduce the problem locally
  • Add a goodbye message when signer is terminated with no crash
  • Analyze the code and find potential problems that could lead to this behavior
  • Add timeout on HTTP client(s)
@jpraynaud jpraynaud added bug ⚠️ Something isn't working to-groom 🤔 Needs grooming and removed to-groom 🤔 Needs grooming labels Oct 20, 2023
@dlachaume
Copy link
Collaborator

Below are some tests performed in attempt to reproduce the behavior described by the SPOs.

These tests were executed locally with 3 signer nodes on a devnet with mithril-end-to-end. Signers are registered and sign before executing each case.

Tests are checked if the signer is able to run and sign without issue.

Scenario Test passed
Restart the signer node
Restart the cardano node while the signer is still running
Rotate the KES keys

@jpraynaud
Copy link
Member Author

We close issue until the problem occurs once again.

In the mean time, the PR #1362 added a goodbye message that will help use identify unexpected crashes if any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug ⚠️ Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants