Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No crash but (almost) nothing happens anymore #45

Closed
shufps opened this issue Dec 25, 2022 · 6 comments
Closed

No crash but (almost) nothing happens anymore #45

shufps opened this issue Dec 25, 2022 · 6 comments

Comments

@shufps
Copy link
Contributor

shufps commented Dec 25, 2022

Alphanet fell out of sync and is not generating new milestones since today about 1:45AM

The weird thing is that neither the inx-tendercoos nor the hornet nodes crashed.

Logs of hornets look normal - they just didn't get milestones anymore.

Logs of inx-tendercoo look weird.

Example:

2022-12-25T00:27:07Z    INFO    Coordinator     new confirmed milestone: 7296250
2022-12-25T00:27:07Z    INFO    Tendermint      Timed out       {"module": "consensus", "dur": 0.6839246, "height": 1758286, "round": 0, "step": "RoundStepNewHeight"}
2022-12-25T00:27:07Z    INFO    Tendermint      received proposal       {"module": "consensus", "proposal": "Proposal{1758286/0 (85CD737A2CE12F901523B8CFF39727F8426F0C187855078C7922440454AE9ECF:1:76D892699DE1, -1) F8E00F9AE0DB @ 2022-12-25T00:27:07.85879395Z}"}
2022-12-25T00:27:07Z    INFO    Tendermint      received complete proposal block        {"module": "consensus", "height": 1758286, "hash": "85CD737A2CE12F901523B8CFF39727F8426F0C187855078C7922440454AE9ECF"}
2022-12-25T00:27:08Z    INFO    Tendermint      finalizing commit of block      {"module": "consensus", "height": 1758286, "hash": "85CD737A2CE12F901523B8CFF39727F8426F0C187855078C7922440454AE9ECF", "root": "E34EAF0B3A6F03E2B59EA5F6A9B3923335AFDA594D95F7A9B318302A33DA9DCB", "num_txs": 0}
2022-12-25T00:27:08Z    INFO    Tendermint      executed block  {"module": "state", "height": 1758286, "num_valid_txs": 0, "num_invalid_txs": 0}
2022-12-25T00:27:08Z    INFO    Tendermint      committed state {"module": "state", "height": 1758286, "num_txs": 0, "app_hash": "E34EAF0B3A6F03E2B59EA5F6A9B3923335AFDA594D95F7A9B318302A33DA9DCB"}
2022-12-25T00:27:08Z    INFO    Tendermint      indexed block exents    {"module": "txindex", "height": 1758286}
2022-12-25T00:27:09Z    INFO    Coordinator     proposing parent for milestone 7296251
2022-12-25T00:27:09Z    DEBUG   Coordinator     broadcast tx    {"parent": {"Index":7296251,"BlockID":"46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}}
2022-12-25T00:27:09Z    INFO    Tendermint      Timed out       {"module": "consensus", "dur": 0.016293711, "height": 1758287, "round": 0, "step": "RoundStepNewRound"}
2022-12-25T00:27:09Z    INFO    Tendermint      received proposal       {"module": "consensus", "proposal": "Proposal{1758287/0 (FD36C18FAE0D8B7E333283667ED3B81C1923F3225DBB7923DFA4615F35D764A1:1:F156BA0EBB9F, -1) 7D8FD69C4E54 @ 2022-12-25T00:27:09.032542002Z}"}
2022-12-25T00:27:09Z    INFO    Tendermint      received complete proposal block        {"module": "consensus", "height": 1758287, "hash": "FD36C18FAE0D8B7E333283667ED3B81C1923F3225DBB7923DFA4615F35D764A1"}
2022-12-25T00:27:09Z    INFO    Tendermint      finalizing commit of block      {"module": "consensus", "height": 1758287, "hash": "FD36C18FAE0D8B7E333283667ED3B81C1923F3225DBB7923DFA4615F35D764A1", "root": "E34EAF0B3A6F03E2B59EA5F6A9B3923335AFDA594D95F7A9B318302A33DA9DCB", "num_txs": 1}
2022-12-25T00:27:09Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7296251,"BlockID":"46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}, "issuer": "PfgJZMyDkh4IwfoKT1/AWBCmNNpFRhsrMV/L/WL3yrc="}
2022-12-25T00:27:09Z    DEBUG   Coordinator     awaiting parent {"blockID": "46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}
2022-12-25T00:27:09Z    INFO    Tendermint      executed block  {"module": "state", "height": 1758287, "num_valid_txs": 1, "num_invalid_txs": 0}
2022-12-25T00:27:09Z    DEBUG   Coordinator     invalid tip     {"parent": "46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}
2022-12-25T00:27:09Z    INFO    Tendermint      committed state {"module": "state", "height": 1758287, "num_txs": 1, "app_hash": "358432A9ABC070DC8928FDD7A6A35E463E5E9E87015F3CC121BDFDA13C5F860E"}
2022-12-25T00:27:09Z    INFO    Tendermint      indexed block exents    {"module": "txindex", "height": 1758287}
2022-12-25T00:27:10Z    INFO    Tendermint      Timed out       {"module": "consensus", "dur": 0.682942085, "height": 1758288, "round": 0, "step": "RoundStepNewHeight"}
2022-12-25T00:27:10Z    INFO    Tendermint      received proposal       {"module": "consensus", "proposal": "Proposal{1758288/0 (8C7801FEC24D29F328E9592AFE9B21790E39F5A287A43380A1B049BD1F9B0EB9:1:4132F2A0EED9, -1) F1F21243BCBE @ 2022-12-25T00:27:10.299955396Z}"}
2022-12-25T00:27:10Z    INFO    Tendermint      received complete proposal block        {"module": "consensus", "height": 1758288, "hash": "8C7801FEC24D29F328E9592AFE9B21790E39F5A287A43380A1B049BD1F9B0EB9"}
2022-12-25T00:27:10Z    INFO    Tendermint      finalizing commit of block      {"module": "consensus", "height": 1758288, "hash": "8C7801FEC24D29F328E9592AFE9B21790E39F5A287A43380A1B049BD1F9B0EB9", "root": "358432A9ABC070DC8928FDD7A6A35E463E5E9E87015F3CC121BDFDA13C5F860E", "num_txs": 8}
2022-12-25T00:27:10Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7296251,"BlockID":"46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}, "issuer": "Kml39uVHUY1XLXumP2tQtbuvPegCPcVCdRcj85Epe4s="}
2022-12-25T00:27:10Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7296251,"BlockID":"46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}, "issuer": "r+rCxSoK09pTe6h9AXJyD9VfpP2zxEsP0gvIEtGKiFY="}
2022-12-25T00:27:10Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7296251,"BlockID":"46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}, "issuer": "o74D/yOQIQNUqa9r60GKlXdmsFD1p8YmYoA8wmepRPY="}
2022-12-25T00:27:10Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7296251,"BlockID":"46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}, "issuer": "jiIq5+Ktz7h6KYShmq1SsZee0UcsPLFyOac+8dNEw1o="}
2022-12-25T00:27:10Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7296251,"BlockID":"46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}, "issuer": "2ZIoGaOelN3zkH9LnI35PznwJiRPy2CSBbmoeQIlmfI="}
2022-12-25T00:27:10Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7296251,"BlockID":"46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}, "issuer": "GVltVw6VNfEcw0CR25jsRKUm44ZX+Ca11rYVLlM7jAY="}
2022-12-25T00:27:10Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7296251,"BlockID":"46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}, "issuer": "ZLx8094lTMOipuCq7luEHQpnGyK0h3iPWU+/HvMpk1w="}
2022-12-25T00:27:10Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7296251,"BlockID":"46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}, "issuer": "2F5bFZDYmNHgzeuy47UzfIt2JwFCZj14gRaDukfBfJg="}
2022-12-25T00:27:10Z    DEBUG   Coordinator     awaiting parent {"blockID": "46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}
2022-12-25T00:27:10Z    INFO    Tendermint      executed block  {"module": "state", "height": 1758288, "num_valid_txs": 8, "num_invalid_txs": 0}
2022-12-25T00:27:10Z    DEBUG   Coordinator     invalid tip     {"parent": "46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}
2022-12-25T00:27:10Z    INFO    Tendermint      committed state {"module": "state", "height": 1758288, "num_txs": 8, "app_hash": "4451565D214CE75A4C65FB94BBF427F40902DA92D6C2322228F350FC3E5B2599"}
2022-12-25T00:27:10Z    INFO    Tendermint      indexed block exents    {"module": "txindex", "height": 1758288}
2022-12-25T00:27:11Z    INFO    Tendermint      Timed out       {"module": "consensus", "dur": 0.681604159, "height": 1758289, "round": 0, "step": "RoundStepNewHeight"}
2022-12-25T00:27:14Z    INFO    Tendermint      Timed out       {"module": "consensus", "dur": 3, "height": 1758289, "round": 0, "step": "RoundStepPropose"}
2022-12-25T00:27:15Z    INFO    Tendermint      Timed out       {"module": "consensus", "dur": 1, "height": 1758289, "round": 0, "step": "RoundStepPrecommitWait"}
2022-12-25T00:27:16Z    INFO    Tendermint      received proposal       {"module": "consensus", "proposal": "Proposal{1758289/1 (522D4C8CE0F5BA59F0E7B36C105A0666C9D5F535DE3E396D00737954BC9D243C:1:96945E28DF84, -1) A4148DCF911D @ 2022-12-25T00:27:15.931789376Z}"}
2022-12-25T00:27:16Z    INFO    Tendermint      received complete proposal block        {"module": "consensus", "height": 1758289, "hash": "522D4C8CE0F5BA59F0E7B36C105A0666C9D5F535DE3E396D00737954BC9D243C"}
2022-12-25T00:27:16Z    INFO    Tendermint      finalizing commit of block      {"module": "consensus", "height": 1758289, "hash": "522D4C8CE0F5BA59F0E7B36C105A0666C9D5F535DE3E396D00737954BC9D243C", "root": "4451565D214CE75A4C65FB94BBF427F40902DA92D6C2322228F350FC3E5B2599", "num_txs": 0}
2022-12-25T00:27:16Z    DEBUG   Coordinator     awaiting parent {"blockID": "46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}
2022-12-25T00:27:16Z    INFO    Tendermint      executed block  {"module": "state", "height": 1758289, "num_valid_txs": 0, "num_invalid_txs": 0}
2022-12-25T00:27:16Z    DEBUG   Coordinator     invalid tip     {"parent": "46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}
2022-12-25T00:27:16Z    INFO    Tendermint      committed state {"module": "state", "height": 1758289, "num_txs": 0, "app_hash": "4451565D214CE75A4C65FB94BBF427F40902DA92D6C2322228F350FC3E5B2599"}
2022-12-25T00:27:16Z    INFO    Tendermint      indexed block exents    {"module": "txindex", "height": 1758289}
2022-12-25T00:27:17Z    INFO    Tendermint      Timed out       {"module": "consensus", "dur": 0.681646438, "height": 1758290, "round": 0, "step": "RoundStepNewHeight"}
2022-12-25T03:01:53Z    INFO    Coordinator     trigger next milestone preemptively
2022-12-25T03:01:53Z    INFO    Coordinator     proposing parent for milestone 7296251
2022-12-25T03:01:53Z    DEBUG   Coordinator     broadcast tx    {"parent": {"Index":7296251,"BlockID":"d5f49c5365db761c153654b08816b34d625a4547722af125990030343d991eb4"}}
2022-12-25T03:01:53Z    WARN    Coordinator     failed to propose parent: invalid proposal: already applied
2022-12-25T05:43:19Z    INFO    Coordinator     trigger next milestone preemptively
2022-12-25T05:43:19Z    INFO    Coordinator     proposing parent for milestone 7296251
2022-12-25T05:43:19Z    DEBUG   Coordinator     broadcast tx    {"parent": {"Index":7296251,"BlockID":"d9cd72873d7b4e5f0b6bf480d3d6203b6775667621b0d1a0e9098c421feb219f"}}
2022-12-25T05:43:19Z    WARN    Coordinator     failed to propose parent: invalid proposal: already applied

Important to notice are the timestamps of the logging. There are 2,5h+ where nothing at all happened.

Then the message trigger next milestone preemptively appears and inx-tendercoo is doing a little bit.

There were no errors but there is one DEBUG message that seems suspicious:

2022-12-25T00:27:09Z    DEBUG   Coordinator     invalid tip     {"parent": "46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}
@shufps
Copy link
Contributor Author

shufps commented Dec 25, 2022

@shufps
Copy link
Contributor Author

shufps commented Dec 25, 2022

invalid proposal: already applied also happened here: #47

@shufps
Copy link
Contributor Author

shufps commented Dec 25, 2022

I also have pprof profiles for each COO and inx-tendercoo if needed (except the inx-tendercoo on coo-2 that hung in the node is not synced loop).

@Wollac
Copy link
Contributor

Wollac commented Dec 27, 2022

OK, the issue is the following:

  • The tip selection picked a tip with !meta.Solid || meta.ReferencedByMilestoneIndex > 0 || meta.ShouldReattach.
  • This is the definition of a block which cannot be a valid milestone parent. As such, none of tendercoos issues a Valid-Parent-Proof and the consensus gets stuck.
  • Even re-triggering the tip selection ("trigger next milestone preemptively") helps here, as a tip was already proposed before and proposing more than one tip is invalid ("invalid proposal: already applied")

One important improvement on the tendercoo side must be to also check !meta.Solid || meta.ReferencedByMilestoneIndex > 0 || meta.ShouldReattach when proposing so that no invalid parents can get proposed. In that case at least a restart or the re-triggering should fix the state. I will integrate this.

However, this only fixes the symptoms. We would need to understand why !meta.Solid || meta.ReferencedByMilestoneIndex > 0 || meta.ShouldReattach was true for a selected tip! I was strongly assumed that this can never happen. Maybe @alexsporn or @muXxer can help out here, with what exactly those flags mean and how this could have happened for the Hornets.

@muXxer
Copy link
Contributor

muXxer commented Jan 4, 2023

See:
#48 (comment)

@shufps
Copy link
Contributor Author

shufps commented Feb 2, 2023

seems to be fixed.

@shufps shufps closed this as completed Feb 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants