Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle restarting with validation automatically #1551

Closed
mrBliss opened this issue Jan 31, 2020 · 5 comments · Fixed by #1623
Closed

Handle restarting with validation automatically #1551

mrBliss opened this issue Jan 31, 2020 · 5 comments · Fixed by #1623
Assignees
Labels
byron Required for a Byron mainnet: replace the old core nodes with cardano-node chain db consensus issues related to ouroboros-consensus immutable db
Milestone

Comments

@mrBliss
Copy link
Contributor

mrBliss commented Jan 31, 2020

Related to #1200 and #1201. When the node is being run by the wallet, it uses error codes to communicate information about the fatal error to the wallet (see #1541). For example, in case a file from the database is missing, the wallet should restart the node with the validation flag enabled.

In a server setting, where the node process is being run by systemd or runit, the node process will automatically be restarted when it terminated. There will be no human reading the error message or checking the error code to enable the validation flag when necessary.

We should be able to automatically validate when necessary.

Plan: add a marker file to the database on a clean shutdown (TODO define "clean"). On startup, if the marker is there, then we don't have to do any additional validation (just the last epoch, as we always do by default). If the marker is missing, we do a full validation.

@mrBliss mrBliss added consensus issues related to ouroboros-consensus chain db immutable db labels Jan 31, 2020
@mrBliss mrBliss added this to the S6 2020-02-13 milestone Jan 31, 2020
@mrBliss mrBliss self-assigned this Jan 31, 2020
@edsko
Copy link
Contributor

edsko commented Jan 31, 2020

  • Create marker on startup
  • Delete marker on normal shutdown (as the last thing the Chain DB does)
  • If marker present on startup, clean shutdown did not happen, and hence we should validate.

@mrBliss
Copy link
Contributor Author

mrBliss commented Jan 31, 2020

@mrBliss
Copy link
Contributor Author

mrBliss commented Jan 31, 2020

* Create marker on startup

* Delete marker on normal shutdown (as the last thing the Chain DB does)

* If marker present on startup, clean shutdown did not happen, and hence we should validate.

Edsko and I discussed two possible interpretations of the marker and found them to be (almost?) isomorphic. However, the approach described in the description has the following advantage: the user can't accidentally delete the marker.

@edsko
Copy link
Contributor

edsko commented Jan 31, 2020

Note that this means we only need two exit codes: restart, or "don't bother restarting" (and automated tools can just treat both as "restart").

@edsko
Copy link
Contributor

edsko commented Jan 31, 2020

("Don't bother restarting" might still be useful to distinguish some subcases, to give better feedback to Daedalus.)

iohk-bors bot added a commit that referenced this issue Feb 7, 2020
1554: Detect clock changes r=edsko a=edsko

Closes #759.

Code is tested using mock time; result of the test labelling:

```
# cabal run test-consensus -- -p delayClockShift --quickcheck-replay=680184 --quickcheck-tests 10000
Up to date
ouroboros-consensus
  WallClock
    delayClockShift: OK (18.51s)
      +++ OK, passed 10000 tests.
      
      schedule goes back (10000 in total):
      63.07% False
      36.93% True
      
      schedule length (10000 in total):
      81.03% R_Gt 20
       9.26% R_Btwn (10,20)
       4.08% R_Btwn (5,10)
       1.04% R_Btwn (4,5)
       1.03% R_Eq 1
       0.96% R_Eq 2
       0.94% R_Eq 4
       0.87% R_Eq 0
       0.79% R_Eq 3
      
      schedule skips (10000 in total):
      38.66% R_Btwn (10,20)
      24.96% R_Btwn (5,10)
       5.59% R_Eq 0
       5.43% R_Eq 2
       5.27% R_Btwn (4,5)
       5.12% R_Eq 1
       5.08% R_Eq 3
       4.99% R_Eq 4
       4.90% R_Gt 20

All 1 tests passed (18.52s)
```

I also verified that the exception bubbles up to the node. Ran the latest node with this PR, set my clock back an hour, and got

```
cardano-node: ExceptionInLinkedThread "ThreadId 20" (SystemClockMovedBack 2020-01-31 11:15:31.00184141 UTC (SlotNo {unSlotNo = 3713312}) (SlotNo {unSlotNo = 3713132}))
```

There is no need to define a custom error policy for this due to #1553 , nor a custom exit failure due to #1551 (comment) .

Co-authored-by: Edsko de Vries <edsko@well-typed.com>
@mrBliss mrBliss added the byron Required for a Byron mainnet: replace the old core nodes with cardano-node label Feb 11, 2020
iohk-bors bot added a commit that referenced this issue Feb 12, 2020
1623: Validate the ChainDB when the clean shutdown marker is missing r=mrBliss a=mrBliss

Fixes #1551.

Co-authored-by: Thomas Winant <thomas@well-typed.com>
iohk-bors bot added a commit that referenced this issue Feb 12, 2020
1623: Validate the ChainDB when the clean shutdown marker is missing r=mrBliss a=mrBliss

Fixes #1551.

Co-authored-by: Thomas Winant <thomas@well-typed.com>
iohk-bors bot added a commit that referenced this issue Feb 12, 2020
1623: Validate the ChainDB when the clean shutdown marker is missing r=mrBliss a=mrBliss

Fixes #1551.

Co-authored-by: Thomas Winant <thomas@well-typed.com>
iohk-bors bot added a commit that referenced this issue Feb 12, 2020
1623: Validate the ChainDB when the clean shutdown marker is missing r=edsko a=mrBliss

Fixes #1551.

Co-authored-by: Thomas Winant <thomas@well-typed.com>
@iohk-bors iohk-bors bot closed this as completed in d2cb974 Feb 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
byron Required for a Byron mainnet: replace the old core nodes with cardano-node chain db consensus issues related to ouroboros-consensus immutable db
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants