Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server: broken journal after hard reset #333

Closed
BitHeaven-Official opened this issue Mar 21, 2024 · 4 comments · Fixed by #336
Closed

Server: broken journal after hard reset #333

BitHeaven-Official opened this issue Mar 21, 2024 · 4 comments · Fixed by #336
Assignees
Labels
C-bug Something isn't working C-reliability This issue/PR relates to reliability C-storage Relating to storage D-server Related to the server
Milestone

Comments

@BitHeaven-Official
Copy link

Hard reset broke journal
When the server suddenly shuts down, the skytable journal breaks.

Steps to reproduce
Steps to reproduce the behavior:

  1. Run skyd
  2. Disconnect the server from the power supply or just press the reset button
  3. Turn on the server
  4. Journal is broken and skyd no longer runs

Expected behavior
I expected it not to give an error, but just to restart

Meta

  • Release tag: v0.8.0
  • Branch: v0.8.0
  • Commit ID: 41e091cd0f6861cbaca2c6d73e023f698ec3f1a8
  • Operating system: openSUSE Tumbleweed aarch64

Additional context
Work state:

Mar 21 23:44:23 secretbase skyd[5617]: ██      ██  ██   ██  ██     ██    ██   ██ ██   ██ ██      ██
Mar 21 23:44:23 secretbase skyd[5617]: ███████ █████     ████      ██    ███████ ██████  ██      █████
Mar 21 23:44:23 secretbase skyd[5617]:      ██ ██  ██     ██       ██    ██   ██ ██   ██ ██      ██
Mar 21 23:44:23 secretbase skyd[5617]: ███████ ██   ██    ██       ██    ██   ██ ██████  ███████ ███████
Mar 21 23:44:23 secretbase skyd[5617]: Skytable v0.8.0 | https://github.com/skytable/skytable
Mar 21 23:44:23 secretbase skyd[5617]: [2024-03-21T15:44:23Z WARN  skyd::engine] running in dev mode
Mar 21 23:44:23 secretbase skyd[5617]: [2024-03-21T15:44:23Z INFO  skyd::engine] starting storage engine
Mar 21 23:44:23 secretbase skyd[5617]: [2024-03-21T15:44:23Z INFO  skyd::engine::storage] initializing databases
Mar 21 23:44:23 secretbase skyd[5617]: [2024-03-21T15:44:23Z INFO  skyd::engine] storage engine ready. initializing system
Mar 21 23:44:23 secretbase skyd[5617]: [2024-03-21T15:44:23Z INFO  skyd::engine] listening on tcp@0.0.0.0:2003

After hard reset:

███████ ██   ██ ██    ██ ████████  █████  ██████  ██      ███████
██      ██  ██   ██  ██     ██    ██   ██ ██   ██ ██      ██
███████ █████     ████      ██    ███████ ██████  ██      █████
     ██ ██  ██     ██       ██    ██   ██ ██   ██ ██      ██
███████ ██   ██    ██       ██    ██   ██ ██████  ███████ ███████

Skytable v0.8.0 | https://github.com/skytable/skytable

[2024-03-21T15:55:40Z WARN  skyd::engine] running in dev mode
[2024-03-21T15:55:40Z INFO  skyd::engine] starting storage engine
[2024-03-21T15:55:40Z WARN  skyd::engine::storage] older storage format detected
[2024-03-21T15:55:40Z INFO  skyd::engine::storage] loading data
[2024-03-21T15:55:40Z ERROR skyd] storage error error: loading storage-v1 in compatibility mode; storage error: journal-corrupted
@BitHeaven-Official BitHeaven-Official added C-bug Something isn't working D-server Related to the server labels Mar 21, 2024
@BitHeaven-Official BitHeaven-Official changed the title Server: Server: broken journal after hard reset Mar 21, 2024
@BitHeaven-Official
Copy link
Author

Same in prod mode

@ohsayan
Copy link
Member

ohsayan commented Mar 21, 2024

There is no obvious solution to this error. The only thing to do is to allow explicit repair (which IMO is something that should definitely be added) instead of the system that we currently have in place. Also, auto recovery based on severity should be provided (i.e configurable) on the user end.

@ohsayan ohsayan added this to the 0.8.1 milestone Mar 21, 2024
@ohsayan ohsayan added C-storage Relating to storage C-reliability This issue/PR relates to reliability labels Mar 21, 2024
@ohsayan ohsayan self-assigned this Mar 24, 2024
@ohsayan
Copy link
Member

ohsayan commented Mar 26, 2024

I'm done working on the recovery system. I'll add a few more tests and then we should be good to go.

@ohsayan
Copy link
Member

ohsayan commented Mar 30, 2024

PR is up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Something isn't working C-reliability This issue/PR relates to reliability C-storage Relating to storage D-server Related to the server
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants