Skip to content

Conversation

@Dodecahedr0x
Copy link
Contributor

@Dodecahedr0x Dodecahedr0x commented Jul 16, 2025

Closes #418

This PR:

  • Adds a config parameter to skip ledger replay when starting a validator, discarding the old ledger but preserving accountsDB and the validator keypair.
  • Adds a test that checks that the state is still there but not the ledger

Snippets

TOML:

[ledger]
resume-strategy = "replay-and-resume"
path = "ledger.example.com"
size = 1000000000

CLI:

cargo run -- --ledger-resume-strategy replay-and-resume
LEDGER_RESUME_STRATEGY=replay-and-resume cargo run

Migration

Old config:

[ledger]
reset = true

Becomes:

[ledger]
resume-strategy = "reset"

Old config:

[ledger]
reset = false

Becomes:

[ledger]
resume-strategy = "replay"

@Dodecahedr0x Dodecahedr0x marked this pull request as ready for review July 18, 2025 12:53
Copy link
Contributor

@thlorenz thlorenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the requests I had add some test for toml parsing for the new config (whichever way we decide to implement it).

Let's have a chat with @GabrielePicco about introducing that enum, as that makes sure users don't try to provide an invalid config and have different results than expected, i.e some might try to replay the ledger and then reset it for whatever reason, obviously that is not valid.
With an enum it is obvious and only valid configs are allowed.

Also please always include a usage example, i.e. a toml snippet, ENV var setting + arg override. That makes it much easier to see what config was added and how.

pub type StWLock = Arc<RwLock<()>>;

pub const ACCOUNTSDB_DIR: &str = "accountsdb";
const ACCOUNTSDB_SUB_DIR: &str = "accountsdb/main";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the above const you introduced here so to concat ACCOUNTSDB_SUB_DIR we don't change accountsdb substring in one place and forget it in the tother.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f723e75 but it uses a method instead of a const to avoid creating a macro or doing more complex &str building

Slot::default()
};

if reset || skip_replay {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we skip_replay we remove the ledger here? How are we gonna be able to restart from that ledger now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only resuming just uses AccountsDB

&identity_keypair,
config.validator_config.ledger.reset,
config.validator_config.ledger.reset
|| config.validator_config.ledger.skip_replay,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we skip replay then the validator keypair can be different?
This might be intended, but it seems odd to me that we allow a different validator to continue with a ledger that was created by another validator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unintended, addressed in 1865b70. Resetting the validator keypair is part of this PR #458

ledger.ledger_path().as_path(),
config.validator_config.ledger.reset,
config.validator_config.ledger.reset
|| config.validator_config.ledger.skip_replay,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting repetetive.
Please add a method to LedgerConfig, similar to replayed such that here you could just do

!config.validator_config.ledger.replayed()

Also consider adding this as a top level method, so not the entire codebase has to know how things are configured, i.e. it might be better to just be able todo:

if !config.ledger_replayed() { .. }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 1865b70

let ledger = ledger::init(ledger_path, reset)?;
let (ledger, last_slot) = ledger::init(
ledger_path,
ledger_config.reset,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting hairy, two booleans right next to each other. Easy to swap them by accident without noticing.
Do we need both or could we derive one from them?

Looking at these two bools all over the place it might be better to introduce an enum instead that is derived from the flags in the config, i.e.:

pub enum LedgerResumeStrategy {
    // implicitly replay is false here
    Reset,
    // do not reset and resume without replay
    ResumeOnly,
    // do not reset, replay and then resume
    ReplayAndResume
}

You can pass this down and properly match to cover all cases everywhere. Also you could add convenience methods to the config to just check for a specific case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 1865b70

fn read(ledger_path: &Path, keypairs: &[Keypair]) -> Child {
let (_, mut validator, ctx) =
setup_offline_validator(ledger_path, None, Some(SLOT_MS), false);
setup_offline_validator(ledger_path, None, Some(SLOT_MS), false, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto what's what here?

Some(programs),
Some(SLOT_MS),
true,
false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto what's what here?

Some(programs),
Some(SLOT_MS),
false,
false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto what's what here?

This goes for all tests that follow as well .... (not putting ditto everywhere :) )

slot: u64,
) -> Child {
let (_, mut validator, ctx) =
setup_offline_validator(ledger_path, None, None, false, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, no clue what setting you're using here. reset? replay?

.arg("--")
.arg(config_path)
.env("RUST_LOG_STYLE", log_suffix)
.env("RUST_LOG", "info")
Copy link
Contributor

@thlorenz thlorenz Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't override this env var here.
CI is logging on trace and devs wanna log on warn maybe.
Setting this in the code ignores the actual value of that env var.

Just run tests with that env var set i.e. RUST_LOG=warn make test.

Look at the makefile how it is defaulted, but if you set it from the command line that is the proper way to override it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's a mistake, addressed in 1865b70

@Dodecahedr0x Dodecahedr0x added the P2 Medium priority label Jul 21, 2025
@Dodecahedr0x Dodecahedr0x requested a review from thlorenz July 21, 2025 16:35
Copy link
Contributor

@thlorenz thlorenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside from the naming Nits.

Please improve them before merging.

pub type StWLock = Arc<RwLock<()>>;

pub const ACCOUNTSDB_DIR: &str = "accountsdb";
const ACCOUNTSDB_SUB_DIR: &str = "accountsdb/main";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using https://crates.io/crates/const_format since otherwise the method call is doing work every time.
With const_fmt you'd be able to assign to a const and use as you were before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in ad9bdc8

) -> ApiResult<(Ledger, Slot)> {
// Save the last slot from the previous ledger to restart from it
let last_slot = if skip_replay || !reset {
let last_slot = if resume_strategy.resume() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I wasn't very good providing the naming for these things, just wanted to provide quick examples, but anything returning a bool should be something with is|has|does, etc.

resume looks like an action (in this case I'd think it resumes something and returns true if successful)

is_resuming is much better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 540667b

// -----------------
fn maybe_process_ledger(&self) -> ApiResult<()> {
if self.config.ledger.reset || self.config.ledger.skip_replay {
if !self.config.ledger.resume_strategy.replay() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_replaying

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 540667b


[ledger]
reset = false
resume-strategy = "replay-and-resume"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I wasn't very good with suggestions for the names. replay kinda implies resume right?

So I'd think the enum values should be: reset | replay | resume-only.

resume-only could also be just resume - up to you. The only part just makes it double clear that we won't replay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 540667b

@Dodecahedr0x Dodecahedr0x removed the P2 Medium priority label Jul 24, 2025
@Dodecahedr0x Dodecahedr0x merged commit 1e5d6a4 into dev Aug 1, 2025
4 of 6 checks passed
@Dodecahedr0x Dodecahedr0x deleted the feat/skip-replay branch August 1, 2025 16:05
Dodecahedr0x added a commit that referenced this pull request Aug 8, 2025
Solves #418

This PR:
- Adds a config parameter to skip ledger replay when starting a
validator, discarding the old ledger but preserving accountsDB and the
validator keypair.
- Adds a test that checks that the state is still there but not the
ledger

TOML:

```toml
[ledger]
resume-strategy = "replay-and-resume"
path = "ledger.example.com"
size = 1000000000
```

CLI:

```bash
cargo run -- --ledger-resume-strategy replay-and-resume
LEDGER_RESUME_STRATEGY=replay-and-resume cargo run
```

Old config:

```toml
[ledger]
reset = true
```

Becomes:

```toml
[ledger]
resume-strategy = "reset"
```

---

Old config:

```toml
[ledger]
reset = false
```

Becomes:

```toml
[ledger]
resume-strategy = "replay"
```
Dodecahedr0x added a commit that referenced this pull request Aug 8, 2025
Solves #418

This PR:
- Adds a config parameter to skip ledger replay when starting a
validator, discarding the old ledger but preserving accountsDB and the
validator keypair.
- Adds a test that checks that the state is still there but not the
ledger

TOML:

```toml
[ledger]
resume-strategy = "replay-and-resume"
path = "ledger.example.com"
size = 1000000000
```

CLI:

```bash
cargo run -- --ledger-resume-strategy replay-and-resume
LEDGER_RESUME_STRATEGY=replay-and-resume cargo run
```

Old config:

```toml
[ledger]
reset = true
```

Becomes:

```toml
[ledger]
resume-strategy = "reset"
```

---

Old config:

```toml
[ledger]
reset = false
```

Becomes:

```toml
[ledger]
resume-strategy = "replay"
```
Dodecahedr0x added a commit that referenced this pull request Aug 8, 2025
Solves #418

This PR:
- Adds a config parameter to skip ledger replay when starting a
validator, discarding the old ledger but preserving accountsDB and the
validator keypair.
- Adds a test that checks that the state is still there but not the
ledger

TOML:

```toml
[ledger]
resume-strategy = "replay-and-resume"
path = "ledger.example.com"
size = 1000000000
```

CLI:

```bash
cargo run -- --ledger-resume-strategy replay-and-resume
LEDGER_RESUME_STRATEGY=replay-and-resume cargo run
```

Old config:

```toml
[ledger]
reset = true
```

Becomes:

```toml
[ledger]
resume-strategy = "reset"
```

---

Old config:

```toml
[ledger]
reset = false
```

Becomes:

```toml
[ledger]
resume-strategy = "replay"
```
Dodecahedr0x added a commit that referenced this pull request Aug 8, 2025
Solves #418

This PR:
- Adds a config parameter to skip ledger replay when starting a
validator, discarding the old ledger but preserving accountsDB and the
validator keypair.
- Adds a test that checks that the state is still there but not the
ledger

TOML:

```toml
[ledger]
resume-strategy = "replay-and-resume"
path = "ledger.example.com"
size = 1000000000
```

CLI:

```bash
cargo run -- --ledger-resume-strategy replay-and-resume
LEDGER_RESUME_STRATEGY=replay-and-resume cargo run
```

Old config:

```toml
[ledger]
reset = true
```

Becomes:

```toml
[ledger]
resume-strategy = "reset"
```

---

Old config:

```toml
[ledger]
reset = false
```

Becomes:

```toml
[ledger]
resume-strategy = "replay"
```
Dodecahedr0x added a commit that referenced this pull request Aug 8, 2025
Solves #418

This PR:
- Adds a config parameter to skip ledger replay when starting a
validator, discarding the old ledger but preserving accountsDB and the
validator keypair.
- Adds a test that checks that the state is still there but not the
ledger

TOML:

```toml
[ledger]
resume-strategy = "replay-and-resume"
path = "ledger.example.com"
size = 1000000000
```

CLI:

```bash
cargo run -- --ledger-resume-strategy replay-and-resume
LEDGER_RESUME_STRATEGY=replay-and-resume cargo run
```

Old config:

```toml
[ledger]
reset = true
```

Becomes:

```toml
[ledger]
resume-strategy = "reset"
```

---

Old config:

```toml
[ledger]
reset = false
```

Becomes:

```toml
[ledger]
resume-strategy = "replay"
```
Dodecahedr0x added a commit that referenced this pull request Aug 8, 2025
Solves #418

This PR:
- Adds a config parameter to skip ledger replay when starting a
validator, discarding the old ledger but preserving accountsDB and the
validator keypair.
- Adds a test that checks that the state is still there but not the
ledger

TOML:

```toml
[ledger]
resume-strategy = "replay-and-resume"
path = "ledger.example.com"
size = 1000000000
```

CLI:

```bash
cargo run -- --ledger-resume-strategy replay-and-resume
LEDGER_RESUME_STRATEGY=replay-and-resume cargo run
```

Old config:

```toml
[ledger]
reset = true
```

Becomes:

```toml
[ledger]
resume-strategy = "reset"
```

---

Old config:

```toml
[ledger]
reset = false
```

Becomes:

```toml
[ledger]
resume-strategy = "replay"
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add option to skip ledger replay

5 participants