Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid persisting redundant tx sets #3501

Merged
merged 3 commits into from
Aug 19, 2022

Conversation

marta-lokhova
Copy link
Contributor

Resolves #3499

@marta-lokhova
Copy link
Contributor Author

note: I realized some functionality is currently missing from the PR, so please don't review it yet

@marta-lokhova marta-lokhova force-pushed the persistent_state branch 5 times, most recently from 0a06b7c to a8fc982 Compare August 11, 2022 19:38
@marta-lokhova
Copy link
Contributor Author

This is up-to-date now. Note that protocol-next doesn't build at the moment because we need to update XDR upsteam. I'll do it after a round of review though, just in case XDR needs to change.

@@ -20,18 +21,52 @@ using namespace std;
std::string PersistentState::mapping[kLastEntry] = {
"lastclosedledger", "historyarchivestate", "lastscpdata",
"databaseschema", "networkpassphrase", "ledgerupgrades",
"rebuildledger", "lastscpdataxdr"};
"rebuildledger", "lastscpdataxdr", "txset"};

std::string PersistentState::kSQLCreateStatement =
"CREATE TABLE IF NOT EXISTS storestate ("
"statename CHARACTER(32) PRIMARY KEY,"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, it looks like sqlite (which is what we use for tests) doesn't actually enforce this limit, but it's now too small because we reference tx sets by hash. I got a failure on Postgres though, so am currently fixing this.

UPD: this is fixed now (added extra tests for Postgres as well)

@@ -57,6 +58,9 @@ class TxSetFrame : public NonMovableOrCopyable
makeFromWire(Hash const& networkID,
GeneralizedTransactionSet const& xdrTxSet);

static TxSetFrameConstPtr
getTxSetFramePtr(StoredTransactionSet const& storedSet, Application& app);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a factory method, isn't it? I would rather call it makeFromStoredTxSet for consistency with other factory methods. Also, could you please add a short doc comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, done.

{
auto txSetPtr = TxSetFrame::getTxSetFramePtr(txSet, mApp);
txSets.emplace(
std::pair(txSetPtr->getContentsHash(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there is no need to use the pair constructor when you're using emplace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, just a habit of writing things down explicitly. Updated (in the other place too)

@@ -308,6 +308,9 @@ ApplicationImpl::initialize(bool createNewDB, bool forceRebuild)
upgradeToCurrentSchemaAndMaybeRebuildLedger(true, forceRebuild);
}

// Subtle: restore persistent state after a potential DB schema upgrade
mPersistentState->restoreState();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double-check: this should be safe for restarts without upgrades, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, when a node upgrades its version to include this change, it'll do a schema upgrade, which will replace all of the old-style rows with new ones. If there is no upgrade, it means the schema is already up-to-date, which is what restoreState expects (as it queries PersistentStateV1)

}
txSetsToPersist.emplace(std::pair(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need to use pair constructor with emplace

continue;
}
auto msgs = herder.getSCP().getLatestMessagesSend(i);
if (expectedSCPState.has_value())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if (expectedSCPState) is sufficient, though I suppose you could use the explicit check for expressivity, so up to you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kept it as-is if you don't mind: has_value is used pretty commonly in the codebase, and I do like the explicitness of it

@marta-lokhova marta-lokhova added this to In progress in v19.4.0 via automation Aug 12, 2022
@marta-lokhova marta-lokhova force-pushed the persistent_state branch 2 times, most recently from c29ea2d to 862b15c Compare August 15, 2022 18:20
// Sqlite does not enforce size limits
if (!isSqlite())
{
PersistentState::upgradeSizeLimit(mApp.getDatabase());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you make the schema upgrade on sqlite instead of skipping? That way in the future (when we squash sql upgrades) we can use the same code to create tables (and the schema is the same everywhere)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALTER COLUMN doesn't exist in sqlite. Instead, you have to create a new table and copy its contents from the old table. It doesn't really make sense to do this, given that this upgrade is a no-op on sqlite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see, so a sad sqlite limitation.
So maybe update the comment to something a little more specific: "sqlite does not enforce size constraints and does not support updating columns".

Also, you should just move this to upgradeSCPDataV1Format: there is no reason to split the schema upgrade into 2 parts.

src/herder/HerderImpl.cpp Outdated Show resolved Hide resolved
xdr::xdr_from_opaque(buffer, scpState);
for (auto const& e : scpState.v1().scpEnvelopes)
{
for (auto const& hash : getTxSetHashes(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably not the right place for this code: it's supposed to match logic that lives right now in HerderImpl. That way we let PersistentState be a "dumb" key/value store.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to Herder.

}

void
PersistentState::startTxSetGCTimer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like having this GC timer setup independently of what Herder is doing (at a minimum this should be managed by Herder).
Do we really need to GC using a timer?
It can be done on startup and every X times we store SCP state or maybe it should be based on the number of TxSets that we have in the database (we can estimate that from herder, doest not need to be exact)?

Maybe we should just reuse the code in HerderPersistence for all this (there is quite a bit of overlap, but I understand it may not be worth it)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of the suggested approaches, because they both make it harder to reason about what's happening:

  • we don't know how many times SCP persistence is called per ledger, so we can't really reason about how often tx sets are cleaned up (same thing with the number of tx sets).
  • It makes testing harder as now we have to into account cases where purging triggers at different times depending on consensus.
  • I went with the timer, as it seems to be the simplest approach that avoids anything complicated where we have to map Herder's state to the database and try to estimate a good time to do the cleanup.

I suggest we keep the timer approach. I do like the idea of moving it to Herder, this way we don't need extra start/shutdown logic in PersistentState, and it'll be easy to refactor the purging function based on your other suggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually you're right, I was thinking that we could get arbitrary number of values from other validators, so we should try to GC more often in some situations, but this is only tracking what the current validator is doing.

}

void
HerderImpl::restoreSCPState()
{
ZoneScoped;

// First, load all known tx sets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code and GC may be able to share code (and we want to GC before loading all those txsets)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same thought, but ultimately decided to keep them separate: the functions are sufficiently different, so refactoring didn't actually simplify much.

src/main/PersistentState.cpp Show resolved Hide resolved
@MonsieurNicolas
Copy link
Contributor

r+ f13ddfe

@latobarita latobarita merged commit 7c6de8b into stellar:master Aug 19, 2022
v19.4.0 automation moved this from In progress to Done Aug 19, 2022
@marta-lokhova marta-lokhova deleted the persistent_state branch August 19, 2022 17:16
Copy link

@lokera666 lokera666 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

SCP state persistence is inefficient
5 participants