-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea: could we not block chunk apply pipeline on committing the chunk application results to storage? #11118
Comments
Q: is this compatible with memtrie? |
@nagisa when you say "commit changes", what exactly do you refer to? There are two levels of commitment here. Conceptually, after each chunk is applied, a new state root is computed as a result of the chunk application and the chunk producer "commit" to the new state root. This is needed for stateless validation so that validators can validate chunk with state witness against the new state root. However, there is a commit to the database on the implementation level that indeed does not need to be blocking. If I understand correctly, you referred to |
The functions called
Yeah, I believe we compute a new state root as a prerequisite step before a transaction is committed to the store (trie node hashes need to be known to write them out...) It wouldn't be particularly terrible to keep the computation of the new state root hash in the main pipeline, but that raises another thought -- can we move to an overall more asynchronous design? As in we currently have N steps that we execute in sequence, but for example we could have conceptual That sort of restructuring would allow stateless validation (and other components too) wait on a specific checkpoint of the computation without entrenching itself with the implementation details of the transaction runtime. And in enabling such separation of concerns, we'd have fewer constraints in how we implement the transaction runtime. Footnotes
|
cc #11808 |
On testnet over a 1hr30min run, I could see a single My guess is that the high values we could see here previously were due to lock contention with reads, that disappeared with recent updates. However, this is AFAICT on a node with memtrie disabled:
This last point is a bit weird to me, as memtrie would be the thing I'd have expected to reduce the read load. @nagisa Do you remember where/how you found the 200ms+ commit you mentioned in the first post? On my end I'll likely try to reproduce the experiment with mainnet tomorrow, now I have a simple process to get that number out, and consider the next steps then. |
It was a routine on mainnet specifically on shards 2 and 3 during higher load (i.e. When chunks were stuffed full with receipts)
The commit time was directly proportional to the amount of changes to storage.
----------------------------------------
6 Aug 2024 23:18:41 Ekleog-NEAR ***@***.***>:
…
On testnet over a 1hr30min run, I could see a single *ChainStoreUpdate::commit* span that would take more than 20ms.
My guess is that the high values we could see here previously were due to lock contention with reads, that disappeared with recent updates.
However, this is AFAICT on a node with memtrie disabled:
* "load_mem_tries_for_shards": [],
"load_mem_tries_for_tracked_shards": false,
*
This last point is a bit weird to me, as memtrie would be /the/ thing I'd have expected to reduce the read load.
@nagisa[https://github.com/nagisa] Do you remember where/how you found the 200ms+ commit you mentioned in the first post?
On my end I'll likely try to reproduce the experiment with mainnet tomorrow, now I have a simple process to get that number out, and consider the next steps then.
—
Reply to this email directly, view it on GitHub[#11118 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AAFFZUVZFB5SXXQ5YZZGK3DZQEVR5AVCNFSM6AAAAABGOYSCCGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZSGA3TMMJYGE].
You are receiving this because you were mentioned.
[Tracking image][https://github.com/notifications/beacon/AAFFZUQI2J5YAYZLNW43XR3ZQEVR5A5CNFSM6AAAAABGOYSCCGWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUHNUQZK.gif]
|
The results of the 1.5hrs experiment on mainnet with memtrie disabled are roughly:
Soon to come, the same experiment with memtrie enabled, but I'll have to spin up a VM with lots of memory first: n2d-standard-8 is not enough to load memtrie |
After ~4.5hrs of mainnet with memtrie enabled (it took longer to get noticeably high results, for some reason rerunning the grafana query got no new result for a while but after 4hrs older results appeared 🤷), I got the following results:
So it seems like this project is even more important now that memtrie landed, than before. Probably because there was no lock contention between reads and writes, and it's just that writes take a long time. So now that reads are faster, the write time is more noticeable. Disclaimer: this is based on the slowest-reported traces of the two functions, it's not necessarily going to be 25% overall perf improvement. But the p99 should improve by ~25%, which could allow us to lower gas costs hopefully. With this in mind, I'll now focus on actually putting in a background thread:
|
Today the transaction runtime is roughly a pipeline that executes its tasks in a certain sequence. Among other things when applying a chunk it will first execute the transactions and receipts and then commit all the state changes to the storage before moving onto the next chunk.
In practice however, we don't necessarily need to block processing and handling of the next block on the state commits for the previous block. Technically we could keep a double-buffering-like scheme for
TrieUpdate
s (TrieUpdate
s accumulate the state changes in a transaction), where the storage accesses for the current chunk goes though aTrieUpdate (current_block) => TrieUpdate (previous_block) => Store
rather thanTrieUpdate (current_block) => Store
as it does currently.That gives us an opportunity to commit
TrieUpdate (previous_block)
into theStore
in a background thread while the transactions for thecurrent_block
are being executed. The storage reads still go through aTrieUpdate (previous_block)
, ensuring a consistent view of the storage. Every time a block processing begins, we rebuild the storage stack to add a newTrieUpdate (next_block)
on top ofTrieUpdate (current_block)
and remove the now-offthread-committedTrieUpdate (previous_block)
(we'd have to block here if the commit is not over yet.)Doing this may make individual reads and writes slightly more expensive, but it will remove a sometimes 200+ms commit from the critical execution path.
The text was updated successfully, but these errors were encountered: