-
Notifications
You must be signed in to change notification settings - Fork 242
Upsert optimization #325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upsert optimization #325
Conversation
Codecov Report
@@ Coverage Diff @@
## main #325 +/- ##
===========================================
+ Coverage 99.98% 100.00% +0.01%
===========================================
Files 232 232
Lines 12574 12631 +57
===========================================
+ Hits 12572 12631 +59
+ Misses 1 0 -1
+ Partials 1 0 -1
Continue to review full report at Codecov.
|
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
…it on update Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
e20060c to
81429dd
Compare
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
|
Here are some comparison results for context. All benchmarks were ran on my MBP with a CLI generated 2 node firefly environment, using postgres as the DB: Before #325, we were consistently seeing > 1 minute latency between the last sent message and 0 pending transactions. With #325: Batch processing time was consistently reduced to <= 1 second, down from ~4. 🚀 🚀 |
| // This is a performance critical function, as we stream data into the database for every message, in every batch. | ||
| // | ||
| // First attempt the operation based on the optimization passed in. | ||
| // The expectation is that this will practically hit of the time, as only recovery paths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this sentence feels like it's missing a word (same with the other places it was copied)
| func (s *SQLCommon) updateMessageDataRefs(ctx context.Context, tx *txWrapper, message *fftypes.Message, recreateDatarefs bool) error { | ||
|
|
||
| if existing { | ||
| if recreateDatarefs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment a few lines down indicates we "Run through the ones in the message, finding ones that already exist". I don't think that's accurate anymore (we always insert all the refs if we get here, and might delete them all beforehand... but it's never a partial update).
awrichar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good. Couldn't see any issues with the functionality. Left a few small notes on wording.
Also note that db migrations 44-45 were claimed by #317, so this will need to be updated to use 46.
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
One of the earliest findings in #316 @shorsher is that the speed at which messages can be injected into FireFly using a parallel loading test runner, is significantly faster than the speed at which batches can be processed.
This is not surprising due to one process currently being parallel, and one being sequential. That's something we can look at parallelizing on the back-end later.
However, the most immediately actionable finding from log analysis was just how many DB operations we were doing.
On a simple Laptop test, it was 3-5ms per PostgreSQL request (in a single TX), so that adds up fast!
We found with the first example analyzed:
UpsertmeaningSELECTmessage,UPDATEmessage,DELETEdatarefs,INSERTdatarefsSELECTdata,UPDATEdataSo the first improvement in the hardening phase, is to reduce these calls, by optimizing for the case we know we're in.
These optimizations are implemented in SQL, but in a way a similar optimization should be possible in NoSQL DBs once implemented.
nodehint to batches, to know which node sent themmessageanddatasignatures on the DB layer to take in anUpsertOptimizationoptionUpsertOptimizationSkip- no optimization - same as todayUpsertOptimizationNew- optimize forINSERTUpsertOptimizationExisting- optimize forUPDATEallowExisting- logic should always allow for this... givenallowHashUpdate- now always false - the hash ofmessageanddatacannot change on any code pathUPDATEhashas a key when updating, and check a row is updatedrowsUpdated(well adopted we believe)messagethedatarefsare fixed, so on the optimized path we don't touch them on updateINSERT