feat: auto conflict resolution for upsert#3785
feat: auto conflict resolution for upsert#3785wjones127 wants to merge 14 commits intolance-format:mainfrom
Conversation
rust/lance/src/io/commit.rs
Outdated
| } | ||
|
|
||
| // Return true | ||
| fn check_transaction( |
There was a problem hiding this comment.
maybe instead of having a check_transaction, now this makes more sense to be a rebase_transaction? So that instead of simply checking if 2 transactions have conflict, a transaction can be updated based on the information in the other_transaction to remove conflict.
6f93635 to
f6e2433
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3785 +/- ##
==========================================
- Coverage 78.62% 78.60% -0.02%
==========================================
Files 274 275 +1
Lines 104635 105772 +1137
Branches 104635 105772 +1137
==========================================
+ Hits 82272 83146 +874
- Misses 19114 19360 +246
- Partials 3249 3266 +17
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| // To benchmark scaling curve: measure how long to run | ||
| // | ||
| // And vary `concurrency` to see how it scales. Compare this again `main`. |
There was a problem hiding this comment.
The main benchmark I'd recommend running is just running this unit tests with the simulated object store latency. Run this for varying concurrency values in 2, 4, 8, 16, 32 on this branch and main. Goal is this should be as fast or faster than main. (Right now, it is slower.)
| let mut transaction = transaction.clone(); | ||
|
|
||
| let num_attempts = std::cmp::max(commit_config.num_retries, 1); | ||
| // TODO: use SlotBackoff here instead and size unit off of attempt time. |
There was a problem hiding this comment.
This is likely going to be more important with the rebases in here now. Look at the retries in merge_insert.rs for how we use SlotBackoff.
|
|
||
| // Assert io requetss | ||
| let io_stats = io_tracker.incremental_stats(); | ||
| assert_eq!(io_stats.read_iops, 0); |
There was a problem hiding this comment.
To accomplish this, we'll need to move the call to checkout_latest() to the end of the retry loop. This makes the first commit attempt blind, but I think that's desirable as it makes the case of sequential writes faster, without making concurrent writes much slower.
There was a problem hiding this comment.
I tried that and it worked, but I ended up reverting this change, because by doing that we will let the commit succeed in conflict resolution and eventually write transaction file and manifest file and then realize the version is conflict. That means it makes the retry path slower with 2 more writes. I feel it is not worth the effort, but I might have missed some good way to achieve both.
| // If there is a conflict with two transaction, the retry should require io requests: | ||
| // * 1 list version | ||
| // * num_other_txns read manifests (cache-able) | ||
| // * num_other_txns read txn files (cache-able) | ||
| // * 1 write txn file | ||
| // * 1 write manifest | ||
| // For total of 3 + 2 * num_other_txns io requests. If we have caching enabled, we can skip 2 * num_other_txns | ||
| // of those. We should be able to read in 5 hops. |
There was a problem hiding this comment.
This might seem like a bit of a tall order, but I think it's possible.
For the uncached case, we'd need to optimize this part:
https://github.com/lancedb/lance/blob/c39b7e7a271eb81078e0a404361a54082289b94e/rust/lance/src/io/commit.rs#L96-L108
Ideally we can re-use the manifest size found in the list versions, and then read the manifest in 1 request.
There was a problem hiding this comment.
I was able to achieve the goal at the cached case, but for uncached, looks like we will repeatedly make some calls for listng and reading, I have not fixed that yet, will create an issue to track it.
Continuation of #3785 with fixes for performance issues. Makes merge_insert and update transactions do row-level conflict resolution checks, to see if we can quickly resolve conflicts by rewriting deletion files. Core changes: - merge_insert and update transactions now produce an affected_rows output, with a map of the affected row addresses. - Create a TransactionRebase struct to handle conflict resolution. This uses the affected_rows output to attempt to rewrite deletion files. Auxiliary changes: - Changed all code paths reading deletion files to go through read_deletion_file_cached(). - Closes #3772 --------- Co-authored-by: Will Jones <willjones127@gmail.com>
Makes
merge_insertandupdatetransactions do row-level conflict resolution checks, to see if we can quickly resolve conflicts by rewriting deletion files.Core changes:
merge_insertandupdatetransactions now produce anaffected_rowsoutput, with a map of the affected row addresses.TransactionRebasestruct to handle conflict resolution. This uses theaffected_rowsoutput to attempt to rewrite deletion files.Auxiliary changes:
read_deletion_file_cached().Closes #3772