Skip to content

feat: auto conflict resolution for upsert#3785

Closed
wjones127 wants to merge 14 commits intolance-format:mainfrom
wjones127:feat/upsert-auto-resolution
Closed

feat: auto conflict resolution for upsert#3785
wjones127 wants to merge 14 commits intolance-format:mainfrom
wjones127:feat/upsert-auto-resolution

Conversation

@wjones127
Copy link
Copy Markdown
Contributor

@wjones127 wjones127 commented May 7, 2025

Makes merge_insert and update transactions do row-level conflict resolution checks, to see if we can quickly resolve conflicts by rewriting deletion files.

Core changes:

  • merge_insert and update transactions now produce an affected_rows output, with a map of the affected row addresses.
  • Create a TransactionRebase struct to handle conflict resolution. This uses the affected_rows output to attempt to rewrite deletion files.

Auxiliary changes:

  • Changed all code paths reading deletion files to go through read_deletion_file_cached().

Closes #3772

@github-actions github-actions bot added the enhancement New feature or request label May 7, 2025
}

// Return true
fn check_transaction(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe instead of having a check_transaction, now this makes more sense to be a rebase_transaction? So that instead of simply checking if 2 transactions have conflict, a transaction can be updated based on the information in the other_transaction to remove conflict.

@wjones127 wjones127 force-pushed the feat/upsert-auto-resolution branch from 6f93635 to f6e2433 Compare May 19, 2025 18:17
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

Attention: Patch coverage is 84.29036% with 145 lines in your changes missing coverage. Please review.

Project coverage is 78.60%. Comparing base (5cf5628) to head (d5d8f94).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance/src/dataset/transaction.rs 23.80% 96 Missing ⚠️
rust/lance/src/io/commit/conflict_resolver.rs 93.92% 26 Missing and 6 partials ⚠️
rust/lance/src/dataset/fragment.rs 83.33% 0 Missing and 5 partials ⚠️
rust/lance/src/io/commit.rs 82.14% 0 Missing and 5 partials ⚠️
rust/lance-core/src/utils/mask.rs 66.66% 2 Missing ⚠️
rust/lance/src/dataset/optimize.rs 81.81% 0 Missing and 2 partials ⚠️
rust/lance/src/dataset/write/merge_insert.rs 94.59% 0 Missing and 2 partials ⚠️
rust/lance-core/src/utils/deletion.rs 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3785      +/-   ##
==========================================
- Coverage   78.62%   78.60%   -0.02%     
==========================================
  Files         274      275       +1     
  Lines      104635   105772    +1137     
  Branches   104635   105772    +1137     
==========================================
+ Hits        82272    83146     +874     
- Misses      19114    19360     +246     
- Partials     3249     3266      +17     
Flag Coverage Δ
unittests 78.60% <84.29%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +2265 to +2267
// To benchmark scaling curve: measure how long to run
//
// And vary `concurrency` to see how it scales. Compare this again `main`.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main benchmark I'd recommend running is just running this unit tests with the simulated object store latency. Run this for varying concurrency values in 2, 4, 8, 16, 32 on this branch and main. Goal is this should be as fast or faster than main. (Right now, it is slower.)

let mut transaction = transaction.clone();

let num_attempts = std::cmp::max(commit_config.num_retries, 1);
// TODO: use SlotBackoff here instead and size unit off of attempt time.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely going to be more important with the rebases in here now. Look at the retries in merge_insert.rs for how we use SlotBackoff.


// Assert io requetss
let io_stats = io_tracker.incremental_stats();
assert_eq!(io_stats.read_iops, 0);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To accomplish this, we'll need to move the call to checkout_latest() to the end of the retry loop. This makes the first commit attempt blind, but I think that's desirable as it makes the case of sequential writes faster, without making concurrent writes much slower.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that and it worked, but I ended up reverting this change, because by doing that we will let the commit succeed in conflict resolution and eventually write transaction file and manifest file and then realize the version is conflict. That means it makes the retry path slower with 2 more writes. I feel it is not worth the effort, but I might have missed some good way to achieve both.

Comment on lines +651 to +658
// If there is a conflict with two transaction, the retry should require io requests:
// * 1 list version
// * num_other_txns read manifests (cache-able)
// * num_other_txns read txn files (cache-able)
// * 1 write txn file
// * 1 write manifest
// For total of 3 + 2 * num_other_txns io requests. If we have caching enabled, we can skip 2 * num_other_txns
// of those. We should be able to read in 5 hops.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might seem like a bit of a tall order, but I think it's possible.

For the uncached case, we'd need to optimize this part:
https://github.com/lancedb/lance/blob/c39b7e7a271eb81078e0a404361a54082289b94e/rust/lance/src/io/commit.rs#L96-L108

Ideally we can re-use the manifest size found in the list versions, and then read the manifest in 1 request.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to achieve the goal at the cached case, but for uncached, looks like we will repeatedly make some calls for listng and reading, I have not fixed that yet, will create an issue to track it.

jackye1995 added a commit that referenced this pull request May 23, 2025
Continuation of #3785 with fixes for performance issues.

Makes merge_insert and update transactions do row-level conflict
resolution checks, to see if we can quickly resolve conflicts by
rewriting deletion files.

Core changes:

- merge_insert and update transactions now produce an affected_rows
output, with a map of the affected row addresses.
- Create a TransactionRebase struct to handle conflict resolution. This
uses the affected_rows output to attempt to rewrite deletion files.

Auxiliary changes:

- Changed all code paths reading deletion files to go through
read_deletion_file_cached().
- Closes #3772

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
@jackye1995 jackye1995 closed this May 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto conflict resolution for merge_insert

3 participants