-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
safe tantivy replication #1755
safe tantivy replication #1755
Conversation
This pull request has been linked to Shortcut Story #8605: Safe replication in Tantivy. |
3cb35ac
to
670faa5
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1755 +/- ##
==========================================
- Coverage 82.15% 82.13% -0.03%
==========================================
Files 335 335
Lines 19682 19682
==========================================
- Hits 16170 16165 -5
- Misses 3512 3517 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
b0d8b75
to
2decd7e
Compare
c671535
to
bea79f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few suggestions, nothing blocking.
As a general comment, this ties us a bit more onto the specific tantivy version, but it seems pretty much inevitable.
chunk_size: u64, | ||
shard_path: &Path, | ||
generation_id: &str, | ||
_index_prefix: PathBuf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it a bit confusing that tantivy returns paths relative to the index and raw (vectors) returns paths relative to the shard. Would it be possible to change vectors so that it's also using paths relative to the index?
Feel free to say no, I saw something regarding vectors
and vectorset
which looks a bit tricky.
In any case, it would be nice to keep this in mind for vectors2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are totally right, the only reason why I did not unify the approach was to not overload the PR. I was trying to add the tantivy thing while changing the surroundings as little as possible. If you want I can unify in the same PR!
Description
This PR uses a different approach to replication in Tantivy indexes. We were getting errors due to data race conditions, which are a result of us directly accessing Tantivy files without proper safeguards.
How was this PR tested?
Local tests.