Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Make soft deletion optional in document addition and deletion + add lots of tests #720

Merged
merged 3 commits into from
Dec 5, 2022

Conversation

loiclec
Copy link
Contributor

@loiclec loiclec commented Dec 5, 2022

Pull Request

What does this PR do?

When debugging recent issues, I created a few unit tests in the hopes reproducing the bugs I was looking for. In the end, I didn't find any, but I thought it would still be good to keep those tests.

More importantly, I added a field to the DeleteDocuments and IndexDocuments builders, called disable_soft_deletion. If set to true, the indexing/deletion will never add documents to the soft_deleted_documents_ids and instead perform a real deletion of the documents from the databases.

For the new tests, I have:

  • Improved the insta-snapshot format of the external_documents_ids structure
  • Added more tests for the facet DB indexing, deletion, and search algorithms, making sure to test them when the facet DB contains strings (instead of numbers) as well.
  • Added more tests for the incremental indexing of the prefix proximity databases. For example, to see if documents are replaced correctly and if common prefixes are deleted correctly.
  • Added tests that mix soft deletion and hard deletion, including when processing batches of document updates.

Copy link
Contributor Author

@loiclec loiclec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote comments to highlight where the soft deletion changes were made, to focus the reviewing efforts on the consequential parts of the PR.

I also don't expect the reviewer to check the content of the .snap files, of course. You can trust that I have read them and didn't see anything wrong with their content :) Don't forget you can filter the files to review by their extension, and thus exclude all .snap files.

@@ -26,7 +26,6 @@ pub struct DeleteDocuments<'t, 'u, 'i> {
index: &'i Index,
external_documents_ids: ExternalDocumentsIds<'static>,
to_delete_docids: RoaringBitmap,
#[cfg(test)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an important change that does not only affect tests.

@@ -88,6 +88,7 @@ pub struct IndexDocumentsConfig {
pub words_positions_level_group_size: Option<NonZeroU32>,
pub words_positions_min_level_size: Option<NonZeroU32>,
pub update_method: IndexDocumentsMethod,
pub disable_soft_deletion: bool,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second important change that does not only affect the tests

@@ -331,6 +332,7 @@ where
// able to simply insert all the documents even if they already exist in the database.
if !replaced_documents_ids.is_empty() {
let mut deletion_builder = update::DeleteDocuments::new(self.wtxn, self.index)?;
deletion_builder.disable_soft_deletion(self.config.disable_soft_deletion);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only place where IndexDocumentsConfig.disable_soft_deletion is used.

@loiclec loiclec added no breaking The related changes are not breaking (DB nor API) maintenance Issue about maintenance (CI, tests, refacto...) labels Dec 5, 2022
Copy link
Member

@irevoire irevoire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for the tests!
bors merge

@@ -80,6 +80,8 @@ impl<'t, 'e> Iterator for AscendingFacetSort<'t, 'e> {
// that we found all the documents in the sub level iterations already,
// we can pop this level iterator.
if documents_ids.is_empty() {
// break our of the for loop into the end of the 'outer loop, which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// break our of the for loop into the end of the 'outer loop, which
// break out of the for loop into the end of the 'outer loop, which

@bors
Copy link
Contributor

bors bot commented Dec 5, 2022

@bors bors bot merged commit 46e26ab into main Dec 5, 2022
@bors bors bot deleted the indexing-and-soft-deletion-tests branch December 5, 2022 18:54
Kerollmops pushed a commit that referenced this pull request Dec 6, 2022
720: Make soft deletion optional in document addition and deletion + add lots of tests r=irevoire a=loiclec

# Pull Request

## What does this PR do?
When debugging recent issues, I created a few unit tests in the hopes reproducing the bugs I was looking for. In the end, I didn't find any, but I thought it would still be good to keep those tests. 

More importantly, I added a field to the `DeleteDocuments` and `IndexDocuments` builders, called `disable_soft_deletion`. If set to `true`, the indexing/deletion will never add documents to the `soft_deleted_documents_ids` and instead perform a real deletion of the documents from the databases.

For the new tests, I have:
- Improved the insta-snapshot format of the `external_documents_ids` structure
- Added more tests for the facet DB indexing, deletion, and search algorithms, making sure to test them when the facet DB contains strings (instead of numbers) as well.
- Added more tests for the incremental indexing of the prefix proximity databases. For example, to see if documents are replaced correctly and if common prefixes are deleted correctly.
- Added tests that mix soft deletion and hard deletion, including when processing batches of document updates. 


Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
bors bot added a commit that referenced this pull request Dec 6, 2022
725: Reapply fixes for v0.37.1 r=curquiza a=Kerollmops

This PR reapplies #712, #720, #722, and #723.
We needed the #720 as it was bringing more tests and snap-related improvements.

Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
maintenance Issue about maintenance (CI, tests, refacto...) no breaking The related changes are not breaking (DB nor API)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants