Reduce Transform
's disk usage when changing the settings
#4485
Labels
performance
Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption
settings diff-indexing
Issues related to settings diff-indexing
Related product team resources: PRD (internal only)
Summary
This issue is a subset of the work implementing the settings diff-indexing enhancement.
The method prepare_documents_for_reindexing exports all the documents of the databases in two different formats on the disk using Grenad sorters:
The original OBKV format
It is used to write documents in the database and recompute the semantic search vectors.
Writing the documents in the database is useless when the settings are changed.
So the original OBKV format should only be computed if a setting related to the vector pipeline is changed,
otherwise, the grenad sorter shouldn't be created and sent to the indexing.
The flattened OBKV format
It is used to compute the searchable pipeline and the facet pipeline.
The flattened OBKV format should only be created if the searchable pipeline, the facet pipeline, or the
word-pair-proximity
database is impacted by the settings change.Moreover, the written field for each document could be filtered depending on which setting is changed.
For instance:
searchableAttributes
have been changed, keep only the searchable fields and the primary key in the documents.Related Benchmarks:
settings-add-embeddings.json
settings-add-remove-filters.json
settings-proximity-precision.json
settings-remove-add-swap-searchable.json
settings-typo.json
TODO
The text was updated successfully, but these errors were encountered: