Replies: 3 comments 1 reply
-
Hey @tacman 👋
It is not the reason why the disk size was reduced. The main reason was that the previous versions did internal soft deletions, marking documents as deleted and not returning them during a search. It was faster to mark the previous version of a document as deleted and reindex it from scratch. The downside is that it took space on disk with dead data. We now directly modify the documents in place in Meilisearch by doing something called differential indexing: using what's the previous version of a field and the new version, indexing both and only modifying what's changed in the internal data structures 🚀
We use a lot of inverted indexes internally. Storing lists of documents' IDs that correspond to those categories. However, we also store the original document internally to be able to return it the same way it was provided. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the explanation, very interesting.
So even if a field is not marked as searchable but is marked as filterable, it's string is stored and repeated? It might not be worth effort for the application to tokenize the strings, but maybe. A HUGE issue for me is that meili can't do a search by a set of ids, you have to create a "facet" for the id, and deal with all the limits of facets. If I had that, I could store that lookup within meilisearch. I've submitted that issue a few times, but then it never seems to get posted for voting. |
Beta Was this translation helpful? Give feedback.
-
Thanks, I'll do that. The discussion is here, but there's been no activity. https://github.com/orgs/meilisearch/discussions/99 |
Beta Was this translation helpful? Give feedback.
-
In the 1.6 release notes, it says
I'm wondering if part of this has to do with handling duplicate strings, and if meili already does this, it'd be inefficient for application developers to also do it (unlike with a traditional database).
That is, if I have a list of statuses ['new','pending','approved','deleted'], and every one of my million records has this status, in a database I might using a byte to store the status and have another table with maps the one-byte number to the strings (or an enum). But perhaps meili already does this internally, especially for facets.
If this is the case, awesome.
Beta Was this translation helpful? Give feedback.
All reactions