-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Think about data oriented design #82
Comments
Working on a simple solution brings to good timings (branch data-oriented).
|
After many hours of reflection I did not find a solution to fix the |
Ok, so after 14 days of intensive reflection (lol), I found a solution to reduce the timings of the The previous vec is finally aggregated into 7 vec of same data type (i.e. all distances, all exact), following the data oriented previously developped design of the engine. Here are the before/after performance logs of the search engine by searching "s" by using the
It seems to be a success, a 2.50x times improvement, note that we use multithreading, the rayon library is nicely designed and use a pool of threads but it could have an impact on the number of concurrent http requests. I need to transpose the old version criterion tests to the new one. |
Currently the mosts cpu consumming parts of MeiliDB are the criterions, it is almost 40% of the total time (for 9 million fields).
We spotted something interresting in time measurements, one of our criterion takes much more time than the others but it only does a sum of one of the
Match
property.The
sum_of_typos
criterion takes 5.41ms to sort 97360 documents and thesum_of_words_attribute
takes 19.76ms for the exact same number of documents.It is nearly the sames algorithms:
https://github.com/Kerollmops/MeiliDB/blob/0a3d069fbc0108cca6953c813453b2dd24d8b68d/src/rank/criterion/sum_of_typos.rs#L14-L26
https://github.com/Kerollmops/MeiliDB/blob/0a3d069fbc0108cca6953c813453b2dd24d8b68d/src/rank/criterion/sum_of_words_attribute.rs#L13-L19
The only difference is that the
attribute
field is not at the start ofMatch
.It is probably padded, making the CPU cache unhappy.
So we thought about data oriented design, putting related data in the same memory space to make the CPU cache happy.
All of these fields will probably be stored in the same vectors, each vector represent a property.
All of the documents matches properties will be in the same five vectors, everything is a matter of indexes and slices.
https://stackoverflow.com/questions/17074324/how-can-i-sort-two-vectors-in-the-same-way-with-criteria-that-uses-only-one-of
The text was updated successfully, but these errors were encountered: