Think about data oriented design #82

Kerollmops · 2019-01-13T13:45:21Z

Currently the mosts cpu consumming parts of MeiliDB are the criterions, it is almost 40% of the total time (for 9 million fields).

We spotted something interresting in time measurements, one of our criterion takes much more time than the others but it only does a sum of one of the Match property.

The sum_of_typos criterion takes 5.41ms to sort 97360 documents and the sum_of_words_attribute takes 19.76ms for the exact same number of documents.

97360 total documents to classify
626460 total matches to classify

query_all took 87.45 ms

criterion SumOfTypos,               documents group of size 97360
criterion SumOfTypos                sort took 5.41 ms

criterion NumberOfWords,            documents group of size 97360
criterion NumberOfWords             sort took 5.36 ms

criterion WordsProximity,           documents group of size 97360
criterion WordsProximity            sort took 7.24 ms

criterion SumOfWordsAttribute,      documents group of size 97360
criterion SumOfWordsAttribute       sort took 19.76 ms

criterion SumOfWordsPosition,       documents group of size 33657
criterion SumOfWordsPosition        sort took 10.48 ms

criterion Exact,                    documents group of size 16708
criterion Exact                     sort took 1.96 ms

criterion DocumentId,               documents group of size 16708
criterion DocumentId                sort took 1.98 ms

It is nearly the sames algorithms:

https://github.com/Kerollmops/MeiliDB/blob/0a3d069fbc0108cca6953c813453b2dd24d8b68d/src/rank/criterion/sum_of_typos.rs#L14-L26

https://github.com/Kerollmops/MeiliDB/blob/0a3d069fbc0108cca6953c813453b2dd24d8b68d/src/rank/criterion/sum_of_words_attribute.rs#L13-L19

The only difference is that the attribute field is not at the start of Match.
It is probably padded, making the CPU cache unhappy.

struct Match {
    query_index: u32,
    distance: u8,
    attribute: u16,
    word_index: u32,
    is_exact: bool,
}

So we thought about data oriented design, putting related data in the same memory space to make the CPU cache happy.

All of these fields will probably be stored in the same vectors, each vector represent a property.
All of the documents matches properties will be in the same five vectors, everything is a matter of indexes and slices.

// OLD algorithm struct
struct Match {
    query_index: u32,
    distance: u8,
    attribute: u16,
    word_index: u32,
    is_exact: bool,
}

let matches: Vec<Match>;

// NEW algorithm struct
struct Matches {
    query_index: Vec<u32>,
    distance: Vec<u8>,
    attribute: Vec<u16>,
    word_index: Vec<u32>,
    is_exact: Vec<bool>,
}

https://stackoverflow.com/questions/17074324/how-can-i-sort-two-vectors-in-the-same-way-with-criteria-that-uses-only-one-of

The text was updated successfully, but these errors were encountered:

Kerollmops · 2019-01-14T21:14:28Z

Working on a simple solution brings to good timings (branch data-oriented).

97360 total documents to classify
626460 total matches to classify

query_all took 106.18 ms

criterion SumOfTypos,               documents group of size 97360
criterion SumOfTypos                sort took 4.36 ms

criterion NumberOfWords,            documents group of size 97360
criterion NumberOfWords             sort took 3.51 ms

criterion WordsProximity,           documents group of size 97360
criterion WordsProximity            sort took 1.76 ms

criterion SumOfWordsAttribute,      documents group of size 97360
criterion SumOfWordsAttribute       sort took 10.47 ms

criterion SumOfWordsPosition,       documents group of size 33657
criterion SumOfWordsPosition        sort took 5.97 ms

criterion Exact,                    documents group of size 16708
criterion Exact                     sort took 882.94 μs

criterion DocumentId,               documents group of size 16708
criterion DocumentId                sort took 1.39 ms

Kerollmops · 2019-01-15T13:19:55Z

After many hours of reflection I did not find a solution to fix the query_all important overhead.

Kerollmops · 2019-01-29T15:41:58Z

Ok, so after 14 days of intensive reflection (lol), I found a solution to reduce the timings of the query_all function, removing the HashMap<DocumentId, Vec<Match>> and replacing it with a Vec<(DocumentId, Match)> that is sort in parallel using rayon.

The previous vec is finally aggregated into 7 vec of same data type (i.e. all distances, all exact), following the data oriented previously developped design of the engine.

Here are the before/after performance logs of the search engine by searching "s" by using the default-fields.toml schema.

Searching for: s

97360 total documents to classify
677358 total matches to classify

query_all took 88.97 ms

criterion SumOfTypos, documents group of size 97360
criterion SumOfTypos sort took 5.57 ms

criterion NumberOfWords, documents group of size 97360
criterion NumberOfWords sort took 6.42 ms

criterion WordsProximity, documents group of size 97360
criterion WordsProximity sort took 7.53 ms

criterion SumOfWordsAttribute, documents group of size 97360
criterion SumOfWordsAttribute sort took 25.56 ms

criterion SumOfWordsPosition, documents group of size 50898
criterion SumOfWordsPosition sort took 3.23 ms

criterion Exact, documents group of size 50898
criterion Exact sort took 4.70 ms

criterion DocumentId, documents group of size 50898
criterion DocumentId sort took 6.85 ms

Found 4 results in 152.88 ms

Searching for: s

97360 total documents to classify
677358 total matches to classify

query_all took 32.94 ms

criterion SumOfTypos, documents group of size 97360
criterion SumOfTypos sort took 3.56 ms

criterion NumberOfWords, documents group of size 97360
criterion NumberOfWords sort took 3.06 ms

criterion WordsProximity, documents group of size 97360
criterion WordsProximity sort took 4.35 ms

criterion SumOfWordsAttribute, documents group of size 97360
criterion SumOfWordsAttribute sort took 9.23 ms

criterion SumOfWordsPosition, documents group of size 50898
criterion SumOfWordsPosition sort took 1.75 ms

criterion Exact, documents group of size 50898
criterion Exact sort took 2.35 ms

criterion DocumentId, documents group of size 50898
criterion DocumentId sort took 3.64 ms

Found 4 results in 61.06 ms

It seems to be a success, a 2.50x times improvement, note that we use multithreading, the rayon library is nicely designed and use a pool of threads but it could have an impact on the number of concurrent http requests.

I need to transpose the old version criterion tests to the new one.

Kerollmops added enhancement New feature or improvement help wanted labels Jan 15, 2019

Kerollmops removed the help wanted label Jan 29, 2019

Kerollmops mentioned this issue Feb 2, 2019

Introduce Data Oriented design into the search algorithm #94

Merged

Kerollmops closed this as completed in #94 Feb 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Think about data oriented design #82

Think about data oriented design #82

Kerollmops commented Jan 13, 2019 •

edited

Kerollmops commented Jan 14, 2019 •

edited

Kerollmops commented Jan 15, 2019 •

edited

Kerollmops commented Jan 29, 2019 •

edited

Think about data oriented design #82

Think about data oriented design #82

Comments

Kerollmops commented Jan 13, 2019 • edited

Kerollmops commented Jan 14, 2019 • edited

Kerollmops commented Jan 15, 2019 • edited

Kerollmops commented Jan 29, 2019 • edited

Kerollmops commented Jan 13, 2019 •

edited

Kerollmops commented Jan 14, 2019 •

edited

Kerollmops commented Jan 15, 2019 •

edited

Kerollmops commented Jan 29, 2019 •

edited