Add cache on the indexes stats #3541

irevoire · 2023-02-23T18:32:38Z

github-actions · 2023-02-23T18:49:51Z

Uffizzi Preview deployment-17252 was deleted.

dureuill

Thank you for this PR, I think it is a very good first step towards caching the stats of the indexes! 🎉

I have a few concerns to resolve before we can accept this PR:

upgrade path: my understanding from reading the code is that running this code on an instance that has indexes from before this PR will lead to IndexNotFound errors until an update is executed on the index. Similarly, failing a stats write on index creation is not a cause for error (only a log), yet it will cause an IndexNotFound error in the stats route.
support for index swapping: before this PR, since the stats are computed eagerly, swapping two indexes will also swap this stats. Because the new DB uses the name of the index as key, my understanding here is that the swapping the indexes won't update their stats, and so won't swap the stats at the same time than the indexes.
persistent database vs in-memory cache: this PR implements the cache as a persistent DB. @Kerollmops expressed a preference for an in-memory cache, to avoid storing this redundant information on disk.

Regarding (1), I think the DB should be made optional such as if an index has no entry in the cache, then it is fetched eagerly and added to the cache. This will make upgrading seamless.

Regarding (2), I think this can be fixed by having the new DB use UUIDs as keys rather than the index names and use the existing name -> UUID table as an intermediary to retrieve the current correspondence between an index name's and its stats.

I don't have a strong opinion one way or the other regarding (3). The main advantage of the disk representation is that there won't be a "cold start" due to the need of populating the cache on startup (or first call to the stats route). The main drawback is that if we get the implementation wrong, any mistake appearing in edge cases will be more persistent and harder to correct. On the other hand, the lifetime of any entry being one index update, I don't think this is too much of a drawback.

For now, I intent to modify this PR to implement solutions for points (1) and (2). I don't intent to switch the DB to an in-memory representation. I feel like this can be easily changed anyway.

dureuill · 2023-02-28T11:02:39Z

I think the DB should be made optional such as if an index has no entry in the cache, then it is fetched eagerly and added to the cache.

Actually, the and added to the cache part is going to be harder than I surmised as the stats route is a reader of the DB and so can't typically write to it. This is a motivation for switing to an in-memory database, that can be set behind a RWLock so it can be shared.

dureuill · 2023-02-28T14:45:45Z

Pushed an update with the following changes:

Rebase on main
Switch the DB to UUID -> Stats instead of str -> Stats
Fallback to eager computation of the stats if the stats DB is missing/incomplete: this eager computation is not cached, but since this is only for update purposes I don't think it is an issue.
Refactor all around to avoid reopening indexes
Fix an issue where we could reopen an index while already holding it, resulting in a deadlock if the index is evicted from the cache between its updates and the stat computation.
Finish fixing snapshots
Documented the introduced functions and structures

Also performed some tests:

On a DB created before this change, running the stats route with >1000 indexes no longer returns index_not_found, but takes 8s to complete
After sending an update to most indexes, the stats route returns almost instantly.

Performed some changes according to my review

dureuill · 2023-02-28T14:50:04Z

@irevoire doesn't look like I can request your review (which is a bit silly since I changed your PR, but...).

Also requesting @Kerollmops ' review

dureuill · 2023-02-28T14:57:40Z

index-scheduler/src/snapshots/lib.rs/cancel_enqueued_task/cancel_processed.snap

@@ -1,6 +1,5 @@
 ---
 source: index-scheduler/src/lib.rs
-assertion_line: 1755


I don't really have any idea why there was an assertion_line here and what it means and why it is not here anymore.

Don't know either but if that makes insta happy I'm 100% for it 😂

curquiza · 2023-03-06T14:53:44Z

@irevoire can you rebase the branch to point to release-v1.1.0?

irevoire

Nice

index-scheduler/src/index_mapper/mod.rs

irevoire · 2023-03-06T15:54:00Z

index-scheduler/src/snapshots/lib.rs/cancel_enqueued_task/cancel_processed.snap

@@ -1,6 +1,5 @@
 ---
 source: index-scheduler/src/lib.rs
-assertion_line: 1755


Don't know either but if that makes insta happy I'm 100% for it 😂

- Refactor all around to avoid spawning indexes more times than necessary

index-scheduler/src/index_mapper/mod.rs

Kerollmops · 2023-03-06T16:41:13Z

index-scheduler/src/index_mapper/mod.rs

+    pub fn store_stats_of(
+        &self,
+        wtxn: &mut RwTxn,
+        index_uid: &str,
+        stats: IndexStats,
+    ) -> Result<()> {
+        let uuid = self
+            .index_mapping
+            .get(wtxn, index_uid)?
+            .ok_or_else(|| Error::IndexNotFound(index_uid.to_string()))?;
+
+        self.index_stats.put(wtxn, &uuid, &stats)?;
+        Ok(())
+    }


Suggested change

pub fn store_stats_of(

&self,

wtxn: &mut RwTxn,

index_uid: &str,

stats: IndexStats,

) -> Result<()> {

let uuid = self

.index_mapping

.get(wtxn, index_uid)?

.ok_or_else(|| Error::IndexNotFound(index_uid.to_string()))?;

self.index_stats.put(wtxn, &uuid, &stats)?;

Ok(())

}

pub fn store_stats_of(

&self,

wtxn: &mut RwTxn,

index_uid: &str,

stats: &IndexStats,

) -> Result<()> {

let uuid = self

.index_mapping

.get(wtxn, index_uid)?

.ok_or_else(|| Error::IndexNotFound(index_uid.to_string()))?;

self.index_stats.put(wtxn, &uuid, stats)?;

Ok(())

}

meilisearch/src/routes/mod.rs

curquiza · 2023-03-06T17:21:17Z

(In case you missed it, still not the right branch to merge into it, should be release-v1.1.0 😇)

- the index size now contributes to the db size even if the index is not authorized

dureuill · 2023-03-07T13:11:36Z

Updated:

Restore contribution of the index sizes to the db size
- the index size now contributes to the db size even if the index is not authorized
Pass IndexStat by ref in store_stats_of

Kerollmops

Thank you for the PR!

However, could please you create an issue describing that the stats are now computing the index disk size too? And make this PR close it?

Thank you 🦪

dureuill · 2023-03-09T11:16:16Z

This PR also fixes #3578

dureuill · 2023-03-09T13:32:35Z

bors merge

bors · 2023-03-09T14:35:20Z

Build succeeded:

irevoire added the enhancement New feature or improvement label Feb 23, 2023

irevoire added this to the v1.1.0 milestone Feb 23, 2023

irevoire requested a review from dureuill February 23, 2023 18:32

dureuill previously requested changes Feb 27, 2023

View reviewed changes

dureuill force-pushed the add-cache-on-the-index-stats branch from 7b99bfa to 60e26cf Compare February 28, 2023 14:36

dureuill requested a review from Kerollmops February 28, 2023 14:49

dureuill reviewed Feb 28, 2023

View reviewed changes

irevoire commented Mar 6, 2023

View reviewed changes

irevoire and others added 4 commits March 6, 2023 16:57

Add cache on the indexes stats

fd5c489

update most snapshots

3bbf760

Eagerly compute stats as fallback to the cache.

076a3d3

- Refactor all around to avoid spawning indexes more times than necessary

Fix snapshots

76288fa

irevoire force-pushed the add-cache-on-the-index-stats branch from 60e26cf to 76288fa Compare March 6, 2023 15:57

Kerollmops requested changes Mar 6, 2023

View reviewed changes

irevoire changed the base branch from main to release-v1.1.0 March 7, 2023 09:28

dureuill added 2 commits March 7, 2023 14:00

Pass IndexStat by ref in store_stats_of

7faa9a2

Restore contribution of the index sizes to the db size

2f5b9fb

- the index size now contributes to the db size even if the index is not authorized

dureuill requested a review from Kerollmops March 9, 2023 08:48

Kerollmops approved these changes Mar 9, 2023

View reviewed changes

dureuill mentioned this pull request Mar 9, 2023

Compute the database size from all indexes in stats #3578

Closed

dureuill linked an issue Mar 9, 2023 that may be closed by this pull request

Compute the database size from all indexes in stats #3578

Closed

bors bot merged commit 667bb87 into release-v1.1.0 Mar 9, 2023

bors bot deleted the add-cache-on-the-index-stats branch March 9, 2023 14:35

curquiza mentioned this pull request Mar 9, 2023

Cache the result of the indexes stats #3540

Closed

meili-bot added the v1.1.0 PRs/issues solved in v1.1.0 released on 2023-04-03 label Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cache on the indexes stats #3541

Add cache on the indexes stats #3541

irevoire commented Feb 23, 2023

github-actions bot commented Feb 23, 2023 •

edited

dureuill left a comment •

edited

dureuill commented Feb 28, 2023 •

edited

dureuill commented Feb 28, 2023 •

edited

dureuill commented Feb 28, 2023

dureuill Feb 28, 2023

irevoire Mar 6, 2023

curquiza commented Mar 6, 2023

irevoire left a comment

irevoire Mar 6, 2023

Kerollmops Mar 6, 2023

dureuill Mar 7, 2023

curquiza commented Mar 6, 2023 •

edited

dureuill commented Mar 7, 2023

Kerollmops left a comment

dureuill commented Mar 9, 2023

dureuill commented Mar 9, 2023

bors bot commented Mar 9, 2023

Add cache on the indexes stats #3541

Add cache on the indexes stats #3541

Conversation

irevoire commented Feb 23, 2023

github-actions bot commented Feb 23, 2023 • edited

dureuill left a comment • edited

Choose a reason for hiding this comment

dureuill commented Feb 28, 2023 • edited

dureuill commented Feb 28, 2023 • edited

dureuill commented Feb 28, 2023

dureuill Feb 28, 2023

Choose a reason for hiding this comment

irevoire Mar 6, 2023

Choose a reason for hiding this comment

curquiza commented Mar 6, 2023

irevoire left a comment

Choose a reason for hiding this comment

irevoire Mar 6, 2023

Choose a reason for hiding this comment

Kerollmops Mar 6, 2023

Choose a reason for hiding this comment

dureuill Mar 7, 2023

Choose a reason for hiding this comment

curquiza commented Mar 6, 2023 • edited

dureuill commented Mar 7, 2023

Kerollmops left a comment

Choose a reason for hiding this comment

dureuill commented Mar 9, 2023

dureuill commented Mar 9, 2023

bors bot commented Mar 9, 2023

github-actions bot commented Feb 23, 2023 •

edited

dureuill left a comment •

edited

dureuill commented Feb 28, 2023 •

edited

dureuill commented Feb 28, 2023 •

edited

curquiza commented Mar 6, 2023 •

edited