-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cache on the indexes stats #3541
Conversation
Uffizzi Preview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this PR, I think it is a very good first step towards caching the stats of the indexes! 🎉
I have a few concerns to resolve before we can accept this PR:
- upgrade path: my understanding from reading the code is that running this code on an instance that has indexes from before this PR will lead to
IndexNotFound
errors until an update is executed on the index. Similarly, failing a stats write on index creation is not a cause for error (only a log), yet it will cause anIndexNotFound
error in the stats route. - support for index swapping: before this PR, since the stats are computed eagerly, swapping two indexes will also swap this stats. Because the new DB uses the name of the index as key, my understanding here is that the swapping the indexes won't update their stats, and so won't swap the stats at the same time than the indexes.
- persistent database vs in-memory cache: this PR implements the cache as a persistent DB. @Kerollmops expressed a preference for an in-memory cache, to avoid storing this redundant information on disk.
Regarding (1), I think the DB should be made optional such as if an index has no entry in the cache, then it is fetched eagerly and added to the cache. This will make upgrading seamless.
Regarding (2), I think this can be fixed by having the new DB use UUIDs as keys rather than the index names and use the existing name -> UUID table as an intermediary to retrieve the current correspondence between an index name's and its stats.
I don't have a strong opinion one way or the other regarding (3). The main advantage of the disk representation is that there won't be a "cold start" due to the need of populating the cache on startup (or first call to the stats route). The main drawback is that if we get the implementation wrong, any mistake appearing in edge cases will be more persistent and harder to correct. On the other hand, the lifetime of any entry being one index update, I don't think this is too much of a drawback.
For now, I intent to modify this PR to implement solutions for points (1) and (2). I don't intent to switch the DB to an in-memory representation. I feel like this can be easily changed anyway.
Actually, the and added to the cache part is going to be harder than I surmised as the stats route is a reader of the DB and so can't typically write to it. This is a motivation for switing to an in-memory database, that can be set behind a RWLock so it can be shared. |
7b99bfa
to
60e26cf
Compare
Pushed an update with the following changes:
Also performed some tests:
|
Performed some changes according to my review
@irevoire doesn't look like I can request your review (which is a bit silly since I changed your PR, but...). Also requesting @Kerollmops ' review |
@@ -1,6 +1,5 @@ | |||
--- | |||
source: index-scheduler/src/lib.rs | |||
assertion_line: 1755 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really have any idea why there was an assertion_line
here and what it means and why it is not here anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know either but if that makes insta happy I'm 100% for it 😂
@irevoire can you rebase the branch to point to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
@@ -1,6 +1,5 @@ | |||
--- | |||
source: index-scheduler/src/lib.rs | |||
assertion_line: 1755 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know either but if that makes insta happy I'm 100% for it 😂
- Refactor all around to avoid spawning indexes more times than necessary
60e26cf
to
76288fa
Compare
pub fn store_stats_of( | ||
&self, | ||
wtxn: &mut RwTxn, | ||
index_uid: &str, | ||
stats: IndexStats, | ||
) -> Result<()> { | ||
let uuid = self | ||
.index_mapping | ||
.get(wtxn, index_uid)? | ||
.ok_or_else(|| Error::IndexNotFound(index_uid.to_string()))?; | ||
|
||
self.index_stats.put(wtxn, &uuid, &stats)?; | ||
Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn store_stats_of( | |
&self, | |
wtxn: &mut RwTxn, | |
index_uid: &str, | |
stats: IndexStats, | |
) -> Result<()> { | |
let uuid = self | |
.index_mapping | |
.get(wtxn, index_uid)? | |
.ok_or_else(|| Error::IndexNotFound(index_uid.to_string()))?; | |
self.index_stats.put(wtxn, &uuid, &stats)?; | |
Ok(()) | |
} | |
pub fn store_stats_of( | |
&self, | |
wtxn: &mut RwTxn, | |
index_uid: &str, | |
stats: &IndexStats, | |
) -> Result<()> { | |
let uuid = self | |
.index_mapping | |
.get(wtxn, index_uid)? | |
.ok_or_else(|| Error::IndexNotFound(index_uid.to_string()))?; | |
self.index_stats.put(wtxn, &uuid, stats)?; | |
Ok(()) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
(In case you missed it, still not the right branch to merge into it, should be |
- the index size now contributes to the db size even if the index is not authorized
Updated:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR!
However, could please you create an issue describing that the stats are now computing the index disk size too? And make this PR close it?
Thank you 🦪
This PR also fixes #3578 |
bors merge |
Fix #3540