-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(trin-storage): add filter to store #1286
Conversation
Regardless if we decide to postpone this for later, I would like to highlight an issues with this approach. Pagination doesn't really guarantee that you will see all entries. Consider following scenarios: Scenario 1: Entries changing pages
Result: Entries that were originally at index 101-110 were never queried (because at the moment of second query they were moved to 91-100). This can be avoided by either:
Scenario 2: Running as a separate task while trin normal operation are running as well If trin is running normally, we can't avoid issue from Scenario 1 because trin can delete entries (e.g. using pruning) and affect pagination in a way that we can't control. Note: This can be avoid by simply only running filtering on startup, but that should be documented (if not enforced). |
2e64e6e
to
563ea81
Compare
@morph-dev @KolbyML
Personally, I'm leaning towards using # 1, but I'm curious what thoughts you guys have on this... |
Is I am just a little confused sorry if it is a silly question. |
Sorry, I should have probably said Basically, right now I'm just curious as to what your thoughts are on the two approaches in this pr. I still don't think we should merge #1436 until we have a gameplan settled upon for removing epoch accs. But if we're ok with the "script" approach, then it'll probably make sense to have the two prs released in the same deployment. |
I am a fan of the script approach and deploying both in the same release so
The script should be able to still work, because it won't be able to decode Epoch-accumulators because it is no longer a valid type, hence it will be evicted (as we just discussed, just a recap of what I think the plan is correct me if I am wrong) I am good with this plan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will distinguish between these two approaches as script vs store.
Based on the comment in the code, I assume store would run only at the startup.
TL;DR: If you want this only for purging EpochAccumulators, then I'm in favor of script but we can do it faster using sql. If we want some "purge" solution that can be reusable in the future, I'm in favor of store with few suggestions.
If always enabled, both would significantly delay startup time (especially for prod nodes). If we want at every startup to check that all data in db is still valid (which I believe we don't), we can't avoid this delay. But In that case I would prefer store approach (and in that case, there are couple improvements that I can suggest, e.g. instead of having it as a function, doing it as a part of trin_storage::versioned::create_store
and have a flag that can enable/disable it).
If this is done as a one-off thing to purge EpochAccumulators, then I'm in favor of script approach. But in that case, we can make this much faster by running some sql query like this:
DELETE FROM ii1_history WHERE hex(content_key) LIKE "03%"
(the number of deleted rows can be obrained with changes function)
If you want to have a solution that is easily reusable in the future (e.g. with simple modification of the is_content_valid
function), than I'm again in favor of store approach but with improvements that I already mentioned above.
} | ||
} | ||
} | ||
for content_id in &content_ids_to_remove { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move this outside bigger loop (line 54), otherwise it doesn't work.
@morph-dev Sounds good. I'll move ahead with the |
563ea81
to
e15d1d8
Compare
500dd4b
to
1ab70f4
Compare
1ab70f4
to
0c52247
Compare
Ready for another review here. I also added a flag to the script that allows for test runs before actually evicting the content. @morph-dev It's definitely not as fast as directly implementing a sql query, but afai to do so requires access to the sql connection pool on the |
@njgheorghita I think we should do @morph-dev suggestion. As currently Or make the script properly handle invalid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't say it's super fast (we don't have index for content_key
), but it must be faster than getting every single content key/value and decoding them :).
create_store(ContentType::History, config, sql_connection_pool).unwrap(); | ||
let total_entry_count = store.usage_stats().entry_count; | ||
info!("total entry count: {total_entry_count}"); | ||
let sql_connection_pool = store.config.sql_connection_pool.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already have sql_connection_pool
, at line 42. Therefor, you don't need config
in store
to be public (changes in other file).
Also, I would argue that you don't need store at all. All you are doing is getting the number of entries, which you can do by executing:
SELECT COUNT(*) as count FROM ii1_history
Up to you if you want to create instance of store and get entry count from there or execute it manually from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow this comment wasn't attached to the previous review.
.expect("Failed to fetch history content"); | ||
info!("found {} epoch accumulators", lookup_result); | ||
if script_config.evict { | ||
let _: Vec<String> = sql_connection_pool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we used execute
here?
It would be something like:
let deleted = sql_connection_pool.get().unwrap().execute(&delete_query(), []).unwrap();
I think with this api we don't even need extra changes()
call that I said in the previous review.
7bfaed0
to
c7d48b3
Compare
c7d48b3
to
c3c19e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I assume the PR is ready for review now, tell me if I am wrong)
looks good, I haven't tested the changes locally. But I don't see any major red flags so we should be good to go
What was wrong?
Soon, we'll need the ability to remove certain content from our storage according to a specified "validation function". This pr implements the ability to filter such data from the storage.
How was it fixed?
Added filter feature to storage
To-Do