Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

add the mt-index-prune utility #1231

Merged
merged 14 commits into from
Mar 8, 2019
Merged

add the mt-index-prune utility #1231

merged 14 commits into from
Mar 8, 2019

Conversation

replay
Copy link
Contributor

@replay replay commented Mar 6, 2019

only did some basic testing with this in a local setup, will do more testing

will also need to add docs for it

fixes #1069

@replay replay requested a review from woodsaj March 6, 2019 17:42
@woodsaj
Copy link
Member

woodsaj commented Mar 7, 2019

looks good to me. the docs need to be updated so that the build checks pass.

@replay
Copy link
Contributor Author

replay commented Mar 7, 2019

Added a few more minor changes, made tests pass, and did some more testing in my local env. Seems to all be working fine as fas as i can tell.
Please take another look and approve if you think it's ok to merge @woodsaj

@shanson7
Copy link
Collaborator

shanson7 commented Mar 7, 2019

Would it be possible to do just 1 partition, or a range? It looks like this would take quite a long time to run sequentially on our data, so there might be utility in doing a partition at a time so we can stop/continue easily.

idx/cassandra/cassandra.go Outdated Show resolved Hide resolved
cmd/mt-index-prune/main.go Outdated Show resolved Hide resolved
makes it possible to iterate over any single partition or an arbitrary
range of partitions
@replay
Copy link
Contributor Author

replay commented Mar 7, 2019

@shanson7 definitely, added that

@Dieterbe
Copy link
Contributor

Dieterbe commented Mar 7, 2019

i didn't check this in depth (or think about it much because i'm on vacation \o/ ), but running this as a tool seems harder to operate (requires setting up extra jobs etc)
wouldn't it be simpler for MT to, upon loading the index, do this automatically and/or a background routine within MT periodically scan for stuff that can be moved? that way there's no additional orchestration overhead.

@woodsaj
Copy link
Member

woodsaj commented Mar 8, 2019

wouldn't it be simpler for MT to, upon loading the index, do this automatically and/or a background routine

A background routine would be a really bad idea, for large instances a huge amount of ram is needed the data from cassandra.
Doing this at startup is an interesting idea, but to get a real benefit you would have to restart the MT instances regularly. Additionally it would increase the startup time of MT and the main goal of the archiving is to reduce startup time.

Doing the archiving out-of-band is definitely the most efficient approach and solves the problems we are trying to address.

Perhaps we could make archiving at startup an optional thing, but doing so is outside the scope of this PR.

- if writes to cassandra fail, just continue on to the next def
- keep count of defs successfully archived.
@woodsaj woodsaj merged commit 2a90de3 into master Mar 8, 2019
@woodsaj woodsaj deleted the mt-index-prune_utility branch March 8, 2019 07:58
@shanson7
Copy link
Collaborator

shanson7 commented Mar 8, 2019

Why would it require any more RAM than MT is already using? MT already has exactly what time-series are still alive in the memory index.

I will point out, though, that doing it within MT would be harder for me. We run 1 set of write instances which prune the index aggressively (different set of index rules) so they wouldn't be able to do this. Our read instances are replicated, meaning they would duplicate the effort if they both tried to do the same partitions.

Just my 2 cents

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prune index in Cassandra
4 participants