add the mt-index-prune utility #1231

replay · 2019-03-06T17:42:21Z

only did some basic testing with this in a local setup, will do more testing

will also need to add docs for it

woodsaj · 2019-03-07T04:31:02Z

looks good to me. the docs need to be updated so that the build checks pass.

replay · 2019-03-07T10:18:07Z

Added a few more minor changes, made tests pass, and did some more testing in my local env. Seems to all be working fine as fas as i can tell.
Please take another look and approve if you think it's ok to merge @woodsaj

idx/cassandra/cassandra.go

shanson7 · 2019-03-07T12:06:17Z

Would it be possible to do just 1 partition, or a range? It looks like this would take quite a long time to run sequentially on our data, so there might be utility in doing a partition at a time so we can stop/continue easily.

idx/cassandra/cassandra.go

cmd/mt-index-prune/main.go

makes it possible to iterate over any single partition or an arbitrary range of partitions

replay · 2019-03-07T13:18:15Z

@shanson7 definitely, added that

Dieterbe · 2019-03-07T17:34:55Z

i didn't check this in depth (or think about it much because i'm on vacation \o/ ), but running this as a tool seems harder to operate (requires setting up extra jobs etc)
wouldn't it be simpler for MT to, upon loading the index, do this automatically and/or a background routine within MT periodically scan for stuff that can be moved? that way there's no additional orchestration overhead.

woodsaj · 2019-03-08T06:38:02Z

wouldn't it be simpler for MT to, upon loading the index, do this automatically and/or a background routine

A background routine would be a really bad idea, for large instances a huge amount of ram is needed the data from cassandra.
Doing this at startup is an interesting idea, but to get a real benefit you would have to restart the MT instances regularly. Additionally it would increase the startup time of MT and the main goal of the archiving is to reduce startup time.

Doing the archiving out-of-band is definitely the most efficient approach and solves the problems we are trying to address.

Perhaps we could make archiving at startup an optional thing, but doing so is outside the scope of this PR.

- if writes to cassandra fail, just continue on to the next def - keep count of defs successfully archived.

shanson7 · 2019-03-08T12:56:03Z

Why would it require any more RAM than MT is already using? MT already has exactly what time-series are still alive in the memory index.

I will point out, though, that doing it within MT would be harder for me. We run 1 set of write instances which prune the index aggressively (different set of index rules) so they wouldn't be able to do this. Our read instances are replicated, meaning they would duplicate the effort if they both tried to do the same partitions.

Just my 2 cents

add the mt-index-prune utility

d07c1bd

replay requested a review from woodsaj March 6, 2019 17:42

replay added 3 commits March 7, 2019 06:25

update docs

dce29cf

update gitignore

ef2aa81

exit if index-rules couldn't be found

c5dc6d3

replay force-pushed the mt-index-prune_utility branch from fb1d7ab to c5dc6d3 Compare March 7, 2019 09:41

replay added 2 commits March 7, 2019 06:53

make for loop easier to understand

fb59a2c

save some calls to time.Now()

13cc134

woodsaj reviewed Mar 7, 2019

View reviewed changes

idx/cassandra/cassandra.go Show resolved Hide resolved

woodsaj reviewed Mar 7, 2019

View reviewed changes

idx/cassandra/cassandra.go Outdated Show resolved Hide resolved

woodsaj reviewed Mar 7, 2019

View reviewed changes

cmd/mt-index-prune/main.go Outdated Show resolved Hide resolved

replay added 3 commits March 7, 2019 10:14

different way to specify partitions

7f8bb74

makes it possible to iterate over any single partition or an arbitrary range of partitions

add error when writing to cassandra

6821d25

change fatal error to warning

d5f72f4

replay added 3 commits March 7, 2019 10:48

execute cassandra operations in a pool of routines

9d8aba4

update docs

8684bea

fix dep

434b7bc

woodsaj added 2 commits March 8, 2019 15:33

add schema_archive_table to scylladb template

9ef83eb

improve error handling, logging and reporting

d2868c8

- if writes to cassandra fail, just continue on to the next def - keep count of defs successfully archived.

woodsaj force-pushed the mt-index-prune_utility branch from 313dbd4 to d2868c8 Compare March 8, 2019 07:41

woodsaj approved these changes Mar 8, 2019

View reviewed changes

woodsaj merged commit 2a90de3 into master Mar 8, 2019

woodsaj deleted the mt-index-prune_utility branch March 8, 2019 07:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add the mt-index-prune utility #1231

add the mt-index-prune utility #1231

replay commented Mar 6, 2019 •

edited

Loading

woodsaj commented Mar 7, 2019

replay commented Mar 7, 2019

shanson7 commented Mar 7, 2019

replay commented Mar 7, 2019

Dieterbe commented Mar 7, 2019 •

edited

Loading

woodsaj commented Mar 8, 2019

shanson7 commented Mar 8, 2019

add the mt-index-prune utility #1231

add the mt-index-prune utility #1231

Conversation

replay commented Mar 6, 2019 • edited Loading

woodsaj commented Mar 7, 2019

replay commented Mar 7, 2019

shanson7 commented Mar 7, 2019

replay commented Mar 7, 2019

Dieterbe commented Mar 7, 2019 • edited Loading

woodsaj commented Mar 8, 2019

shanson7 commented Mar 8, 2019

replay commented Mar 6, 2019 •

edited

Loading

Dieterbe commented Mar 7, 2019 •

edited

Loading