Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set gc grace to zero #3513

Closed
wants to merge 2 commits into from
Closed

set gc grace to zero #3513

wants to merge 2 commits into from

Conversation

j-baker
Copy link
Contributor

@j-baker j-baker commented Sep 19, 2018

Do we actually need this/benefit from this being greater than zero? It'd be real nice for short lived things to never end up in SSTables. As I understand it, read/write quorum and delete consistency all should save us from issues.


This change is Reviewable

Do we actually need this/benefit from this being greater than zero? It'd be real nice for short lived things to never end up in SSTables. As I understand it, read/write quorum and delete consistency all should save us from issues.
@ashrayjain
Copy link
Contributor

note that this will also set the hint ttl to 0 (effectively disabling it).

@clockfort
Copy link
Contributor

My only worry is around quroum writes that didn't create 3 copies / that hints were kind of nice because we don't have any maintenance-style repair currently.

@tpetracca
Copy link
Contributor

For a nice explanation of what Ashray and Clocks are getting at: http://thelastpickle.com/blog/2018/03/21/hinted-handoff-gc-grace-demystified.html

In practice now that we're issuing range tombstones in targeted sweep I don't think we really have to worry about tombstone accumulation the same way we used to. Two overlapping range tombstones where 1 is both larger and newer than the other will cause the smaller/older one to get removed regardless of gc_grace, and I'd expect the TS write pattern to always result in this. So in very high overwrite tables you'd generally expect this to occur in the memtable layer and then persist just a single tombstone on every flush which will then be compacted relatively quickly.

Would be good to actually observe what I'm describing in practice, but yea I don't think we need to sacrifice the hints here for what I understand as little to no benefit.

@j-baker
Copy link
Contributor Author

j-baker commented Sep 21, 2018

As I understand it, if you do quorum and delete 'ALL', all the hints ever help is perf. But they also hurt stability, so there's a bit of a tradeoff, right?

The workflow I know exists is one where e.g. you might have a table of active jobs, and you have maybe 100 jobs active at a time, with each of them taking maybe 60 seconds, and being swept after a few minutes. With GC grace 1 hour, your range scan to select all of them is going to be reading 6000 cells of which 5900 are tombstones. With GC grace 0, you plausibly never really end up with data on disk, right? That's the workflow i was hoping to enable here.

@clockfort
Copy link
Contributor

I agree with you about the tombstones, I'm just worried about it maybe disabling hints.

A specific example of what I'm worried about is a user with a completely immutable, append-only style table access pattern, where data is written once and rarely if ever read and there's no reason to sweep anything since its all live.

If a write to such a table fails to achieve full RF3 replication, today at least there's a good chance it'll eventually get fixed via the very targeted repair of hinted handoff when the bad node starts behaving better and takes delivery of hints from other nodes. But otherwise the only mechanism by which that repair would happen today is happening to hit the 10% read-repair-chance lottery while happening to read the poorly-replicated data in a query and having Cassandra herself kick off her own targeted repair via the read-repair mechanism.

Fixes to that situation I think are either of :

  • us(CassOps) actually writing background maintenance table repair jobs
  • us(AtlasOps) doing CassOp's maintenance job for them, and altering read repair to 100% at least during the period when sweep cruises through reading all of the data for a table during a classic-style sweep (which probably is somewhat perf-degenerate to other services' active queries on that same table) (and somewhat annoyingly I believe range scans at least in C* 2.2 aren't capable of kicking off their own read repairs, though the read path for direct key-based lookup is)

@sandorw
Copy link
Contributor

sandorw commented Oct 3, 2018

What's the resolution here? Do we want to track having automatic repair infra before considering this?

@clockfort
Copy link
Contributor

We caught up offline since we had a meeting that had James, Tom, and myself already.

We came to the conclusion that this commit is fine, but that we should be forcing repairs at least before some cluster-expansion operations that currently do not force a repair in our scripts.

@sandorw sandorw closed this Mar 12, 2019
@sandorw sandorw deleted the j-baker-patch-1 branch March 12, 2019 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants