Digests of data queries may mismatch even though mutation queries return equal mutations #1165

Closed
tgrabiec opened this Issue Apr 6, 2016 · 0 comments

Projects

None yet

1 participant

@tgrabiec
Contributor
tgrabiec commented Apr 6, 2016

Data query doesn't include metadata like tombstones and timestamps. It comes with a digest which should differ if metadata (or, of course, main data) is different. For read-repair to make progress we want digest to differ only if there is something to repair. Read-repair will do a mutation query and send out diffs to fix the digest mismatch. That increases read latency, so we would like to avoid this unless really necessary.

Current algorithm of digest calculation includes data which may no longer exist after mutation is compacted, for example cells covered by higher-level tombstones or GC-able tombstones. This may lead to mismatched digest even though mutations returned by mutation queries, which always compact, are equal.

@tgrabiec tgrabiec self-assigned this Apr 6, 2016
@avikivity avikivity pushed a commit that closed this issue Apr 8, 2016
@tgrabiec tgrabiec database: Compact mutations when executing data queries
Currently data query digest includes cells and tombstones which may have
expired or be covered by higher-level tombstones. This causes digest
mismatch between replicas if some elements are compacted on one of the
nodes and not on others. This mismatch triggers read-repair which doesn't
resolve because mutations received by mutation queries are not differing,
they are compacted already.

The fix adds compacting step before writing and digesting query results by
reusing the algorithm used by mutation query. This is not the most optimal
way to fix this. The compaction step could be folded with the query writing,
there is redundancy in both steps. However such change carries more risk,
and thus was postponed.

perf_simple_query test (cassandra-stress-like partitions) shows regression
from 83k to 77k (7%) ops/s.

Fixes #1165.
f15c380
@avikivity avikivity closed this in f15c380 Apr 8, 2016
@avikivity avikivity added a commit that referenced this issue Apr 8, 2016
@avikivity avikivity Merge "Fix query digest mismatch" from Tomasz
"Currently data query digest includes cells and tombstones which may have
expired or be covered by higher-level tombstones. This causes digest
mismatch between replicas if some elements are compacted on one of the
nodes and not on others. This mismatch triggers read-repair which doesn't
resolve because mutations received by mutation queries are not differing,
they are compacted already.

The fix adds compacting step before writing and digesting query results by
reusing the algorithm used by mutation query. This is not the most optimal
way to fix this. The compaction step could be folded with the query writing,
there is redundancy in both steps. However such change carries more risk,
and thus was postponed.

perf_simple_query test (cassandra-stress-like partitions) shows regression
from 83k to 77k (7%) ops/s.

Fixes #1165."
db03295
@penberg penberg added a commit that referenced this issue Apr 9, 2016
@tgrabiec @penberg tgrabiec + penberg database: Compact mutations when executing data queries
Currently data query digest includes cells and tombstones which may have
expired or be covered by higher-level tombstones. This causes digest
mismatch between replicas if some elements are compacted on one of the
nodes and not on others. This mismatch triggers read-repair which doesn't
resolve because mutations received by mutation queries are not differing,
they are compacted already.

The fix adds compacting step before writing and digesting query results by
reusing the algorithm used by mutation query. This is not the most optimal
way to fix this. The compaction step could be folded with the query writing,
there is redundancy in both steps. However such change carries more risk,
and thus was postponed.

perf_simple_query test (cassandra-stress-like partitions) shows regression
from 83k to 77k (7%) ops/s.

Fixes #1165.

(cherry picked from commit f15c380)
e276e7b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment