New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
appending_hash<row> ignores cells after first null #4567
Comments
@pdziepak I need a bit more context about this - is this a bug in computing digests ? and potentially not identifying a case in which two rows are different and because of the return the hashes do match ? since when do we have this ? Example Schema {PK PRIMARY KEY, A INT ,B INT ,C INT} Node 1: PK=1,A=1,C=3 how did you find this bug - code inspection / unitests ? |
d773e4b is the commit that introduced the bug (for the aficionados of archaeology, in the earlier versions of that patch there was a lambda and the The user-visible problem is exactly like you've shown in your example. If the queried columns are A, B, and C, then node 1 and 2 will produce the same digest since only column A will be taken into consideration. I've spotted this by inspecting the code trying to find the source of an unrelated bug. |
Currently, appending_hash<row> returns as soon as it encounters first missing cell (i.e. a null). This is incorrect, since any present cells after that null will not contribute to the hash. The right behaviour is to simply ignore missing cells. Fixes scylladb#4567.
We need to fix this asap. This is the root cause for #7116. |
@psarna this is a blocker for Jepsen. Please be advised there is a PR in place with requested changes on it (by pdziepak) |
Perhaps we can use a new hash function enum (with the same hash function) to describe the behavioral change. |
@avikivity this is a regression. All the problems you are trying to avoid by feature bits or new hash function would have present when the regression was introduced. It did not cause any mayhem so much so that we did not notice it. The same will be true for fixing the regression. |
@gleb-cloudius we don't know that it did not cause damage. Also, we have more users now, so there is more potential for problems. |
@avikivity we know because otherwise we would have notice the bug. "More users" argument is hard to dispute, but it is not like 2 years ago we had small amount either. Read repair for affected rows will not kill anyone and if repair uses the same hashing function it is likely read repair will happen anyway because data will be different after 2 years of been neglected. If schema digest may be different this is more serious problem, but may it? |
More users = more chances the problem will surface. I don't want to risk ruining someone's day during an upgrade, which is already stressful. I think it can happen during schema digest. Why wouldn't it? |
Read repair will not ruin anyone's day. If it will they will have the bad day regardless because data is out of sync after we neglected to repair it for two years. For the problem to happen a cell should not be present. Is it possible for schema tables? May be it is and we got lucky and nobody hit the problem when upgraded to broken version. |
Note that nodetool repair is not prone to this bug, it doesn't use this function. |
On Tue, Sep 08, 2020 at 05:43:49AM -0700, Tomasz Grabiec wrote:
Note that nodetool repair is not prone to this bug, it doesn't use this function.
That's great. It means no data was lost due to it.
…--
Gleb.
|
Schema digest is also not prone. |
I have a partially working work-in-progress series based on Paweł's commits (https://github.com/psarna/scylla/tree/fix_ignoring_cells_after_null_in_appending_hash) which adds a cluster feature for this, but it's quite tough to decide where to check for this feature bit... The code in this branch is ugly and doesn't work well with our test suite, but I don't have any clear idea how to approach it, so suggestions are welcome. To sum up - the cluster feature check needs to be placed somewhere, and I don't see a good place for it. In my patch I placed the check inside /cc @avikivity |
I suggested before to model it as a hash function change: add digest_algorithm::xxHash_with_null_fix. Then specialize on the hasher. I see there is also a digester class that type-erases the algorithm, so maybe not so easy to specialize. |
" This series fixes a bug in `appending_hash<row>` that caused it to ignore any cells after the first NULL. It also adds a cluster feature which starts using the new hashing only after the whole cluster is aware of it. The series comes with tests, which reproduce the issue. Fixes #4567 Based on #4574 " * psarna-fix_ignoring_cells_after_null_in_appending_hash: test: extend mutation_test for NULL values tests/mutation: add reproducer for #4567 gms: add a cluster feature for fixed hashing digest: add null values to row digest mutation_partition: fix formatting appending_hash<row>: make publicly visible
" This series fixes a bug in `appending_hash<row>` that caused it to ignore any cells after the first NULL. It also adds a cluster feature which starts using the new hashing only after the whole cluster is aware of it. The series comes with tests, which reproduce the issue. Fixes #4567 Based on #4574 " * psarna-fix_ignoring_cells_after_null_in_appending_hash: test: extend mutation_test for NULL values tests/mutation: add reproducer for #4567 gms: add a cluster feature for fixed hashing digest: add null values to row digest mutation_partition: fix formatting appending_hash<row>: make publicly visible
@psarna how does this interact with the cached hashes? I think it should not, since the cached hashes are on a cell level. My concern is that hashes generated with one hash function will be used with another hash function. Are deleted cells and never-written cells treated the same way? |
No idea, I'll browse the code and try to verify |
From what I see here, deleted and never-written cells are not treated the same way. Hashes are cached for template<>
struct appending_hash<atomic_cell_view> {
template<typename Hasher>
void operator()(Hasher& h, atomic_cell_view cell, const column_definition& cdef) const {
feed_hash(h, cell.is_live());
feed_hash(h, cell.timestamp());
if (cell.is_live()) {
if (cdef.is_counter()) {
counter_cell_view::with_linearized(cell, [&] (counter_cell_view ccv) {
::feed_hash(h, ccv);
});
return;
}
if (cell.is_live_and_has_ttl()) {
feed_hash(h, cell.expiry());
feed_hash(h, cell.ttl());
}
feed_hash(h, cell.value());
} else {
feed_hash(h, cell.deletion_time());
}
}
}; |
Is it correct to use default_hasher here? Shouldn't we use Even more in this variant:
Although xxhash is compatible with legacy_xx_hasher_without_null_digest below the row level. So it's just a code wart, not a correctnesss problem. |
And that's correct, since we'd want deleted cells with different deletion times to trigger a read repair. This can cause an unnecessary read repair if a tombstone is hashed in one replica and nothing in another, because the tombstone was expired and garbage collected. But I see nothing we can do about it. |
Sounds right. I'll send a patch to remove the wart(s). |
@roydahan we need feedback on this fix, especially during upgrades. |
@avikivity I'll need more information about this. |
Suppose you have a row pk ck c1 c2 c3 The risk is that during the upgrade we'd see a spike in read repairs. So we need a test that runs a QUORUM read workload before, during, and after the upgrade, and checks for read repair. There can be a few read repairs initiated after the upgrade, but there should not be any during the upgrade or a large number afterwards. The test should run on a few million rows, say reading 1000 rows/sec. It can be manual, no need to repeat it. |
Please backport to 4.2 |
" This series fixes a bug in `appending_hash<row>` that caused it to ignore any cells after the first NULL. It also adds a cluster feature which starts using the new hashing only after the whole cluster is aware of it. The series comes with tests, which reproduce the issue. Fixes #4567 Based on #4574 " * psarna-fix_ignoring_cells_after_null_in_appending_hash: test: extend mutation_test for NULL values tests/mutation: add reproducer for #4567 gms: add a cluster feature for fixed hashing digest: add null values to row digest mutation_partition: fix formatting appending_hash<row>: make publicly visible (cherry picked from commit 0e03c97)
@roydahan Did you finish testing this? |
I've done the test Avi suggested, but I can't really verify that I had the case where the replicas differs from each other after the NULL. In any case, it looks like it doesn't break anything. |
Backported to 4.2 Lets wait with additional backports till it matures in the field |
" This series fixes a bug in `appending_hash<row>` that caused it to ignore any cells after the first NULL. It also adds a cluster feature which starts using the new hashing only after the whole cluster is aware of it. The series comes with tests, which reproduce the issue. Fixes #4567 Based on #4574 " * psarna-fix_ignoring_cells_after_null_in_appending_hash: test: extend mutation_test for NULL values tests/mutation: add reproducer for #4567 gms: add a cluster feature for fixed hashing digest: add null values to row digest mutation_partition: fix formatting appending_hash<row>: make publicly visible (cherry picked from commit 0e03c97)
Backported to 4.1. |
* 'master' of github.com:scylladb/scylla: (28 commits) scylla-gdb.py: add scylla schema command scylla-gdb.py: add pretty-printer for bytes scylla-gdb.py: don't use the string display hint for UUIDs alternator test: fix two tests that failed in HTTPS mode alternator test: add reproducing tests for several issues test: extend mutation_test for NULL values tests/mutation: add reproducer for scylladb#4567 gms: add a cluster feature for fixed hashing digest: add null values to row digest mutation_partition: fix formatting appending_hash<row>: make publicly visible tools: toolchain: update for gnutls-3.6.15 storage_service: Fix use-after-free when calculating effective ownership cql3: Fix NULL reference in get_column_defs_for_filtering storage_service: Fix a TOKENS update race for replace operation Add support passing python3 dependencies from main repo to scylla-python3 script Update seastar submodule dist/common/scripts: abort scylla_prepare with better error message clustering_row: Do not re-implement deletable_row deletable_row: Do not mess with clustering_row ...
" This series fixes a bug in `appending_hash<row>` that caused it to ignore any cells after the first NULL. It also adds a cluster feature which starts using the new hashing only after the whole cluster is aware of it. The series comes with tests, which reproduce the issue. Fixes scylladb#4567 Based on scylladb#4574 " * psarna-fix_ignoring_cells_after_null_in_appending_hash: test: extend mutation_test for NULL values tests/mutation: add reproducer for scylladb#4567 gms: add a cluster feature for fixed hashing digest: add null values to row digest mutation_partition: fix formatting appending_hash<row>: make publicly visible (cherry picked from commit 0e03c97)
Scylla version (or git commit hash): d773e4b
appending_hash<row>
is supposed to update the provided hasher with information form the selected columns. However, the loops that is supposed to do that ends early if a requested column doesn't not exist. As a result, a row hash will not include information about the cells after the first missing one.The text was updated successfully, but these errors were encountered: