-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDC: reactor stalled up to 17000 ms after change cdc log table property #6098
Comments
Those stalls are in cache update when memtable is flushed. The good think is I'm finishing a PR that will make CDC Log not use cache. That should fix the issue. |
…g CDC Log' from Piotr We inherited from Origin a `caching` table parameter. It's a map of named caching parameters. Before this PR two caching parameters were expected: `keys` and `rows_per_partition`. So far we have been ignoring them. This PR adds a new caching parameter called `enabled` which can be set to `true` or `false` and controls the usage of the cache for the table. By default, it's set to `true` which reflects Scylla behavior before this PR. This new capability is used to disable caching for CDC Log table. It is desirable because CDC Log entries are not expected to be read often. They also put much more pressure on memory than entries in Base Table. This is caused by the fact that some writes to Base Table can override previous writes. Every write to CDC Log is unique and does not invalidate any previous entry. Fixes #6098 Fixes #6146 Tests: unit(dev, release), manual * haaawk-dont_cache_cdc: cdc: Don't cache CDC Log table table: invalidate disabled cache on memtable flush table: Add cache_enabled member function cf_prop_defs: persist caching_options in schema feature: add PER_TABLE_CACHING feature caching_options: add enabled parameter
It fixed the direct cause, but it can still happen on another table. I see it is a duplicate of #2577. And indeed the title says that a property was changed. |
cdc=experimental, so not backporting. |
@avikivity #2577 is till open, should we close it? |
@tzach no. This issue was fixed by avoiding using cache. The issue still exists in cache. |
Installation details
Scylla version (or git commit hash):
version 666.development-0.20200325.9fee712d62 with build-id 0908e1527b1f134a4e4a46ca44ffe85f0f877318
Cluster size: 3
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-045e69a7ca1cb853f
Cluster was configured with cdc feature. For base table cdc_test.test_table_preimage cdc preimage was enabled. After several minutes, for cdc log table properties was changed:
ALTER TABLE cdc_test.test_table_preimage_scylla_cdc_log WITH dclocal_read_repair_chance = 0.5
after that, on several nodes were found next messages in log
and reactor stalled reached up to 17000 ms:
DB nodes log link:
https://cloudius-jenkins-test.s3.amazonaws.com/77890b52-9030-4a13-95d8-42764428086a/20200326_174910/db-cluster-77890b52.zip
The text was updated successfully, but these errors were encountered: