Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache populating read concurrent with schema alter may use the wrong schema version to interpret sstable data #5127

Closed
tgrabiec opened this issue Oct 1, 2019 · 5 comments
Assignees
Milestone

Comments

@tgrabiec
Copy link
Contributor

@tgrabiec tgrabiec commented Oct 1, 2019

Installation details
Scylla version (or git commit hash): 2.3+

The sstable reader which populates the partition entry in the cache is using the schema of the partition entry snapshot, which will be the schema of the cache at the time the partition was entered. If there was a schema change after the cache reader entered the partition but before it created the sstable reader, the cache populating reader will interpret sstable fragments using the wrong schema version. That is more likely if partitions have many rows, and the front of the partition is populated. With single-row partitions that's unlikely to happen.

That is undefined behavior in general, which may include:

  • read failures due to bad_alloc, if fixed-size cells are interpreted as variable-sized cells, and we misinterpret a value for a huge size
  • wrong read results
  • node crash

This doesn't result in a permanent corruption, restarting the node should help.

@tgrabiec tgrabiec added the bug label Oct 1, 2019
@tgrabiec tgrabiec self-assigned this Oct 1, 2019
@denesb

This comment has been minimized.

Copy link
Contributor

@denesb denesb commented Oct 1, 2019

Won't this only manifest if sstables were flushed after the schema change? In general the fragments read by the reader will originate from sstables flushed before the schema change, and hence there won't be any problem, no?

@tgrabiec

This comment has been minimized.

Copy link
Contributor Author

@tgrabiec tgrabiec commented Oct 1, 2019

@denesb We're trying to interpret a clustering_row created with schema version v1 using schema version v2. Even if nothing changed in sstables between v1 and v2, that's undefined behavior. You may interpret data for column A as belonging to column B.

@denesb

This comment has been minimized.

Copy link
Contributor

@denesb denesb commented Oct 1, 2019

@roydahan

This comment has been minimized.

Copy link

@roydahan roydahan commented Oct 3, 2019

@tgrabiec what kind of schema changes can cause the issue?

@tgrabiec

This comment has been minimized.

Copy link
Contributor Author

@tgrabiec tgrabiec commented Oct 3, 2019

@roydahan ALTER TABLE { ADD | DROP }

tgrabiec added a commit to tgrabiec/scylla that referenced this issue Oct 4, 2019
avikivity added a commit that referenced this issue Oct 7, 2019
"
Fixes #5134, Eviction concurrent with preempted partition entry update after
  memtable flush may allow stale data to be populated into cache.

Fixes #5135, Cache reads may miss some writes if schema alter followed by a
  read happened concurrently with preempted partition entry update.

Fixes #5127, Cache populating read concurrent with schema alter may use the
  wrong schema version to interpret sstable data.

Fixes #5128, Reads of multi-row partitions concurrent with memtable flush may
  fail or cause a node crash after schema alter.
"

* tag 'fix-cache-issues-with-schema-alter-and-eviction-v2' of github.com:tgrabiec/scylla:
  tests: row_cache: Introduce test_alter_then_preempted_update_then_memtable_read
  tests: row_cache_stress_test: Verify all entries are evictable at the end
  tests: row_cache_stress_test: Exercise single-partition reads
  tests: row_cache_stress_test: Add periodic schema alters
  tests: memtable_snapshot_source: Allow changing the schema
  tests: simple_schema: Prepare for schema altering
  row_cache: Record upgraded schema in memtable entries during update
  memtable: Extract memtable_entry::upgrade_schema()
  row_cache, mvcc: Prevent locked snapshots from being evicted
  row_cache: Make evict() not use invalidate_unwrapped()
  mvcc: Introduce partition_snapshot::touch()
  row_cache, mvcc: Do not upgrade schema of entries which are being updated
  row_cache: Use the correct schema version to populate the partition entry
  delegating_reader: Optimize fill_buffer()
  row_cache, memtable: Use upgrade_schema()
  flat_mutation_reader: Introduce upgrade_schema()
@slivne slivne added this to the 3.2 milestone Oct 7, 2019
tgrabiec added a commit that referenced this issue Oct 18, 2019
"
Fixes #5134, Eviction concurrent with preempted partition entry update after
  memtable flush may allow stale data to be populated into cache.

Fixes #5135, Cache reads may miss some writes if schema alter followed by a
  read happened concurrently with preempted partition entry update.

Fixes #5127, Cache populating read concurrent with schema alter may use the
  wrong schema version to interpret sstable data.

Fixes #5128, Reads of multi-row partitions concurrent with memtable flush may
  fail or cause a node crash after schema alter.
"

* tag 'fix-cache-issues-with-schema-alter-and-eviction-v2' of github.com:tgrabiec/scylla:
  tests: row_cache: Introduce test_alter_then_preempted_update_then_memtable_read
  tests: row_cache_stress_test: Verify all entries are evictable at the end
  tests: row_cache_stress_test: Exercise single-partition reads
  tests: row_cache_stress_test: Add periodic schema alters
  tests: memtable_snapshot_source: Allow changing the schema
  tests: simple_schema: Prepare for schema altering
  row_cache: Record upgraded schema in memtable entries during update
  memtable: Extract memtable_entry::upgrade_schema()
  row_cache, mvcc: Prevent locked snapshots from being evicted
  row_cache: Make evict() not use invalidate_unwrapped()
  mvcc: Introduce partition_snapshot::touch()
  row_cache, mvcc: Do not upgrade schema of entries which are being updated
  row_cache: Use the correct schema version to populate the partition entry
  delegating_reader: Optimize fill_buffer()
  row_cache, memtable: Use upgrade_schema()
  flat_mutation_reader: Introduce upgrade_schema()

(cherry picked from commit 8ed6f94)
tgrabiec added a commit that referenced this issue Oct 23, 2019
"
Fixes #5134, Eviction concurrent with preempted partition entry update after
  memtable flush may allow stale data to be populated into cache.

Fixes #5135, Cache reads may miss some writes if schema alter followed by a
  read happened concurrently with preempted partition entry update.

Fixes #5127, Cache populating read concurrent with schema alter may use the
  wrong schema version to interpret sstable data.

Fixes #5128, Reads of multi-row partitions concurrent with memtable flush may
  fail or cause a node crash after schema alter.
"

* tag 'fix-cache-issues-with-schema-alter-and-eviction-v2' of github.com:tgrabiec/scylla:
  tests: row_cache: Introduce test_alter_then_preempted_update_then_memtable_read
  tests: row_cache_stress_test: Verify all entries are evictable at the end
  tests: row_cache_stress_test: Exercise single-partition reads
  tests: row_cache_stress_test: Add periodic schema alters
  tests: memtable_snapshot_source: Allow changing the schema
  tests: simple_schema: Prepare for schema altering
  row_cache: Record upgraded schema in memtable entries during update
  memtable: Extract memtable_entry::upgrade_schema()
  row_cache, mvcc: Prevent locked snapshots from being evicted
  row_cache: Make evict() not use invalidate_unwrapped()
  mvcc: Introduce partition_snapshot::touch()
  row_cache, mvcc: Do not upgrade schema of entries which are being updated
  row_cache: Use the correct schema version to populate the partition entry
  delegating_reader: Optimize fill_buffer()
  row_cache, memtable: Use upgrade_schema()
  flat_mutation_reader: Introduce upgrade_schema()

(cherry picked from commit 8ed6f94)
(cherry picked from commit 3f4d9f2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.