Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restart_node_with_resharding nemesis seastar::semaphore_timed_out (Semaphore timedout) #4615

Closed
ShlomiBalalis opened this issue Jun 27, 2019 · 21 comments
Assignees
Labels
Milestone

Comments

@ShlomiBalalis
Copy link

@ShlomiBalalis ShlomiBalalis commented Jun 27, 2019

This is Scylla's bug tracker, to be used for reporting bugs only.
If you have a question about Scylla, and not a bug, please ask it in
our mailing-list at scylladb-dev@googlegroups.com or in our slack channel.

  • I have read the disclaimer above, and I am reporting a suspected malfunction in Scylla.

Installation details
Scylla version (or git commit hash): 3.1.0.rc2-0.20190624.433cb93f7
Cluster size: 5
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0ac07c9630d9e4eea
(Ireland)

Platform (physical/VM/cloud instance type/docker): i3.4xlarge

During the nemesis restart_node_with_resharding, the target node, all of the shards returned the error "semaphore_timed_out" from all of its shards.

Jun 26 13:36:25 ip-10-0-167-66.eu-west-1.compute.internal scylla.bin[17945]:  [shard 6] storage_proxy - Failed to apply mutation from 10.0.132.172#6: seastar::semaphore_timed_out (Semaphore timedout)

journal.log

@juliayakovlev

This comment has been minimized.

Copy link

@juliayakovlev juliayakovlev commented Jun 27, 2019

Same error got during longevity-mv-si-4days while destroy_data_then_rebuild nemesis. Reported from the node where data was rebuilt. The error happens during the destroying/rebuilding and after the node has been up.

Same ami/version as in the issue description

Test runs cassandra-stress scenarios:

"cassandra-stress user profile=/tmp/c-s_profile_4mv_5queries.yaml ops'(insert=15,read1=1,read2=1,read3=1,read4=1,read5=1)' cl=QUORUM duration=5760m -port jmx=6868 -mode cql3 native -rate threads=10",

"cassandra-stress user profile=/tmp/c-s_profile_2mv_2queries.yaml ops'(insert=6,mv_p_read1=1,mv_p_read2=1)' cl=QUORUM duration=5760m -port jmx=6868 -mode cql3 native -rate threads=10",

"cassandra-stress user profile=/tmp/c-s_profile_3si_5queries.yaml ops'(insert=25,si_read1=1,si_read2=1,si_read3=1,si_read4=1,si_read5=1)' cl=QUORUM duration=5760m -port jmx=6868 -mode cql3 native -rate threads=10",

"cassandra-stress user profile=/tmp/c-s_profile_2si_2queries.yaml ops'(insert=10,si_p_read1=1,si_p_read2=1)' cl=QUORUM duration=5760m -port jmx=6868 -mode cql3 native -rate threads=10"

Cassandra stress failed with read timeouts:

2019-06-26 13:10:51,668 remote           L0682 INFO | RemoteCmdRunner [centos@10.0.153.255]: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency QUORUM (3 responses were required but only 2 replica responded)
2019-06-26 13:10:51,708 remote           L0682 INFO | RemoteCmdRunner [centos@10.0.153.255]: java.io.IOException: Operation x10 on key(s) [3012032]: Error executing: (ReadTimeoutException): Cassandra timeout during read query at consistency QUORUM (3 responses were required but only 2 replica responded)

Timeouts started during nemesis and c-s failed

Screenshot from 2019-06-27 13-05-53

On the attached print screen: blue area - destroy_data_then_rebuild nemesis that was started at 13:08
Load failed at 13:11:30

Logs: https://cloudius-jenkins-test.s3.amazonaws.com/90f89887-310f-482f-9e1e-e132a493cc66/job-2019-06-26T12.18-ae93006.zip

@bentsi

This comment has been minimized.

Copy link

@bentsi bentsi commented Jun 27, 2019

Same issue happened in 2 parallel nemesis longevity, when starting CS threads with different compressions. The nemesis that were run during the issue ran "nodetool repair"

image
Events:


2019-06-26 21:38:30.991 | (CassandraStressEvent Severity.NORMAL): type=start node=Node longevity-tls-1tb-7d-1dis-2nondis-1-loader-node-7bc14f28-1 [34.245.1.134 \| 10.0.38.18] (seed: False) stress_cmd=cassandra-stress mixed         cl=QUORUM duration=10080m -schema keyspace=keyspace1 'replication(factor=3)                               compaction(strategy=LeveledCompactionStrategy)'    -port jmx=6868 -mode cql3 native   user=cassandra password=cassandra -rate threads=20 -pop seq=1..1100200300  -log interval=5 -col 'size=FIXED(1024) n=FIXED(1)' -node 10.0.184.222 | CassandraStressEvent, NORMAL, events, start
-- | -- | --
2019-06-26 21:39:01.151 | (CassandraStressEvent Severity.NORMAL): type=start node=Node longevity-tls-1tb-7d-1dis-2nondis-1-loader-node-7bc14f28-2 [34.241.106.112 \| 10.0.189.132] (seed: False) stress_cmd=cassandra-stress mixed         cl=QUORUM duration=10080m -schema keyspace=keyspace1 'replication(factor=3)                               compaction(strategy=LeveledCompactionStrategy)'    -port jmx=6868 -mode cql3 native   user=cassandra password=cassandra -rate threads=20 -pop seq=1..1100200300  -log interval=5 -col 'size=FIXED(1024) n=FIXED(1)' -node 10.0.184.222 | CassandraStressEvent, NORMAL, events, start
2019-06-26 21:39:11.169 | (CassandraStressEvent Severity.NORMAL): type=start node=Node longevity-tls-1tb-7d-1dis-2nondis-1-loader-node-7bc14f28-1 [34.245.1.134 \| 10.0.38.18] (seed: False) stress_cmd=cassandra-stress write         cl=QUORUM duration=10010m -schema keyspace=keyspace1 'replication(factor=3) compression=LZ4Compressor     compaction(strategy=SizeTieredCompactionStrategy)' -port jmx=6868 -mode cql3 native compression=lz4                    user=cassandra password=cassandra -rate threads=50 -pop seq=1..50000000    -log interval=5 -node 10.0.184.222 | CassandraStressEvent, NORMAL, events, start
2019-06-26 21:39:41.200 | (CassandraStressEvent Severity.NORMAL): type=start node=Node longevity-tls-1tb-7d-1dis-2nondis-1-loader-node-7bc14f28-2 [34.241.106.112 \| 10.0.189.132] (seed: False) stress_cmd=cassandra-stress write         cl=QUORUM duration=10010m -schema keyspace=keyspace1 'replication(factor=3) compression=LZ4Compressor     compaction(strategy=SizeTieredCompactionStrategy)' -port jmx=6868 -mode cql3 native compression=lz4                    user=cassandra password=cassandra -rate threads=50 -pop seq=1..50000000    -log interval=5 -node 10.0.184.222 | CassandraStressEvent, NORMAL, events, start
2019-06-26 21:39:51.978 | (CassandraStressEvent Severity.NORMAL): type=start node=Node longevity-tls-1tb-7d-1dis-2nondis-1-loader-node-7bc14f28-1 [34.245.1.134 \| 10.0.38.18] (seed: False) stress_cmd=cassandra-stress write         cl=QUORUM duration=10020m -schema keyspace=keyspace1 'replication(factor=3) compression=SnappyCompressor  compaction(strategy=SizeTieredCompactionStrategy)' -port jmx=6868 -mode cql3 native compression=snappy                 user=cassandra password=cassandra -rate threads=50 -pop seq=1..50000000    -log interval=5 -node 10.0.184.222 | CassandraStressEvent, NORMAL, events, start
2019-06-26 21:40:21.981 | (CassandraStressEvent Severity.NORMAL): type=start node=Node longevity-tls-1tb-7d-1dis-2nondis-1-loader-node-7bc14f28-2 [34.241.106.112 \| 10.0.189.132] (seed: False) stress_cmd=cassandra-stress write         cl=QUORUM duration=10020m -schema keyspace=keyspace1 'replication(factor=3) compression=SnappyCompressor  compaction(strategy=SizeTieredCompactionStrategy)' -port jmx=6868 -mode cql3 native compression=snappy                 user=cassandra password=cassandra -rate threads=50 -pop seq=1..50000000    -log interval=5 -node 10.0.184.222 | CassandraStressEvent, NORMAL, events, start
2019-06-26 21:40:32.328 | (CassandraStressEvent Severity.NORMAL): type=start node=Node longevity-tls-1tb-7d-1dis-2nondis-1-loader-node-7bc14f28-1 [34.245.1.134 \| 10.0.38.18] (seed: False) stress_cmd=cassandra-stress write         cl=QUORUM duration=10030m -schema keyspace=keyspace1 'replication(factor=3) compression=DeflateCompressor compaction(strategy=SizeTieredCompactionStrategy)' -port jmx=6868 -mode cql3 native compression=none                   user=cassandra password=cassandra -rate threads=50 -pop seq=1..50000000    -log interval=5 -node 10.0.184.222 | CassandraStressEvent, NORMAL, events, start
2019-06-26 21:40:45.902 | (DatabaseLogEvent Severity.CRITICAL): type=DATABASE_ERROR regex=Exception  line_number=45618 node=Node longevity-tls-1tb-7d-1dis-2nondis-1-db-node-7bc14f28-5 [54.171.151.159 \| 10.0.135.12] (seed: False) Jun 26 21:40:24 ip-10-0-135-12.eu-west-1.compute.internal scylla.bin[6507]:  [shard 7] storage_proxy - Exception when communicating with 10.0.135.12: seastar::semaphore_timed_out (Semaphore timedout) | DatabaseLogEvent, CRITICAL, events, DATABASE_ERROR
2019-06-26 21:41:02.390 | (CassandraStressEvent Severity.NORMAL): type=start node=Node longevity-tls-1tb-7d-1dis-2nondis-1-loader-node-7bc14f28-2 [34.241.106.112 \| 10.0.189.132] (seed: False) stress_cmd=cassandra-stress write         cl=QUORUM duration=10030m -schema keyspace=keyspace1 'replication(factor=3) compression=DeflateCompressor compaction(strategy=SizeTieredCompactionStrategy)' -port jmx=6868 -mode cql3 native compression=none                   user=cassandra password=cassandra -rate threads=50 -pop seq=1..50000000    -log interval=5 -node 10.0.184.222
21:41:48.831 | (DatabaseLogEvent Severity.CRITICAL): type=DATABASE_ERROR regex=Exception  line_number=47567 node=Node longevity-tls-1tb-7d-1dis-2nondis-1-db-node-7bc14f28-5 [54.171.151.159 \| 10.0.135.12] (seed: False) Jun 26 21:41:10 ip-10-0-135-12.eu-west-1.compute.internal scylla.bin[6507]:  [shard 7] storage_proxy - Exception when communicating with 10.0.135.12: seastar::semaphore_timed_out (Semaphore timedout) | DatabaseLogEvent, CRITICAL, events, DATABASE_ERROR
2019-06-26 21:41:48.841 | (DatabaseLogEvent Severity.CRITICAL): type=DATABASE_ERROR regex=Exception  line_number=47651 node=Node longevity-tls-1tb-7d-1dis-2nondis-1-db-node-7bc14f28-5 [54.171.151.159 \| 10.0.135.12] (seed: False) Jun 26 21:41:15 ip-10-0-135-12.eu-west-1.compute.internal scylla.bin[6507]:  [shard 8] storage_proxy - Exception when communicating with 10.0.135.12: seastar::semaphore_timed_out (Semaphore timedout) | DatabaseLogEvent, CRITICAL, events, DATABASE_ERROR
2019-06-26 21:41:48.850 | (DatabaseLogEvent Severity.CRITICAL): type=DATABASE_ERROR regex=Exception  line_number=47733 node=Node longevity-tls-1tb-7d-1dis-2nondis-1-db-node-7bc14f28-5 [54.171.151.159 \| 10.0.135.12] (seed: False) Jun 26 21:41:18 ip-10-0-135-12.eu-west-1.compute.internal scylla.bin[6507]:  [shard 12] storage_proxy - Exception when communicating with 10.0.135.12: seastar::semaphore_timed_out (Semaphore timedout) | DatabaseLogEvent, CRITICAL, events, DATABASE_ERROR
@slivne slivne added this to the 3.1 milestone Jun 27, 2019
@slivne slivne added high bug labels Jun 27, 2019
@slivne

This comment has been minimized.

Copy link
Contributor

@slivne slivne commented Jun 30, 2019

@bentsi any reproducer so we can check whats going on ?

@juliayakovlev

This comment has been minimized.

Copy link

@juliayakovlev juliayakovlev commented Jun 30, 2019

It happened again during MV-SI test.
The "semaphore_timed_out" messages started during hard reboot nemesis.
Later found also a lot of " std::runtime_error (sstable read queue overloaded)" errors:

[shard 10] storage_proxy - Exception when communicating with 10.0.179.136: std::runtime_error (sstable read queue overloaded)

Jenkins job: job log

@fruch

This comment has been minimized.

Copy link

@fruch fruch commented Jul 1, 2019

This is happening also in the 200gb longevity, after major_compaction or restart_and_repair nemesis.

@slivne

This comment has been minimized.

Copy link
Contributor

@slivne slivne commented Jul 8, 2019

@roydahan / @bentsi / @fruch - AFAIK you have run 3.0.8 longevities as well - did this reproduce - if not please provide Avi access and lets have him look at the two runs side by side to see if he can extract more info from a comparison.

@slivne

This comment has been minimized.

Copy link
Contributor

@slivne slivne commented Jul 8, 2019

@bentsi please work with Avi on this

@roydahan

This comment has been minimized.

Copy link

@roydahan roydahan commented Jul 8, 2019

We don't have the monitoring of 3.0 anymore, @bentsi is running in parallel longevities that should easily reproduce it, one with 3.1 and one 3.0.

@slivne

This comment has been minimized.

Copy link
Contributor

@slivne slivne commented Jul 9, 2019

@bentsi find Avi with the info

@bentsi

This comment has been minimized.

Copy link

@bentsi bentsi commented Jul 9, 2019

3.0.8: The run unable to finish till the end, stress fails due to issue #4668
3.1: longevity is still running but the issue didn't reproduce

@slivne

This comment has been minimized.

Copy link
Contributor

@slivne slivne commented Jul 15, 2019

@bentsi any update if not lets close - we can allways reopen if you hit this again

@bentsi

This comment has been minimized.

Copy link

@bentsi bentsi commented Jul 15, 2019

I am running another longevity with parallel Nemesis, will update tomorrow if it happened

@bentsi

This comment has been minimized.

Copy link

@bentsi bentsi commented Jul 17, 2019

the issue wasn't reproduced in the last run of multiple Nemesis longevity on 3.1.0.rc2-0.20190713.92bf92817 so I think we can close this.

@bhalevy bhalevy closed this Jul 18, 2019
@juliayakovlev

This comment has been minimized.

Copy link

@juliayakovlev juliayakovlev commented Aug 19, 2019

The issue is reproduced in the MV-SI longevity.
Scylla version 3.1.0.rc3-0.20190816.d06bcef3b
Jenkins job:

https://jenkins.scylladb.com/view/scylla-3.1/job/scylla-3.1/view/scylla-3.1-test/job/longevity/job/longevity-mv-si-4days/8/consoleFull

Monitor IP: 34.201.51.128

@juliayakovlev

This comment has been minimized.

Copy link

@juliayakovlev juliayakovlev commented Aug 20, 2019

@bhalevy Benny, the issue is reprodused in 3.1. Please, see my comment above

@bhalevy bhalevy reopened this Aug 20, 2019
@yarongilor

This comment has been minimized.

Copy link

@yarongilor yarongilor commented Aug 20, 2019

reproduced in 3.1.0.rc3-0.20190816.d06bcef3b 50GB longevity as well.
received error of:

[35.175.151.33 | 172.30.0.154] (seed: True)
2019-08-19T19:54:19+00:00 ip-172-30-0-154 !WARNING | scylla.bin: [shard 1] storage_proxy - Failed to apply mutation from 172.30.0.74#1: seastar::semaphore_timed_out (Semaphore timedout)
@bentsi

This comment has been minimized.

Copy link

@bentsi bentsi commented Aug 22, 2019

happened to me during parallel nemesis run (3.1.0.rc3-0.20190818.d06bcef3b):
Grafana dashboards:

@avikivity

This comment has been minimized.

Copy link
Contributor

@avikivity avikivity commented Aug 22, 2019

Fixed by 2c74354

@avikivity avikivity closed this Aug 22, 2019
avikivity added a commit that referenced this issue Aug 22, 2019
…" from Piotr

"
Streamed view updates parasitized on writing io priority, which is
reserved for user writes - it's now properly bound to streaming
write priority.

Verified manually by checking appropriate io metrics: scylla_io_queue_total_bytes{class="streaming_write" ...} vs scylla_io_queue_total_bytes{class="query" ...}

Tests: unit(dev)
"

Fixes #4615.

* 'assign_proper_io_priority_to_streaming_view_updates' of https://github.com/psarna/scylla:
  db,view: wrap view update generation in stream scheduling group
  database: assign proper io priority for streaming view updates

(cherry picked from commit 2c74354)
avikivity added a commit that referenced this issue Aug 22, 2019
…" from Piotr

"
Streamed view updates parasitized on writing io priority, which is
reserved for user writes - it's now properly bound to streaming
write priority.

Verified manually by checking appropriate io metrics: scylla_io_queue_total_bytes{class="streaming_write" ...} vs scylla_io_queue_total_bytes{class="query" ...}

Tests: unit(dev)
"

Fixes #4615.

* 'assign_proper_io_priority_to_streaming_view_updates' of https://github.com/psarna/scylla:
  db,view: wrap view update generation in stream scheduling group
  database: assign proper io priority for streaming view updates

(cherry picked from commit 2c74354)
@avikivity

This comment has been minimized.

Copy link
Contributor

@avikivity avikivity commented Aug 22, 2019

The issue also exists in 3.0. Why didn't 3.0 testing catch it?

@bentsi

This comment has been minimized.

Copy link

@bentsi bentsi commented Aug 22, 2019

in 3.0 we didn't have SCT events and so it was very hard to analyze the logs... (especially when the issue didn't cause Scylla to produce coredump or C-S crash)

psarna added a commit to psarna/scylla that referenced this issue Dec 3, 2019
Generating view updates can be a source of high read amplification,
since updates may need to perform read-before-write.
After ensuring that readers participating in this read-before-write
process use `bypass_cache`, the read amplification from a test case:

1. Creating an index
  CREATE INDEX index1  ON myks2.standard1 ("C1")
2. Running cassandra-stress in order to generate view updates
cassandra-stress write no-warmup n=1000000 cl=ONE -schema \
  'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \
  keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors
  skip-read-validation -node 127.0.0.1

... dropped to 1.5GB, down from quite astonishing 36GB (sic!)
from before the change.

Fixes scylladb#4615
psarna added a commit to psarna/scylla that referenced this issue Dec 4, 2019
... do not have fast-forwarding enabled, as it may lead to huge
read amplification. The observed case was:
1. Creating an index.
  CREATE INDEX index1  ON myks2.standard1 ("C1");
2. Running cassandra-stress in order to generate view updates.
cassandra-stress write no-warmup n=1000000 cl=ONE -schema \
  'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \
  keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors
  skip-read-validation -node 127.0.0.1;

Without disabling fast-forwarding, single-partition readers
were turned into scanning readers in cache, which resulted
in reading 36GB (sic!) on a workload which generates less
than 1GB of view updates. After applying the fix, the number
dropped down to less than 1GB, as expected.

Refs scylladb#5409
Fixes scylladb#4615
psarna added a commit to psarna/scylla that referenced this issue Dec 4, 2019
... do not have fast-forwarding enabled, as it may lead to huge
read amplification. The observed case was:
1. Creating an index.
  CREATE INDEX index1  ON myks2.standard1 ("C1");
2. Running cassandra-stress in order to generate view updates.
cassandra-stress write no-warmup n=1000000 cl=ONE -schema \
  'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \
  keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors
  skip-read-validation -node 127.0.0.1;

Without disabling fast-forwarding, single-partition readers
were turned into scanning readers in cache, which resulted
in reading 36GB (sic!) on a workload which generates less
than 1GB of view updates. After applying the fix, the number
dropped down to less than 1GB, as expected.

Refs scylladb#5409
Fixes scylladb#4615
Fixes scylladb#5418
psarna added a commit to psarna/scylla that referenced this issue Dec 4, 2019
This commit makes sure that single-partition readers for
read-before-write do not have fast-forwarding enabled,
as it may lead to huge read amplification. The observed case was:
1. Creating an index.
  CREATE INDEX index1  ON myks2.standard1 ("C1");
2. Running cassandra-stress in order to generate view updates.
cassandra-stress write no-warmup n=1000000 cl=ONE -schema \
  'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \
  keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors
  skip-read-validation -node 127.0.0.1;

Without disabling fast-forwarding, single-partition readers
were turned into scanning readers in cache, which resulted
in reading 36GB (sic!) on a workload which generates less
than 1GB of view updates. After applying the fix, the number
dropped down to less than 1GB, as expected.

Refs scylladb#5409
Fixes scylladb#4615
Fixes scylladb#5418
tgrabiec added a commit that referenced this issue Dec 4, 2019
This commit makes sure that single-partition readers for
read-before-write do not have fast-forwarding enabled,
as it may lead to huge read amplification. The observed case was:
1. Creating an index.
  CREATE INDEX index1  ON myks2.standard1 ("C1");
2. Running cassandra-stress in order to generate view updates.
cassandra-stress write no-warmup n=1000000 cl=ONE -schema \
  'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \
  keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors
  skip-read-validation -node 127.0.0.1;

Without disabling fast-forwarding, single-partition readers
were turned into scanning readers in cache, which resulted
in reading 36GB (sic!) on a workload which generates less
than 1GB of view updates. After applying the fix, the number
dropped down to less than 1GB, as expected.

Refs #5409
Fixes #4615
Fixes #5418
avikivity added a commit that referenced this issue Dec 5, 2019
This commit makes sure that single-partition readers for
read-before-write do not have fast-forwarding enabled,
as it may lead to huge read amplification. The observed case was:
1. Creating an index.
  CREATE INDEX index1  ON myks2.standard1 ("C1");
2. Running cassandra-stress in order to generate view updates.
cassandra-stress write no-warmup n=1000000 cl=ONE -schema \
  'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \
  keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors
  skip-read-validation -node 127.0.0.1;

Without disabling fast-forwarding, single-partition readers
were turned into scanning readers in cache, which resulted
in reading 36GB (sic!) on a workload which generates less
than 1GB of view updates. After applying the fix, the number
dropped down to less than 1GB, as expected.

Refs #5409
Fixes #4615
Fixes #5418
avikivity added a commit that referenced this issue Dec 5, 2019
This commit makes sure that single-partition readers for
read-before-write do not have fast-forwarding enabled,
as it may lead to huge read amplification. The observed case was:
1. Creating an index.
  CREATE INDEX index1  ON myks2.standard1 ("C1");
2. Running cassandra-stress in order to generate view updates.
cassandra-stress write no-warmup n=1000000 cl=ONE -schema \
  'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \
  keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors
  skip-read-validation -node 127.0.0.1;

Without disabling fast-forwarding, single-partition readers
were turned into scanning readers in cache, which resulted
in reading 36GB (sic!) on a workload which generates less
than 1GB of view updates. After applying the fix, the number
dropped down to less than 1GB, as expected.

Refs #5409
Fixes #4615
Fixes #5418
avikivity added a commit that referenced this issue Dec 5, 2019
This commit makes sure that single-partition readers for
read-before-write do not have fast-forwarding enabled,
as it may lead to huge read amplification. The observed case was:
1. Creating an index.
  CREATE INDEX index1  ON myks2.standard1 ("C1");
2. Running cassandra-stress in order to generate view updates.
cassandra-stress write no-warmup n=1000000 cl=ONE -schema \
  'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \
  keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors
  skip-read-validation -node 127.0.0.1;

Without disabling fast-forwarding, single-partition readers
were turned into scanning readers in cache, which resulted
in reading 36GB (sic!) on a workload which generates less
than 1GB of view updates. After applying the fix, the number
dropped down to less than 1GB, as expected.

Refs #5409
Fixes #4615
Fixes #5418

(cherry picked from commit 79c3a50)
avikivity added a commit that referenced this issue Dec 5, 2019
This commit makes sure that single-partition readers for
read-before-write do not have fast-forwarding enabled,
as it may lead to huge read amplification. The observed case was:
1. Creating an index.
  CREATE INDEX index1  ON myks2.standard1 ("C1");
2. Running cassandra-stress in order to generate view updates.
cassandra-stress write no-warmup n=1000000 cl=ONE -schema \
  'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \
  keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors
  skip-read-validation -node 127.0.0.1;

Without disabling fast-forwarding, single-partition readers
were turned into scanning readers in cache, which resulted
in reading 36GB (sic!) on a workload which generates less
than 1GB of view updates. After applying the fix, the number
dropped down to less than 1GB, as expected.

Refs #5409
Fixes #4615
Fixes #5418

(cherry picked from commit 79c3a50)
avikivity added a commit that referenced this issue Dec 5, 2019
This commit makes sure that single-partition readers for
read-before-write do not have fast-forwarding enabled,
as it may lead to huge read amplification. The observed case was:
1. Creating an index.
  CREATE INDEX index1  ON myks2.standard1 ("C1");
2. Running cassandra-stress in order to generate view updates.
cassandra-stress write no-warmup n=1000000 cl=ONE -schema \
  'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \
  keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors
  skip-read-validation -node 127.0.0.1;

Without disabling fast-forwarding, single-partition readers
were turned into scanning readers in cache, which resulted
in reading 36GB (sic!) on a workload which generates less
than 1GB of view updates. After applying the fix, the number
dropped down to less than 1GB, as expected.

Refs #5409
Fixes #4615
Fixes #5418

(cherry picked from commit 79c3a50)
@avikivity

This comment has been minimized.

Copy link
Contributor

@avikivity avikivity commented Dec 5, 2019

Backported to 3.0, 3.1, 3.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.