LWT - Data population takes significantly longer than before #9331

roydahan · 2021-09-13T11:38:34Z

Installation details
Kernel version: 5.4.0-1035-aws
Scylla version (or git commit hash): 4.5.rc7-0.20210906.edead1caf
Cluster size: 4 nodes (i3.4xlarge)
Scylla running with shards number (live nodes):
longevity-lwt-500G-3d-4-5-db-node-2447519d-1 (63.33.190.129 | 10.0.0.59): 14 shards
longevity-lwt-500G-3d-4-5-db-node-2447519d-2 (34.243.2.11 | 10.0.3.58): 14 shards
longevity-lwt-500G-3d-4-5-db-node-2447519d-3 (54.74.79.228 | 10.0.1.88): 14 shards
longevity-lwt-500G-3d-4-5-db-node-2447519d-4 (54.216.64.60 | 10.0.3.78): 14 shards
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0e0abb5a45374673b (aws: eu-west-1)

Test: longevity-lwt-500G-3d-test
Test name: longevity_lwt_test.LWTLongevityTest.test_lwt_longevity
Test config file(s):

longevity-lwt-500G-3d.yaml
Test id: 2447519d-8e50-4bcc-9f14-c9046cb649f1

Issue description

====================================
This long longevity test of LWT is trying to write 400000000 keys (about 500gb of data), using c-s "user profile" (see below).
The tool is being run from 3 different loaders, each writing one third of the dataset.
Writing the entire dataset should take around 8 hours (based on previous runs of this test).

In this case, it seems that the data population hasn't been able to complete during the entire test (3 days).
The write throughput is consistently decreasing during the first few hours until it reaches a very low number of op/s.

====================================

Live monitor: http://34.252.214.230:3000/d/dGxHpZSnz/longevity-lwt-500g-3d-test-scylla-per-server-metrics-nemesis-master?orgId=1&from=1631032607724&to=1631169036853
(Restore Monitor Stack command: $ hydra investigate show-monitor 2447519d-8e50-4bcc-9f14-c9046cb649f1)
screenshot: https://snapshot.raintank.io/dashboard/snapshot/yLJrkc9UvaQGg1fUTDB5IReSWskowZ3v

A monitor for 4.4 run: http://44.193.79.222:3000/d/KWzIAbI7k/longevity-lwt-500g-3d-test-scylla-per-server-metrics-nemesis-master?orgId=1&from=1615474723261&to=1615524652826
(Restore Monitor Stack command: $ hydra investigate show-monitor 79499b7f-2c8f-45ad-8292-3f137c85e025)
Screenshot: https://cloudius-jenkins-test.s3.amazonaws.com/79499b7f-2c8f-45ad-8292-3f137c85e025/20210312_030749/grafana-screenshot-longevity-lwt-500g-3d-test-scylla-per-server-metrics-nemesis-20210312_031129-longevity-lwt-500G-3d-4-4-monitor-node-79499b7f-1.png

Logs:
grafana - https://cloudius-jenkins-test.s3.amazonaws.com/2447519d-8e50-4bcc-9f14-c9046cb649f1/20210910_191248/grafana-screenshot-longevity-lwt-500g-3d-test-scylla-per-server-metrics-nemesis-20210910_191711-longevity-lwt-500G-3d-4-5-monitor-node-2447519d-1.png
db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/2447519d-8e50-4bcc-9f14-c9046cb649f1/20210910_192216/db-cluster-2447519d.tar.gz
events - https://cloudius-jenkins-test.s3.amazonaws.com/2447519d-8e50-4bcc-9f14-c9046cb649f1/20210910_192216/events.log.tar.gz
loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/2447519d-8e50-4bcc-9f14-c9046cb649f1/20210910_192216/loader-set-2447519d.tar.gz
monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/2447519d-8e50-4bcc-9f14-c9046cb649f1/20210910_192216/monitor-set-2447519d.tar.gz
events - https://cloudius-jenkins-test.s3.amazonaws.com/2447519d-8e50-4bcc-9f14-c9046cb649f1/20210910_192216/raw_events.log.tar.gz

Show all stored logs command: $ hydra investigate show-logs 2447519d-8e50-4bcc-9f14-c9046cb649f1

Jenkins job URL

the load is defined by:

prepare_write_cmd: ["cassandra-stress user profile=/tmp/c-s_lwt_big_data.yaml n=133333333 ops'(insert_query=1)' cl=QUORUM -mode native cql3 -rate threads=100 -pop seq=1..133333333",
                    "cassandra-stress user profile=/tmp/c-s_lwt_big_data.yaml n=133333333 ops'(insert_query=1)' cl=QUORUM -mode native cql3 -rate threads=100 -pop seq=133333334..266666666",
                    "cassandra-stress user profile=/tmp/c-s_lwt_big_data.yaml n=133333333 ops'(insert_query=1)' cl=QUORUM -mode native cql3 -rate threads=100 -pop seq=266666667..400000000"
                   ]
where c-s_lwt_big_data.yaml is:

# LWT test: create and update data using LWT.
### DML ###

# Keyspace Name
keyspace: cqlstress_lwt_example

# The CQL for creating a keyspace (optional if it already exists)
keyspace_definition: |
  CREATE KEYSPACE IF NOT EXISTS cqlstress_lwt_example WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};

# Table name
table: blogposts

# The CQL for creating a table you wish to stress (optional if it already exists)
table_definition: |
  CREATE TABLE IF NOT EXISTS blogposts (
        domain int,
        published_date int,
        lwt_indicator int,
        url blob,
        url1 blob,
        url2 blob,
        url3 blob,
        url4 blob,
        author text,
        title text,
        c ascii,
        d varchar,
        e boolean,
        f inet,
        g int,
        h bigint,
        char ascii,
        vchar varchar,
        finished boolean,
        osirisjson blob,
        ip inet,
        size int,
        imagebase bigint,
        small_num smallint,
        tiny_num tinyint,
        vint varint,
        decimal_num decimal,
        double_num double,
        float_num float,
        start_date date,
        start_time time,
        check_date timestamp,
        pass_date timestamp,
        upload_time timeuuid,
        user_uid uuid,
        name_set set<text>,
        name_list list<text>,
        PRIMARY KEY(domain, published_date)
  ) WITH compaction = { 'class':'LeveledCompactionStrategy' }
    AND comment='A table to hold blog posts'

extra_definitions:
  - create MATERIALIZED VIEW blogposts_update_one_column_lwt_indicator as select domain, lwt_indicator, author from blogposts where domain is not null and published_date is not null and lwt_indicator >=-2000 and lwt_indicator < 0 PRIMARY KEY(lwt_indicator, domain, published_date);
  - create MATERIALIZED VIEW blogposts_update_one_column_lwt_indicator_after_update as select lwt_indicator, author from blogposts where domain is not null and published_date is not null and lwt_indicator = 30000000 PRIMARY KEY(lwt_indicator, domain, published_date);
  - create MATERIALIZED VIEW blogposts_update_2_columns_lwt_indicator as select lwt_indicator, author from blogposts where domain is not null and published_date is not null and lwt_indicator > 0 and lwt_indicator <= 2000 PRIMARY KEY(lwt_indicator, domain, published_date);
  - create MATERIALIZED VIEW blogposts_update_2_columns_lwt_indicator_after_update as select lwt_indicator, author from blogposts where domain is not null and published_date is not null and lwt_indicator = 20000000 PRIMARY KEY(lwt_indicator, domain, published_date);
  - create MATERIALIZED VIEW blogposts_not_updated_lwt_indicator as select lwt_indicator, author from blogposts where domain is not null and published_date is not null and lwt_indicator > 1000000 and lwt_indicator < 20000000 PRIMARY KEY(lwt_indicator, domain, published_date);
  - create MATERIALIZED VIEW blogposts_deletions as select lwt_indicator, author from blogposts where domain is not null and published_date is not null and lwt_indicator > 10000 and lwt_indicator < 100000 PRIMARY KEY(lwt_indicator, domain, published_date);

### Column Distribution Specifications ###

columnspec:
  - name: domain
    population: seq(1..500000000)  #500M possible domains to pick from

  - name: published_date
    cluster: uniform(1..2000)         #under each domain we will have max 2000 posts

  - name: lwt_indicator
    population: seq(1..1001000)

  - name: url
    size: fixed(200)

  - name: url1
    size: fixed(200)

  - name: url2
    size: fixed(200)

  - name: url3
    size: fixed(200)

  - name: url4
    size: fixed(200)

  - name: title                  #titles shouldn't go beyond 20 chars
    size: gaussian(10..20)

  - name: author
    size: uniform(5..20)         #author names should be short

### Batch Ratio Distribution Specifications ###

insert:
  partitions: fixed(1)            # Our partition key is the domain so only insert one per batch

  select:    fixed(1)/2000        # We have 2000 posts per domain so 1/2000 will allow 1 post per batch

  batchtype: UNLOGGED             # Unlogged batches


#
# A list of queries you wish to run against the schema
#
queries:
   insert_query:
      cql: insert into blogposts(domain, published_date, lwt_indicator,  url, url1, url2, url3, url4, author, title, c, d, e, f, g, h, char, vchar, finished, osirisjson, ip, size, imagebase, small_num, tiny_num, vint, decimal_num, double_num, float_num, start_date, start_time, check_date, pass_date, upload_time, user_uid, name_set, name_list) values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) if not exists
      fields: samerow
   select:
      cql: select * from blogposts where domain = ? LIMIT 1
      fields: samerow
   lwt_update_one_column:
      cql: update blogposts set lwt_indicator = 30000000 where domain = ? and published_date = ? if lwt_indicator >=-2000 and lwt_indicator < 0
      fields: samerow
   lwt_update_two_columns:
      cql: update blogposts set lwt_indicator = 20000000, author = 'text' where domain = ? and published_date = ? if lwt_indicator > 0 and lwt_indicator <= 2000 and author != 'text'
      fields: samerow
   lwt_deletes:
     cql: delete from blogposts where domain = ? and published_date = ? if lwt_indicator > 10000 and lwt_indicator < 100000
     fields: samerow

The text was updated successfully, but these errors were encountered:

roydahan · 2021-09-13T11:41:51Z

One observation we had from the logs is huge amount of semaphore_timeouts as described in #7779 (comment)

avikivity · 2021-09-13T14:50:35Z

/cc @kostja @elcallio

What commit worked? Probably related to reverting commitlog improvements.

slivne · 2021-09-13T15:25:30Z

there are no writes blocked on commitlog

roydahan · 2021-09-14T07:36:35Z

I updated in the original description the link to 4.4 monitor

slivne · 2021-09-14T11:06:47Z

@avikivity ping - please look at 4.4 / 4.5 monitors do you see anything pointing to the possible issue ?

slivne · 2021-09-14T11:07:32Z

@roydahan - lets run on 4.4.4 (latest 4.4 version) and see if it reproduces on that.

bhalevy · 2021-09-14T13:37:03Z

Looking at the logs, the timeouts are happening mostly on node-1.

db-cluster-2447519d$ grep -c -- '-1 .*Operation timed out' timeouts.log 
905743
db-cluster-2447519d$ grep -c -- '-[234] .*Operation timed out' timeouts.log 
13332

bhalevy · 2021-09-14T13:53:16Z

Here's an example of a burst of errors from the log:

2021-09-07T17:44:23+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !WARNING | scylla:  [shard 5] storage_proxy - Failed to apply mutation from 10.0.1.88#5: exceptions::mutation_write_timeout_exception (Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE.)
2021-09-07T17:44:23+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] rpc - client 10.0.1.88:53163 msg_id 605543:  exception "Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE." in no_wait handler ignored
2021-09-07T17:44:23+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] rpc - client 10.0.1.88:53163 msg_id 605542:  exception "Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE." in no_wait handler ignored
2021-09-07T17:44:23+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !WARNING | scylla:  [shard 5] storage_proxy - Failed to apply mutation from 10.0.1.88#5: exceptions::mutation_write_timeout_exception (Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE.)
2021-09-07T17:44:23+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !WARNING | scylla: message repeated 2 times: [  [shard 5] storage_proxy - Failed to apply mutation from 10.0.1.88#5: exceptions::mutation_write_timeout_exception (Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE.)]
2021-09-07T17:44:24+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] rpc - client 10.0.1.88:53163 msg_id 605979:  exception "Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE." in no_wait handler ignored
2021-09-07T17:44:24+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] rpc - client 10.0.1.88:53163 msg_id 605982:  exception "Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE." in no_wait handler ignored
...
2021-09-07T17:44:57+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !WARNING | scylla:  [shard 5] storage_proxy - Failed to apply mutation from 10.0.1.88#5: exceptions::mutation_write_timeout_exception (Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE.)
2021-09-07T17:44:57+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !WARNING | scylla:  [shard 5] storage_proxy - Failed to apply mutation from 10.0.1.88#5: exceptions::mutation_write_timeout_exception (Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE.)
2021-09-07T17:44:57+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !WARNING | scylla:  [shard 5] storage_proxy - Failed to apply mutation from 10.0.3.58#5: exceptions::mutation_write_timeout_exception (Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE.)
2021-09-07T17:44:58+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] compaction - [Compact system.paxos b978bf40-1002-11ec-ba6b-caa0a3b95357] Compacted 3 sstables to [/var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-117-big-Data.db:level=0, ]. 23MB to 21MB (~91% o
f original) in 254486ms = 85kB/s. ~420608 total partitions merged to 420425.
2021-09-07T17:44:58+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !WARNING | scylla:  [shard 5] storage_proxy - Failed to apply mutation from 10.0.3.58#5: exceptions::mutation_write_timeout_exception (Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE.)
2021-09-07T17:44:58+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] compaction - [Compact system.paxos 5195a270-1003-11ec-ba6b-caa0a3b95357] Compacting [/var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-355-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data
/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-327-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-285-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-243-big-Data.db:level=0:origin=memtable, /var/
lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-341-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-229-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-397-big-Data.db:level=0:origin=
memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-299-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-313-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-257-big-Data.db:
level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-271-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-411-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-2
15-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-201-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-383-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053e
f614bb7c2d/md-369-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-453-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-439-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c
2fd0a34108c053ef614bb7c2d/md-425-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-187-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-159-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/syste
m/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-173-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-145-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-131-big-Data.db:level=0:origin=memtable, /var/lib/sc
ylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-117-big-Data.db:level=0:origin=compaction, ]
2021-09-07T17:44:58+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] rpc - client 10.0.3.58:64909 msg_id 615650:  exception "Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE." in no_wait handler ignored
2021-09-07T17:44:58+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !WARNING | scylla:  [shard 5] storage_proxy - Failed to apply mutation from 10.0.3.58#5: exceptions::mutation_write_timeout_exception (Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE.)
2021-09-07T17:44:58+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | sshd[50277]: Accepted publickey for scyllaadm from 10.0.3.77 port 48506 ssh2: RSA SHA256:1YmYWh7roz4Y3FSY7N8CetmQNlK/K5FeBPITC8KF0u8
2021-09-07T17:44:58+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !WARNING | scylla:  [shard 5] storage_proxy - Failed to apply mutation from 10.0.3.58#5: exceptions::mutation_write_timeout_exception (Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE.)
2021-09-07T17:44:58+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] rpc - client 10.0.3.58:64909 msg_id 615658:  exception "Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE." in no_wait handler ignored
...
2021-09-07T17:45:02+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] compaction - [Compact cqlstress_lwt_example.blogposts_deletions 50e60180-1003-11ec-ba6b-caa0a3b95357] Compacted 4 sstables to [/var/lib/scylla/data/cqlstress_lwt_example/blogposts_deletions-b15e6d10100111ec8481ca9da3b95357/md-257-big-Data.db:level=0, ]. 501kB to 240kB (~48% of original) in 4202ms = 57kB/s. ~2816 total partitions merged to 2415.
2021-09-07T17:45:02+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] compaction - [Compact cqlstress_lwt_example.blogposts_deletions 5405b810-1003-11ec-ba6b-caa0a3b95357] Compacting [/var/lib/scylla/data/cqlstress_lwt_example/blogposts_deletions-b15e6d10100111ec8481ca9da3b95357/md-285-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts_deletions-b15e6d10100111ec8481ca9da3b95357/md-271-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts_deletions-b15e6d10100111ec8481ca9da3b95357/md-257-big-Data.db:level=0:origin=compaction, ]
2021-09-07T17:45:02+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] rpc - client 10.0.3.78:53149 msg_id 641064:  exception "Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE." in no_wait handler ignored
2021-09-07T17:45:02+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] rpc - client 10.0.3.78:53149 msg_id 641066:  exception "Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE." in no_wait handler ignored
2021-09-07T17:45:02+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] rpc - client 10.0.3.78:53149 msg_id 641068:  exception "Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE." in no_wait handler ignored
2021-09-07T17:45:02+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !WARNING | scylla:  [shard 5] storage_proxy - Failed to apply mutation from 10.0.3.78#5: exceptions::mutation_write_timeout_exception (Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE.)
2021-09-07T17:45:03+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !WARNING | scylla: message repeated 2 times: [  [shard 5] storage_proxy - Failed to apply mutation from 10.0.3.78#5: exceptions::mutation_write_timeout_exception (Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE.)]
2021-09-07T17:45:03+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] rpc - client 10.0.3.78:53149 msg_id 641965:  exception "Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE." in no_wait handler ignored
2021-09-07T17:45:03+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] rpc - client 10.0.3.78:53149 msg_id 641972:  exception "Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE." in no_wait handler ignored
2021-09-07T17:45:03+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !WARNING | scylla:  [shard 5] storage_proxy - Failed to apply mutation from 10.0.3.78#5: exceptions::mutation_write_timeout_exception (Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE.)
2021-09-07T17:45:05+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | sshd[50358]: Accepted publickey for scyllaadm from 10.0.3.77 port 48530 ssh2: RSA SHA256:1YmYWh7roz4Y3FSY7N8CetmQNlK/K5FeBPITC8KF0u8
2021-09-07T17:45:05+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | sshd[50358]: pam_unix(sshd:session): session opened for user scyllaadm by (uid=0)
2021-09-07T17:45:05+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | systemd-logind[666]: New session 421 of user scyllaadm.
2021-09-07T17:45:05+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | systemd[1]: Started Session 421 of user scyllaadm.
2021-09-07T17:45:06+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] compaction - [Compact cqlstress_lwt_example.blogposts_deletions 5405b810-1003-11ec-ba6b-caa0a3b95357] Compacted 3 sstables to [/var/lib/scylla/data/cqlstress_lwt_example/blogposts_deletions-b15e6d10100111ec8481ca9da3b95357/md-299-big-Data.db:level=0, ]. 409kB to 241kB (~59% of original) in 4170ms = 57kB/s. ~2688 total partitions merged to 2431.
2021-09-07T17:45:07+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] compaction - [Compact cqlstress_lwt_example.blogposts_update_one_column_lwt_indicator 56e928f0-1003-11ec-ba6b-caa0a3b95357] Compacting [/var/lib/scylla/data/cqlstress_lwt_example/blogposts_update_one_column_lwt_indicator-aeec5ba0100111ec923a54b3294ab14a/md-229-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts_update_one_column_lwt_indicator-aeec5ba0100111ec923a54b3294ab14a/md-61-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts_update_one_column_lwt_indicator-aeec5ba0100111ec923a54b3294ab14a/md-215-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts_update_one_column_lwt_indicator-aeec5ba0100111ec923a54b3294ab14a/md-89-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts_update_one_column_lwt_indicator-aeec5ba0100111ec923a54b3294ab14a/md-145-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts_update_one_column_lwt_indicator-aeec5ba0100111ec923a54b3294ab14a/md-103-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts_update_one_column_lwt_indicator-aeec5ba0100111ec923a54b3294ab14a/md-75-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts_update_one_column_lwt_indicator-aeec5ba0100111ec923a54b3294ab14a/md-131-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts_update_one_column_lwt_indicator-aeec5ba0100111ec923a54b3294ab14a/md-47-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts_update_one_column_lwt_indicator-aeec5ba0100111ec923a54b3294ab14a/md-117-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts_update_one_column_lwt_indicator-aeec5ba0100111ec923a54b3294ab14a/md-201-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts_update_one_column_lwt_indicator-aeec5ba0100111ec923a54b3294ab14a/md-187-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts_update_one_column_lwt_indicator-aeec5ba0100111ec923a54b3294ab14a/md-173-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts_update_one_column_lwt_indicator-aeec5ba0100111ec923a54b3294ab14a/md-159-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts_update_one_column_lwt_indicator-aeec5ba0100111ec923a54b3294ab14a/md-33-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts_update_one_column_lwt_indicator-aeec5ba0100111ec923a54b3294ab14a/md-19-big-Data.db:level=0:origin=memtable, ]
2021-09-07T17:45:08+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | sshd[50358]: pam_unix(sshd:session): session closed for user scyllaadm
2021-09-07T17:45:08+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | systemd[1]: session-421.scope: Succeeded.
2021-09-07T17:45:08+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | systemd-logind[666]: Session 421 logged out. Waiting for processes to exit.
2021-09-07T17:45:08+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | systemd-logind[666]: Removed session 421.
2021-09-07T17:45:08+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 4] compaction - [Compact system.paxos 2be77c60-1003-11ec-87a5-ca9ba3b95357] Compacted 2 sstables to [/var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-214-big-Data.db:level=0, ]. 29MB to 29MB (~98% of original) in 73422ms = 401kB/s. ~565760 total partitions merged to 565550.
2021-09-07T17:45:11+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] compaction - [Compact cqlstress_lwt_example.blogposts_update_one_column_lwt_indicator 56e928f0-1003-11ec-ba6b-caa0a3b95357] Compacted 16 sstables to [/var/lib/scylla/data/cqlstress_lwt_example/blogposts_update_one_column_lwt_indicator-aeec5ba0100111ec923a54b3294ab14a/md-243-big-Data.db:level=0, ]. 619kB to 140kB (~22% of original) in 3008ms = 46kB/s. ~2048 total partitions merged to 107.
2021-09-07T17:45:11+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] compaction - [Compact cqlstress_lwt_example.blogposts_update_2_columns_lwt_indicator 59593e90-1003-11ec-ba6b-caa0a3b95357] Compacting [/var/lib/scylla/data/cqlstress_lwt_example/blogposts_update_2_columns_lwt_indicator-af5de220100111ec8e161c3028c316f0/md-61-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts_update_2_columns_lwt_indicator-af5de220100111ec8e161c3028c316f0/md-47-big-Data.db:level=0:origin=compaction, ]
2021-09-07T17:45:13+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | sshd[50438]: Accepted publickey for scyllaadm from 10.0.3.77 port 48552 ssh2: RSA SHA256:1YmYWh7roz4Y3FSY7N8CetmQNlK/K5FeBPITC8KF0u8
2021-09-07T17:45:13+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | sshd[50438]: pam_unix(sshd:session): session opened for user scyllaadm by (uid=0)
2021-09-07T17:45:13+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | systemd-logind[666]: New session 422 of user scyllaadm.
2021-09-07T17:45:13+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | systemd[1]: Started Session 422 of user scyllaadm.
2021-09-07T17:45:14+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] compaction - [Compact cqlstress_lwt_example.blogposts_update_2_columns_lwt_indicator 59593e90-1003-11ec-ba6b-caa0a3b95357] Compacted 2 sstables to [/var/lib/scylla/data/cqlstress_lwt_example/blogposts_update_2_columns_lwt_indicator-af5de220100111ec8e161c3028c316f0/md-75-big-Data.db:level=0, ]. 99kB to 93kB (~94% of original) in 2875ms = 32kB/s. ~256 total partitions merged to 51.
2021-09-07T17:45:15+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 5] compaction - [Compact system.paxos 5ba79b60-1003-11ec-ba6b-caa0a3b95357] Compacting [/var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-495-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-467-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d/md-509-big-Data.db:level=0:origin=memtable, ]
2021-09-07T17:45:15+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | sshd[50438]: pam_unix(sshd:session): session closed for user scyllaadm
2021-09-07T17:45:15+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | systemd[1]: session-422.scope: Succeeded.
2021-09-07T17:45:15+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | systemd-logind[666]: Session 422 logged out. Waiting for processes to exit.
2021-09-07T17:45:15+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | systemd-logind[666]: Removed session 422.
2021-09-07T17:45:20+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 0] compaction - [Compact cqlstress_lwt_example.blogposts 8142f230-1002-11ec-80ac-ca9ca3b95357] Compacted 4 sstables to [/var/lib/scylla/data/cqlstress_lwt_example/blogposts-adacafb0100111ec80a556cf069bba9f/md-98-big-Data.db:level=1, /var/lib/scylla/data/cqlstress_lwt_example/blogposts-adacafb0100111ec80a556cf069bba9f/md-140-big-Data.db:level=1, /var/lib/scylla/data/cqlstress_lwt_example/blogposts-adacafb0100111ec80a556cf069bba9f/md-182-big-Data.db:level=1, ]. 467MB to 468MB (~100% of original) in 371664ms = 1MB/s. ~318336 total partitions merged to 317990.
2021-09-07T17:45:20+00:00  longevity-lwt-500G-3d-4-5-db-node-2447519d-1 !INFO    | scylla:  [shard 0] compaction - [Compact cqlstress_lwt_example.blogposts 5ed3ae00-1003-11ec-80ac-ca9ca3b95357] Compacting [/var/lib/scylla/data/cqlstress_lwt_example/blogposts-adacafb0100111ec80a556cf069bba9f/md-126-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts-adacafb0100111ec80a556cf069bba9f/md-154-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts-adacafb0100111ec80a556cf069bba9f/md-168-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts-adacafb0100111ec80a556cf069bba9f/md-112-big-Data.db:level=0:origin=memtable, /var/lib/scylla/data/cqlstress_lwt_example/blogposts-adacafb0100111ec80a556cf069bba9f/md-182-big-Data.db:level=1:origin=compaction, /var/lib/scylla/data/cqlstress_lwt_example/blogposts-adacafb0100111ec80a556cf069bba9f/md-140-big-Data.db:level=1:origin=compaction, /var/lib/scylla/data/cqlstress_lwt_example/blogposts-adacafb0100111ec80a556cf069bba9f/md-98-big-Data.db:level=1:origin=compaction, ]

@kostja @elcallio @raphaelsc I wonder if lwt might be creating lots of small sstables causing the read timeouts, until they are compacted.

roydahan · 2021-09-15T06:52:33Z

The test on 4.4.4 looks good, the throughput is normal, the population phase finished in several hours and nemesis started to run.

monitor: http://54.75.44.177:3000/d/ly9LCfSnk/longevity-lwt-500g-3d-test-on-4-4-scylla-per-server-metrics-nemesis-master?orgId=1&refresh=30s&from=now-24h&to=now

slivne · 2021-09-15T07:06:55Z

If we are generating more sstables it should be visisble in counters for both sstables and also for memtable flush. lets check this out.

avikivity · 2021-09-15T11:19:23Z

there are no writes blocked on commitlog

Probably the counter is lying. With LWT all writes should be blocked on commitlog.

This exception is ignored by design, but if it's left unhandled, it generates `Exceptional future ignored` warnings, like the following. Also, ignore f2 if f1 failed since we return early in this case. ``` [shard 5] seastar - Exceptional future ignored: seastar::named_semaphore_timed_out (Semaphore timed out: _read_concurrency_sem), backtrace: 0x431689e 0x4316d40 0x43170e8 0x3f35486 0x218d14a 0x3f8002f 0x3f81217 0x3f9f868 0x3f4b76a /opt/scylladb/libreloc/libpthread.so.0+0x93f8 /opt/scylladb/libreloc/libc.so.6+0x101902#012 N7seastar12continuationINS_8internal22promise_base_with_typeISt7variantIJN5utils4UUIDEN7service5paxos7promiseEEEEEZZZZNS7_11paxos_state7prepareEN7tracing15trace_state_ptrENS_13lw_shared_ptrIK6schemaEERKN5query12read_commandERK13partition_keyS5_bNSI_16digest_algorithmENSt6chrono10time_pointINS_12lowres_clockENSQ_8durationIlSt5ratioILl1ELl1000EEEEEEENK3$_0clEvENUlvE_clEvENKUlSB_E_clESB_EUlT_E_ZNS_6futureISt5tupleIJNS13_IvEENS13_IS14_IJNSE_INSI_6resultEEE17cache_temperatureEEEEEEE14then_impl_nrvoIS12_NS13_IS9_EEEET0_OS11_EUlOSA_RS12_ONS_12future_stateIS1B_EEE_S1B_EE#012 seastar::continuation<seastar::internal::promise_base_with_type<std::variant<utils::UUID, service::paxos::promise> >, seastar::future<std::variant<utils::UUID, service::paxos::promise> >::finally_body<seastar::with_semaphore<seastar::semaphore_default_exception_factory, seastar::lowres_clock, service::paxos::paxos_state::prepare(tracing::trace_state_ptr, seastar::lw_shared_ptr<schema const>, query::read_command const&, partition_key const&, utils::UUID, bool, query::digest_algorithm, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::$_0::operator()() const::{lambda()#1}>(seastar::basic_semaphore<seastar::semaphore_default_exception_factory, seastar::lowres_clock>&, unsigned long, seastar::lowres_clock::duration, std::result_of&&)::{lambda(seastar::basic_semaphore)#1}::operator()<seastar::semaphore_units<seastar::semaphore_default_exception_factory, seastar::lowres_clock> >(seastar::basic_semaphore)::{lambda()#1}, false>, seastar::future<std::variant<utils::UUID, service::paxos::promise> >::then_wrapped_nrvo<seastar::future<std::variant<utils::UUID, service::paxos::promise> >, seastar::semaphore_units<seastar::semaphore_default_exception_factory, seastar::lowres_clock> >(seastar::future<std::variant<utils::UUID, service::paxos::promise> >&&)::{lambda(seastar::internal::promise_base_with_type<std::variant<utils::UUID, service::paxos::promise> >&&, seastar::semaphore_units<seastar::semaphore_default_exception_factory, seastar::lowres_clock>&, seastar::future_state<std::variant<utils::UUID, service::paxos::promise> >&&)#1}, std::variant<utils::UUID, service::paxos::promise> >#12 seastar::continuation<seastar::internal::promise_base_with_type<std::variant<utils::UUID, service::paxos::promise> >, seastar::future<std::variant<utils::UUID, service::paxos::promise> >::finally_body<service::paxos::paxos_state::key_lock_map::with_locked_key<service::paxos::paxos_state::prepare(tracing::trace_state_ptr, seastar::lw_shared_ptr<schema const>, query::read_command const&, partition_key const&, utils::UUID, bool, query::digest_algorithm, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::$_0::operator()() const::{lambda()#1}>(dht::token const&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >, std::result_of)::{lambda()#1}, false>, seastar::future<std::variant<utils::UUID, service::paxos::promise> >::then_wrapped_nrvo<seastar::future<std::variant<utils::UUID, service::paxos::promise> >, {lambda()#1}>({lambda()#1}&&)::{lambda(seastar::internal::promise_base_with_type<std::variant<utils::UUID, service::paxos::promise> >&&, {lambda()#1}&, seastar::future_state<std::variant<utils::UUID, service::paxos::promise> >&&)#1}, std::variant<utils::UUID, service::paxos::promise> >#12 seastar::continuation<seastar::internal::promise_base_with_type<std::variant<utils::UUID, service::paxos::promise> >, seastar::future<std::variant<utils::UUID, service::paxos::promise> >::finally_body<service::paxos::paxos_state::prepare(tracing::trace_state_ptr, seastar::lw_shared_ptr<schema const>, query::read_command const&, partition_key const&, utils::UUID, bool, query::digest_algorithm, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::$_0::operator()() const::{lambda()#2}, false>, seastar::future<std::variant<utils::UUID, service::paxos::promise> >::then_wrapped_nrvo<seastar::future<std::variant<utils::UUID, service::paxos::promise> >, service::paxos::paxos_state::prepare(tracing::trace_state_ptr, seastar::lw_shared_ptr<schema const>, query::read_command const&, partition_key const&, utils::UUID, bool, query::digest_algorithm, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::$_0::operator()() const::{lambda()#2}>(service::paxos::paxos_state::prepare(tracing::trace_state_ptr, seastar::lw_shared_ptr<schema const>, query::read_command const&, partition_key const&, utils::UUID, bool, query::digest_algorithm, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::$_0::operator()() const::{lambda()#2}&&)::{lambda(seastar::internal::promise_base_with_type<std::variant<utils::UUID, service::paxos::promise> >&&, service::paxos::paxos_state::prepare(tracing::trace_state_ptr, seastar::lw_shared_ptr<schema const>, query::read_command const&, partition_key const&, utils::UUID, bool, query::digest_algorithm, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::$_0::operator()() const::{lambda()#2}&, seastar::future_state<std::variant<utils::UUID, service::paxos::promise> >&&)#1}, std::variant<utils::UUID, service::paxos::promise> >#12 seastar::continuation<seastar::internal::promise_base_with_type<seastar::foreign_ptr<std::unique_ptr<std::variant<utils::UUID, service::paxos::promise>, std::default_delete<std::variant<utils::UUID, service::paxos::promise> > > > >, service::storage_proxy::init_messaging_service()::$_51::operator()(seastar::rpc::client_info const&, seastar::rpc::opt_time_point, query::read_command, partition_key, utils::UUID, bool, query::digest_algorithm, std::optional<tracing::trace_info>) const::{lambda(seastar::lw_shared_ptr<schema const>)#1}::operator()(seastar::lw_shared_ptr<schema const>)::{lambda()#1}::operator()() const::{lambda(std::variant<utils::UUID, service::paxos::promise>)#1}, seastar::future<std::variant<utils::UUID, service::paxos::promise> >::then_impl_nrvo<{lambda()#1}, {lambda()#1}<seastar::foreign_ptr<std::unique_ptr<std::variant<utils::UUID, service::paxos::promise>, std::default_delete<std::variant<utils::UUID, service::paxos::promise> > > > > >({lambda()#1}&&)::{lambda(seastar::internal::promise_base_with_type<seastar::foreign_ptr<std::unique_ptr<std::variant<utils::UUID, service::paxos::promise>, std::default_delete<std::variant<utils::UUID, service::paxos::promise> > > > >&&, {lambda()#1}&, seastar::future_state<std::variant<utils::UUID, service::paxos::promise> >&&)#1}, std::variant<utils::UUID, service::paxos::promise> >#12 seastar::continuation<seastar::internal::promise_base_with_type<seastar::foreign_ptr<std::unique_ptr<std::variant<utils::UUID, service::paxos::promise>, std::default_delete<std::variant<utils::UUID, service::paxos::promise> > > > >, seastar::future<seastar::foreign_ptr<std::unique_ptr<std::variant<utils::UUID, service::paxos::promise>, std::default_delete<std::variant<utils::UUID, service::paxos::promise> > > > >::finally_body<seastar::smp::submit_to<service::storage_proxy::init_messaging_service()::$_51::operator()(seastar::rpc::client_info const&, seastar::rpc::opt_time_point, query::read_command, partition_key, utils::UUID, bool, query::digest_algorithm, std::optional<tracing::trace_info>) const::{lambda(seastar::lw_shared_ptr<schema const>)#1}::operator()(seastar::lw_shared_ptr<schema const>)::{lambda()#1}>(unsigned int, se ``` Refs #7779 Refs #9331 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210919053007.13960-1-bhalevy@scylladb.com>

bhalevy · 2021-09-19T11:07:20Z

@kostja please look into this issue asap (it's blocking 4.5)
and assign someone to get to the bottom of it.

slivne · 2021-09-19T21:29:21Z

@elcallio can you please also look at the metrics can you see anything suggesting writes are blocked on commitlog in some form. The run is with commitlog hard limit disabled.

slivne · 2021-09-19T21:33:54Z

@kostja / @elcallio - to make it easier to find

this monitor is with 45.rc7 that has commitlog hard limit disabled
http://34.252.214.230:3000/d/dGxHpZSnz/longevity-lwt-500g-3d-test-scylla-per-server-metrics-nemesis-master?orgId=1&from=1631032607724&to=1631169036853
this monitor is with 4.4.4 same workload in which populate phased finished
http://54.75.44.177:3000/d/ly9LCfSnk/longevity-lwt-500g-3d-test-on-4-4-scylla-per-server-metrics-nemesis-master?orgId=1&refresh=30s&from=now-24h&to=now - the 4.4 good run screenshots are available above (@roydahan can you please ask someone to get the monitor up and post an updated link for 4.4.4)

elcallio · 2021-09-20T09:36:23Z

@slivne - I can't see anything from the first monitor, and the second is dead (?). In any case, we never show any commitlog stalls in these - I am starting to wonder if the graphana counter is wired properly?
I can't see any significant changes between 4.4 and the current 4.5 version, other than 5fcc206 where we added multi-entry support. Also ab55a1b (partial flush/threshold) and improved (?) pre-alloc 48ca01c.

Note that the partial flush could be a reason for perhaps smaller sstables being written (initially).

slivne · 2021-09-20T20:02:56Z

@bentsi Magidovich ***@***.***> can you please fix the monitors / update which ones calle shoild use and ping him we are stuck on thia Calle once they are avail.please try to exlplain you theory via metrics and screen shots (from bith versons)

…

On Mon, Sep 20, 2021, 12:36 Calle Wilund ***@***.***> wrote: @slivne <https://github.com/slivne> - I can't see anything from the first monitor, and the second is dead (?). In any case, we never show any commitlog stalls in these - I am starting to wonder if the graphana counter is wired properly? I can't see any significant changes between 4.4 and the current 4.5 version, other than 5fcc206 <5fcc206> where we added multi-entry support. Also ab55a1b <ab55a1b> (partial flush/threshold) and improved (?) pre-alloc 48ca01c <48ca01c> . Note that the partial flush *could* be a reason for perhaps smaller sstables being written (initially). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9331 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2OCCBIZCUIU2XCKKWFHVLUC36CDANCNFSM5D5TU74Q> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

elcallio · 2021-09-21T06:52:19Z

@slivne - I don't think fixing monitors really helps any. As I said, the CL blocked/write counters never seem to contain anything, Which makes me think they are either not read properly, or not presented properly (or don't lend themselves to presenting).
I have no theories, esp. since noone seems to be very certain what it is that blocks...

slivne · 2021-09-21T10:10:52Z

We have metrics for memtable flush and fsync at prometheus level can we check them out

…

On Tue, Sep 21, 2021, 09:52 Calle Wilund ***@***.***> wrote: @slivne <https://github.com/slivne> - I don't think fixing monitors really helps any. As I said, the CL blocked/write counters never seem to contain anything, Which makes me think they are either not read properly, or not presented properly (or don't lend themselves to presenting). I have no theories, esp. since noone seems to be very certain *what* it is that blocks... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9331 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2OCCDWEY6HRRXTZEFK76TUDATS7ANCNFSM5D5TU74Q> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

bentsi · 2021-09-21T11:07:17Z

Which monitors do you need?
this one(45.rc7) is working:
http://34.252.214.230:3000/d/dGxHpZSnz/longevity-lwt-500g-3d-test-scylla-per-server-metrics-nemesis-master?orgId=1&from=1631032607724&to=1631169036853
the monitor with 4.4.4 is not, I will check whether I can bring it up

elcallio · 2021-09-21T11:14:20Z

If I read the metric correctly, we only had "many" memtable flushes in the beginning of the run (first 3-4h or so). Way before any of the log messages above... Not sure if that means anything though...

kostja · 2021-09-21T19:36:17Z

I've been running this for a few hours locally with --commitlog-use-hard-size-limit=true; did not observe a performance drop that would be unusual. running without the option.
;

kostja · 2021-09-22T06:05:41Z

Left the test running overnight with --commitlog-use-hard-size-limit=false, did not observe a performance drop. 131GB of data in blogposts is stored in ~6000 files on disk, which seems to be adequate (~400 files were used for 6GB)

bhalevy · 2021-09-22T06:15:17Z

@kostja just to make sure, are you testing master or 4.5?

kostja · 2021-09-22T08:23:27Z

master

fgelcer · 2021-09-22T08:25:03Z

master

the issue was found and reported against 4.5

kostja · 2021-09-22T08:52:22Z

@fgelcer I'm sorry but this is not the issue. The issue is not found and I am trying to find it. An issue contains a test case that I as an engineer can easily reproduce. A bunch of monitors left around after a 3-day run help me improve my telepathy skills but are a very inefficient way to nail down the problem.
If there is a strong reason to believe that the issue was fixed in master, please share it.

roydahan · 2021-09-22T11:07:28Z

Left the test running overnight with --commitlog-use-hard-size-limit=false, did not observe a performance drop. 131GB of data in blogposts is stored in ~6000 files on disk, which seems to be adequate (~400 files were used for 6GB)

@kostja I don't know how you ran it and for how long, but the test should write 500GB of data in 8 hours.

slivne · 2021-09-23T09:44:03Z

@kostja - QA have a specific use cases hat fails on a specific setup using a specific version - the issue reproduces on that environment and use case - lets use that - to first understand where the issue is and then try to minimize / change / reproduce on a smaller setup etc.

The bug exists and it reproduces.

eliransin · 2022-05-03T13:12:03Z

I can't seem to get resolved lines out of that AMI... The scylla-packaged python addr2line is missing, and trying to use scripts with local (apt) packages fails due to version mismatches (?). Even trying to use native elfutils on the scylla exec just dies... Is there debug info installed into the AMI?

yes

elcallio · 2022-05-03T13:29:45Z

Unfortunately, I could still not get any info out of the image, neither the booted up ubuntu, or copying the scylla executable to another machine. No idea why the ubuntu addr2line fails, but it does...

eliransin · 2022-05-09T09:55:35Z

I will try to help with this.

eliransin · 2022-05-23T07:29:34Z

The stack trace for the reactor stall in #9331 (comment):

seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) + 210 in section .text of /opt/scylladb/libexec/scylla
seastar::internal::cpu_stall_detector::generate_trace() + 288 in section .text of /opt/scylladb/libexec/scylla
seastar::reactor::block_notifier(int) + 80 in section .text of /opt/scylladb/libexec/scylla
No symbol matches 0x7f01168fda1f.
logalloc::region_impl::free(void*, unsigned long) + 719 in section .text of /opt/scylladb/libexec/scylla
bplus::tree<unsigned long, cached_file::cached_page, cached_file::page_idx_less_comparator, 12ul, (bplus::key_search)0, (bplus::with_debug)0>::iterator::erase(cached_file::page_idx_less_comparator) + 1692 in section .text of /opt/scylladb/libexec/scylla
cached_file::cached_page::on_evicted() + 116 in section .text of /opt/scylladb/libexec/scylla
std::_Function_handler<seastar::memory::reclaiming_result (), cache_tracker::cache_tracker(mutation_application_stats&, seastar::bool_class<register_metrics_tag>)::$_0>::_M_invoke(std::_Any_data const&) + 146 in section .text of /opt/scylladb/libexec/scylla
logalloc::region_impl::evict_some() + 77 in section .text of /opt/scylladb/libexec/scylla
logalloc::reclaim_from_evictable(logalloc::region_impl&, unsigned long, seastar::bool_class<is_preemptible_tag>) + 1144 in section .text of /opt/scylladb/libexec/scylla
logalloc::tracker::impl::compact_and_evict_locked(unsigned long, unsigned long, seastar::bool_class<is_preemptible_tag>) + 5523 in section .text of /opt/scylladb/libexec/scylla
logalloc::segment_pool::allocate_segment(unsigned long) + 395 in section .text of /opt/scylladb/libexec/scylla
logalloc::allocating_section::reserve() + 119 in section .text of /opt/scylladb/libexec/scylla
cached_file::get_page_ptr(unsigned long, unsigned long, seastar::io_priority_class const&, tracing::trace_state_ptr)::{lambda(seastar::temporary_buffer<char>&&)#1}::operator()(seastar::temporary_buffer<char>&&) + 188 in section .text of /opt/scylladb/libexec/scylla
seastar::continuation<seastar::internal::promise_base_with_type<std::unique_ptr<cached_file::cached_page, cached_file::cached_page::cached_page_del> >, cached_file::get_page_ptr(unsigned long, unsigned long, seastar::io_priority_class const&, tracing::trace_state_ptr)::{lambda(seastar::temporary_buffer<char>&&)#1}, seastar::future<seastar::temporary_buffer<char> >::then_impl_nrvo<{lambda(seastar::temporary_buffer<char>&&)#1}, cached_file::get_page_ptr(unsigned long, unsigned long, seastar::io_priority_class const&, tracing::trace_state_ptr)::{lambda(seastar::temporary_buffer<char>&&)#1}<std::unique_ptr<cached_file::cached_page, cached_file::cached_page::cached_page_del> > >({lambda(seastar::temporary_buffer<char>&&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<std::unique_ptr<cached_file::cached_page, cached_file::cached_page::cached_page_del> >&&, {lambda(seastar::temporary_buffer<char>&&)#1}&, seastar::future_state<seastar::temporary_buffer<char> >&&)#1}, seastar::temporary_buffer<char> >::run_and_dispose() + 100 in section .text of /opt/scylladb/libexec/scylla
seastar::reactor::run_some_tasks() + 340 in section .text of /opt/scylladb/libexec/scylla
seastar::reactor::do_run() + 3303 in section .text of /opt/scylladb/libexec/scylla
std::_Function_handler<void (), seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_98>::_M_invoke(std::_Any_data const&) + 1397 in section .text of /opt/scylladb/libexec/scylla
seastar::posix_thread::start_routine(void*) + 10 in section .text of /opt/scylladb/libexec/scylla
No symbol matches 0x9298.
No symbol matches 0x100352.

@elcallio would you analyze and determine if it is commitlog (more sepcifically hard limit) related or not?
some technicalities in case there are more and different stack traces to analyze:
I produced this using gdb on an instance with one of the images listed in #9331 (comment) ,those are AMIs are built on my machine and they
are simply 4.6.rc5 with hard limit on by default.
The way I generated this stacktrace:

start an instance with one of the AMIs listed (region dependant), i used an i3.xlarge
ssh into the machine and stop all scylla processes running (not mandatory)
sudo apt-get update
sudo apt-get install build-essential gdb
Create a file in the home dir with one line per address which says: info symbol 0x??????? - i called it scylla_stack
gdb /opt/scylladb/libexec/scylla - this is going to take a while, a few minutes at least to load the symbols.
in gdb promp: source ~/scylla_stack
You will get the input above.

For completeness and so you can do a one to one match for the addresses, the scylla_stack file content:

info symbol 0x3db1022
info symbol 0x3dafcd0
info symbol 0x3db0f30
info symbol 0x7f01168fda1f
info symbol 0x1409dff
info symbol 0x164a25c
info symbol 0x1649b94
info symbol 0x134b1b2
info symbol 0x13dd21d
info symbol 0x13e9278
info symbol 0x13e4e03
info symbol 0x13d46fb
info symbol 0x13f2e97
info symbol 0x17454ac
info symbol 0x1748bc4
info symbol 0x3dc3a54
info symbol 0x3dc4e47
info symbol 0x3de2475
info symbol 0x3d9c08a
info symbol 0x9298
info symbol 0x100352

eliransin · 2022-05-23T07:33:37Z

@elcallio BTW I am leaving the machine on for a few hours in case you want to experiment with it.
You can log into my gdb session without the hassle of waiting for the symbols to load with:
ssh -i ~/.ssh/scylla-qa-ec2 centos@52.215.81.66
then run: tmux a
Let me know when I can put the instance down, if not asked otherwise I will kill it towards the end of the day.

elcallio · 2022-05-23T12:25:47Z

FWIW, the trace looks wholly unrelated to anything commit log.

eliransin · 2022-05-24T07:17:48Z

@roydahan the stall doesn't seem to come from commitlog or more specifically hard limit.
What are next steps in order to try and push this forward?

roydahan · 2022-05-29T21:47:14Z

If you think it's ready, I think we can merge.
If you know some related areas you would like us to test we can do that.

There will be unbounded growth of pending tasks if they are submitted faster than retiring them. That can potentially happen if memtables are frequently flushed too early. It was observed that this unbounded growth caused task queue violations as the queue will be filled with tons of tasks being reevaluated. By avoiding duplication in pending task list for a given table T, growth is no longer unbounded and consequently reevaluation is no longer aggressive. Refs scylladb#9331. Scylla ent: Refs scylladb#2111. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210930125718.41243-1-raphaelsc@scylladb.com> (cherry picked from commit 52302c3) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220215162147.61628-1-raphaelsc@scylladb.com>

eliransin · 2022-06-13T07:46:07Z

I am doing a comparative test now:
I reran the AMI and got a similar number of stalls. (~420 stalls).
Now I am rerunning the original version (4.6.rc5) to see if those stalls exist there. If the same stalls exist we can proceed with
enabling the feature, however if not, we will need to find out what triggers those stalls.
Jenkins job link to a clean 4.6.rc5 run:
https://jenkins.scylladb.com/view/scylla-4.6/job/scylla-4.6/job/SCT-Reproducers/job/Issue-9331/job/longevity-cdc-400gb-3d/5

eliransin · 2022-06-15T08:45:35Z

After running a build without hardlimit I found a similar amount of stalls:
I used the following search pattern:
find -name "messages.log" | xargs cat | grep "Reactor stalled" | wc -l
For the latest run with hard limit on I got:
475
And for the latest run with hard limit off which is official 4.6.rc5 I got:
570

All in all it is comparable.

Histograms of the stalls looks similar which also the mean stall time is similar (+/- 10%):

eliransin · 2022-06-15T08:52:53Z

@aleksbykov the last thing we are left to do according to @roydahan comment #9331 (comment) is to run write throughput test and
see how it is impacted.
What we have already done is, making sure that the insert time is not too long compared to the non hard limit version, make
sure that coredumps are not in the commitlog area, analyze observed stalls and see if it is hard limit related.
In order to enable the feature I will go over the other hard limit bugs and make sure that either there are none or if there are
extremely problematic workloads that we should additionally check in order to qualify it.

kostja · 2023-01-13T12:04:42Z

@eliransin can this be closed?

DoronArazii · 2023-08-27T08:40:39Z

@eliransin / @wmitros what are we tracking here ATM?

@wmitros

…fault' from Eliran Sinvani This miniset, completes the prerequisites for enabling commitlog hard limit on by default. Namely, start flushing and evacuating segments halfway to the limit in order to never hit it under normal circumstances. It is worth mentioning that hitting the limit is an exceptional condition which it's root cause need to be resolved, however, once we do hit the limit, the performance impact that is inflicted as a result of this enforcement is irrelevant. Tests: unit tests. LWT write test (#9331) A whitebox testing has been performed by @wmitros , the test aimed at putting as much pressure as possible on the commitlog segments by using a write pattern that rewrites the partitions in the memtable keeping it at ~85% occupancy so the dirty memory manager will not kick in. The test compared 3 configurations: 1. The default configuration 2. Hard limit on (without changing the flush threshold) 3. the changes in this PR applied. The last exhibited the "best" behavior in terms of metrics, the graphs were the flattest and less jaggy from the others. Closes #10974 * github.com:scylladb/scylladb: commitlog: enforce commitlog size hard limit by default commitlog: set flush threshold to half of the limit size commitlog: unfold flush threshold assignment Fixes #9625

@wmitros

…fault' from Eliran Sinvani This miniset, completes the prerequisites for enabling commitlog hard limit on by default. Namely, start flushing and evacuating segments halfway to the limit in order to never hit it under normal circumstances. It is worth mentioning that hitting the limit is an exceptional condition which it's root cause need to be resolved, however, once we do hit the limit, the performance impact that is inflicted as a result of this enforcement is irrelevant. Tests: unit tests. LWT write test (#9331) A whitebox testing has been performed by @wmitros , the test aimed at putting as much pressure as possible on the commitlog segments by using a write pattern that rewrites the partitions in the memtable keeping it at ~85% occupancy so the dirty memory manager will not kick in. The test compared 3 configurations: 1. The default configuration 2. Hard limit on (without changing the flush threshold) 3. the changes in this PR applied. The last exhibited the "best" behavior in terms of metrics, the graphs were the flattest and less jaggy from the others. Closes #10974 * github.com:scylladb/scylladb: commitlog: enforce commitlog size hard limit by default commitlog: set flush threshold to half of the limit size commitlog: unfold flush threshold assignment

mykaul · 2023-11-19T07:54:37Z

@eliransin can this be closed?

@eliransin / @kostja ?

nyh · 2023-11-19T08:45:06Z

I think the creator of this issue (@roydahan) or one of the people who made the 257 comments in this issue who were ever able to reproduce this issue (are there such people?) need to say if this issue can still be reproduced. If it can't be reproduced, we should close it. If it's still can be reproduced, it's still a bug and definitely shouldn't be closed!

kostja · 2023-11-19T09:21:39Z

@mykaul I agee with @nyh comment. There has never been a clear success criteria for this issue. The description and the title are by now vastly obsolete. We used it to track commit log hard limit work for a while, so it hangs on the commit log hard limit still not being a default. I don't think it is the proper way to use the issue database.

So I will take the liberty of closing this, whatever issues are still out there after 4 years deserve to be tracked in a new ticket.

bhalevy · 2023-11-19T11:46:35Z

after 4 years

2 + epsilon. let's not exaggerate.

@wmitros

…fault' from Eliran Sinvani This miniset, completes the prerequisites for enabling commitlog hard limit on by default. Namely, start flushing and evacuating segments halfway to the limit in order to never hit it under normal circumstances. It is worth mentioning that hitting the limit is an exceptional condition which it's root cause need to be resolved, however, once we do hit the limit, the performance impact that is inflicted as a result of this enforcement is irrelevant. Tests: unit tests. LWT write test (#9331) A whitebox testing has been performed by @wmitros , the test aimed at putting as much pressure as possible on the commitlog segments by using a write pattern that rewrites the partitions in the memtable keeping it at ~85% occupancy so the dirty memory manager will not kick in. The test compared 3 configurations: 1. The default configuration 2. Hard limit on (without changing the flush threshold) 3. the changes in this PR applied. The last exhibited the "best" behavior in terms of metrics, the graphs were the flattest and less jaggy from the others. Closes #10974 * github.com:scylladb/scylladb: commitlog: enforce commitlog size hard limit by default commitlog: set flush threshold to half of the limit size commitlog: unfold flush threshold assignment

bhalevy assigned kostja Sep 19, 2021

eliransin mentioned this issue Jul 5, 2022

Commitlog: complete prerequisites and enforce hard limit by default #10974

Merged

DoronArazii added this to the 5.x milestone Jan 16, 2023

kostja added area/admission control tests/longevity Issue detected during longevity labels Jul 18, 2023

kostja removed their assignment Jul 18, 2023

eliransin assigned eliransin and unassigned elcallio Oct 31, 2023

eliransin added the area/commitlog hard limit label Oct 31, 2023

kostja closed this as completed Nov 19, 2023

LWT - Data population takes significantly longer than before #9331

LWT - Data population takes significantly longer than before #9331

Comments

roydahan commented Sep 13, 2021 • edited

roydahan commented Sep 13, 2021

avikivity commented Sep 13, 2021

slivne commented Sep 13, 2021

roydahan commented Sep 14, 2021

slivne commented Sep 14, 2021

slivne commented Sep 14, 2021

bhalevy commented Sep 14, 2021 • edited

bhalevy commented Sep 14, 2021

roydahan commented Sep 15, 2021

slivne commented Sep 15, 2021

avikivity commented Sep 15, 2021

bhalevy commented Sep 19, 2021

slivne commented Sep 19, 2021

slivne commented Sep 19, 2021

elcallio commented Sep 20, 2021

slivne commented Sep 20, 2021 via email

elcallio commented Sep 21, 2021

slivne commented Sep 21, 2021 via email

bentsi commented Sep 21, 2021 • edited

elcallio commented Sep 21, 2021

kostja commented Sep 21, 2021

kostja commented Sep 22, 2021

bhalevy commented Sep 22, 2021

kostja commented Sep 22, 2021

fgelcer commented Sep 22, 2021

kostja commented Sep 22, 2021 • edited

roydahan commented Sep 22, 2021

slivne commented Sep 23, 2021

eliransin commented May 3, 2022

elcallio commented May 3, 2022

eliransin commented May 9, 2022

eliransin commented May 23, 2022

eliransin commented May 23, 2022

elcallio commented May 23, 2022

eliransin commented May 24, 2022

roydahan commented May 29, 2022

eliransin commented Jun 13, 2022 • edited

eliransin commented Jun 15, 2022

eliransin commented Jun 15, 2022

kostja commented Jan 13, 2023

DoronArazii commented Aug 27, 2023

mykaul commented Nov 19, 2023

nyh commented Nov 19, 2023

kostja commented Nov 19, 2023

bhalevy commented Nov 19, 2023

roydahan commented Sep 13, 2021 •

edited

bhalevy commented Sep 14, 2021 •

edited

bentsi commented Sep 21, 2021 •

edited

kostja commented Sep 22, 2021 •

edited

eliransin commented Jun 13, 2022 •

edited