nodetool tablestats: Partition keys number (estimate) in Scylla post migration from C* differs by 20% less up to 120% more than the original amount in C* #2545

tomer-sandler · 2017-07-04T11:51:07Z

Installation details
Scylla version (or git commit hash): 1.7.1
Cluster size: 3
OS (RHEL/CentOS/Ubuntu/AWS AMI): Ubuntu16.04
C* version: 3.10 (3 node cluster)

I performed a migration of 3 KS, each with 1 table of ~10M partitions, in parallel, while utilizing 3 intermediate nodes, each has an NFS mount point to 1 of the C* nodes, to one of the KS.
Each intermediate node ran sstableloader and loaded the file to a different Scylla node.

Metrics in Grafana here: http://104.196.52.52:3000/dashboard/db/scylla-per-server-metrics-1-7?from=1499084205742&to=1499086585000

After all sstables files loaded and compactions completed, the number of partitions it much bigger than the 9.8M we had in C*. So far in my tests the partition keys estimate post migration + compactions complete + nodetool flush, is wither 20% less up to 120% more than the original amount in C* 3.10.

tomer@ubuntu16-scylla171-migration-1:~$ nodetool tablestats migration3 | grep keys
                Number of keys (estimate): 19510809
tomer@ubuntu16-scylla171-migration-1:~$ nodetool tablestats migration4 | grep keys
                Number of keys (estimate): 12313292
tomer@ubuntu16-scylla171-migration-1:~$ nodetool tablestats migration5 | grep keys
                Number of keys (estimate): 19668921

@glommer wrote about the estimate:
In Nodetool, it is exported as estimatedPartitionCount, which is calculated as

                Object estimatedPartitionCount =
probe.getColumnFamilyMetric(keyspaceName, tableName,
"EstimatedPartitionCount");

Now let's look at what that metric really is in TableMetrics.java, it is basically:

                                                        long
memtablePartitions = 0;
                                                           for
(Memtable memtable : cfs.getTracker().getView().getAllMemtables())

memtablePartitions += memtable.partitionCount();
                                                           return
SSTableReader.getApproximateKeyCount(cfs.getSSTables(SSTableSet.CANONICAL))
+ memtablePartitions;

And is also defined as an alias for EstimatedRowCount. The latter is what scylla-jmx responds to, and it translates to /column_family/metrics/estimated_row_count/

Looking at our implementation, we do not include memtables. Also, when
getting the sstable set, they pass that flag "CANONICAL". The comment
on top of that definition says:

    // returns the "canonical" version of any current sstable, i.e. if
an sstable is being replaced and is only partially
    // visible to reads, this sstable will be returned as its original
entirety, and its replacement will not be returned
    // (even if it completely replaces it)
    CANONICAL,

So my conclusion here is that Scylla is misreporting this. The fact
that we don't include memtables should lead us to underreport. Shared
sstables and sstables being compacted will lead us to overreport.
Those, I think, we should fix.

Another potential interesting difference is the calculation of the
estimate itself. Those estimates come from the Statistics.db file if
available, with a fallback to a simple index-based calculation. It is
entirely possible that C*3 has a newer version of that file.

The text was updated successfully, but these errors were encountered:

tzach · 2018-03-20T12:55:31Z

@glommer if I understand correctly, the issue is:
Scylla does not include memtable in nodetool tablestats while Apache Cassandra do. Is it the case?

glommer · 2018-03-20T13:49:40Z

no, I hever said that. I am bring this back from my cache, but reading above, my statements concentrate on the fact that we calculate size estimates differently (by double counting some SSTables, for instance)

slivne added this to the 2.x milestone Jul 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nodetool tablestats: Partition keys number (estimate) in Scylla post migration from C* differs by 20% less up to 120% more than the original amount in C* #2545

nodetool tablestats: Partition keys number (estimate) in Scylla post migration from C* differs by 20% less up to 120% more than the original amount in C* #2545

tomer-sandler commented Jul 4, 2017 •

edited

Loading

tzach commented Mar 20, 2018

glommer commented Mar 20, 2018

nodetool tablestats: Partition keys number (estimate) in Scylla post migration from C* differs by 20% less up to 120% more than the original amount in C* #2545

nodetool tablestats: Partition keys number (estimate) in Scylla post migration from C* differs by 20% less up to 120% more than the original amount in C* #2545

Comments

tomer-sandler commented Jul 4, 2017 • edited Loading

tzach commented Mar 20, 2018

glommer commented Mar 20, 2018

tomer-sandler commented Jul 4, 2017 •

edited

Loading