You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
nodetool tablestats: Partition keys number (estimate) in Scylla post migration from C* differs by 20% less up to 120% more than the original amount in C*
#2545
Open
tomer-sandler opened this issue
Jul 4, 2017
· 2 comments
I performed a migration of 3 KS, each with 1 table of ~10M partitions, in parallel, while utilizing 3 intermediate nodes, each has an NFS mount point to 1 of the C* nodes, to one of the KS.
Each intermediate node ran sstableloader and loaded the file to a different Scylla node.
After all sstables files loaded and compactions completed, the number of partitions it much bigger than the 9.8M we had in C*. So far in my tests the partition keys estimate post migration + compactions complete + nodetool flush, is wither 20% less up to 120% more than the original amount in C* 3.10.
tomer@ubuntu16-scylla171-migration-1:~$ nodetool tablestats migration3 | grep keys
Number of keys (estimate): 19510809
tomer@ubuntu16-scylla171-migration-1:~$ nodetool tablestats migration4 | grep keys
Number of keys (estimate): 12313292
tomer@ubuntu16-scylla171-migration-1:~$ nodetool tablestats migration5 | grep keys
Number of keys (estimate): 19668921
@glommer wrote about the estimate:
In Nodetool, it is exported as estimatedPartitionCount, which is calculated as
Now let's look at what that metric really is in TableMetrics.java, it is basically:
long
memtablePartitions = 0;
for
(Memtable memtable : cfs.getTracker().getView().getAllMemtables())
memtablePartitions += memtable.partitionCount();
return
SSTableReader.getApproximateKeyCount(cfs.getSSTables(SSTableSet.CANONICAL))
+ memtablePartitions;
And is also defined as an alias for EstimatedRowCount. The latter is what scylla-jmx responds to, and it translates to /column_family/metrics/estimated_row_count/
Looking at our implementation, we do not include memtables. Also, when
getting the sstable set, they pass that flag "CANONICAL". The comment
on top of that definition says:
// returns the "canonical" version of any current sstable, i.e. if
an sstable is being replaced and is only partially
// visible to reads, this sstable will be returned as its original
entirety, and its replacement will not be returned
// (even if it completely replaces it)
CANONICAL,
So my conclusion here is that Scylla is misreporting this. The fact
that we don't include memtables should lead us to underreport. Shared
sstables and sstables being compacted will lead us to overreport.
Those, I think, we should fix.
Another potential interesting difference is the calculation of the
estimate itself. Those estimates come from the Statistics.db file if
available, with a fallback to a simple index-based calculation. It is
entirely possible that C*3 has a newer version of that file.
The text was updated successfully, but these errors were encountered:
no, I hever said that. I am bring this back from my cache, but reading above, my statements concentrate on the fact that we calculate size estimates differently (by double counting some SSTables, for instance)
Installation details
Scylla version (or git commit hash): 1.7.1
Cluster size: 3
OS (RHEL/CentOS/Ubuntu/AWS AMI): Ubuntu16.04
C* version: 3.10 (3 node cluster)
I performed a migration of 3 KS, each with 1 table of ~10M partitions, in parallel, while utilizing 3 intermediate nodes, each has an NFS mount point to 1 of the C* nodes, to one of the KS.
Each intermediate node ran sstableloader and loaded the file to a different Scylla node.
Metrics in Grafana here: http://104.196.52.52:3000/dashboard/db/scylla-per-server-metrics-1-7?from=1499084205742&to=1499086585000
After all sstables files loaded and compactions completed, the number of partitions it much bigger than the 9.8M we had in C*. So far in my tests the partition keys estimate post migration + compactions complete + nodetool flush, is wither 20% less up to 120% more than the original amount in C* 3.10.
@glommer wrote about the estimate:
In Nodetool, it is exported as estimatedPartitionCount, which is calculated as
Now let's look at what that metric really is in TableMetrics.java, it is basically:
And is also defined as an alias for EstimatedRowCount. The latter is what scylla-jmx responds to, and it translates to /column_family/metrics/estimated_row_count/
Looking at our implementation, we do not include memtables. Also, when
getting the sstable set, they pass that flag "CANONICAL". The comment
on top of that definition says:
So my conclusion here is that Scylla is misreporting this. The fact
that we don't include memtables should lead us to underreport. Shared
sstables and sstables being compacted will lead us to overreport.
Those, I think, we should fix.
Another potential interesting difference is the calculation of the
estimate itself. Those estimates come from the Statistics.db file if
available, with a fallback to a simple index-based calculation. It is
entirely possible that C*3 has a newer version of that file.
The text was updated successfully, but these errors were encountered: