Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_cluster_restore_no_resurrection dtest fails with commitlog min_gc fix when tombstone_gc is correctly deferred without node.flush #15777

Closed
bhalevy opened this issue Oct 19, 2023 · 18 comments · Fixed by #15820
Assignees
Labels
P2 High Priority
Milestone

Comments

@bhalevy
Copy link
Member

bhalevy commented Oct 19, 2023

Seen in https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-release/397/testReport/cleanup_test/TestCleanup/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split001___test_cluster_restore_no_resurrection/
With Scylla version 7d5e22b

01:36:03,160 787     cleanup_test                   INFO     cleanup_test.py     :263  | test_cluster_restore_no_resurrection: Restoring from snapshot
...
01:36:03,165 787     ccm                            DEBUG    cluster.py          :709  | test_cluster_restore_no_resurrection: node1: nodetool cmd=refresh ks cf0 wait=True timeout=None
...
01:36:03,962 787     ccm                            DEBUG    cluster.py          :709  | test_cluster_restore_no_resurrection: node2: nodetool cmd=refresh ks cf0 wait=True timeout=None
01:36:04,725 787     cleanup_test                   INFO     cleanup_test.py     :268  | test_cluster_restore_no_resurrection: Removing node3
01:36:04,725 787     ccm                            DEBUG    cluster.py          :709  | test_cluster_restore_no_resurrection: node3: nodetool cmd=info wait=True timeout=60
01:36:05,909 787     cassandra.cluster              WARNING  libevreactor.py     :370  | test_cluster_restore_no_resurrection: Host 127.0.74.3:9042 has been marked down
01:36:07,111 787     cassandra.pool                 WARNING  thread.py           :58   | test_cluster_restore_no_resurrection: Error attempting to reconnect to 127.0.74.3:9042, scheduling retry in 2.2 seconds: [Errno 111] Tried connecting to [('127.0.74.3', 9042)]. Last error: Connection refused
01:36:08,909 787     ccm                            DEBUG    cluster.py          :709  | test_cluster_restore_no_resurrection: node1: nodetool cmd=removenode 690f898f-c89d-4543-80be-45d3cfb515bb wait=True timeout=None
01:36:09,314 787     cassandra.pool                 WARNING  thread.py           :58   | test_cluster_restore_no_resurrection: Error attempting to reconnect to 127.0.74.3:9042, scheduling retry in 4.0 seconds: [Errno 111] Tried connecting to [('127.0.74.3', 9042)]. Last error: Connection refused
01:36:13,319 787     cassandra.pool                 WARNING  thread.py           :58   | test_cluster_restore_no_resurrection: Error attempting to reconnect to 127.0.74.3:9042, scheduling retry in 3.72 seconds: [Errno 111] Tried connecting to [('127.0.74.3', 9042)]. Last error: Connection refused
01:36:13,888 787     cleanup_test                   INFO     cleanup_test.py     :273  | test_cluster_restore_no_resurrection: Verifying that no data was resurrected
01:36:13,920 787     cassandra.cluster              INFO     thread.py           :58   | test_cluster_restore_no_resurrection: Cassandra host 127.0.74.3:9042 removed
01:36:14,477 787     errors                         ERROR    conftest.py         :226  | test_cluster_restore_no_resurrection: test failed: 
self = <cleanup_test.TestCleanup object at 0x7f0aa7210390>

    @pytest.mark.require('scylladb/scylla#11933')
    def test_cluster_restore_no_resurrection(self):
        """
        Reproducer for https://github.com/scylladb/scylladb/issues/11933:
        - Write data to 2-node cluster
        - Make a snapshot
        - Add node
        - Run cleanup
        - Delete data
        - Wait for tombstones to expire
        - Run major compaction (this will get rid of both data and tombstones)
        - Restore sstables from snapshot
        - Verify there are no readable keys.
        - Any stale data that is restored from snapshots would get resurrected at this stage.
        """
        num_keys = 1000
        self.prepare(nodes=2, num_keys=num_keys, rf=1)
        cluster = self.cluster
        node1, node2 = cluster.nodelist()
        session = self.patient_cql_connection(node1)
        gc_grace_seconds = 0
        session.execute(f"ALTER TABLE ks.cf0 WITH gc_grace_seconds={gc_grace_seconds}")
    
        def existing_keys(session):
            res = []
            key_exists_query = session.prepare("SELECT key from ks.cf0 WHERE key = ?")
            key_exists_query.consistency_level = ConsistencyLevel.ONE
            for i in range(num_keys):
                try:
                    if session.execute(key_exists_query, [f"k{i}"]):
                        res.append(i)
                except Unavailable:
                    pass
            return res
    
        logger.info("Verifying initial dataset")
        assert existing_keys(session) == list(range(num_keys))
    
        snapshot_name = 'pre_bootstrap'
        snapshot_dirs = {}
        for node in cluster.nodelist():
            snapshot_dirs[node.name] = make_snapshot(node, ks="ks", name=snapshot_name)
    
        logger.info("Adding a new node")
        new_node = cluster.new_node(len(cluster.nodelist()) + 1)
        new_node.start(wait_for_binary_proto=True, wait_other_notice=True)
    
        logger.info("Running cleanup")
        cluster.nodetool('cleanup ks')
    
        logger.info(f"Stopping {new_node.name}")
        new_node.stop()
    
        logger.info(f"Get existing keys")
        existing = existing_keys(session)
    
        logger.info(f"Starting {new_node.name}")
        new_node.start()
    
        logger.info("Deleting data")
        delete_c1c2(session, n=num_keys, cf='cf0')
    
        logger.info("Verifying data")
        query = SimpleStatement("SELECT count(*) FROM ks.cf0", consistency_level=ConsistencyLevel.QUORUM)
        rows = session.execute(query)
        assert rows.one()[0] == 0
    
        logger.info(f"Sleeping until gc_grace_seconds={gc_grace_seconds} pass")
        time.sleep(gc_grace_seconds + 1)
        logger.info("Running compaction")
        cluster.compact()
        cluster.wait_for_compactions()
    
        logger.info("Restoring from snapshot")
        for node in [node1, node2]:
            restore_snapshot_with_refresh(
                snapshot_dir=snapshot_dirs[node.name], node=node, keyspace='ks', table='cf0', name=snapshot_name)
    
        logger.info(f"Removing {new_node.name}")
        new_node_hostid = new_node.hostid()
        new_node.stop()
        node1.removenode(new_node_hostid)
    
        logger.info("Verifying that no data was resurrected")
        restored = existing_keys(session)
    
>       assert restored == existing
E       assert [] == [0, 2, 4, 5, 7, 9, 11, 12, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 28, 29, 33, 34, 35, 37, 38, 41, 44, 46, 47, 48, 49, 50, 51, 53, 54, 55, 56, 57, 58, 59, 60, 63, 64, 66, 68, 72, 75, 77, 79, 81, 85, 87, 88, 89, 90, 92, 93, 95, 96, 97, 98, 99, 100, 102, 103, 104, 106, 109, 110, 111, 112, 114, 115, 116, 117, 118, 119, 121, 122, 123, 124, 125, 126, 127, 131, 132, 133, 134, 136, 137, 138, 139, 140, 141, 142, 143, 145, 150, 152, 154, 156, 160, 161, 163, 164, 168, 169, 171, 172, 173, 174, 175, 179, 180, 182, 183, 184, 185, 187, 188, 190, 192, 193, 195, 196, 197, 201, 202, 203, 205, 206, 207, 208, 209, 210, 211, 212, 213, 215, 217, 220, 223, 224, 225, 227, 229, 230, 231, 235, 237, 238, 239, 240, 241, 244, 245, 246, 247, 248, 249, 250, 254, 256, 257, 258, 259, 261, 264, 265, 266, 267, 268, 269, 271, 272, 273, 274, 275, 279, 280, 281, 282, 283, 286, 288, 289, 291, 292, 293, 296, 297, 298, 299, 300, 302, 303, 304, 305, 306, 310, 312, 313, 314, 315, 316, 318, 319, 321, 322, 323, 326, 327, 328, 329, 330, 332, 333, 335, 336, 337, 339, 340, 341, 342, 343, 344, 346, 347, 349, 350, 354, 355, 358, 359, 361, 362, 363, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 378, 381, 382, 384, 386, 387, 388, 391, 392, 393, 394, 396, 399, 400, 402, 403, 404, 405, 408, 409, 410, 412, 413, 414, 417, 418, 420, 422, 423, 424, 425, 426, 427, 428, 429, 433, 436, 437, 438, 439, 441, 444, 446, 447, 449, 450, 452, 453, 455, 456, 457, 460, 461, 462, 463, 465, 466, 467, 469, 472, 475, 481, 482, 483, 484, 485, 486, 487, 488, 489, 491, 493, 494, 495, 496, 497, 498, 500, 501, 504, 505, 507, 509, 510, 513, 514, 516, 518, 521, 522, 526, 527, 528, 530, 532, 533, 535, 536, 537, 539, 541, 543, 546, 547, 548, 551, 554, 555, 558, 560, 561, 562, 563, 565, 566, 567, 572, 574, 575, 576, 577, 578, 580, 581, 583, 584, 585, 587, 588, 589, 590, 591, 592, 593, 595, 596, 597, 598, 599, 601, 603, 604, 607, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 622, 625, 626, 627, 629, 632, 634, 635, 636, 639, 640, 641, 642, 643, 644, 646, 647, 649, 651, 652, 654, 655, 657, 659, 661, 662, 664, 665, 666, 668, 670, 671, 672, 673, 675, 677, 678, 679, 681, 682, 683, 686, 692, 693, 694, 695, 697, 698, 699, 700, 701, 703, 704, 705, 706, 707, 709, 710, 714, 716, 717, 718, 719, 722, 723, 725, 727, 728, 729, 730, 731, 732, 735, 736, 738, 740, 742, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 758, 759, 761, 762, 764, 765, 766, 768, 772, 773, 776, 777, 778, 779, 781, 783, 785, 787, 788, 790, 791, 793, 794, 796, 797, 800, 801, 802, 803, 805, 806, 809, 811, 812, 813, 814, 815, 816, 818, 820, 821, 823, 824, 825, 827, 828, 830, 831, 832, 833, 834, 835, 838, 841, 842, 844, 845, 846, 847, 848, 850, 851, 855, 856, 857, 858, 859, 860, 861, 863, 868, 870, 871, 876, 877, 878, 880, 882, 885, 887, 888, 889, 891, 893, 894, 895, 896, 897, 898, 899, 901, 902, 903, 904, 905, 906, 908, 911, 912, 915, 916, 919, 921, 922, 924, 926, 927, 929, 930, 932, 937, 938, 940, 942, 943, 944, 945, 946, 947, 949, 950, 951, 952, 954, 955, 956, 957, 959, 961, 963, 965, 966, 967, 968, 969, 970, 972, 973, 974, 975, 976, 978, 979, 980, 981, 982, 986, 988, 990, 994, 997, 998, 999]
E         Right contains 644 more items, first extra item: 0
@bhalevy
Copy link
Member Author

bhalevy commented Oct 19, 2023

The regression was introduced to master in f42eb4d
Still bisecting to pinpoint the exact patch in the series...

@bhalevy
Copy link
Member Author

bhalevy commented Oct 19, 2023

@avikivity @mykaul if we don't find an immediate solution and we covince ourselves that there's no flaw in the test, we should consider reverting that merge, and adding test_cluster_restore_no_resurrection to next_gating to prevent future regressions.

@bhalevy
Copy link
Member Author

bhalevy commented Oct 19, 2023

git bisect points at 3378c24

@bhalevy
Copy link
Member Author

bhalevy commented Oct 22, 2023

Cc @eliransin

@mykaul mykaul added the status/release blocker Preventing from a release to be promoted label Oct 22, 2023
@mykaul mykaul added this to the 5.4 milestone Oct 22, 2023
bhalevy added a commit to bhalevy/scylla that referenced this issue Oct 24, 2023
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6

However, compaction is unaware of mutations in the commitlog
and due to sharing commitlog segments by different tables,
it is possible the data or tombstones will get resurrected
by commitlog replay in case the node restarts right after
the respective data and tombstones are purged by (major) compaction
after f42eb4d.

This patch calls `database::flush_on_all_shards` in
`major_keyspace_compaction_task_impl` before tables are compacted
to ensure that any data in membtables is flushed to sstables,
AND commitlog replay will not replay it back after restart.

Note that this requires flushing all tables (and their memtables)
in each shard since they share the commitlog and an unflushed
and unrelated table may hold a commitlog segment that stores
mutations from another table that is about to get compacted.

Fixes scylladb#15777

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Oct 24, 2023
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6

However, compaction is unaware of mutations in the commitlog
and due to sharing commitlog segments by different tables,
it is possible the data or tombstones will get resurrected
by commitlog replay in case the node restarts right after
the respective data and tombstones are purged by (major) compaction
after f42eb4d.

This patch calls `database::flush_on_all_shards` in
`major_keyspace_compaction_task_impl` before tables are compacted
to ensure that any data in membtables is flushed to sstables,
AND commitlog replay will not replay it back after restart.

Note that this requires flushing all tables (and their memtables)
in each shard since they share the commitlog and an unflushed
and unrelated table may hold a commitlog segment that stores
mutations from another table that is about to get compacted.

Fixes scylladb#15777

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Nov 9, 2023
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6

However, compaction is unaware of mutations in the commitlog
and due to sharing commitlog segments by different tables,
it is possible the data or tombstones will get resurrected
by commitlog replay in case the node restarts right after
the respective data and tombstones are purged by (major) compaction
after f42eb4d.

This patch calls `database::flush_all_tables` in
`shard_major_keyspace_compaction_task_impl` before tables are compacted
to ensure that any data in membtables is flushed to sstables,
AND a new commitlog segment is forced so that major compaction
would be able to purge tombstones mroe efficiently, without
them being locked by live commitlog segments.

Note that this requires flushing all tables (and their memtables)
in each shard since they share the commitlog and an unflushed
and unrelated table may hold a commitlog segment that stores
mutations from another table that is about to get compacted.

Fixes scylladb#15777

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
@fruch fruch closed this as completed Nov 12, 2023
@mykaul
Copy link
Contributor

mykaul commented Nov 12, 2023

@fruch - is https://github.com/scylladb/scylla-dtest/pull/3689 actually fixing THIS issue?

@fruch
Copy link
Contributor

fruch commented Nov 12, 2023

@fruch - is https://github.com/scylladb/scylla-dtest/pull/3689 actually fixing THIS issue?

Hmm, that's what @elcallio was pointing to...

@bhalevy
Copy link
Member Author

bhalevy commented Nov 12, 2023

@mykaul One can claim this is a test issue rather than a scylla issue, since there is no clear definition of what major compaction is reponsible for.

With regards to tombstone purging, whether in regular or major compaction, now there are 2 processes that are external to compaction that are a prerequisite to maximizing tombstone garbage collection:

  1. Depending on the tombstone_gc mode: either wating for gc_grace_seconds, or running repair successfully (which is the recommended way)
  2. Now, flushing all tables (i.e. running nodetool flush)

And https://github.com/scylladb/scylla-dtest/pull/3689 is adding the latter.

Ideally, the issue should have been moved to scylla-dtest first, and then it would be clearer how a dtest change fixes it.

@mykaul
Copy link
Contributor

mykaul commented Nov 12, 2023

The latter (nodetool flush) is not documented as such method (per https://opensource.docs.scylladb.com/stable/operating-scylla/nodetool-commands/flush.html ), but OK - understood - just wanted to ensure there's no error here.

bhalevy added a commit to bhalevy/scylla that referenced this issue Nov 13, 2023
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6

However, compaction is unaware of mutations in the commitlog
and due to sharing commitlog segments by different tables,
it is possible the data or tombstones will get resurrected
by commitlog replay in case the node restarts right after
the respective data and tombstones are purged by (major) compaction
after f42eb4d.

This patch calls `database::flush_all_tables`, if the
`compaction_flush_all_tables_before_major` is enabled,
before tables are compacted to ensure that any data in membtables
is flushed to sstables, AND a new commitlog segment is forced so
that major compaction would be able to purge tombstones more
efficiently, without them being locked by live commitlog segments.

Note that this requires flushing all tables (and their memtables)
in each shard since they share the commitlog and an unflushed
and unrelated table may hold a commitlog segment that stores
mutations from another table that is about to get compacted.

Fixes scylladb#15777

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Nov 15, 2023
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6

However, compaction is unaware of mutations in the commitlog
and due to sharing commitlog segments by different tables,
it is possible the data or tombstones will get resurrected
by commitlog replay in case the node restarts right after
the respective data and tombstones are purged by (major) compaction
after f42eb4d.

This patch calls `database::flush_all_tables`, based on the
`compaction_flush_all_tables_before_major_seconds` interval,
before tables are compacted to ensure that any data in membtables
is flushed to sstables, AND a new commitlog segment is forced so
that major compaction would be able to purge tombstones more
efficiently, without them being locked by live commitlog segments.

Note that this requires flushing all tables (and their memtables)
in each shard since they share the commitlog and an unflushed
and unrelated table may hold a commitlog segment that stores
mutations from another table that is about to get compacted.

in the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.

Fixes scylladb#15777

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Nov 15, 2023
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6

However, compaction is unaware of mutations in the commitlog
and due to sharing commitlog segments by different tables,
it is possible the data or tombstones will get resurrected
by commitlog replay in case the node restarts right after
the respective data and tombstones are purged by (major) compaction
after f42eb4d.

This patch calls `database::flush_all_tables`, based on the
`compaction_flush_all_tables_before_major_seconds` interval,
before tables are compacted to ensure that any data in membtables
is flushed to sstables, AND a new commitlog segment is forced so
that major compaction would be able to purge tombstones more
efficiently, without them being locked by live commitlog segments.

Note that this requires flushing all tables (and their memtables)
in each shard since they share the commitlog and an unflushed
and unrelated table may hold a commitlog segment that stores
mutations from another table that is about to get compacted.

in the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.

Fixes scylladb#15777

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Nov 16, 2023
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6

However, tombstone purging may be inhibited by data
in commitlog segments based on `gc_time_min` in the
`tombstone_gc_state` (See f42eb4d).

Flushing all sstables in the database release
all references to commitlog segments and there
it maximizes the potential for tombstone purging,
which is typically the reason for running major compaction.

However, flushing all tables too frequently might
result in tiny sstables.  Since when flushing all
keyspaces using `nodetool flush` the `force_keyspace_compaction`
api is invoked for keyspace successively, we need a mechanism
to prevent too frequent flushes by major compaction.

Hence a `compaction_flush_all_tables_before_major_seconds` interval
configuration option is added (defaults to 24 hours).

In the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.

Fixes scylladb#15777

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Nov 19, 2023
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6

However, tombstone purging may be inhibited by data
in commitlog segments based on `gc_time_min` in the
`tombstone_gc_state` (See f42eb4d).

Flushing all sstables in the database release
all references to commitlog segments and there
it maximizes the potential for tombstone purging,
which is typically the reason for running major compaction.

However, flushing all tables too frequently might
result in tiny sstables.  Since when flushing all
keyspaces using `nodetool flush` the `force_keyspace_compaction`
api is invoked for keyspace successively, we need a mechanism
to prevent too frequent flushes by major compaction.

Hence a `compaction_flush_all_tables_before_major_seconds` interval
configuration option is added (defaults to 24 hours).

In the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.

Fixes scylladb#15777

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Nov 19, 2023
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6

However, tombstone purging may be inhibited by data
in commitlog segments based on `gc_time_min` in the
`tombstone_gc_state` (See f42eb4d).

Flushing all sstables in the database release
all references to commitlog segments and there
it maximizes the potential for tombstone purging,
which is typically the reason for running major compaction.

However, flushing all tables too frequently might
result in tiny sstables.  Since when flushing all
keyspaces using `nodetool flush` the `force_keyspace_compaction`
api is invoked for keyspace successively, we need a mechanism
to prevent too frequent flushes by major compaction.

Hence a `compaction_flush_all_tables_before_major_seconds` interval
configuration option is added (defaults to 24 hours).

In the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.

Fixes scylladb#15777

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Nov 22, 2023
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6

However, tombstone purging may be inhibited by data
in commitlog segments based on `gc_time_min` in the
`tombstone_gc_state` (See f42eb4d).

Flushing all sstables in the database release
all references to commitlog segments and there
it maximizes the potential for tombstone purging,
which is typically the reason for running major compaction.

However, flushing all tables too frequently might
result in tiny sstables.  Since when flushing all
keyspaces using `nodetool flush` the `force_keyspace_compaction`
api is invoked for keyspace successively, we need a mechanism
to prevent too frequent flushes by major compaction.

Hence a `compaction_flush_all_tables_before_major_seconds` interval
configuration option is added (defaults to 24 hours).

In the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.

Fixes scylladb#15777

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
@bhalevy
Copy link
Member Author

bhalevy commented Nov 26, 2023

I'd like #15820 to fix this issue, not https://github.com/scylladb/scylla-dtest/pull/3689. The latter indeed fixes the test and the regression, but it's superficial, whilr the scylla change fixes it more comprehensively so it would serve common use cases beyond this particular dtest.

@bhalevy bhalevy reopened this Nov 26, 2023
bhalevy added a commit to bhalevy/scylla that referenced this issue Nov 27, 2023
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6

However, tombstone purging may be inhibited by data
in commitlog segments based on `gc_time_min` in the
`tombstone_gc_state` (See f42eb4d).

Flushing all sstables in the database release
all references to commitlog segments and there
it maximizes the potential for tombstone purging,
which is typically the reason for running major compaction.

However, flushing all tables too frequently might
result in tiny sstables.  Since when flushing all
keyspaces using `nodetool flush` the `force_keyspace_compaction`
api is invoked for keyspace successively, we need a mechanism
to prevent too frequent flushes by major compaction.

Hence a `compaction_flush_all_tables_before_major_seconds` interval
configuration option is added (defaults to 24 hours).

In the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.

Fixes scylladb#15777

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
@bhalevy bhalevy assigned bhalevy and unassigned elcallio Nov 27, 2023
@bhalevy bhalevy modified the milestones: 5.4, 6.0 Nov 27, 2023
@bhalevy bhalevy added P2 High Priority and removed P0 Critical status/regression labels Nov 27, 2023
@bhalevy bhalevy changed the title Regression seen in test_cluster_restore_no_resurrection dtest test_cluster_restore_no_resurrection dtest fails with commitlog min_gc fix when tombstone_gc is correctly deferred without node.flush Nov 27, 2023
denesb added a commit that referenced this issue Nov 29, 2023
…t and flushing all tables' from Benny Halevy

Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6

However, tombstone purging may be inhibited by data
in commitlog segments based on `gc_time_min` in the
`tombstone_gc_state` (See f42eb4d).

Flushing all sstables in the database release
all references to commitlog segments and there
it maximizes the potential for tombstone purging,
which is typically the reason for running major compaction.

However, flushing all tables too frequently might
result in tiny sstables.  Since when flushing all
keyspaces using `nodetool flush` the `force_keyspace_compaction`
api is invoked for keyspace successively, we need a mechanism
to prevent too frequent flushes by major compaction.

Hence a `compaction_flush_all_tables_before_major_seconds` interval
configuration option is added (defaults to 24 hours).

In the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.

Fixes #15777

Closes #15820

* github.com:scylladb/scylladb:
  docs: nodetool: flush: enrich examples
  docs: nodetool: compact: fix example
  api: add /storage_service/compact
  api: add /storage_service/flush
  compaction_manager: flush_all_tables before major compaction
  database: add flush_all_tables
  api: compaction: add flush_memtables option
  test/nodetool: jmx: fix path to scripts/scylla-jmx
  scylla-nodetool, docs: improve optional params documentation
@mykaul
Copy link
Contributor

mykaul commented Jan 1, 2024

@scylladb/scylla-maint - can we backport this to 5.4?

@denesb
Copy link
Contributor

denesb commented Jan 4, 2024

@scylladb/scylla-maint - can we backport this to 5.4?

Why only 5.4? If it is worth backporting, we should backport to all currently supported OSS releases.

@bhalevy should we backport this?

@bhalevy
Copy link
Member Author

bhalevy commented Jan 4, 2024

There's a dependency on f42eb4d which is only in 5.4

@bhalevy
Copy link
Member Author

bhalevy commented Jan 4, 2024

The title of this issue mentions that the test fails "with commitlog min_gc fix", which is the one I mentioned above
so there's no need to backport it to branches missing this fix.

@mykaul mykaul added the backport/5.4 Issues that should be backported to 5.4 branch once they'll be fixed label Jan 10, 2024
@mykaul
Copy link
Contributor

mykaul commented Jan 10, 2024

Re-ping @scylladb/scylla-maint - please backport to 5.4 (only, per above)

@denesb
Copy link
Contributor

denesb commented Jan 10, 2024

Re-ping @scylladb/scylla-maint - please backport to 5.4 (only, per above)

Doesn't apply cleanly, please provide a backport PR.

@denesb
Copy link
Contributor

denesb commented Jan 10, 2024

@bhalevy ^^

@bhalevy
Copy link
Member Author

bhalevy commented Jan 11, 2024

@tchaikov would you like to do the backport?
It's a good opportunity to get familiarized with this code before applying a similar fix for cleanup.

tchaikov pushed a commit to tchaikov/scylladb that referenced this issue Jan 12, 2024
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6

However, tombstone purging may be inhibited by data
in commitlog segments based on `gc_time_min` in the
`tombstone_gc_state` (See f42eb4d).

Flushing all sstables in the database release
all references to commitlog segments and there
it maximizes the potential for tombstone purging,
which is typically the reason for running major compaction.

However, flushing all tables too frequently might
result in tiny sstables.  Since when flushing all
keyspaces using `nodetool flush` the `force_keyspace_compaction`
api is invoked for keyspace successively, we need a mechanism
to prevent too frequent flushes by major compaction.

Hence a `compaction_flush_all_tables_before_major_seconds` interval
configuration option is added (defaults to 24 hours).

In the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.

Fixes scylladb#15777

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 66ba983)
@tchaikov
Copy link
Contributor

backport created at #16756

tchaikov pushed a commit to tchaikov/scylladb that referenced this issue Jan 12, 2024
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6

However, tombstone purging may be inhibited by data
in commitlog segments based on `gc_time_min` in the
`tombstone_gc_state` (See f42eb4d).

Flushing all sstables in the database release
all references to commitlog segments and there
it maximizes the potential for tombstone purging,
which is typically the reason for running major compaction.

However, flushing all tables too frequently might
result in tiny sstables.  Since when flushing all
keyspaces using `nodetool flush` the `force_keyspace_compaction`
api is invoked for keyspace successively, we need a mechanism
to prevent too frequent flushes by major compaction.

Hence a `compaction_flush_all_tables_before_major_seconds` interval
configuration option is added (defaults to 24 hours).

In the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.

Fixes scylladb#15777

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 66ba983)
tchaikov pushed a commit to tchaikov/scylladb that referenced this issue Jan 12, 2024
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6

However, tombstone purging may be inhibited by data
in commitlog segments based on `gc_time_min` in the
`tombstone_gc_state` (See f42eb4d).

Flushing all sstables in the database release
all references to commitlog segments and there
it maximizes the potential for tombstone purging,
which is typically the reason for running major compaction.

However, flushing all tables too frequently might
result in tiny sstables.  Since when flushing all
keyspaces using `nodetool flush` the `force_keyspace_compaction`
api is invoked for keyspace successively, we need a mechanism
to prevent too frequent flushes by major compaction.

Hence a `compaction_flush_all_tables_before_major_seconds` interval
configuration option is added (defaults to 24 hours).

In the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.

Fixes scylladb#15777

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 66ba983)
@mykaul mykaul removed the status/release blocker Preventing from a release to be promoted label Jan 14, 2024
denesb added a commit that referenced this issue Jan 16, 2024
… active segment and flushing all tables' from Kefu Chai

Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6

However, tombstone purging may be inhibited by data
in commitlog segments based on `gc_time_min` in the
`tombstone_gc_state` (See f42eb4d).

Flushing all sstables in the database release
all references to commitlog segments and there
it maximizes the potential for tombstone purging,
which is typically the reason for running major compaction.

However, flushing all tables too frequently might
result in tiny sstables.  Since when flushing all
keyspaces using `nodetool flush` the `force_keyspace_compaction`
api is invoked for keyspace successively, we need a mechanism
to prevent too frequent flushes by major compaction.

Hence a `compaction_flush_all_tables_before_major_seconds` interval
configuration option is added (defaults to 24 hours).

In the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.

Fixes #15777

Closes #15820

to address the confliction, following change is also included in this changeset:

tools/scylla-nodetool: implement the cleanup command

The --jobs command-line argument is accepted but ignored, just like the
current nodetool does.

Refs: #15588

Closes #16756

* github.com:scylladb/scylladb:
  docs: nodetool: flush: enrich examples
  docs: nodetool: compact: fix example
  api: add /storage_service/compact
  api: add /storage_service/flush
  tools/scylla-nodetool: implement the flush command
  compaction_manager: flush_all_tables before major compaction
  database: add flush_all_tables
  api: compaction: add flush_memtables option
  test/nodetool: jmx: fix path to scripts/scylla-jmx
  scylla-nodetool, docs: improve optional params documentation
  tools/scylla-nodetool: extract keyspace/table parsing
  tools/scylla-nodetool: implement the cleanup command
  test/nodetool: rest_api_mock: add more options for multiple requests
@denesb denesb removed Backport candidate backport/5.4 Issues that should be backported to 5.4 branch once they'll be fixed labels Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 High Priority
Projects
None yet
7 participants