Unify database setup and teardown between main.cc and single_node_cql_env #2795

tgrabiec · 2017-09-13T08:02:42Z

There is a lot of duplication between the two.

jvanz · 2017-10-10T00:54:52Z

I would like to send some contribution, and I think this can be a good starting point. Isn't it?

bhalevy · 2019-10-28T08:25:52Z

The purpose of the class in question is to start sharded storage service to make its global instance alive. I don't know when exactly it happened but no code that instantiates this wrapper really needs the global storage service. Ref: #2795 tests: unit(dev), perf_sstable(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210526170454.15795-1-xemul@scylladb.com>

" Currently database start and stop code is quite disperse and exists in two slightly different forms -- one in main and the other one in cql_test_env. This set unifies both and makes them look almost the perfect way: sharded<database> db; db.start(<dependencies>); auto stop = defer([&db] { db.stop().get(); }); db.invoke_on_all(&database::start).get(); with all (well, most) other mentionings of the "db" variable being arguments for other services' dependencies. tests: unit(dev, release), unit.cross_shard_barrier(debug) dtest.simple_boot_shutdown(dev) refs: #2737 refs: #2795 refs: #5489 " * 'br-database-teardown-unification-2' of https://github.com/xemul/scylla: (26 commits) main: Log when database starts view_update_generator: Register staging sstables in constructor database, messaging: Delete old connection drop notification database, proxy: Relocate connection-drop activity messaging, proxy: Notify connection drops with boost signal database, tests: Rework recommended format setting database, sstables_manager: Sow some noexcepts database: Eliminate unused helpers database: Merge the stop_database() into database::stop() database: Flatten stop_database() database: Equip with cross-shard-barrier database: Move starting bits into start() database: Add .start() method main: Initialize directories before database main, api: Detach set_server_config from database and move up main: Shorten commitlog creation database: Extract commitlog initialization from init_system_keyspace repair: Shutdown without database help main: Shift iosched verification upward database: Remove unused mm arg from init_non_system_keyspaces() ...

" The main challenge here is to move messaging_service.start_listen() call from out of gossiper into main. Other changes are pretty minor compared to that and include - patch gossiper API towards a standard start-shutdown-stop form - gossiping "sharder info" in initial state - configure cluster name and seeds via gossip_config tests: unit(dev) dtest.bootstrap_test.start_stop_test_node(dev) manual(dev): start+stop, nodetool enable-/disablegossip refs: #2737 refs: #2795 refs: #5489 " * 'br-gossiper-dont-start-messaging-listen-2' of https://github.com/xemul/scylla: code: Expell gossiper.hh from other headers storage_service: Gossip "sharder" in initial states gossiper: Relax set_seeds() gossiper, main: Turn init_gossiper into get_seeds_from_config storage_service: Eliminate the do-bind argument from everywhere gossiper: Drop ms-registered manipulations messaging, main, gossiper: Move listening start into main gossiper: Do handlers reg/unreg from start/stop gossiper: Split (un)init_messaging_handler() gossiper: Relocate stop_gossiping() into .stop() gossiper: Introduce .shutdown() and use where appropriate gossiper: Set cluster_name via gossip_config gossiper, main: Straighten start/stop tests/cql_test_env: Open-code tst_init_ms_fd_gossiper tests/cql_test_env: De-global most of gossiper gossiper: Merge start_gossiping() overloads into one gossiper: Use is_... helpers gossiper: Fix do_shadow_round comment gossiper: Dispose dead code

The cql_config_updater is a sharded<> service that exists in main and whose goal is to make sure some db::config's values are propagated into cql_config. There's a more handy updateable_value<> glue for that. tests: unit(dev) refs: #2795 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210927090402.25980-1-xemul@scylladb.com>

" The storage_service is involved in the cdc_generation_service guts more than needed. - the bool _for_testing bit is cdc-only - there's API-only cdc_generation_service getter - cdc_g._s. startup code partially sits in s._s. one This patch cleans most of the above leaving only the startup _cdc_gen_id on board. tests: unit(dev) refs: #2795 " * 'br-storage-service-vs-cdc-2' of https://github.com/xemul/scylla: api: Use local sharded<cdc::generation_service> reference main: Push cdc::generation_service via API storage_service: Ditch for_testing boolean cdc: Replace db::config with generation_service::config cdc: Drop db::config from description_generator cdc: Remove all arguments from maybe_rewrite_streams_descriptions cdc: Move maybe_rewrite_streams_descriptions into after_join cdc: Squash two methods into one cdc: Turn make_new_cdc_generation a service method cdc: Remove ring-delay arg from make_new_cdc_generation cdc: Keep database reference on generation_service

" There's an explicit barrier in main that waits for the sstable format selector to finish selecting it by the time node start to join a cluter. (Actually -- not quite, when restarting a normal node it joins cluster in prepare_to_join()). This explicit barrier is not needed, the sync point already exists in the way features are enabled, the format-selector just needs to use it. branch: https://github.com/xemul/scylla/tree/br-format-selector-sync tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/351/ refs: #2795 " * 'br-format-selector-sync' of https://github.com/xemul/scylla: format-selector: Remove .sync() point format-selector: Coroutinize maybe_select_format() format-selector: Coroutinize simple methods

" There are several issues with it - it's scattered between main() and storage_service methods - yet another incarnation of it also sits in the cql-test-env - the prepare_to_join() and join_token_ring() names are lying to readers, as sometimes node joins the ring in prepare- stage - storage service has to carry several private fields to keep the state between prepare- and join- parts - some storage service dependencies are only needed to satisfy joining, but since they cannot start early enough, they are pushed to storage service uninitialized "in the hope" that it won't use them until join This patch puts joining steps in one place and enlightens storage service not to carry unneeded dependencies/state onboard. And eliminates one more usage of global proxy instance while at it. branch: https://github.com/xemul/scylla/tree/br-merge-init-server-and-join-cluster tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/466/ refs: #2795 " * 'br-merge-init-server-and-join-cluster' of https://github.com/xemul/scylla: storage_service: Remove global proxy call storage_service: Remove sys_dist_ks from storage_service dependencies storage_service: Remove cdc_gen_service from storage_service dependencies storage_service: Make _cdc_gen_id local variable storage_service: Make _bootstrap_tokens local variable storage_service: Merge prepare- and join- private members storage_service: Move some code up the file storage_service: Coroutinize join_token_ring storage_service: Fix indentation after previous patch storage_service: Execute its .bootstrap() into async() storage_service: Dont assume async context in mark_existing_views_as_built storage_service: Merge init-server and join-cluster main, storage_service: Move wait for gossip to settle main, storage_service: Move passive announce subscription main, storage_service: Move early group0 join call

" There's an ongoing effort to move the endpoint -> {dc/rack} mappings from snitch onto topology object and this set finalizes it. After it the snitch service stops depending on gossiper and system keyspace and is ready for de-globalization. As a nice side-effect the system keyspace no longer needs to maintain the dc/rack info cache and its starting code gets relaxed. refs: #2737 refs: #2795 " * 'br-snitch-dont-mess-with-topology-data-2' of https://github.com/xemul/scylla: (23 commits) system_keyspace: Dont maintain dc/rack cache system_keyspace: Indentation fix after previous patch system_keyspace: Coroutinuze build_dc_rack_info() topology: Move all post-configuration to topology::config snitch: Start early gossiper: Do not export system keyspace snitch: Remove gossiper reference snitch: Mark get_datacenter/_rack methods const snitch: Drop some dead dependency knots snitch, code: Make get_datacenter() report local dc only snitch, code: Make get_rack() report local rack only storage_service: Populate pending endpoint in on_alive() code: Populate pending locations topology: Put local dc/rack on topology early topology: Add pending locations collection topology: Make get_location() errors more verbose token_metadata: Add config, spread everywhere token_metadata: Hide token_metadata_impl copy constructor gosspier: Remove messaging service getter snitch: Get local address to gossip via config ...

There are some asserting checks for keyspace and table existence on cql_test_env that perform some one-linee work in a complex manner, tests can do better on their own. Removing it makes cql_test_env simpler refs: #2795 Closes #15027 * github.com:scylladb/scylladb: test: Remove require_..._exists from cql_test_env test: Open-code ks.cf name parse into cdc_test test: Don't use require_table_exists() in test/lib/random_schema test: Use BOOST_REQUIRE(!db.has_schema()) test: Use BOOST_REQUIRE(db.has_schema()) test: Use BOOST_REQUIRE(db.has_keyspace()) test: Threadify cql_query_test::test_compact_storage case test: Threadify some cql_query_test cases

To add a sharded service to the cql_test_env one needs to patch it in 5 or 6 places - add cql_test_env reference - add cql_test_env constructor argument - initialize the reference in initializer list - add service variable to do_with method - pass the variable to cql_test_env constructor - (optionally) export it via cql_test_env public method Steps 1 through 5 are annoying, things get much simpler if look like - add cql_test_env variable - (optionally) export it via cql_test_env public method This is what this PR does refs: #2795 Closes #15028 * github.com:scylladb/scylladb: cql_test_env: Drop local *this reference cql_test_env: Drop local references cql_test_env: Move most of the stuff in run_in_thread() cql_test_env: Open-code env start/stop and remove both cql_test_env: Keep other services as class variables cql_test_env: Keep services as class variables cql_test_env: Construct env early cql_test_env: De-static fdpinger variable cql_test_env: Define all services' variables early cql_test_env: Keep group0_client pointer

On boot several manipulations with system.local are performed. 1. The host_id value is selected from it with key = local If not found, system_keyspace generates a new host_id, inserts the new value into the table and returns back 2. The cluster_name is selected from it with key = local Then it's system_keyspace that either checks that the name matches the one from db::config, or inserts the db::config value into the table 3. The row with key = local is updated with various info like versions, listen, rpc and bcast addresses, dc, rack, etc. Unconditionally All three steps are scattered over main, p.1 is called directly, p.2 and p.3 are executed via system_keyspace::setup() that happens rather late. Also there's some touch of this table from the cql_test_env startup code. The proposal is to collect this setup into one place and execute it early -- as soon as the system.local table is populated. This frees the system_keyspace code from the logic of selecting host id and cluster name leaving it to main and keeps it with only select/insert work. refs: scylladb#2795 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

On boot several manipulations with system.local are performed. 1. The host_id value is selected from it with key = local If not found, system_keyspace generates a new host_id, inserts the new value into the table and returns back 2. The cluster_name is selected from it with key = local Then it's system_keyspace that either checks that the name matches the one from db::config, or inserts the db::config value into the table 3. The row with key = local is updated with various info like versions, listen, rpc and bcast addresses, dc, rack, etc. Unconditionally All three steps are scattered over main, p.1 is called directly, p.2 and p.3 are executed via system_keyspace::setup() that happens rather late. Also there's some touch of this table from the cql_test_env startup code. The proposal is to collect this setup into one place and execute it early -- as soon as the system.local table is populated. This frees the system_keyspace code from the logic of selecting host id and cluster name leaving it to main and keeps it with only select/insert work. refs: #2795 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #15082

On boot several manipulations with system.local are performed. 1. The host_id value is selected from it with key = local If not found, system_keyspace generates a new host_id, inserts the new value into the table and returns back 2. The cluster_name is selected from it with key = local Then it's system_keyspace that either checks that the name matches the one from db::config, or inserts the db::config value into the table 3. The row with key = local is updated with various info like versions, listen, rpc and bcast addresses, dc, rack, etc. Unconditionally All three steps are scattered over main, p.1 is called directly, p.2 and p.3 are executed via system_keyspace::setup() that happens rather late. Also there's some touch of this table from the cql_test_env startup code. The proposal is to collect this setup into one place and execute it early -- as soon as the system.local table is populated. This frees the system_keyspace code from the logic of selecting host id and cluster name leaving it to main and keeps it with only select/insert work. refs: scylladb#2795 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb#15082

This code is now spread over main and differs in cql_test_env. The PR unifies both places and makes the manager start-stop look standard refs: #2795 Closes #15375 * github.com:scylladb/scylladb: batchlog_manager: Remove start() method batchlog_manager: Start replay loop in constructor main, cql_test_env: Start-stop batchlog manager in one "block" batchlog_manager: Move shard-0 check into batchlog_replay_loop() batchlog_manager: Fix drain() reentrability

…lyanov Currently storage service starts too early and its initialization is split into several steps. This PR makes storage service start "late enough" and makes its initialization (minimally required before joining cluster) happen in on place. refs: #2795 refs: #2737 Closes #16103 * github.com:scylladb/scylladb: storage_service: Drop (un)init_messaging_service_part() pair storage_service: Init/Deinit RPC handlers in constructor/stop storage_service: Dont capture container() on RPC handler storage_service: Use storage_service::_sys_dist_ks in some places storage_service: Add explicit dependency on system dist. keyspace storage_service: Rurn query processor pointer into reference storage_service: Add explicity query_processor dependency main: Start storage service later

For view builder draining there's dedicated deferred action in main while all other services that need to be drained do it via storage_service. The latter is to unify shutdown for services and to make `nodetool drain` drain everything, not just some part of those. This PR makes view builder drain look the same. As a side effect it also moves `mark_existing_views_as_built` from storage service to view builder and generalizes this marking code inside view builder itself. refs: #2737 refs: #2795 Closes #16558 * github.com:scylladb/scylladb: storage_service: Drain view builder on drain too view_builder: Generalize mark_as_built(view_ptr) method view_builder: Move mark_existing_views_as_built from storage service storage_service: Add view_builder& reference main,cql_test_env: Move view_builder start up (and make unconditional)

tgrabiec added type/code cleanup enhancement n00b labels Sep 13, 2017

tgrabiec mentioned this issue Sep 13, 2017

tests/perf/perf_fast_forward hangs on shutdown with --smp > 1 #2796

Closed

slivne added this to the 2.x milestone Sep 13, 2017

bhalevy assigned xemul Oct 29, 2019

bhalevy mentioned this issue Jul 21, 2022

Decouple compaction manager from database #11099

Merged

xemul mentioned this issue Dec 12, 2022

direct_fd_pinger uses messaging_service after it was stopped leading to sigsegv #12244

Closed

This was referenced Aug 12, 2023

Remove some stale sugar from cql_test_env #15027

Merged

Relax cql_test_env services maintenance #15028

Merged

xemul mentioned this issue Aug 17, 2023

system_keyspace: De-bloat .setup() from messing with system.local #15082

Closed

This was referenced Sep 11, 2023

system_keyspace: remove flushes when writing to system tables #15279

Merged

Sanitize batchlog manager start/stop #15375

Merged

avikivity added the good first issue label Sep 13, 2023

xemul mentioned this issue Nov 20, 2023

Sanitize storage service init/deinit sequences #16103

Merged

xemul mentioned this issue Dec 25, 2023

Drain view_builder in generic drain #16558

Merged

dani-tweig removed the n00b label Feb 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify database setup and teardown between main.cc and single_node_cql_env #2795

Unify database setup and teardown between main.cc and single_node_cql_env #2795

tgrabiec commented Sep 13, 2017

jvanz commented Oct 10, 2017 •

edited

bhalevy commented Oct 28, 2019

Unify database setup and teardown between main.cc and single_node_cql_env #2795

Unify database setup and teardown between main.cc and single_node_cql_env #2795

Comments

tgrabiec commented Sep 13, 2017

jvanz commented Oct 10, 2017 • edited

bhalevy commented Oct 28, 2019

jvanz commented Oct 10, 2017 •

edited