Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify database setup and teardown between main.cc and single_node_cql_env #2795

Open
tgrabiec opened this issue Sep 13, 2017 · 2 comments
Open

Comments

@tgrabiec
Copy link
Contributor

There is a lot of duplication between the two.

@jvanz
Copy link

jvanz commented Oct 10, 2017

I would like to send some contribution, and I think this can be a good starting point. Isn't it?

@bhalevy
Copy link
Member

bhalevy commented Oct 28, 2019

Ref #4384

avikivity pushed a commit that referenced this issue May 27, 2021
The purpose of the class in question is to start sharded storage
service to make its global instance alive. I don't know when exactly
it happened but no code that instantiates this wrapper really needs
the global storage service.

Ref: #2795
tests: unit(dev), perf_sstable(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210526170454.15795-1-xemul@scylladb.com>
avikivity added a commit that referenced this issue Sep 20, 2021
"
Currently database start and stop code is quite disperse and
exists in two slightly different forms -- one in main and the
other one in cql_test_env. This set unifies both and makes
them look almost the perfect way:

    sharded<database> db;
    db.start(<dependencies>);
    auto stop = defer([&db] { db.stop().get(); });
    db.invoke_on_all(&database::start).get();

with all (well, most) other mentionings of the "db" variable
being arguments for other services' dependencies.

tests: unit(dev, release), unit.cross_shard_barrier(debug)
       dtest.simple_boot_shutdown(dev)
refs: #2737
refs: #2795
refs: #5489

"

* 'br-database-teardown-unification-2' of https://github.com/xemul/scylla: (26 commits)
  main: Log when database starts
  view_update_generator: Register staging sstables in constructor
  database, messaging: Delete old connection drop notification
  database, proxy: Relocate connection-drop activity
  messaging, proxy: Notify connection drops with boost signal
  database, tests: Rework recommended format setting
  database, sstables_manager: Sow some noexcepts
  database: Eliminate unused helpers
  database: Merge the stop_database() into database::stop()
  database: Flatten stop_database()
  database: Equip with cross-shard-barrier
  database: Move starting bits into start()
  database: Add .start() method
  main: Initialize directories before database
  main, api: Detach set_server_config from database and move up
  main: Shorten commitlog creation
  database: Extract commitlog initialization from init_system_keyspace
  repair: Shutdown without database help
  main: Shift iosched verification upward
  database: Remove unused mm arg from init_non_system_keyspaces()
  ...
avikivity added a commit that referenced this issue Sep 22, 2021
"
The main challenge here is to move messaging_service.start_listen()
call from out of gossiper into main. Other changes are pretty minor
compared to that and include

- patch gossiper API towards a standard start-shutdown-stop form
- gossiping "sharder info" in initial state
- configure cluster name and seeds via gossip_config

tests: unit(dev)
       dtest.bootstrap_test.start_stop_test_node(dev)
       manual(dev): start+stop, nodetool enable-/disablegossip

refs: #2737
refs: #2795
refs: #5489
"

* 'br-gossiper-dont-start-messaging-listen-2' of https://github.com/xemul/scylla:
  code: Expell gossiper.hh from other headers
  storage_service: Gossip "sharder" in initial states
  gossiper: Relax set_seeds()
  gossiper, main: Turn init_gossiper into get_seeds_from_config
  storage_service: Eliminate the do-bind argument from everywhere
  gossiper: Drop ms-registered manipulations
  messaging, main, gossiper: Move listening start into main
  gossiper: Do handlers reg/unreg from start/stop
  gossiper: Split (un)init_messaging_handler()
  gossiper: Relocate stop_gossiping() into .stop()
  gossiper: Introduce .shutdown() and use where appropriate
  gossiper: Set cluster_name via gossip_config
  gossiper, main: Straighten start/stop
  tests/cql_test_env: Open-code tst_init_ms_fd_gossiper
  tests/cql_test_env: De-global most of gossiper
  gossiper: Merge start_gossiping() overloads into one
  gossiper: Use is_... helpers
  gossiper: Fix do_shadow_round comment
  gossiper: Dispose dead code
avikivity added a commit that referenced this issue Sep 23, 2021
"
The main challenge here is to move messaging_service.start_listen()
call from out of gossiper into main. Other changes are pretty minor
compared to that and include

- patch gossiper API towards a standard start-shutdown-stop form
- gossiping "sharder info" in initial state
- configure cluster name and seeds via gossip_config

tests: unit(dev)
       dtest.bootstrap_test.start_stop_test_node(dev)
       manual(dev): start+stop, nodetool enable-/disablegossip

refs: #2737
refs: #2795
refs: #5489

"

* 'br-gossiper-dont-start-messaging-listen-2' of https://github.com/xemul/scylla:
  code: Expell gossiper.hh from other headers
  storage_service: Gossip "sharder" in initial states
  gossiper: Relax set_seeds()
  gossiper, main: Turn init_gossiper into get_seeds_from_config
  storage_service: Eliminate the do-bind argument from everywhere
  gossiper: Drop ms-registered manipulations
  messaging, main, gossiper: Move listening start into main
  gossiper: Do handlers reg/unreg from start/stop
  gossiper: Split (un)init_messaging_handler()
  gossiper: Relocate stop_gossiping() into .stop()
  gossiper: Introduce .shutdown() and use where appropriate
  gossiper: Set cluster_name via gossip_config
  gossiper, main: Straighten start/stop
  tests/cql_test_env: Open-code tst_init_ms_fd_gossiper
  tests/cql_test_env: De-global most of gossiper
  gossiper: Merge start_gossiping() overloads into one
  gossiper: Use is_... helpers
  gossiper: Fix do_shadow_round comment
  gossiper: Dispose dead code
denesb pushed a commit that referenced this issue Sep 30, 2021
The cql_config_updater is a sharded<> service that exists in main and
whose goal is to make sure some db::config's values are propagated into
cql_config. There's a more handy updateable_value<> glue for that.

tests: unit(dev)
refs: #2795

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210927090402.25980-1-xemul@scylladb.com>
avikivity added a commit that referenced this issue Oct 4, 2021
"
The storage_service is involved in the cdc_generation_service guts
more than needed.

 - the bool _for_testing bit is cdc-only
 - there's API-only cdc_generation_service getter
 - cdc_g._s. startup code partially sits in s._s. one

This patch cleans most of the above leaving only the startup
_cdc_gen_id on board.

tests: unit(dev)
refs: #2795

"

* 'br-storage-service-vs-cdc-2' of https://github.com/xemul/scylla:
  api: Use local sharded<cdc::generation_service> reference
  main: Push cdc::generation_service via API
  storage_service: Ditch for_testing boolean
  cdc: Replace db::config with generation_service::config
  cdc: Drop db::config from description_generator
  cdc: Remove all arguments from maybe_rewrite_streams_descriptions
  cdc: Move maybe_rewrite_streams_descriptions into after_join
  cdc: Squash two methods into one
  cdc: Turn make_new_cdc_generation a service method
  cdc: Remove ring-delay arg from make_new_cdc_generation
  cdc: Keep database reference on generation_service
avikivity added a commit that referenced this issue May 19, 2022
"
There's an explicit barrier in main that waits for the sstable format
selector to finish selecting it by the time node start to join a cluter.
(Actually -- not quite, when restarting a normal node it joins cluster
in prepare_to_join()).

This explicit barrier is not needed, the sync point already exists in
the way features are enabled, the format-selector just needs to use it.

branch: https://github.com/xemul/scylla/tree/br-format-selector-sync
tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/351/
refs: #2795
"

* 'br-format-selector-sync' of https://github.com/xemul/scylla:
  format-selector: Remove .sync() point
  format-selector: Coroutinize maybe_select_format()
  format-selector: Coroutinize simple methods
avikivity added a commit that referenced this issue May 19, 2022
"
There's an explicit barrier in main that waits for the sstable format
selector to finish selecting it by the time node start to join a cluter.
(Actually -- not quite, when restarting a normal node it joins cluster
in prepare_to_join()).

This explicit barrier is not needed, the sync point already exists in
the way features are enabled, the format-selector just needs to use it.

branch: https://github.com/xemul/scylla/tree/br-format-selector-sync
tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/351/
refs: #2795
"

* 'br-format-selector-sync' of https://github.com/xemul/scylla:
  format-selector: Remove .sync() point
  format-selector: Coroutinize maybe_select_format()
  format-selector: Coroutinize simple methods
avikivity added a commit that referenced this issue May 19, 2022
"
There's an explicit barrier in main that waits for the sstable format
selector to finish selecting it by the time node start to join a cluter.
(Actually -- not quite, when restarting a normal node it joins cluster
in prepare_to_join()).

This explicit barrier is not needed, the sync point already exists in
the way features are enabled, the format-selector just needs to use it.

branch: https://github.com/xemul/scylla/tree/br-format-selector-sync
tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/351/
refs: #2795
"

* 'br-format-selector-sync' of https://github.com/xemul/scylla:
  format-selector: Remove .sync() point
  format-selector: Coroutinize maybe_select_format()
  format-selector: Coroutinize simple methods
avikivity added a commit that referenced this issue May 23, 2022
"
There are several issues with it

- it's scattered between main() and storage_service methods
- yet another incarnation of it also sits in the cql-test-env
- the prepare_to_join() and join_token_ring() names are lying to readers,
  as sometimes node joins the ring in prepare- stage
- storage service has to carry several private fields to keep the state
  between prepare- and join- parts
- some storage service dependencies are only needed to satisfy joining,
  but since they cannot start early enough, they are pushed to storage
  service uninitialized "in the hope" that it won't use them until join

This patch puts joining steps in one place and enlightens storage service
not to carry unneeded dependencies/state onboard. And eliminates one more
usage of global proxy instance while at it.

branch: https://github.com/xemul/scylla/tree/br-merge-init-server-and-join-cluster
tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/466/
refs: #2795
"

* 'br-merge-init-server-and-join-cluster' of https://github.com/xemul/scylla:
  storage_service: Remove global proxy call
  storage_service: Remove sys_dist_ks from storage_service dependencies
  storage_service: Remove cdc_gen_service from storage_service dependencies
  storage_service: Make _cdc_gen_id local variable
  storage_service: Make _bootstrap_tokens local variable
  storage_service: Merge prepare- and join- private members
  storage_service: Move some code up the file
  storage_service: Coroutinize join_token_ring
  storage_service: Fix indentation after previous patch
  storage_service: Execute its .bootstrap() into async()
  storage_service: Dont assume async context in mark_existing_views_as_built
  storage_service: Merge init-server and join-cluster
  main, storage_service: Move wait for gossip to settle
  main, storage_service: Move passive announce subscription
  main, storage_service: Move early group0 join call
denesb added a commit that referenced this issue Oct 19, 2022
"
There's an ongoing effort to move the endpoint -> {dc/rack} mappings
from snitch onto topology object and this set finalizes it. After it the
snitch service stops depending on gossiper and system keyspace and is
ready for de-globalization. As a nice side-effect the system keyspace no
longer needs to maintain the dc/rack info cache and its starting code gets
relaxed.

refs: #2737
refs: #2795
"

* 'br-snitch-dont-mess-with-topology-data-2' of https://github.com/xemul/scylla: (23 commits)
  system_keyspace: Dont maintain dc/rack cache
  system_keyspace: Indentation fix after previous patch
  system_keyspace: Coroutinuze build_dc_rack_info()
  topology: Move all post-configuration to topology::config
  snitch: Start early
  gossiper: Do not export system keyspace
  snitch: Remove gossiper reference
  snitch: Mark get_datacenter/_rack methods const
  snitch: Drop some dead dependency knots
  snitch, code: Make get_datacenter() report local dc only
  snitch, code: Make get_rack() report local rack only
  storage_service: Populate pending endpoint in on_alive()
  code: Populate pending locations
  topology: Put local dc/rack on topology early
  topology: Add pending locations collection
  topology: Make get_location() errors more verbose
  token_metadata: Add config, spread everywhere
  token_metadata: Hide token_metadata_impl copy constructor
  gosspier: Remove messaging service getter
  snitch: Get local address to gossip via config
  ...
avikivity added a commit that referenced this issue Aug 12, 2023
There are some asserting checks for keyspace and table existence on cql_test_env that perform some one-linee work in a complex manner, tests can do better on their own. Removing it makes cql_test_env simpler

refs: #2795

Closes #15027

* github.com:scylladb/scylladb:
  test: Remove require_..._exists from cql_test_env
  test: Open-code ks.cf name parse into cdc_test
  test: Don't use require_table_exists() in test/lib/random_schema
  test: Use BOOST_REQUIRE(!db.has_schema())
  test: Use BOOST_REQUIRE(db.has_schema())
  test: Use BOOST_REQUIRE(db.has_keyspace())
  test: Threadify cql_query_test::test_compact_storage case
  test: Threadify some cql_query_test cases
avikivity added a commit that referenced this issue Aug 13, 2023
To add a sharded service to the cql_test_env one needs to patch it in 5 or 6 places

- add cql_test_env reference
- add cql_test_env constructor argument
- initialize the reference in initializer list
- add service variable to do_with method
- pass the variable to cql_test_env constructor
- (optionally) export it via cql_test_env public method

Steps 1 through 5 are annoying, things get much simpler if look like

- add cql_test_env variable
- (optionally) export it via cql_test_env public method

This is what this PR does

refs: #2795

Closes #15028

* github.com:scylladb/scylladb:
  cql_test_env: Drop local *this reference
  cql_test_env: Drop local references
  cql_test_env: Move most of the stuff in run_in_thread()
  cql_test_env: Open-code env start/stop and remove both
  cql_test_env: Keep other services as class variables
  cql_test_env: Keep services as class variables
  cql_test_env: Construct env early
  cql_test_env: De-static fdpinger variable
  cql_test_env: Define all services' variables early
  cql_test_env: Keep group0_client pointer
xemul added a commit to xemul/scylla that referenced this issue Aug 17, 2023
On boot several manipulations with system.local are performed.

1. The host_id value is selected from it with key = local

   If not found, system_keyspace generates a new host_id, inserts the
   new value into the table and returns back

2. The cluster_name is selected from it with key = local

   Then it's system_keyspace that either checks that the name matches
   the one from db::config, or inserts the db::config value into the
   table

3. The row with key = local is updated with various info like versions,
   listen, rpc and bcast addresses, dc, rack, etc. Unconditionally

All three steps are scattered over main, p.1 is called directly, p.2 and
p.3 are executed via system_keyspace::setup() that happens rather late.
Also there's some touch of this table from the cql_test_env startup code.

The proposal is to collect this setup into one place and execute it
early -- as soon as the system.local table is populated. This frees the
system_keyspace code from the logic of selecting host id and cluster
name leaving it to main and keeps it with only select/insert work.

refs: scylladb#2795

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
avikivity pushed a commit that referenced this issue Aug 20, 2023
On boot several manipulations with system.local are performed.

1. The host_id value is selected from it with key = local

   If not found, system_keyspace generates a new host_id, inserts the
   new value into the table and returns back

2. The cluster_name is selected from it with key = local

   Then it's system_keyspace that either checks that the name matches
   the one from db::config, or inserts the db::config value into the
   table

3. The row with key = local is updated with various info like versions,
   listen, rpc and bcast addresses, dc, rack, etc. Unconditionally

All three steps are scattered over main, p.1 is called directly, p.2 and
p.3 are executed via system_keyspace::setup() that happens rather late.
Also there's some touch of this table from the cql_test_env startup code.

The proposal is to collect this setup into one place and execute it
early -- as soon as the system.local table is populated. This frees the
system_keyspace code from the logic of selecting host id and cluster
name leaving it to main and keeps it with only select/insert work.

refs: #2795

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #15082
raphaelsc pushed a commit to raphaelsc/scylla that referenced this issue Aug 22, 2023
On boot several manipulations with system.local are performed.

1. The host_id value is selected from it with key = local

   If not found, system_keyspace generates a new host_id, inserts the
   new value into the table and returns back

2. The cluster_name is selected from it with key = local

   Then it's system_keyspace that either checks that the name matches
   the one from db::config, or inserts the db::config value into the
   table

3. The row with key = local is updated with various info like versions,
   listen, rpc and bcast addresses, dc, rack, etc. Unconditionally

All three steps are scattered over main, p.1 is called directly, p.2 and
p.3 are executed via system_keyspace::setup() that happens rather late.
Also there's some touch of this table from the cql_test_env startup code.

The proposal is to collect this setup into one place and execute it
early -- as soon as the system.local table is populated. This frees the
system_keyspace code from the logic of selecting host id and cluster
name leaving it to main and keeps it with only select/insert work.

refs: scylladb#2795

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb#15082
raphaelsc pushed a commit to raphaelsc/scylla that referenced this issue Aug 29, 2023
On boot several manipulations with system.local are performed.

1. The host_id value is selected from it with key = local

   If not found, system_keyspace generates a new host_id, inserts the
   new value into the table and returns back

2. The cluster_name is selected from it with key = local

   Then it's system_keyspace that either checks that the name matches
   the one from db::config, or inserts the db::config value into the
   table

3. The row with key = local is updated with various info like versions,
   listen, rpc and bcast addresses, dc, rack, etc. Unconditionally

All three steps are scattered over main, p.1 is called directly, p.2 and
p.3 are executed via system_keyspace::setup() that happens rather late.
Also there's some touch of this table from the cql_test_env startup code.

The proposal is to collect this setup into one place and execute it
early -- as soon as the system.local table is populated. This frees the
system_keyspace code from the logic of selecting host id and cluster
name leaving it to main and keeps it with only select/insert work.

refs: scylladb#2795

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb#15082
avikivity added a commit that referenced this issue Sep 13, 2023
This code is now spread over main and differs in cql_test_env. The PR unifies both places and makes the manager start-stop look standard

refs: #2795

Closes #15375

* github.com:scylladb/scylladb:
  batchlog_manager: Remove start() method
  batchlog_manager: Start replay loop in constructor
  main, cql_test_env: Start-stop batchlog manager in one "block"
  batchlog_manager: Move shard-0 check into batchlog_replay_loop()
  batchlog_manager: Fix drain() reentrability
denesb added a commit that referenced this issue Nov 21, 2023
…lyanov

Currently storage service starts too early and its initialization is split into several steps. This PR makes storage service start "late enough" and makes its initialization (minimally required before joining cluster) happen in on place.

refs: #2795
refs: #2737

Closes #16103

* github.com:scylladb/scylladb:
  storage_service: Drop (un)init_messaging_service_part() pair
  storage_service: Init/Deinit RPC handlers in constructor/stop
  storage_service: Dont capture container() on RPC handler
  storage_service: Use storage_service::_sys_dist_ks in some places
  storage_service: Add explicit dependency on system dist. keyspace
  storage_service: Rurn query processor pointer into reference
  storage_service: Add explicity query_processor dependency
  main: Start storage service later
denesb added a commit that referenced this issue Nov 22, 2023
…lyanov

Currently storage service starts too early and its initialization is split into several steps. This PR makes storage service start "late enough" and makes its initialization (minimally required before joining cluster) happen in on place.

refs: #2795
refs: #2737

Closes #16103

* github.com:scylladb/scylladb:
  storage_service: Drop (un)init_messaging_service_part() pair
  storage_service: Init/Deinit RPC handlers in constructor/stop
  storage_service: Dont capture container() on RPC handler
  storage_service: Use storage_service::_sys_dist_ks in some places
  storage_service: Add explicit dependency on system dist. keyspace
  storage_service: Rurn query processor pointer into reference
  storage_service: Add explicity query_processor dependency
  main: Start storage service later
@dani-tweig dani-tweig removed the n00b label Feb 4, 2024
denesb added a commit that referenced this issue Apr 16, 2024
For view builder draining there's dedicated deferred action in main while all other services that need to be drained do it via storage_service. The latter is to unify shutdown for services and to make `nodetool drain` drain everything, not just some part of those. This PR makes view builder drain look the same. As a side effect it also moves `mark_existing_views_as_built` from storage service to view builder and generalizes this marking code inside view builder itself.

refs: #2737
refs: #2795

Closes #16558

* github.com:scylladb/scylladb:
  storage_service: Drain view builder on drain too
  view_builder: Generalize mark_as_built(view_ptr) method
  view_builder: Move mark_existing_views_as_built from storage service
  storage_service: Add view_builder& reference
  main,cql_test_env: Move view_builder start up (and make unconditional)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants