Skip to content

Commit

Permalink
tools/scylla-sstable: always read scylla.yaml
Browse files Browse the repository at this point in the history
Currently, scylla.yaml is read conditionally, if either the user
provided `--scylla-yaml-file` command line parameter, or if deducing the
data dir location from the sstable path failed.
We want the scylla.yaml file to be always read, so that when working
with encrypted file (enterprise), scylla-sstable can pick up the
configuration for the encryption.
This patch makes scylla-sstable always attempt to read the scylla-yaml
file, whether the user provided a location for it or not. When not, the
default location is used (also considering the `SCYLLA_CONF` and
`SCYLLA_HOME` environment variables.
Failing to find the scylla.yaml file is not considered an error. The
rational is that the user will discover this if they attempt to do an
operation that requires this anyway.
There is a debug-level log about whether it was successfully read or
not.

Fixes: #16132

Closes #16174
  • Loading branch information
denesb authored and avikivity committed Dec 5, 2023
1 parent 2ebdc40 commit 5fb0d66
Show file tree
Hide file tree
Showing 3 changed files with 102 additions and 48 deletions.
6 changes: 4 additions & 2 deletions docs/operating-scylla/admin-tools/scylla-sstable.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,10 @@ By default (no schema-related options are provided), the tool will try the follo

* Try to load schema from ``schema.cql``.
* Try to deduce the ScyllaDB data directory path and table names from the SStable path.
* Try to load the schema from the ScyllaDB directory located at the standard location (``/var/lib/scylla``). For this to succeed, the table name has to be provided via ``--keyspace`` and ``--table``.
* Try to load the schema from the ScyllaDB directory path obtained from config at the standard location (``./conf/scylla.yaml``). ``SCYLLA_CONF`` and ``SCYLLA_HOME`` environment variables are also checked. For this to succeed, the table name has to be provided via ``--keyspace`` and ``--table``.
* Try to load the schema from the ScyllaDB data directory path, obtained from the configuration file, at the standard location (``./conf/scylla.yaml``).
``SCYLLA_CONF`` and ``SCYLLA_HOME`` environment variables are also checked.
If the configuration file cannot be located, the default ScyllaDB data directory path (``/var/lib/scylla``) is used.
For this to succeed, the table name has to be provided via ``--keyspace`` and ``--table``.

The tool stops after the first successful attempt. If none of the above succeed, an error message will be printed.
A user provided schema in ``schema.cql`` (if present) always takes precedence over other methods. This is deliberate, to allow to manually override the schema to be used.
Expand Down
22 changes: 22 additions & 0 deletions test/cql-pytest/test_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -600,6 +600,13 @@ def test_table_dir_system_schema(self, scylla_path, system_scylla_local_sstable_
system_scylla_local_sstable_prepared,
system_scylla_local_reference_dump)

def test_table_dir_system_schema_deduced_keyspace_table(self, scylla_path, system_scylla_local_sstable_prepared, system_scylla_local_reference_dump):
self.check(
scylla_path,
["--system-schema"],
system_scylla_local_sstable_prepared,
system_scylla_local_reference_dump)

def test_table_dir_schema_file(self, scylla_path, system_scylla_local_sstable_prepared, system_scylla_local_reference_dump, system_scylla_local_schema_file):
self.check(
scylla_path,
Expand All @@ -614,6 +621,13 @@ def test_table_dir_data_dir(self, scylla_path, system_scylla_local_sstable_prepa
system_scylla_local_sstable_prepared,
system_scylla_local_reference_dump)

def test_table_dir_data_dir_deduced_keyspace_table(self, scylla_path, system_scylla_local_sstable_prepared, system_scylla_local_reference_dump, scylla_data_dir):
self.check(
scylla_path,
["--scylla-data-dir", scylla_data_dir],
system_scylla_local_sstable_prepared,
system_scylla_local_reference_dump)

def test_table_dir_scylla_yaml(self, scylla_path, system_scylla_local_sstable_prepared, system_scylla_local_reference_dump, scylla_home_dir):
scylla_yaml_file = os.path.join(scylla_home_dir, "conf", "scylla.yaml")
self.check(
Expand All @@ -622,6 +636,14 @@ def test_table_dir_scylla_yaml(self, scylla_path, system_scylla_local_sstable_pr
system_scylla_local_sstable_prepared,
system_scylla_local_reference_dump)

def test_table_dir_scylla_yaml_deduced_keyspace_table(self, scylla_path, system_scylla_local_sstable_prepared, system_scylla_local_reference_dump, scylla_home_dir):
scylla_yaml_file = os.path.join(scylla_home_dir, "conf", "scylla.yaml")
self.check(
scylla_path,
["--scylla-yaml-file", scylla_yaml_file],
system_scylla_local_sstable_prepared,
system_scylla_local_reference_dump)

def test_external_dir_system_schema(self, scylla_path, system_scylla_local_sstable_prepared, system_scylla_local_reference_dump, temp_workdir):
ext_sstable = self.copy_sstable_to_external_dir(system_scylla_local_sstable_prepared, temp_workdir)
self.check(
Expand Down
122 changes: 76 additions & 46 deletions tools/scylla-sstable.cc
Original file line number Diff line number Diff line change
Expand Up @@ -136,14 +136,55 @@ partition_set get_partitions(schema_ptr schema, const bpo::variables_map& app_co
return partitions;
}

struct sstable_path_info {
std::filesystem::path sstable_path;
std::filesystem::path data_dir_path;
sstring keyspace;
sstring table;
};

sstable_path_info extract_from_sstable_path(const bpo::variables_map& app_config) {
if (!app_config.count("sstables")) {
throw std::invalid_argument("cannot extract information from sstable path, no sstable arguments");
}

auto sst_path = std::filesystem::path(app_config["sstables"].as<std::vector<sstring>>().front());
sstring keyspace, table;
try {
auto [_, ks, tbl] = sstables::parse_path(sst_path);
keyspace = std::move(ks);
table = std::move(tbl);
} catch (const sstables::malformed_sstable_exception&) {
throw std::invalid_argument(fmt::format("cannot extract information from sstable path, sstable has invalid path: {}", sst_path));
}
const auto sst_dir_path = std::filesystem::path(sst_path).remove_filename();
std::filesystem::path data_dir_path;
// Detect whether sstable is in root table directory, or in a sub-directory
// The last component is "" due to the trailing "/" left by "remove_filename()" above.
// So we need to go back 2 more, to find the supposed keyspace component.
if (keyspace == std::prev(sst_dir_path.end(), 3)->native()) {
data_dir_path = sst_dir_path / ".." / "..";
} else {
data_dir_path = sst_dir_path / ".." / ".." / "..";
}

return sstable_path_info{std::move(sst_path), std::move(data_dir_path), std::move(keyspace), std::move(table)};
}

std::pair<sstring, sstring> get_keyspace_and_table_options(const bpo::variables_map& app_config) {
sstring keyspace_name, table_name;
auto k_it = app_config.find("keyspace");
auto t_it = app_config.find("table");
if (k_it == app_config.end() || t_it == app_config.end()) {
throw std::invalid_argument("don't know which schema to load: --keyspace and/or --table are not provided");
if (k_it != app_config.end() || t_it != app_config.end()) {
return std::pair(k_it->second.as<sstring>(), t_it->second.as<sstring>());
}

try {
auto info = extract_from_sstable_path(app_config);
return std::pair(info.keyspace, info.table);
} catch (...) {
throw std::invalid_argument("don't know which schema to load: no --keyspace and --table provided, failed to extract keyspace/table from sstable paths");
}
return std::pair(k_it->second.as<sstring>(), t_it->second.as<sstring>());
}

struct schema_with_source {
Expand Down Expand Up @@ -182,9 +223,6 @@ std::optional<schema_with_source> try_load_schema_from_user_provided_source(cons
}
if (app_config.contains("scylla-yaml-file")) {
schema_source_opt = "schema-tables";
const auto scylla_yaml_path = app_config["scylla-yaml-file"].as<sstring>();
cfg.read_from_file(scylla_yaml_path).get();
cfg.setup_directories();
const auto data_dir_path = std::filesystem::path(cfg.data_file_directories()[0]);
return schema_with_source{.schema = tools::load_schema_from_schema_tables(data_dir_path, keyspace_name, table_name).get(),
.source = schema_source_opt,
Expand Down Expand Up @@ -213,22 +251,11 @@ std::optional<schema_with_source> try_load_schema_autodetect(const bpo::variable

if (app_config.count("sstables")) {
try {
auto sst_path = std::filesystem::path(app_config["sstables"].as<std::vector<sstring>>().front());
auto [ed, ks, cf] = sstables::parse_path(sst_path);
const auto sst_dir_path = std::filesystem::path(sst_path).remove_filename();
std::filesystem::path data_dir_path;
// Detect whether sstable is in root table directory, or in a sub-directory
// The last component is "" due to the trailing "/" left by "remove_filename()" above.
// So we need to go back 2 more, to find the supposed keyspace component.
if (ks == std::prev(sst_dir_path.end(), 3)->native()) {
data_dir_path = sst_dir_path / ".." / "..";
} else {
data_dir_path = sst_dir_path / ".." / ".." / "..";
}
return schema_with_source{.schema = tools::load_schema_from_schema_tables(data_dir_path, ks, cf).get(),
auto info = extract_from_sstable_path(app_config);
return schema_with_source{.schema = tools::load_schema_from_schema_tables(info.data_dir_path, info.keyspace, info.table).get(),
.source = "schema-tables",
.path = data_dir_path,
.obtained_from = format("sstable path ({})", sst_path)};
.path = info.data_dir_path,
.obtained_from = format("sstable path ({})", info.sstable_path)};
} catch (...) {
sst_log.debug("Trying to find scylla data dir based on the sstable path failed: {}", std::current_exception());
}
Expand All @@ -237,32 +264,14 @@ std::optional<schema_with_source> try_load_schema_autodetect(const bpo::variable
}

try {
auto scylla_yaml_file = db::config::get_conf_sub("scylla.yaml").string();
cfg.read_from_file(scylla_yaml_file).get();
cfg.setup_directories();
auto [keyspace_name, table_name] = get_keyspace_and_table_options(app_config);
const auto data_dir_path = std::filesystem::path(cfg.data_file_directories()[0]);
return schema_with_source{.schema = tools::load_schema_from_schema_tables(data_dir_path, keyspace_name, table_name).get(),
.source = "schema-tables",
.path = data_dir_path,
.obtained_from = format("scylla.yaml file - default location ({})", scylla_yaml_file)};
} catch (...) {
sst_log.debug("Trying to find and read scylla.yaml failed: {}", std::current_exception());
}

try {
// Place on heap to avoid wasting stack space
auto pcfg = std::make_unique<db::config>();
auto& cfg = *pcfg;
cfg.setup_directories();
auto [keyspace_name, table_name] = get_keyspace_and_table_options(app_config);
const auto [keyspace_name, table_name] = get_keyspace_and_table_options(app_config);
const auto data_dir_path = std::filesystem::path(cfg.data_file_directories()[0]);
return schema_with_source{.schema = tools::load_schema_from_schema_tables(data_dir_path, keyspace_name, table_name).get(),
.source = "schema-tables",
.path = data_dir_path,
.obtained_from = "default location for data dir"};
.obtained_from = "data dir"};
} catch (...) {
sst_log.debug("Trying to find scylla data dir at default location failed: {}", std::current_exception());
sst_log.debug("Trying to locate data dir failed: {}", std::current_exception());
}

fmt::print(std::cerr, "Failed to autodetect and load schema, try again with --logger-log-level scylla-sstable=debug to learn more or provide the schema source manually\n");
Expand Down Expand Up @@ -2852,10 +2861,12 @@ are multiple ways to obtain the schema:
## System schema
If the examined sstables belong to a system table, whose schema is
hardcoded in scylla (and thus known), it is enough to provide just
the name of said table in the `keyspace.table` notation, via the
`--system-schema` command line option. The table has to be from one of
the following system keyspaces:
hardcoded in ScyllaDB (and thus known), it is enough to provide just
the name of said table via the --keyspace and --table command line
parameters. Alternatively, the keyspace and tablename can be deduced from
the path of the sstable, if the sstable is in its natural directory, in
ScyllaDB's data dir.
The table has to be from one of the following system keyspaces:
* system
* system_schema
* system_distributed
Expand Down Expand Up @@ -2923,6 +2934,25 @@ Validate the specified sstables:

auto& dbcfg = *app.cfg().db_cfg_ext->db_cfg;

sstring scylla_yaml_path;
sstring scylla_yaml_path_source;

if (app_config.count("scylla-yaml-file")) {
scylla_yaml_path = app_config["scylla-yaml-file"].as<sstring>();
scylla_yaml_path_source = "user provided";
} else {
scylla_yaml_path = db::config::get_conf_sub("scylla.yaml").string();
scylla_yaml_path_source = "default";
}

if (file_exists(scylla_yaml_path).get()) {
dbcfg.read_from_file(scylla_yaml_path).get();
dbcfg.setup_directories();
sst_log.debug("Successfully read scylla.yaml from {} location of {}", scylla_yaml_path_source, scylla_yaml_path);
} else {
sst_log.debug("Failed to read scylla.yaml from {} location of {}, some functionality may be unavailable", scylla_yaml_path_source, scylla_yaml_path);
}

{
unsigned schema_sources = 0;
schema_sources += !app_config["schema-file"].defaulted();
Expand Down

0 comments on commit 5fb0d66

Please sign in to comment.