Skip to content

Commit

Permalink
Merge ' message: match unknown tenants to the default tenant' from Bo…
Browse files Browse the repository at this point in the history
…tond Dénes

On connection setup, the isolation cookie of the connection is matched to the appropriate scheduling group. This is achieved by iterating over the known statement tenant connection types as well as the system connections and choosing the one with a matching name.

If a match is not found, it is assumed that the cluster is upgraded and the remote node has a scheduling group the local one doesn't have. To avoid demoting a scheduling group of unknown importance, in this case the default scheduling group is chosen.

This is problematic when upgrading an OSS cluster to an enterprise version, as the scheduling groups of the enterprise service-levels will match none of the statement tenants and will hence fall-back to the default scheduling group. As a consequence, while the cluster is mixed, user workload on old (OSS) nodes, will be executed under the system scheduling group and concurrency semaphore. Not only does this mean that user workloads are directly competing for resources with system ones, but the two workloads are now sharing the semaphore too, reducing the available throughput. This usually manifests in queries timing out on the old (OSS) nodes in the cluster.

This PR proposes to fix this, by recognizing that the unknown scheduling group is in fact a tenant this node doesn't know yet, and matching it with the default statement tenant. With this, order should be restored, with service-level connections being recognized as user connections and being executed in the statement scheduling group and the statement (user) concurrency semaphore.

I tested this manually, by creating a cluster of 2 OSS nodes, then upgrading one of the nodes to enterprise and verifying (with extra logging) that service level connections are matched to the default statement tenant after the PR and they indeed match to the default scheduling group before.

Fixes: #13841
Fixes: #12552

Closes #13843

* github.com:scylladb/scylladb:
  message: match unknown tenants to the default tenant
  message: generalize per-tenant connection types

(cherry picked from commit a7c2c9f)
  • Loading branch information
avikivity authored and denesb committed Jul 12, 2023
1 parent 1a6f438 commit c9a5c4c
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 3 deletions.
14 changes: 11 additions & 3 deletions message/messaging_service.cc
Original file line number Diff line number Diff line change
Expand Up @@ -628,9 +628,9 @@ messaging_service::initial_scheduling_info() const {
sched_infos.reserve(sched_infos.size() +
_scheduling_config.statement_tenants.size() * PER_TENANT_CONNECTION_COUNT);
for (const auto& tenant : _scheduling_config.statement_tenants) {
sched_infos.push_back({ tenant.sched_group, "statement:" + tenant.name });
sched_infos.push_back({ tenant.sched_group, "statement-ack:" + tenant.name });
sched_infos.push_back({ tenant.sched_group, "forward:" + tenant.name });
for (auto&& connection_prefix : _connection_types_prefix) {
sched_infos.push_back({ tenant.sched_group, sstring(connection_prefix) + tenant.name });
}
}

assert(sched_infos.size() == PER_SHARD_CONNECTION_COUNT +
Expand All @@ -656,6 +656,14 @@ messaging_service::scheduling_group_for_isolation_cookie(const sstring& isolatio
return info.sched_group;
}
}
// Check for the case of the client using a connection class we don't
// recognize, but we know its a tenant, not a system connection.
// Fall-back to the default tenant in this case.
for (auto&& connection_prefix : _connection_types_prefix) {
if (isolation_cookie.find(connection_prefix.data()) == 0) {
return _scheduling_config.statement_tenants.front().sched_group;
}
}
// Client is using a new connection class that the server doesn't recognize yet.
// Assume it's important, after server upgrade we'll recognize it.
return default_scheduling_group();
Expand Down
2 changes: 2 additions & 0 deletions message/messaging_service.hh
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#include <list>
#include <vector>
#include <optional>
#include <array>
#include <absl/container/btree_set.h>
#include <seastar/net/tls.hh>

Expand Down Expand Up @@ -526,6 +527,7 @@ public:
scheduling_group scheduling_group_for_isolation_cookie(const sstring& isolation_cookie) const;
std::vector<messaging_service::scheduling_info_for_connection_index> initial_scheduling_info() const;
unsigned get_rpc_client_idx(messaging_verb verb) const;
static constexpr std::array<std::string_view, 3> _connection_types_prefix = {"statement:", "statement-ack:", "forward:"};
};

} // namespace netw

0 comments on commit c9a5c4c

Please sign in to comment.