Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintenance socket: set filesystem permissions to 660 #17113

Merged

Conversation

margdoc
Copy link
Contributor

@margdoc margdoc commented Feb 1, 2024

Set filesystem permissions for the maintenance socket to 660 (previously it was 755) to allow a scyllaadm's group to connect.
Split the logic of creating sockets into two separate functions, one for each case: when it is a regular cql controller or used by maintenance_socket.

Fixes #16487.

@margdoc margdoc force-pushed the maintenance_socket_permissions branch from a1fc469 to 7e540ee Compare February 1, 2024 14:58
@avikivity
Copy link
Member

What were the permissions before the patch? I'm surprised if "others" had write permission, which is needed to connect.

@margdoc
Copy link
Contributor Author

margdoc commented Feb 2, 2024

What were the permissions before the patch? I'm surprised if "others" had write permission, which is needed to connect.

It was 755, so only the owner could connect to the socket.

@kbr-scylla
Copy link
Contributor

@ptrsmrn please assign someone from your team to review this (I think this can be considered a "frontend" feature)

Copy link
Contributor

@dawmd dawmd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -65,6 +66,124 @@ future<> controller::start_server() {
return do_start_server().finally([this] { _ops_sem.signal(); });
}

// Wrapper for cql_server::listen to invoke it on all shards and log address and listen options.
static future<> listen(sharded<cql_server>& cserver, socket_address addr, std::shared_ptr<seastar::tls::credentials_builder> creds, bool is_shard_aware, bool keepalive, std::optional<file_permissions> unix_domain_socket_permissions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to assign unix_domain_socket_permissions a default value instead of making it optional, that way you won't have to pass an extra std::nullopt if you don't intend to specify permissions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this makes sense because afacs seastar part of this patch is either doing some calls or not (use defaults) so later it would require extra check if unix_domain_socket_permissions != default

Documennthing this in some way in the comment in generic_server.hh could be nice though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a difference between setting permissions to a default value, and not changing permissions.

transport/controller.cc Outdated Show resolved Hide resolved
Comment on lines +75 to +76
logger.info("Starting listening for CQL clients on {} ({}, {})"
, addr, creds ? "encrypted" : "unencrypted", is_shard_aware ? "shard-aware" : "non-shard-aware"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing to something shorter like:

Suggested change
logger.info("Starting listening for CQL clients on {} ({}, {})"
, addr, creds ? "encrypted" : "unencrypted", is_shard_aware ? "shard-aware" : "non-shard-aware"
logger.info("Starting listening for CQL clients on {} (encrypted: {}, shard-aware: {})", addr, creds, is_shard_aware)

(or better print "true" or "false")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's code already existing on master, I'd not change it in this PR

auto& cfg = _config;
auto preferred = cfg.rpc_interface_prefer_ipv6() ? std::make_optional(net::inet_address::family::INET6) : std::nullopt;
auto family = cfg.enable_ipv6_dns_lookup() || preferred ? std::nullopt : std::make_optional(net::inet_address::family::INET);
auto ceo = cfg.client_encryption_options();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last I checked @dorlaor was still our CEO :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know you can reassign this position so easily!

transport/controller.cc Outdated Show resolved Hide resolved
@avikivity
Copy link
Member

What were the permissions before the patch? I'm surprised if "others" had write permission, which is needed to connect.

It was 755, so only the owner could connect to the socket.

Aha. But is changing permissions enough? We need to chgrp it too.

TCP sockets and unix domain sockets don't share common listen options
excluding `socket_address`. For unix domain sockets, available options will be
expanded to cover also filesystem permissions and owner for the socket.
Storing listen options for both types of sockets in one structure would become messy.
For now, both use `listen_cfg`.

In a singular cql controller, only sockets of one type are created, thus it
can be easily split into two cases.
Isolate maintenance socket from `listen_cfg`.
@margdoc margdoc force-pushed the maintenance_socket_permissions branch 2 times, most recently from 2e0f271 to 92e08c3 Compare February 5, 2024 13:32
@margdoc
Copy link
Contributor Author

margdoc commented Feb 5, 2024

v2:

  • rebase on master
  • listen -> listen_on_all_shards
  • get rid of the magic number for the socket path's maximal length

@margdoc
Copy link
Contributor Author

margdoc commented Feb 5, 2024

v3:

  • added option to change owning group

@nuivall
Copy link
Member

nuivall commented Feb 5, 2024

@margdoc it's not immediately visible, is 1st commit pure code move?

}

struct stat statbuf;
auto stat_result = ::stat(socket.c_str(), &statbuf);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't bypass Seastar APIs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it's a preexisting problem. Please fix it later in a follow-up.


// Remove the socket if it already exists, otherwise when the server
// tries to listen on it, it will hang on bind().
auto unlink_result = ::unlink(socket.c_str());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too.

@nuivall
Copy link
Member

nuivall commented Feb 5, 2024

It would be good to add to cover letter why the change to 660.

transport/controller.cc Outdated Show resolved Hide resolved
@@ -764,6 +764,8 @@ db::config::config(std::shared_ptr<db::extensions> exts)
"\tworkdir the node will open the maintenance socket on the path <scylla's workdir>/cql.m,\n"
"\t where <scylla's workdir> is a path defined by the workdir configuration option\n"
"\t<socket path> the node will open the maintenance socket on the path <socket path>")
, maintenance_socket_group(this, "maintenance_socket_group", value_status::Used, "",
"The group that the maintenance socket will be owned by. If not set, the group will be the same as the user running the scylla node.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to default it to scyllaadm in installations.

This means to set it scyllaadm in conf/scylla.yaml, but that's used in un-installed runs (./build/dev/scylla ...) (is it really? please check)

I think this is the most flexible wrt. packaging.

  1. add --admin-group option to install.sh, and if set, use its value to adjust the installed scylla.aml.
  2. use the new option in dist/redhat/scylla.spec and dist/debian/debian/rules, with scyllaadm as the group
  3. also adjust those files to create scyllaadm and make it the scylla user's group (these files create the scylla user via useradd/adduser), also install.sh plays with users too

/cc @syuu1228

I guess this is quite some work (esp. testing) so can be done in a follow-up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to use the scylla group. scyllaadm only exists in scylla-machine-image. Later, we'll need to adjust it to make scyllaadm a member of the scylla group.

struct group *grp;
grp = ::getgrnam(_config.maintenance_socket_group().c_str());
if (grp) {
auto chown_result = ::chown(socket.c_str(), ::geteuid(), grp->gr_gid);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if current user is not member of the group I guess this fails? Perhaps worth to document that or display more friendly message. Looks like there is a func for that: explain_message_errno_chown

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is a func for that: explain_message_errno_chown

It would need library libexplain to be installed. I don't know if it will always be true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then perhaps some documentation and mention maintenance socket in the error message? If user sets this option he needs to run scylla as a part of this group, maybe it's obvious when you read the code, but as an user you get random chown it may be less obvious.

I think the message currently would look similar to:
"Failed to chown /path/cql.m: EPERM: Not owner"

Don't know exactly how strerror output look like in this case. But as you can see there is no link to maintenance socket feature and user may be a bit lost.

transport/controller.cc Outdated Show resolved Hide resolved
@margdoc
Copy link
Contributor Author

margdoc commented Feb 8, 2024

@margdoc it's not immediately visible, is 1st commit pure code move?

Yes.

It would be good to add to cover letter why the change to 660.

It is explained in the linked issue #16487.
Added also to the cover letter why it is needed.

…nce_socket

Set filesystem permissions for the maintenance socket to 660.

Fixes scylladb#16487
…al length

Calculate `max_socket_length` from the size of the structure
representing the Unix domain socket address.
Option `maintenance-socket-group` sets the owning group of the maintenance socket.
If not set, the group will be the same as the user running the scylla node.
@margdoc margdoc force-pushed the maintenance_socket_permissions branch from edff0df to 182cfeb Compare February 19, 2024 09:25
@margdoc
Copy link
Contributor Author

margdoc commented Feb 19, 2024

v4:

  • more descriptive error messages
  • added a note in maintenance socket's docs
  • move the unix_domain_socket_permissions definition closer to its usage
  • more readable if statement

@margdoc margdoc marked this pull request as ready for review February 19, 2024 09:31
@scylladb-promoter
Copy link
Contributor

Docs Preview 📖

Docs Preview for this pull request is available here

Changed Files:

Note: This preview will be available for 30 days and will be automatically deleted after that period. You can manually trigger a new build by committing changes.

@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
✅ - Container Test
✅ - dtest
❌ - Unit Tests

Failed Tests (4/25863):

Build Details:

  • Duration: 5 hr 3 min
  • Builder: i-01ebd67647fc95bec (m5ad.8xlarge)

@margdoc
Copy link
Contributor Author

margdoc commented Feb 20, 2024

Failed tests:
#17422
#17423

Rerunning CI.

@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - Container Test
✅ - dtest
✅ - Unit Tests

Build Details:

  • Duration: 1 hr 59 min
  • Builder: spider2.cloudius-systems.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Maintenance socket follow-up: better permissions
7 participants