-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Give possibility for labeling the clusters added to SM #3219
Comments
@zimnx what's exactly needed here ? Operator must know all the detials about the cluster including |
The issue is there's no way to tell whether cluster you PUT into API is in the desired state, because GET doesn't return all the info - which is fine. Proposed solution is to add a |
This would be useful for us in tasks as well, as it would allow for a more robust assessment of the tasks being up to date, i.e. we wouldn't have to compare the properties of the desired task specification and the actual state in the manager. Ref: scylladb/scylla-operator#1827. The approach proposed by @zimnx, i.e. a map of labels/annotations, would work in both cases. @karol-kokoszka can we have it triaged? |
@rzetelskik @zimnx
What is missing exactly ? |
@karol-kokoszka let's forget about the initial problem described here and focus on the feature request. Let me give you a different use case: Say we create a cluster and try to save its ID on our side. The ID can be lost or we can simply get an older version of the object to process, which doesn't yet have the ID. Without the ID we can't be sure that the cluster we've found through listing is the one we created earlier, or if we just hit a name collision - so we have to delete it and create it again. How do the annotations help here? We could save a UUID there, so that when we get the cluster from the manager state, we know for sure it's the one we created earlier, regardless of if we lost the cluster ID in the process or not. Hitting this is actually quite common. Another one is the updates I mentioned earlier - even if the calls you posted return all the fields we need, on every iteration we have to translate between the manager state and our internal one, and compare the specifications to figure out if we need to update it or not. Even if we do it correctly at some point and get all the fields right, it's prone to break with any changes in defaulting or with new attributes. If we have the possibility to save the hash of our representation of the cluster/task in manager state, its enough for us to compare the hashes to know if we need to update, instead of comparing the entire specification every time. I understand you might be hesitant to extend the API with something serving a specific use case, but I think the solution to this is broad enough to prove helpful for other use cases as well. |
@karol-kokoszka coming back to your comment - the issue described by @zimnx still stands. The responses posted by you only say whether the username/password are set, not what their values are - and rightly so. What follows from that though is that we can't verify whether they are synchronised, i.e. if we were to change the username/password, we can't verify if we should send an update request or if they were already updated. The annotations would allow us to verify if the hashes changed, and so if we should send an update request. ATM there's no way for us to check this for some of the values, namely password in this case. $ sctool cluster add --host=scylla-client.scylla.svc --name=scylla/scylla --auth-token=v4qw9nxq75f8ssp7bx8f76t5r64jg7t7wzkffp849wvjt67hrk9xdv6wncq5rjr6qq8jckdmvfr75mr6lnbrqgbcr8n84ljwggxmtrb8pn4jkzwqx9x8plfz5ndl8fbf --force-non-ssl-session-port --force-tls-disabled --username=cassandra --password=cassandra
c9e1cc68-09b6-4048-8bc6-bf730b66570c
WARNING! TLS is explicitly disabled on cluster sessions (--force-tls-disabled).
WARNING! Cluster session is going to always use non SSL port (--force-non-ssl-session-port).
__
/ \ Cluster added! You can set it as default, by exporting its name or ID as env variable:
@ @ $ export SCYLLA_MANAGER_CLUSTER=c9e1cc68-09b6-4048-8bc6-bf730b66570c
| | $ export SCYLLA_MANAGER_CLUSTER=scylla/scylla
|| |/
|| || Now run:
|\_/| $ sctool status -c scylla/scylla
\___/ $ sctool tasks -c scylla/scylla
$ curl -i 127.0.0.1:5080/api/v1/clusters
HTTP/1.1 200 OK
Content-Type: application/json
Date: Fri, 29 Mar 2024 17:11:22 GMT
Content-Length: 343
[{"id":"c9e1cc68-09b6-4048-8bc6-bf730b66570c","name":"scylla/scylla","host":"scylla-client.scylla.svc","auth_token":"v4qw9nxq75f8ssp7bx8f76t5r64jg7t7wzkffp849wvjt67hrk9xdv6wncq5rjr6qq8jckdmvfr75mr6lnbrqgbcr8n84ljwggxmtrb8pn4jkzwqx9x8plfz5ndl8fbf","force_tls_disabled":true,"force_non_ssl_session_port":true,"username":"set","password":"set"}]
$ curl -i 127.0.0.1:5080/api/v1/cluster/c9e1cc68-09b6-4048-8bc6-bf730b66570c
HTTP/1.1 200 OK
Content-Type: application/json
Date: Fri, 29 Mar 2024 17:12:02 GMT
Content-Length: 307
{"id":"c9e1cc68-09b6-4048-8bc6-bf730b66570c","name":"scylla/scylla","host":"scylla-client.scylla.svc","auth_token":"v4qw9nxq75f8ssp7bx8f76t5r64jg7t7wzkffp849wvjt67hrk9xdv6wncq5rjr6qq8jckdmvfr75mr6lnbrqgbcr8n84ljwggxmtrb8pn4jkzwqx9x8plfz5ndl8fbf","force_tls_disabled":true,"force_non_ssl_session_port":true}
$ sctool cluster update --cluster=c9e1cc68-09b6-4048-8bc6-bf730b66570c --username=cassandra --password=changed
$ curl -i 127.0.0.1:5080/api/v1/clusters
HTTP/1.1 200 OK
Content-Type: application/json
Date: Fri, 29 Mar 2024 17:12:43 GMT
Content-Length: 343
[{"id":"c9e1cc68-09b6-4048-8bc6-bf730b66570c","name":"scylla/scylla","host":"scylla-client.scylla.svc","auth_token":"v4qw9nxq75f8ssp7bx8f76t5r64jg7t7wzkffp849wvjt67hrk9xdv6wncq5rjr6qq8jckdmvfr75mr6lnbrqgbcr8n84ljwggxmtrb8pn4jkzwqx9x8plfz5ndl8fbf","force_tls_disabled":true,"force_non_ssl_session_port":true,"username":"set","password":"set"}] |
@rzetelskik but why do you need to know more ? This is not the best approach to let sever responding you with the answer if the password stored matches the given one. |
this, but not just for the password - the proposed solution would let us hash the entire specification on our side and only compare the hashes, instead of comparing the manager state and our desired specification field by field
That's essentialy what we end up doing now, and we flood the manager with unnecessary updates. See e.g. scylladb/scylla-operator#1827 - this issue is mentioning tasks, not clusters, and is caused by some inconsistencies in conversions between our desired specification and the manager state, and back, but you can see in the example logs how frequent these updates are. |
API of scylla-manager is not protected with any key, we support HTTP and HTTPS though. What if scylla-manager will expose the endpoint to compare the object hashes only on HTTPS server and only if server requires and verifies client certificate ? So, that part would need to be enabled:
We must have fully secured communication channel if you need to make this kind of comparison. Object hash will contain the sensitive information still. |
The only thing we're requesting here is enriching the cluster/task structs with user-defined key-value pairs, allowing the users to annotate the objects - akin to https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/. Whatever the API clients do with it would be up to the clients, so I don't really see why that would be required? |
Because, the password (even hashed) is the sensitive information that we don't want to share in non-encrypted mode. |
No one is asking you to - the implementation of checksums is just a use case of the metadata annotations, so it's on the client to not leak anything there. As a user, I might as well leak it with the current implementation - I could just put the password as the cluster name. Again, this feature request is not asking the manager to provide a mechanism hashing the manager objects - it's just to provide a tool for attaching and retrieving additional metadata from the objects (clusters, tasks). I suppose the original issue here could be misleading. |
From the issue description:
Got it... It sound like setting version of an object.
Does it mean that it will actually be the operator that counts the hash (or checksum) of an object and saves it among with the cluster ? What if Scylla Manager versions the cluster object ? And returns current version for every PUT / GET operation on cluster ? |
Yes.
I suppose you're worried about dictionary attacks, which is a valid concern. The metadata annotations could always be used to store a salt together with the hash though.
That could also work, although a bit worse for us - hitting conflicts or reconciling older object versions on our side would again cause unnecessary updates. Hard to say which solution would be more versatile. @zimnx wdyt? |
We just want to store metadata of object, we gave legitimate use case but overall what we do with it shouldn't be SM concern. API to get version of object is not what we want, as we build expected state from current state. We construct every task and cluster field from various APIs and then combine them into expected state. Version of an object is just a side effect of an update, it doesn't carry any meaningful semantic, hence cannot be compared with expected state, as no one defines "i want backup task in version 2137 which is cron xxx location yyy", but "i want backup task with cron xxx, location yyy". |
OK, I'll set the milestone for next patch/minor release. The goal for manager is to just extend the cluster object in DB with label text column. |
And tasks imo. |
grooming notes How to add label to cluster / task.
If the same label will be listed in both Labels should be listed in |
The requirements listed above focus on the user<->sctool interaction:
So it's also worth to define what happens in SM API.
SM does not really support partial cluster/task update - it relies on receiving the full definition of But this makes it impossible to delete So it looks like SM API is kind of confusing :/
From my POV, the easiest and most consistent (with the current state of SM API) approach would be the first one, but does it fit the operator use case? |
Option number 1 is fine, we can always provide entire object we want to store.
I don't think we should implement concurrency support as part of this task. I think better approach than compare and swap would be optimistic concurrency, but it's a bigger feature.
It could be simplified by assuming that deletion is an update, but with special format. I like what
IMO this should be rejected as it's an obvious mistake. |
Scylla-manager omits fields such as Username, Password, AuthToken in response to /clusters request.
For security reasons this is desired, but when it comes to Scylla-operator, clusters cannot be compared and updated when the only field that changes is e.g. password.
That's how the cluster model looks like right now:
https://github.com/scylladb/scylla-manager/blob/master/swagger/gen/scylla-manager/models/cluster.go
I would suggest adding a hash field for client to set, so that it will be possible to distinguish different clusters without any security risks.
The text was updated successfully, but these errors were encountered: