-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manager restore error (kms_error AccessDeniedException) for multiDC cluster with EAR enabled #3871
Comments
@fruch |
So, as a first step check the |
Hey @vponomaryov, Thanks for your reply.
|
No. Scylla gets configured with the KMS keys by their aliases. It doesn't check keys equality.
Again, need to understand the steps done in the test and why it is so? I don't see proofs that the mgmt test approach is correct. |
I got your point. Okay, I need to dig deeper into the test itself because what I did is only changed the version of Scylla (from 2022 to 2024) for the
Agree. |
@vponomaryov I'm wondering does the current implementation allow to disable Scylla Encryption? From what I see in the code the encryption will be enabled by default for:
And I don't see any ways to correctly disable it except providing any non-equal to |
Just update the config the following way:
|
Got it, thanks |
Just to emphasize, the scylla causing the issue is the one installed on the monitor node for manager We never tested it with KMS enabled, since the SCT code is written to enable KMS by default in the supported versions, it got enabled. There is no reason it shouldn't be working, I'm guessing it's just a configuration error picking the wrong region in scylla.yaml configuration, and should be fixed. |
Hm, I thought that the problem is in cluster nodes (in one of the region) because while going through logs I've seen this error for @fruch How to understand that the problem here is related to monitor node? |
I take it back, I was confused cause of the name of node had manger in it Yes it's the DB nodes, and yes the expectation is that the manager is putting the stables back in the same nodes or at least the same region I don't know what the cloud did by default, but if it's not multi-region keys, restore for multi region setup would be broken We can't disable KMS before understanding the situation |
@karol-kokoszka Could you please elaborate on this? |
To make it working the way that SSTables are sent to the same region (DC), you must specify the DC when adding location to the restore task https://manager.docs.scylladb.com/stable/sctool/restore.html#l-location "The format is [:]:. The parameter is optional. It allows you to specify the datacenter whose nodes will be used to restore the data from this location in a multi-dc setting, it must match Scylla nodes datacenter. By default, all live nodes are used to restore data from specified locations." If the DC is not specified in the location, then it may be sent to any node. I guess you must restore multiDC cluster, DC by DC when the encryption at rest is enabled. |
It means that backup should be also done with --location option specified with every cluster's DC, right? |
I don't think it's necessary. |
In such case I suppose, I need to know which DC to use during restoring and specifically which key was used to encrypt the SSTables during backup, right? Does sctool provides such opportunity? |
SCTool does not concern itself with encryption at rest and is not aware of the keys used to encrypt SSTables. Therefore, it is unnecessary for you to know which keys were used during the backup process. It is the responsibility of the Scylla server to manage decryption of the data. When SM (presumably Scylla Manager) employs the load & stream feature for restoration, it calls the Scylla server and passes the SSTable. Subsequently, Scylla is tasked with identifying the appropriate node to which the SSTable should be streamed. I presume that Scylla must first decrypt the SSTable in order to determine the correct destination for streaming. In the scenario you described with this issue, there is a possibility that an SSTable encrypted with a key stored in a different region was sent to a node lacking access to the Key Management Service (KMS) in that region. To mitigate this issue, it is advisable to restore data center (DC) by data center (DC), ensuring that SSTables encrypted with a specific key (e.g., key A) are decrypted with the corresponding key A. |
Thanks a lot for detailed explanation, I'll experiment |
@karol-kokoszka could you please take a look? I made an attempt to restore specifying two locations - one for every DC.
|
@mikliapko this is somewhat of SM limitation/bug - that you can't specify given location with many DCs and other location with other DC. How many DCs do you have in the restore destination cluster? If only mentioned 2, then you can run restore with a single location without DC specified (it will use all nodes with location access for restoring the data). |
@Michal-Leszczynski the goal is to restore DC by DC. And to send node data from DC A to the nodes from DC A. @mikliapko Please just use separate restore tasks, one per DC. |
But something like this is not supported by SM right now. When location is specified, nodes with access (or nodes from specified DC with access) to it restore the whole backup data from this location. So one would need truly separate backup locations for this purpose. |
If so, then this is a bug, that we must address in some of the upcoming releases. Backup location structure, explicitly defines the tree path to exact DC. Restore must take advantage of it. We may have problems with multiDC EaR without it. |
Issue description
Manager restore operation returns
kms_error AccessDeniedException
for multiDC cluster with EAR enabled.Full error message:
Found exception: kms_error (AccessDeniedException: User: arn:aws:iam::797456418907:role/qa-scylla-manager-backup-role is not authorized to perform: kms:Decrypt on the resource associated with this ciphertext because the resource does not exist in this Region, no resource-based policies allow access, or a resource-based policy explicitly denies access)
The issue has been observed just recently after we tried to switch Managet SCT tests to run against Scylla 2024.1 instead of 2022.
Impact
The restore operation returns this error multiple times during one run but the whole process finishes successfully. The reason of that I suppose in kms availability of some nodes of the cluster.
How frequently does it reproduce?
Every restore operation performed in such configuration.
Installation details
SCT Version: 31ff1e87d830ce7fe2587e0c609d113d2f66f8a4
Scylla version (or git commit hash): 2024.1.3-20240401.64115ae91a55
Logs
The text was updated successfully, but these errors were encountered: