Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ceph multisite objecstore createSystemUser doesn't always run #10450

Open
mgugino-uipath opened this issue Jun 15, 2022 · 16 comments
Open

Ceph multisite objecstore createSystemUser doesn't always run #10450

mgugino-uipath opened this issue Jun 15, 2022 · 16 comments
Assignees
Labels
Projects

Comments

@mgugino-uipath
Copy link

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:
When creating a multisite objectstore in accordance with the docs, rgw user <realm-name>-system-user is not always created. Interestingly, the k8s secret <realm-name>-keys seems to always be created regardless of whether the rgw user is created. I do see the access key and secret key set on the zone itself, but no corresponding user in the output of radosgw-admin user list

The result of this is the secondary site cannot pull the realm even when the <realm-name-keys> secret is copied over properly to a second k8s cluster.

This does not always happen, I suspect it's a problem with reconcile re-entrance. Unfortunately it's not trivial for me to enable debug logging and reproduce.

Expected behavior:
This user should always get created when using multisite settings.

How to reproduce it (minimal and precise):
Follow docs to create a multisite objecstore (create a realm, zg, zone, and objectstore which references zone).
Connect to toolbox, run radosgw-admin user list and inspect for proper user creation.

File(s) to submit:
Code path in question:

if zoneIsMaster && zoneGroupIsMaster {

Environment:

  • OS (e.g. from /etc/os-release): RHEL 8.2 EUS
  • Kernel (e.g. uname -a): Linux server0 4.18.0-193.75.1.el8_2.x86_64
  • Cloud provider or hardware configuration: Azure
  • Rook version (use rook version inside of a Rook Pod): v1.8.9
  • Storage backend version (e.g. for ceph do ceph -v): 16.2.7
  • Kubernetes version (use kubectl version): Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.4+rke2r2"
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): RKE2
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_OK
@thotz
Copy link
Contributor

thotz commented Jun 16, 2022

while listing have u used --rgw-realm=<realm-name> to the radosgw-admin user list command

The system user for a realm will be created after creating the first object store of master zone/zonegroup. And rook always sets initially created zone and zonegroups as master.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions
Copy link

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@bumarcell
Copy link

I'm on rook 1.13.5
Just like @mgugino-uipath I created all objects according to docu, including the object store
The system user still wasn't created..

@degorenko
Copy link
Contributor

Hi, i'm on rook 1.13.5 as well. But faced a bit different situation (after switching from single-to-multi https://rook.io/docs/rook/v1.13/Storage-Configuration/Object-Storage-RGW/ceph-object-multisite/#configure-an-existing-object-store-for-multisite) , system-user is created, but not added to master zone "system_key", as a result master zone can't get sync status info. From secondary zone side - no issues.

@bumarcell
Copy link

bumarcell commented Mar 19, 2024

@degorenko
Did creating the realm lead to automatic creation of a secret <realm-name>-keys containing secret- and access-keys?

@degorenko
Copy link
Contributor

yes, i have realm created as well as a secret

@bumarcell
Copy link

bumarcell commented Mar 20, 2024

I can confirm the system user isn't automatically being created on Rook 1.13.5 😞

Not sure if this plays a role, but this is our second multisite realm 💡

@subhamkrai
Copy link
Contributor

@thotz could look at this?

@subhamkrai subhamkrai reopened this Mar 21, 2024
@github-actions github-actions bot removed the wontfix label Mar 21, 2024
@thotz
Copy link
Contributor

thotz commented Mar 22, 2024

I can confirm the system user isn't automatically being created on Rook 1.13.5 😞

Not sure if this plays a role, but this is our second multisite realm 💡

Thats possible. If yes it might be bug in current code. Can u please check rook-operator logs for errors specifically related to system-user.

@bumarcell
Copy link

As far as I saw there was no logs regarding a (system) user.

@lgyurci
Copy link

lgyurci commented Mar 25, 2024

I'm facing the same issue. Not only with the second multisite realm, but also with deleting and re-creating the same one. The system user creation simply does not happen, even after cleaning up the whole thing in ceph's brain (so manually deleting the zone, zonegroup, realm, user, etc). Moreover, deleting and re-creating the realm (without manually removing the created system user) regenerates the REALM_NAME-keys secret, but the system user is not updated with these credentials in any way.

Also, could someone please enlighten me on how to create this system user manually? I'm really not familiar with the necessary ceph commands

@bumarcell
Copy link

Command for creating the system user you mean?
radosgw-admin user create --rgw-realm=STH --uid=STH --display-name="STH" --system --access-key="STH" --secret-key="STH" --rgw-zone=STH

@enrico2828
Copy link

I am on Rook 1.14 experiencing the same problem.
On the first site, there is a secret, but no user:

radosgw-admin user list --rgw-realm=realm-a
2024-04-16T20:13:04.900+0000 7fbf9c578a80  0 period (f695fadc-6192-40ad-9d76-3032db4a31a4 does not have zone 69dc8839-4d58-449c-bec8-cc0ff0ea9e3d configured
[
    "dashboard-admin",
    "cosi",
    "rgw-admin-ops-user"
]

On the second site that I'd like to join, error is thrown:
ceph-object-realm-controller: failed to reconcile CephObjectRealm "rook-ceph/realm-a". realm pull failed for reason: request failed: (13) Permission denied If the realm has been changed on the master zone, the master zone's gateway may need to be restarted to recognize this user.. : exit status 13
I am testing in local environment and have not seen it working a single time automatically, need to manually intervene.
I can easily reproduce so I can supply any additional info for solving this, let me know.

@enrico2828
Copy link

enrico2828 commented Apr 16, 2024

I enabled debug logging and following the code from

func configureObjectStore(objContext *Context, store *cephv1.CephObjectStore, zone *cephv1.CephObjectZone) error {

In my log I only see those lines:

2024-04-16 20:46:01.226971 I | ceph-object-controller: configuring object store "objectstore"
2024-04-16 20:46:01.226975 D | ceph-object-controller: setting multisite configuration for object-store objectstore
2024-04-16 20:46:01.226978 I | ceph-object-controller: configuration for object-store objectstore is complete

There is no error and it also doesn't seem to come by functions "JoinMultisite" and "CreateSystemUser" as I don't see any log / debug statements from those functions. I also find it strange that CreateSystemUser is called from JoinMultisite, shouldn't the system user be created for the first master multi site? Well, didn't have time to dig too deep in the code and I am also no programmer, but I think the error is somewhere in that area.

@travisn travisn added this to To do in v1.14 via automation Apr 16, 2024
@enrico2828
Copy link

By the way, adding the system user manually to the first site as @bumarcell suggested fixes the problem and the multi site is established.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
v1.14
To do
Development

No branches or pull requests

8 participants