Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph: create new keyring for osd #8155

Merged
merged 1 commit into from
Jul 19, 2021
Merged

ceph: create new keyring for osd #8155

merged 1 commit into from
Jul 19, 2021

Conversation

subhamkrai
Copy link
Contributor

@subhamkrai subhamkrai commented Jun 21, 2021

when osd pods removed, they are not able
to add back due to missing ceph auth
or different ceph auth.

Signed-off-by: subhamkrai srai@redhat.com

Description of your changes:
this commit mounts the secret rook-ceph-admin-keyring
inside the osd activate initcontainer which has
admin keyring.

Which issue is resolved by this Pull Request:
Resolves #4238

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Skip Tests for Docs: Add the flag for skipping the build if this is only a documentation change. See here for the flag.
  • Skip Unrelated Tests: Add a flag to run tests for a specific storage provider. See test options.
  • Reviewed the developer guide on Submitting a Pull Request
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.
  • Pending release notes updated with breaking and/or notable changes, if necessary.
  • Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
  • Code generation (make codegen) has been run to update object specifications, if necessary.

@mergify mergify bot added the ceph main ceph tag label Jun 21, 2021
@subhamkrai subhamkrai mentioned this pull request Jun 21, 2021
10 tasks
@subhamkrai
Copy link
Contributor Author

I think there is a bit more work to do at the end:

Option 1: put the bootstrap-osd key in a secret and mount it as VolumeSource
Option 2: use the admin key which already available in a Secret

I prefer option 1 since option 2 is not secure. Option 1 is just a bit more work for you but a good exercise to discover more of the Rook's codebase too!

going with option 1.

Discussed here

@subhamkrai
Copy link
Contributor Author

+ ceph -n client.bootstrap-osd auth get-or-create osd.0 mon 'profile osd' mgr 'profile osd' osd 'allow *' -k /var/lib/ceph/bootstrap-osd/keyring
2021-06-21T11:26:35.874+0000 7f9a83339700 -1 auth: error parsing file /var/lib/ceph/bootstrap-osd/keyring: cannot parse buffer: Malformed input
2021-06-21T11:26:35.874+0000 7f9a83339700 -1 auth: failed to load /var/lib/ceph/bootstrap-osd/keyring: (5) Input/output error
2021-06-21T11:26:35.874+0000 7f9a820d7700 -1 auth: error parsing file /var/lib/ceph/bootstrap-osd/keyring: cannot parse buffer: Malformed input
2021-06-21T11:26:35.874+0000 7f9a820d7700 -1 auth: failed to load /var/lib/ceph/bootstrap-osd/keyring: (5) Input/output error
2021-06-21T11:26:35.875+0000 7f9a820d7700 -1 auth: error parsing file /var/lib/ceph/bootstrap-osd/keyring: cannot parse buffer: Malformed input
2021-06-21T11:26:35.875+0000 7f9a820d7700 -1 auth: failed to load /var/lib/ceph/bootstrap-osd/keyring: (5) Input/output error
2021-06-21T11:26:35.875+0000 7f9a820d7700 -1 auth: error parsing file /var/lib/ceph/bootstrap-osd/keyring: cannot parse buffer: Malformed input
2021-06-21T11:26:35.875+0000 7f9a820d7700 -1 auth: failed to load /var/lib/ceph/bootstrap-osd/keyring: (5) Input/output error
2021-06-21T11:26:35.875+0000 7f9a820d7700 -1 monclient: keyring not found
[errno 5] RADOS I/O error (error connecting to the cluster)

getting this in activate init container...looking to this

pkg/operator/ceph/cluster/osd/volumes.go Outdated Show resolved Hide resolved
pkg/operator/ceph/cluster/osd/volumes.go Outdated Show resolved Hide resolved
@leseb
Copy link
Member

leseb commented Jun 22, 2021

+ ceph -n client.bootstrap-osd auth get-or-create osd.0 mon 'profile osd' mgr 'profile osd' osd 'allow *' -k /var/lib/ceph/bootstrap-osd/keyring
2021-06-21T11:26:35.874+0000 7f9a83339700 -1 auth: error parsing file /var/lib/ceph/bootstrap-osd/keyring: cannot parse buffer: Malformed input
2021-06-21T11:26:35.874+0000 7f9a83339700 -1 auth: failed to load /var/lib/ceph/bootstrap-osd/keyring: (5) Input/output error
2021-06-21T11:26:35.874+0000 7f9a820d7700 -1 auth: error parsing file /var/lib/ceph/bootstrap-osd/keyring: cannot parse buffer: Malformed input
2021-06-21T11:26:35.874+0000 7f9a820d7700 -1 auth: failed to load /var/lib/ceph/bootstrap-osd/keyring: (5) Input/output error
2021-06-21T11:26:35.875+0000 7f9a820d7700 -1 auth: error parsing file /var/lib/ceph/bootstrap-osd/keyring: cannot parse buffer: Malformed input
2021-06-21T11:26:35.875+0000 7f9a820d7700 -1 auth: failed to load /var/lib/ceph/bootstrap-osd/keyring: (5) Input/output error
2021-06-21T11:26:35.875+0000 7f9a820d7700 -1 auth: error parsing file /var/lib/ceph/bootstrap-osd/keyring: cannot parse buffer: Malformed input
2021-06-21T11:26:35.875+0000 7f9a820d7700 -1 auth: failed to load /var/lib/ceph/bootstrap-osd/keyring: (5) Input/output error
2021-06-21T11:26:35.875+0000 7f9a820d7700 -1 monclient: keyring not found
[errno 5] RADOS I/O error (error connecting to the cluster)

getting this in activate init container...looking to this

Please paste the content of /var/lib/ceph/bootstrap-osd/keyring

@subhamkrai
Copy link
Contributor Author

Please paste the content of /var/lib/ceph/bootstrap-osd/keyring
@leseb

+ cat /var/lib/ceph/bootstrap-osd/keyring
+ ceph -n client.bootstrap-osd auth get-or-create osd.0 mon 'profile osd' mgr 'profile osd' osd 'allow *' -k /var/lib/ceph/bootstrap-osd/keyring
2021-06-23T10:11:19.837+0000 7f793359e700 -1 auth: error parsing file /var/lib/ceph/bootstrap-osd/keyring: cannot parse buffer: Malformed input
2021-06-23T10:11:19.837+0000 7f793359e700 -1 auth: failed to load /var/lib/ceph/bootstrap-osd/keyring: (5) Input/output error
2021-06-23T10:11:19.837+0000 7f7938ff0700 -1 auth: error parsing file /var/lib/ceph/bootstrap-osd/keyring: cannot parse buffer: Malformed input
2021-06-23T10:11:19.837+0000 7f7938ff0700 -1 auth: failed to load /var/lib/ceph/bootstrap-osd/keyring: (5) Input/output error
2021-06-23T10:11:19.837+0000 7f7938ff0700 -1 auth: error parsing file /var/lib/ceph/bootstrap-osd/keyring: cannot parse buffer: Malformed input
2021-06-23T10:11:19.837+0000 7f7938ff0700 -1 auth: failed to load /var/lib/ceph/bootstrap-osd/keyring: (5) Input/output error
2021-06-23T10:11:19.837+0000 7f7938ff0700 -1 auth: error parsing file /var/lib/ceph/bootstrap-osd/keyring: cannot parse buffer: Malformed input
2021-06-23T10:11:19.837+0000 7f7938ff0700 -1 auth: failed to load /var/lib/ceph/bootstrap-osd/keyring: (5) Input/output error
2021-06-23T10:11:19.837+0000 7f7938ff0700 -1 monclient: keyring not found
[errno 5] RADOS I/O error (error connecting to the cluster)
AQCsA9NgWSRLOBAAYTzk65KZ31t+3SoIBe5FqQ==[

looks like it has key only and it has to be in

[client.bootstrap-osd]
	key = %s
	caps mon = "allow profile bootstrap-osd"
`

in this format. right?

@leseb
Copy link
Member

leseb commented Jun 23, 2021

AQCsA9NgWSRLOBAAYTzk65KZ31t+3SoIBe5FqQ==[

Yes you need this format:

[client.bootstrap-osd]
key = xxxxxxxxx

@subhamkrai
Copy link
Contributor Author

subhamkrai commented Jun 24, 2021

AQCsA9NgWSRLOBAAYTzk65KZ31t+3SoIBe5FqQ==[

Yes you need this format:

[client.bootstrap-osd]
key = xxxxxxxxx

getting Error EACCES: access denied

[srai@fedora minikube]$ kc logs -f rook-ceph-osd-0-77c4ffc586-bcjzk -c activate 
+ OSD_ID=0
+ OSD_UUID=08104aa7-4759-4c8d-b076-0f4554c81f64
+ OSD_STORE_FLAG=--bluestore
+ OSD_DATA_DIR=/var/lib/ceph/osd/ceph-0
+ CV_MODE=raw
+ DEVICE=/dev/sdb
+ cat /var/lib/ceph/bootstrap-osd/keyring
+ ceph -n client.bootstrap-osd auth get-or-create osd.0 mon 'allow profile bootstrap-osd' mgr 'profile profile bootstrap-osd' osd 'allow *' -k /var/lib/ceph/bootstrap-osd/keyring

[client.bootstrap-osd]
key = AQB8KdRgNZDjEBAApgkjHtdFEAAHuh6rX05Ppg==
caps mon = "allow profile bootstrap-osd"
Error EACCES: access denied

@subhamkrai subhamkrai force-pushed the osd-keyring branch 2 times, most recently from 4520f41 to 4edbdfa Compare June 24, 2021 06:48
@leseb
Copy link
Member

leseb commented Jun 24, 2021

AQCsA9NgWSRLOBAAYTzk65KZ31t+3SoIBe5FqQ==[

Yes you need this format:

[client.bootstrap-osd]
key = xxxxxxxxx

getting Error EACCES: access denied

[srai@fedora minikube]$ kc logs -f rook-ceph-osd-0-77c4ffc586-bcjzk -c activate 
+ OSD_ID=0
+ OSD_UUID=08104aa7-4759-4c8d-b076-0f4554c81f64
+ OSD_STORE_FLAG=--bluestore
+ OSD_DATA_DIR=/var/lib/ceph/osd/ceph-0
+ CV_MODE=raw
+ DEVICE=/dev/sdb
+ cat /var/lib/ceph/bootstrap-osd/keyring
+ ceph -n client.bootstrap-osd auth get-or-create osd.0 mon 'allow profile bootstrap-osd' mgr 'profile profile bootstrap-osd' osd 'allow *' -k /var/lib/ceph/bootstrap-osd/keyring

[client.bootstrap-osd]
key = AQB8KdRgNZDjEBAApgkjHtdFEAAHuh6rX05Ppg==
caps mon = "allow profile bootstrap-osd"
Error EACCES: access denied

Alright, I was convinced that this key would have enough permissions to create an OSD key but it looks like I was wrong and it can only generate an OSD UUID. It looks like we will have to use the admin key. So please use it instead. Thanks!

@subhamkrai
Copy link
Contributor Author

osds pod are not coming, but in the operator, logs seeing this error

 E | op-osd: failed to get or create auth key for client.admin. failed get-or-create-key client.admin: Error EINVAL: mon capability parse failed, stopped at '*' of '*'
. 
2021-06-25 14:12:48.762454 E | op-osd: failed to create OSD 0 on node "minikube": failed to create deployment for OSD 0 on node "minikube": Deployment.apps "rook-ceph-osd-0" is invalid: [spec.template.spec.volumes[7].name: Required value, spec.template.spec.initContainers[0].volumeMounts[3].name: Required value, spec.template.spec.initContainers[0].volumeMounts[3].name: Not found: "", spec.template.spec.initContainers[0].volumeMounts[3].mountPath: Required value]
2021-06-25 14:12:48.772362 E | ceph-cluster-controller: failed to reconcile. failed to reconcile cluster "my-cluster": failed to configure local ceph cluster: failed to create cluster: failed to start ceph osds: 1 failures encountered while running osds on nodes in namespace "rook-ceph". 

pkg/operator/ceph/cluster/osd/volumes.go Outdated Show resolved Hide resolved
@subhamkrai
Copy link
Contributor Author

Something to consider is that if the user has followed the instructions to purge an osd, the auth would have been intentionally removed in that case. Perhaps the solution in >that case is to only wipe the disk so it isn't discovered and the osd isn't started again, but please do test what happens if those >instructions are followed, but the disk is not wiped. Does the OSD start up successfully again in that case? Or what is the error >state?

looking to above comment from here

@subhamkrai subhamkrai marked this pull request as ready for review July 14, 2021 10:23
@subhamkrai subhamkrai requested a review from travisn July 14, 2021 10:24
@subhamkrai
Copy link
Contributor Author

Something to consider is that if the user has followed the instructions to purge an osd, the auth would have been intentionally removed in that case. Perhaps the solution in >that case is to only wipe the disk so it isn't discovered and the osd isn't started again, but please do test what happens if those >instructions are followed, but the disk is not wiped. Does the OSD start up successfully again in that case? Or what is the error >state?

looking to above comment from here

@travisn while testing steps I did and result

  1. scale down the operator to 0
  2. kubectl -n rook-ceph scale deployment rook-ceph-osd-0 --replicas=0
  3. ceph osd out osd.0
  4. ceph osd down osd.0
  5. ceph osd purge 0 --yes-i-really-mean-it
    6 to confirm osd down
 ceph osd tree
ID  CLASS  WEIGHT  TYPE NAME          STATUS  REWEIGHT  PRI-AFF
-1              0  root default                                
-3              0      host minikube
  1. scaled operator to 1
  2. scaled osd-0 it is inCrashLoopBackOff state(initially it was in error state)
csi-cephfsplugin-cjsfk                          3/3     Running            0          31m
csi-cephfsplugin-provisioner-78d66674d8-bcvcq   6/6     Running            0          31m
csi-rbdplugin-m66xx                             3/3     Running            0          31m
csi-rbdplugin-provisioner-687cf777ff-lvgkq	6/6     Running            0          31m
rook-ceph-mgr-a-674d686f7-vgnnq                 1/1     Running            0          32m
rook-ceph-mon-a-79dc66555d-sttxg                1/1     Running            0          32m
rook-ceph-operator-556589bbcc-2sv5f             1/1     Running            0          8m44s
rook-ceph-osd-0-76bc6f96c7-mtxll                0/1     CrashLoopBackOff   6          7m33s
rook-ceph-osd-prepare-minikube-ctbr7            0/1     Completed          0          8m
rook-ceph-tools-78cdfd976c-hdlgm                1/1     Running            0          17m
  1. checked osd-o logs
ogs -f rook-ceph-osd-0-76bc6f96c7-mtxll 
debug 2021-07-15T04:10:22.395+0000 7fcc874ee700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
failed to fetch mon config (--no-mon-config to skip)

Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok since the OSD was purged, it must not be able to create the keyring again. LGTM

Copy link
Member

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

pkg/operator/ceph/cluster/osd/spec.go Outdated Show resolved Hide resolved
Copy link
Member

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last nits.

pkg/operator/ceph/cluster/osd/spec.go Outdated Show resolved Hide resolved
pkg/operator/ceph/cluster/osd/spec.go Outdated Show resolved Hide resolved
when osd pods removed, they are not able
to add back due to missing `ceph auth`
or different `ceph auth`.

this commit mounts the secret `rook-ceph-admin-keyring`
inside the osd activate initcontainer which has
admin keyring.

Signed-off-by: subhamkrai <srai@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ceph main ceph tag ceph-osd
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When OSD pods removed from hosts they are not able add them back to ceph cluster
3 participants