-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
daemon_id used for RGW pods is not unique in case of multiple instances #13674
Comments
We used to manage the RGWs like we do other daemons, where each has a different deployment. I think the code for handling this might still be in Rook, but I'm unsure. However, this is not the most desired design. We would like to make the fullest use of k8s primitives as possible, and in this case, continuing to be able to set a number of replicas for an RGW deployment would be ideal. Following on this idea, it would be best if RGWs continued to not need individual IDs. If it does become necessary, perhaps something we could do is see whether each RGW can take its ID using the Pod downward API if necessary. Or it could be better to use a StatefulSet for RGWs to do a similar thing. |
The reason why RGWs don't have unique IDs are for a few reasons. In our experience, RGWs generally don't need to have unique IDs.
Technically speaking, in Kubernetes, the reason comes down to using a Deployment with Replicas >1 to run RGWs. This is a much more simplified deployment model than tracking individual deployments for RGWs. If Ceph weren't so picky about the IDs of things like mons and OSDs, Rook would prefer to use the same model for deploying those as well. I want to understand more about why you are flagging this as an issue? Is this blocking some feature? Or is it just that it's confusing that things are different for RGWs? Another deployment model that offers more ID management is StatefulSets. These work like Deployments, but each replica has a unique ID added. This could be used for RGWs if we do find that we need unique IDs for each. However, I would still suggest that we try to look into making changes to Ceph such that it doesn't care about RGW individual identity as much, if possible. It was a win for the Rook project when we could use a single Deployment for all RGWs to simplify the management. |
We discussed in huddle using the random pod ID as each RGWs individual ID. Redo says cephadm uses a random ID for each RGW (though I'm not clear exactly as to when the ID changes with cephadm). I think this suggests that Rook can safely use the randomized pod ID to achieve a similar thing without much risk. I think the solution will be pretty easy and involve 2 small changes:
|
@BlaineEXE thanks for the detailed suggestion. I can give it a try and if it works or not 👍 |
I gave it a try but unfortunately using the POD_ID doesn't seem to be easy. The problem is that the keyring name must be the same as the https://github.com/rook/rook/blob/master/pkg/operator/ceph/object/rgw.go#L137-L142 At this point we don't have the pod_id yet, so we can use it to generate the keyring. As an alternative, I tried to add some random suffix to the pod name (and consequently to the keyring). This seems to work but the operator starts creating more and more rgw instances (because of the random part I guess). You can find the changes are on the PR: I think this happens because of the random suffix which changes the secret (used by the controller to watch for changes). But I'm not sure as I don't understand how the rgw reconcile should work in this case. |
The only way we can continue to use a single RGW deployment, is if we can use a single keyring even with different pod names at runtime. If the operator creates separate keyrings for each RGW instance, this means we have to create separate deployments, which means we have to revert back to the old behavior and we miss out on benefits like the pod autoscaler. To avoid that regression, we would really need a way for the metrics to have some unique id without a separate keyring. |
We discussed in huddle, but I wanted to add a couple notes here. It may be possible to get RGWs to share a keyring while still issuing each one a unique ID using the Pod ID. Looking through the cephx doc (link below), I notice that the mons (which share a keyring) seem to use the principal https://docs.ceph.com/en/latest/rados/configuration/auth-config-ref/ I've been looking around for ceph examples, and they are hard to find. This doc (below) uses https://docs.ceph.com/en/pacific/rados/configuration/ceph-conf/#override-values This doc may also have helpful info, but I can't say for sure: https://docs.ceph.com/en/latest/rados/operations/user-management/ After looking deeply, I am becoming concerned that the |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions. |
With newer versions of Prometheus and/or kube-prometheus-stack, this causes issues. When two RGW pods are scheduled on the same node we now see the following in our Prometheus logs:
This also leads to |
Is this a bug report or feature request?
Deviation from expected behavior:
Just to give some background related to this issue. With the introduction of
ceph-exporter
some of the performance metrics are generated by this new component (instead of the prometheus mgr module). The problem is that this component doesn't have some metadata related to the metrics such as:hostname, daemon, etc
which is available only on the mgr prometheus module. To fix this problem,ceph-exporter
uses a unique labelinstance_id
as key to join the metrics generated byceph-exporter
with the metadata coming from the prometheus modulergw_metadata
.In practice
ceph-exporter
only has access to the rgw daemon socket filename which it is used in this case to extract theinstance_id
. In cephadm deployments for example this value is set tohrgsea
fromdaemon_name=rgw.foo.ceph-node-00.hrgsea.2.94739968030880
which is a unique value across all the rgw daemons. However, in a Rook deployment this is not the case. Following are the socket names generated for an object store with the namemy-store
and two instances:In this case
ceph-exporter
will ends up settinginstance_id
toa
which is not unique.On the rook side,
daemon_id
label generated for each rgw instance is not unique either. For example, using the following spec gives place to two instances withdaemon_id
set tomy-store
This daemon id is used to generate later the daemon socket name. So we need it to be unique otherwise
ceph-exporter
cannot extract this information from the filename.Expected behavior:
Rook should append the corresponding letter (a,b, c... ) to the
daemon_id
to make it uniqueHow to reproduce it (minimal and precise):
Use the provided spec to create RGW instances and observe the
daemon_id
for each one of them.File(s) to submit:
cluster.yaml
, if necessaryLogs to submit:
Operator's logs, if necessary
Crashing pod(s) logs, if necessary
To get logs, use
kubectl -n <namespace> logs <pod name>
When pasting logs, always surround them with backticks or use the
insert code
button from the Github UI.Read GitHub documentation if you need help.
Cluster Status to submit:
Output of kubectl commands, if necessary
To get the health of the cluster, use
kubectl rook-ceph health
To get the status of the cluster, use
kubectl rook-ceph ceph status
For more details, see the Rook kubectl Plugin
Environment:
uname -a
):rook version
inside of a Rook Pod):ceph -v
):kubectl version
):ceph health
in the Rook Ceph toolbox):The text was updated successfully, but these errors were encountered: