All services with a selector for the mgr daemon should be updated if there are multiple mgr daemons and the active mgr changes #7988

psavva · 2021-05-25T13:11:46Z

Is this a bug report or feature request?

Bug Report

Deviation from expected behavior:
I have setup my rook-ceph cluster, and have enabled the dashboard in the CephCluster CRD.

[root@DGCVM01 ceph]# kubectl describe cephcluster -n rook-ceph
Name:         rook-ceph
Namespace:    rook-ceph
Labels:       <none>
Annotations:  <none>
API Version:  ceph.rook.io/v1
Kind:         CephCluster
Metadata:
  Creation Timestamp:  2021-05-24T08:59:07Z
  Finalizers:
    cephcluster.ceph.rook.io
  Generation:        3
  Resource Version:  628081
  UID:               e0d6dbe4-9e30-4d4f-8083-030a6b69cc5c
Spec:
  Ceph Version:
    Allow Unsupported:  false
    Image:              ceph/ceph:v15.2.11
  Cleanup Policy:
    Allow Uninstall With Volumes:  false
    Confirmation:
    Sanitize Disks:
      Data Source:                                    zero
      Iteration:                                      1
      Method:                                         quick
  Continue Upgrade After Checks Even If Not Healthy:  false
  Crash Collector:
    Disable:  false
  Dashboard:
    Enabled:           true
    Ssl:               true
  Data Dir Host Path:  /var/lib/rook
  Disruption Management:
    Machine Disruption Budget Namespace:  openshift-machine-api
    Manage Machine Disruption Budgets:    false
    Manage Pod Budgets:                   true
    Osd Maintenance Timeout:              30
    Pg Health Check Timeout:              0

I have also installed the External Dashboard nodeport service

[root@DGCVM01 ceph]# cat dashboard-external-https.yaml
apiVersion: v1
kind: Service
metadata:
  name: rook-ceph-mgr-dashboard-external-https
  namespace: rook-ceph # namespace:cluster
  labels:
    app: rook-ceph-mgr
    rook_cluster: rook-ceph # namespace:cluster
spec:
  ports:
    - name: dashboard
      port: 8443
      protocol: TCP
      targetPort: 8443
  selector:
    app: rook-ceph-mgr
    rook_cluster: rook-ceph
  sessionAffinity: None
  type: NodePort

When visiting the ceph dashboard, i'm redirected to a wrong URL.
You will notice that I'm accessing my internal IP and nodeport, and I'm redirected to the url: rook-ceph-mgr-a-84c875bd95-svhnd This is the bug....
See here:

Expected behavior:
The Ceph Dashboard should appear OK.

Environment:

OS (e.g. from /etc/os-release):

[root@DGCVM01 ceph]# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Kernel (e.g. uname -a):
Linux DGCVM01 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux * Cloud provider or hardware configuration:
Rook version (use rook version inside of a Rook Pod):

[root@rook-ceph-operator-65965c66b5-mznfn /]# rook version
rook: v1.6.3
go: go1.16.3

Storage backend version (e.g. for ceph do ceph -v):

[root@rook-ceph-operator-65965c66b5-mznfn /]# ceph -v
ceph version 16.2.2 (e8f22dde28889481f4dda2beb8a07788204821d3) pacific (stable)

Kubernetes version (use kubectl version):

[root@DGCVM01 ceph]# kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:12:29Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): BareMetal
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):

[root@rook-ceph-tools-fc5f9586c-r68cl /]# ceph status
  cluster:
    id:     9ce72e85-8040-43a1-b19e-529ce34b32fb
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,c (age 27h)
    mgr: a(active, since 27h), standbys: b
    osd: 3 osds: 3 up (since 27h), 3 in (since 27h)

  data:
    pools:   2 pools, 33 pgs
    objects: 333 objects, 948 MiB
    usage:   5.8 GiB used, 6.0 TiB / 6 TiB avail
    pgs:     33 active+clean

The text was updated successfully, but these errors were encountered:

parth-gr · 2021-05-25T15:09:26Z

@psavva which node IP address you are using for calling the dashboard service? You should use the IP address of the Node on which the MGR pod is created.
PS: You'll be able to contact the NodePort service, from outside the cluster, by requesting <NodeIP>:<NodePort>.

psavva · 2021-05-25T16:59:08Z

I'm certainly using the correct IP address, you can see I've hit the service and the response is a redirect.

I've made sure by Access via port forwarding and node IP, both methods fail.

I'm able to access my other dashboards fine. This is a new bug.

I have simulated it on 2 different clusters.

parth-gr · 2021-05-26T07:34:17Z

Are the other clusters are using the same Rook and Ceph version? Or Is it like you are able to access your first dashboards for the Ceph object store and for the second Ceph object store you are not able to access it, If that the problem then IMPORTANT: Please note the dashboard will only be enabled for the first Ceph object store created by Rook.

And if their version mismatch you can try to upgrade the Ceph version to the v15.2.12 as there are recent fixes on it, or it will be by default added into Rook v1.6.4.

travisn · 2021-05-26T18:29:58Z

@psavva If you're getting a redirect, the response is coming from the standby mgr. The active mgr would respond properly, but the standby mgr only responds with redirects. When two mgrs are deployed, Rook periodically updates the dashboard service to direct traffic to the active mgr. If you're defining your own dashboard service based on a node port, you would also need to update it to only direct traffic to the active mgr.

psavva · 2021-05-26T19:08:32Z

@psavva If you're getting a redirect, the response is coming from the standby mgr. The active mgr would respond properly, but the standby mgr only responds with redirects. When two mgrs are deployed, Rook periodically updates the dashboard service to direct traffic to the active mgr. If you're defining your own dashboard service based on a node port, you would also need to update it to only direct traffic to the active mgr.

Thank you for this info, I'll update my configuration in the morning, and report back. It however seems that this should be automated somehow... Maybe the use of a new label to indicate the active manager would be a good solution, also would require an update on the current deployment manifest for Kubernetes

travisn · 2021-05-26T19:26:04Z

Rook does automatically update the rook-ceph-mgr-dashboard service when the active mgr changes. Can you pick up on those selectors? Rook doesn't update the labels on the mgrs for which one is active for several reasons including the node may be down where the previously active mgr was running, and that the implementation is from a sidecar on the mgr pods that doesn't have the ability to update its own labels.

psavva · 2021-06-10T11:37:17Z

@travisn I'm trying to figure out which is the active manager.
I cannot find any labels to highlight this?

[root@DGCVM01 ~]# kubectl -n rook-ceph describe pod rook-ceph-mgr-b-697f7f548b-bbqjx
Name:         rook-ceph-mgr-b-697f7f548b-bbqjx
Namespace:    rook-ceph
Priority:     0
Node:         dgcvm05/172.20.50.21
Start Time:   Fri, 28 May 2021 11:03:40 +0300
Labels:       app=rook-ceph-mgr
              ceph_daemon_id=b
              ceph_daemon_type=mgr
              instance=b
              mgr=b
              pod-template-hash=697f7f548b
              rook_cluster=rook-ceph
Annotations:  prometheus.io/port: 9283
              prometheus.io/scrape: true

and

[root@DGCVM01 ~]# kubectl -n rook-ceph describe pod rook-ceph-mgr-a-84c875bd95-t6tb6
Name:         rook-ceph-mgr-a-84c875bd95-t6tb6
Namespace:    rook-ceph
Priority:     0
Node:         dgcvm02/172.20.50.18
Start Time:   Fri, 28 May 2021 11:10:31 +0300
Labels:       app=rook-ceph-mgr
              ceph_daemon_id=a
              ceph_daemon_type=mgr
              instance=a
              mgr=a
              pod-template-hash=84c875bd95
              rook_cluster=rook-ceph
Annotations:  prometheus.io/port: 9283
              prometheus.io/scrape: true

travisn · 2021-06-10T22:00:26Z

@psavva The labels on the mgr pods are not updated when the active mgr changes, but the labels on the rook-ceph-mgr and rook-ceph-mgr-dashboard services are updated when the active mgr changes. What about this: kubectl -n rook-ceph describe svc rook-ceph-mgr

github-actions · 2021-09-09T20:01:54Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

dredwilliams · 2021-09-29T01:50:01Z

FYI -- I was having issues with the dashboard:

lots of http failures in pieces of the dashboard
tmeouts -- redirecting to an internal cluster IP
generally slow responsiveness when it did show something

I am using a loadbalancer service to access the dashboard from a host separage from the k8s cluster, and the comments above about MGRs switching triggered an 'aha!' moment: I had just increased the MGR count from 1 to 2 when it started breaking. Not sure why, but it seems that I was actually talking to BOTH MGRs -- but only one of them was acutally using the service?

I reduced back to 1 MGR and the dashboard started working again.

Rook v1.7.4
Ceph v16.2.6

travisn · 2021-09-29T17:50:45Z

@dredwilliams Were you referencing the rook-ceph-mgr-dashbard service? Or did you have another service defined? When there are two mgrs, only one of them is active and Rook will update the service to automatically point to the active mgr. But if you have defined another service, Rook wouldn't know to update it.

dredwilliams · 2021-09-30T11:25:14Z

I had created the loadbalancer service using "dashboard-loadbalancer.yaml" ... which (looking now) created a new service 'rook-ceph-mgr-dashboard-loadbalancer' ... so that was probably my problem. I guess I expected that if I used a provided capability, it would respond appropriately ...

Thanks!

travisn · 2021-09-30T17:43:00Z

Agreed, Rook should be able to update any service that has a selector for the app: rook-ceph-mgr daemon. We'll take a look at this.

edit: Issue title is updated to reflect the proposal

github-actions · 2021-11-29T20:02:13Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

prazumovsky · 2021-12-09T12:15:59Z

@travisn any updates on this?

travisn · 2021-12-09T23:26:30Z

@travisn any updates on this?

I'm hoping to look at it next week now.

psavva added the bug label May 25, 2021

github-actions bot added the wontfix label Sep 9, 2021

travisn changed the title ~~Ceph-Dashboard does not work~~ All services with a selector for the mgr daemon should be updated if there are multiple mgr daemons and the active mgr changes Sep 30, 2021

travisn removed the wontfix label Sep 30, 2021

travisn added this to To do in v1.8 via automation Sep 30, 2021

github-actions bot added the wontfix label Nov 29, 2021

travisn removed the wontfix label Nov 29, 2021

travisn self-assigned this Dec 9, 2021

travisn mentioned this issue Dec 18, 2021

mgr: Update services when active mgr changes #9467

Merged

10 tasks

travisn moved this from To do to In progress in v1.8 Jan 11, 2022

travisn closed this as completed in #9467 Jan 13, 2022

v1.8 automation moved this from In progress to Done Jan 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All services with a selector for the mgr daemon should be updated if there are multiple mgr daemons and the active mgr changes #7988

All services with a selector for the mgr daemon should be updated if there are multiple mgr daemons and the active mgr changes #7988

psavva commented May 25, 2021 •

edited

parth-gr commented May 25, 2021

psavva commented May 25, 2021

parth-gr commented May 26, 2021

travisn commented May 26, 2021

psavva commented May 26, 2021

travisn commented May 26, 2021

psavva commented Jun 10, 2021 •

edited

travisn commented Jun 10, 2021

github-actions bot commented Sep 9, 2021

dredwilliams commented Sep 29, 2021

travisn commented Sep 29, 2021

dredwilliams commented Sep 30, 2021

travisn commented Sep 30, 2021 •

edited

github-actions bot commented Nov 29, 2021

prazumovsky commented Dec 9, 2021

travisn commented Dec 9, 2021

All services with a selector for the mgr daemon should be updated if there are multiple mgr daemons and the active mgr changes #7988

All services with a selector for the mgr daemon should be updated if there are multiple mgr daemons and the active mgr changes #7988

Comments

psavva commented May 25, 2021 • edited

parth-gr commented May 25, 2021

psavva commented May 25, 2021

parth-gr commented May 26, 2021

travisn commented May 26, 2021

psavva commented May 26, 2021

travisn commented May 26, 2021

psavva commented Jun 10, 2021 • edited

travisn commented Jun 10, 2021

github-actions bot commented Sep 9, 2021

dredwilliams commented Sep 29, 2021

travisn commented Sep 29, 2021

dredwilliams commented Sep 30, 2021

travisn commented Sep 30, 2021 • edited

github-actions bot commented Nov 29, 2021

prazumovsky commented Dec 9, 2021

travisn commented Dec 9, 2021

psavva commented May 25, 2021 •

edited

psavva commented Jun 10, 2021 •

edited

travisn commented Sep 30, 2021 •

edited