Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor reboot after create storageclass #11081

Closed
lincate opened this issue Sep 29, 2022 · 4 comments
Closed

Monitor reboot after create storageclass #11081

lincate opened this issue Sep 29, 2022 · 4 comments

Comments

@lincate
Copy link

lincate commented Sep 29, 2022

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:
After create Storageclass rook-ceph-block, provisioner: rook-ceph.rbd.csi.ceph.com, monitor will rebooting again and again.
Then, the PVC which reference rook-ceph-block status is pending.

Expected behavior:
Storageclass can be created, PVC can be bound.

How to reproduce it (minimal and precise):
kubectl create -f crds.yaml -f commons.yaml -f operator.yaml
kubectl create -f cluster.yaml
after that, ceph-cluster status is HEALTH_OK
then kubectl create -f storageclass.yaml

File(s) to submit:

  • Cluster CR (custom resource), typically called cluster.yaml, if necessary

Logs to submit:
Info: Checking mon quorum and ceph health details
HEALTH_OK

after create storageclass:
one of monitor log is:
debug 2022-09-29T09:10:21.318+0000 7f937291a700 0 log_channel(cluster) log [DBG] : fsmap
debug 2022-09-29T09:10:21.318+0000 7f937291a700 0 log_channel(cluster) log [DBG] : osdmap e25: 3 total, 3 up, 3 in
debug 2022-09-29T09:10:21.319+0000 7f937291a700 0 log_channel(cluster) log [DBG] : mgrmap e30: no daemons active (since 18s)
debug 2022-09-29T09:10:21.320+0000 7f937291a700 0 log_channel(cluster) log [WRN] : Health check update: 1/3 mons down, quorum a,c (MON_DOWN)
debug 2022-09-29T09:10:59.737+0000 7f9371117700 1 mon.a@0(leader) e3 handle_auth_request failed to assign global_id
debug 2022-09-29T09:11:15.942+0000 7f9370916700 1 mon.a@0(leader) e3 handle_auth_request failed to assign global_id
debug 2022-09-29T09:11:25.176+0000 7f9378125700 -1 received signal: Terminated from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
debug 2022-09-29T09:11:25.176+0000 7f9378125700 -1 mon.a@0(leader) e3 *** Got Signal Terminated ***
debug 2022-09-29T09:11:25.176+0000 7f9378125700 1 mon.a@0(leader) e3 shutdown
image

  • Operator's logs, if necessary

  • Crashing pod(s) logs, if necessary

    To get logs, use kubectl -n <namespace> logs <pod name>
    When pasting logs, always surround them with backticks or use the insert code button from the Github UI.
    Read GitHub documentation if you need help.

Cluster Status to submit:

  • Output of krew commands, if necessary

    To get the health of the cluster, use kubectl rook-ceph health
    To get the status of the cluster, use kubectl rook-ceph ceph status
    For more details, see the Rook Krew Plugin

Environment:

  • OS (e.g. from /etc/os-release): Centos 7
  • Kernel (e.g. uname -a): Linux 4.18.0-147.5.1.6.h579.x86_64
  • Cloud provider or hardware configuration: on-prem
  • Rook version (use rook version inside of a Rook Pod): 1.10.1 & 1.10.0
  • Storage backend version (e.g. for ceph do ceph -v): 17.2.3
  • Kubernetes version (use kubectl version): 1.25.0 & 1.24.2
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): bare metal
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):
@lincate lincate added the bug label Sep 29, 2022
@travisn
Copy link
Member

travisn commented Sep 29, 2022

Some questions:

  • This is a default install, or did you modify the cluster.yaml? If you added resource limits on the mons, they may need to be increased
  • The liveness probes may be failing for the mons. You could try disabling the liveness probes to see if it helps.

@shangjin92
Copy link

It maybe similar to #10110

@github-actions
Copy link

github-actions bot commented Dec 9, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions
Copy link

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants