Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NDM looping constantly causing high cpu usage with Error: unreachable state #674

Open
magnetised opened this issue Jul 6, 2022 · 2 comments

Comments

@magnetised
Copy link

magnetised commented Jul 6, 2022

What steps did you take and what happened:
I've just installed openebs as part of k0s on an aws ec2 instance with 2 disks, the host disk and a separate ebs data partition. Everything seems to be working fine but one of the ndm pods is at a constant 20% cpu usage. looking at the logs it seems to be in some loop querying the host/node disks

Looking at another server with the same ndm version but a simpler, single-disk setup, the exact same thing is happening.

What did you expect to happen:
I expected the ndm process to not be constantly using cpu in a constant loop.

The output of the following commands will help us better understand what's going on:
[Pasting long output into a GitHub gist or other pastebin is fine.]

  • kubectl get pods -n openebs
NAME                                           READY   STATUS    RESTARTS      AGE
openebs-localpv-provisioner-6ccc9d6fc9-kcnhs   1/1     Running   9 (19h ago)   20h
openebs-ndm-jpvpw                              1/1     Running   0             26m
openebs-ndm-operator-7bd6898d96-vz54r          1/1     Running   9 (19h ago)   20h

  • kubectl get blockdevices -n openebs -o yaml
apiVersion: v1
items:
- apiVersion: openebs.io/v1alpha1
  kind: BlockDevice
  metadata:
    annotations:
      internal.openebs.io/uuid-scheme: gpt
    creationTimestamp: "2022-07-05T13:22:38Z"
    generation: 20
    labels:
      kubernetes.io/hostname: ip-172-31-18-163.eu-west-1.compute.internal
      ndm.io/blockdevice-type: blockdevice
      ndm.io/managed: "true"
    name: blockdevice-01fd0d0d966998648102985c5f12e22a
    namespace: openebs
    resourceVersion: "64236"
    uid: 9d3e2ec3-57b5-4303-829c-e0cfa51f2f07
  spec:
    capacity:
      logicalSectorSize: 512
      physicalSectorSize: 512
      storage: 137437888000
    details:
      compliance: ""
      deviceType: partition
      driveType: SSD
      firmwareRevision: ""
      hardwareSectorSize: 512
      logicalBlockSize: 512
      model: Amazon Elastic Block Store
      physicalBlockSize: 512
      serial: vol033aa51d4508ed1b0
      vendor: ""
    devlinks:
    - kind: by-id
      links:
      - /dev/disk/by-id/nvme-nvme.1d0f-766f6c3033336161353164343530386564316230-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part1
      - /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol033aa51d4508ed1b0-part1
      - /dev/disk/by-id/wwn-nvme.1d0f-766f6c3033336161353164343530386564316230-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part1
    - kind: by-path
      links:
      - /dev/disk/by-path/pci-0000:00:1f.0-nvme-1-part1
    filesystem:
      fsType: xfs
      mountPoint: /var/openebs
    nodeAttributes:
      nodeName: ip-172-31-18-163.eu-west-1.compute.internal
    partitioned: "No"
    path: /dev/nvme1n1p1
  status:
    claimState: Unclaimed
    state: Inactive
kind: List
metadata:
  resourceVersion: ""
  • kubectl get blockdeviceclaims -n openebs -o yaml
apiVersion: v1
items: []
kind: List
metadata:
  resourceVersion: ""

  • kubectl logs <ndm daemon pod name> -n openebs

just including two loops, it goes on like this permanently.

https://gist.github.com/magnetised/c1f2bef4242b663721d87898f8416d65

  • lsblk from nodes where ndm daemonset is running
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
nvme1n1     259:0    0  128G  0 disk
└─nvme1n1p1 259:4    0  128G  0 part /var/openebs
nvme0n1     259:1    0  128G  0 disk
├─nvme0n1p1 259:2    0    1M  0 part
└─nvme0n1p2 259:3    0  128G  0 part /

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • OpenEBS version

openebs.io/version=3.0.0
node-disk-manager:1.7.0

  • Kubernetes version (use kubectl version):
Client Version: v1.24.2
Kustomize Version: v4.5.4
Server Version: v1.23.6+k0s
  • Kubernetes installer & version:

K0s version v1.23.6+k0s.0

  • Cloud provider or hardware configuration:

AWS EC2 instance

  • Type of disks connected to the nodes (eg: Virtual Disks, GCE/EBS Volumes, Physical drives etc)

host root partition nvme0n1
open ebs volume nvme1n1 with a single partition nvme1n1p1 mounted at /var/openebs

  • OS (e.g. from /etc/os-release):
NAME="Red Hat Enterprise Linux"
VERSION="8.6 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.6"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.6 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/red_hat_enterprise_linux/8/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.6
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.6"
@artem-zinnatullin
Copy link

Exact same issue with vanilla k0s v1.27.2+k0s.0 installation with open ebs extension enabled (openebs/node-disk-manager:1.9.0), consumes over 60% of CPU on IDLE with no PVs no nothing, this is really bad.

@gervaso
Copy link

gervaso commented Nov 21, 2023

Hi, we had the same issue on-premise and it was caused by the presence of "/dev/sr1" on the vm, so I think you should update the filter to remove unusable devices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants