OSD and MON memory consumption #5811

Antiarchitect · 2020-07-13T08:52:05Z

I have this picture of memory consumption by Rook and ceph:

core-rook                                core-rook-toolbox-64bf577fb6-28jkp                                1m           8Mi             
core-rook                                csi-cephfsplugin-4qn92                                            1m           57Mi            
core-rook                                csi-cephfsplugin-6wwj8                                            1m           73Mi            
core-rook                                csi-cephfsplugin-7bkn6                                            1m           62Mi            
core-rook                                csi-cephfsplugin-mwhdk                                            1m           56Mi            
core-rook                                csi-cephfsplugin-provisioner-8ff65f84d-kflt8                      3m           72Mi            
core-rook                                csi-cephfsplugin-provisioner-8ff65f84d-xsmhs                      5m           77Mi            
core-rook                                csi-cephfsplugin-ztfw9                                            1m           62Mi            
core-rook                                csi-rbdplugin-2zqhz                                               1m           81Mi            
core-rook                                csi-rbdplugin-8kv64                                               1m           86Mi            
core-rook                                csi-rbdplugin-fgbwf                                               2m           79Mi            
core-rook                                csi-rbdplugin-provisioner-594f9bb949-7j8vh                        6m           109Mi           
core-rook                                csi-rbdplugin-provisioner-594f9bb949-wc2cs                        4m           140Mi           
core-rook                                csi-rbdplugin-r9xmf                                               1m           64Mi            
core-rook                                csi-rbdplugin-zjp9x                                               1m           77Mi            
core-rook                                rook-ceph-crashcollector-worker-2-tkkxn   0m           7Mi             
core-rook                                rook-ceph-crashcollector-worker-3-crzn5   0m           7Mi             
core-rook                                rook-ceph-crashcollector-worker-4-2vv22   0m           8Mi             
core-rook                                rook-ceph-crashcollector-worker-5-mcklw   0m           7Mi             
core-rook                                rook-ceph-mds-core-rook-a-5996f955fb-wmwzg                        18m          86Mi            
core-rook                                rook-ceph-mds-core-rook-b-c5dc5d6db-wgg42                         39m          43Mi            
core-rook                                rook-ceph-mgr-a-7f899ddd6f-xp2m9                                  31m          381Mi           
core-rook                                rook-ceph-mon-ba-f7b95fb59-9zxkr                                  47m          1608Mi          
core-rook                                rook-ceph-mon-bd-55448d644b-ddx8h                                 68m          1436Mi          
core-rook                                rook-ceph-mon-be-7b6765557c-tsfbt                                 38m          127Mi           
core-rook                                rook-ceph-operator-78564dd996-jwhv2                               57m          115Mi           
core-rook                                rook-ceph-osd-0-7c9b58679f-hz45f                                  95m          8452Mi          
core-rook                                rook-ceph-osd-10-8647487d5b-s6rvq                                 279m         14526Mi         
core-rook                                rook-ceph-osd-11-bc5cf6579-zkrrh                                  278m         12843Mi         
core-rook                                rook-ceph-osd-12-ffdbb5c67-wlmvm                                  209m         13301Mi         
core-rook                                rook-ceph-osd-14-57995cc8bc-7p28w                                 222m         11337Mi         
core-rook                                rook-ceph-osd-16-55b6b479bc-fcvxv                                 155m         12394Mi         
core-rook                                rook-ceph-osd-4-5f857b5467-xxl66                                  147m         13020Mi         
core-rook                                rook-ceph-osd-5-67dfc5c6d6-gs2nn                                  119m         9645Mi          
core-rook                                rook-ceph-osd-6-7fd969c9fb-fnlkz                                  263m         8997Mi          
core-rook                                rook-ceph-osd-7-9fbbb49cf-x8qw4                                   141m         12402Mi         
core-rook                                rook-ceph-osd-8-594496d97b-2trhp                                  247m         12369Mi         
core-rook                                rook-ceph-osd-9-6bfcd45f9c-kxr8j                                  87m          13761Mi         
core-rook                                rook-discover-2kn6p                                               1m           39Mi            
core-rook                                rook-discover-4lpph                                               968m         56Mi            
core-rook                                rook-discover-bmdxz                                               1m           43Mi            
core-rook                                rook-discover-ppnkm                                               1m           63Mi            
core-rook                                rook-discover-sprm8                                               1m           33Mi

I have several OSD per node, Total Raw Capacity 5.2 TiB, one node was offline a week or so so last rebalancing was very slow and lasts for several hours. Is memory consumption by OSDs so high is normal? I have only 64GB of RAM on each node and seems like more than a half is consumed by Ceph OSDs.

Environment:

OS (e.g. from /etc/os-release): CentOS 7
Kernel (e.g. uname -a): Linux worker-5.prod.lwams1.enapter.ninja 5.7.1-1.el7.elrepo.x86_64
Cloud provider or hardware configuration: Bare Metal
Rook version (use rook version inside of a Rook Pod):

rook version
rook: v1.3.7
go: go1.13.8

Storage backend version (e.g. for ceph do ceph -v):

Rook Toolbox shows: ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)
Actually: 15.2.4 (Ceph Dashboard all nodes)

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.5", GitCommit:"e0fccafd69541e3750d460ba0f9743b90336f24f", GitTreeState:"clean", BuildDate:"2020-04-16T11:44:03Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.5", GitCommit:"e0fccafd69541e3750d460ba0f9743b90336f24f", GitTreeState:"clean", BuildDate:"2020-04-16T11:35:47Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): Bare Metal + Puppet + Kubeadm
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):

HEALTH_WARN Degraded data redundancy: 116638/924012 objects degraded (12.623%), 214 pgs degraded, 214 pgs undersized; 2 daemons have recently crashed

One node is out for maintenance

The text was updated successfully, but these errors were encountered:

OpsPita · 2020-07-13T09:10:07Z

If its new ceph cluster then you did something wrong .
I had this when i started with rook .
There are 2 places to change the version .
The operator , and cluster.yaml

Where is the cluster created ? Cloud , on-perm?

Antiarchitect · 2020-07-13T09:24:53Z

@OpsPita Cluster is Bare Metal and was created more than a year ago

Antiarchitect · 2020-07-15T21:59:48Z

rook-ceph-osd-10-8647487d5b-xz76v                                 865m         36869Mi

That is something enormous

travisn · 2020-07-15T22:04:00Z

@Antiarchitect It's recommended to add the resource limits to the OSDs. See the Cluster CR doc on resource limits. There is also an example in cluster.yaml.

Antiarchitect · 2020-07-15T22:29:10Z

@travisn Thank you for the tip - any recommended values?

travisn · 2020-07-15T22:30:48Z

@Antiarchitect In the Cluster CR doc it mentions the minimum memory limit is 2G, but a better default is 4G.

Antiarchitect · 2020-07-15T22:44:47Z

@travisn Situation normalized, but 2 of 15 OSDs are encountering this:

  Warning  Unhealthy  4m43s (x2 over 13m)  kubelet, worker-3.xxx.xxx  Liveness probe failed: no valid command found; 10 closest matches:
0
1
2
abort
assert
bluestore allocator dump block
bluestore allocator dump bluefs-db
bluestore allocator fragmentation block
bluestore allocator fragmentation bluefs-db
bluestore allocator score block
admin_socket: invalid command

Original issue: #5814 - will reopen

travisn · 2020-07-15T22:51:28Z

@leseb Thoughts on what would cause the admin_socket to be invalid in the liveness probe?

Antiarchitect · 2020-07-15T22:58:32Z

Also I've set the resources for mon, mgr, osd, mds but cannot see it in my pods specs. Are they managed somehow different? By operator maybe?

Antiarchitect · 2020-07-15T23:07:04Z

@travisn @leseb Meanwhile seeng this in operator logs:

2020-07-15 23:02:22.612726 I | util: retrying after 1m0s, last error: failed to check if we can stop the deployment rook-ceph-osd-13: failed to check if rook-ceph-osd-13 was ok to stop: deployment rook-ceph-osd-13 cannot be stopped: exit status 16
2020-07-15 23:03:22.612931 I | op-mon: checking if we can stop the deployment rook-ceph-osd-13
2020-07-15 23:03:32.987387 I | util: retrying after 1m0s, last error: failed to check if we can stop the deployment rook-ceph-osd-13: failed to check if rook-ceph-osd-13 was ok to stop: deployment rook-ceph-osd-13 cannot be stopped: exit status 16
2020-07-15 23:04:32.987649 I | op-mon: checking if we can stop the deployment rook-ceph-osd-13
2020-07-15 23:04:46.519241 I | util: retrying after 1m0s, last error: failed to check if we can stop the deployment rook-ceph-osd-13: failed to check if rook-ceph-osd-13 was ok to stop: deployment rook-ceph-osd-13 cannot be stopped: exit status 16
2020-07-15 23:05:46.519431 I | op-mon: checking if we can stop the deployment rook-ceph-osd-13
2020-07-15 23:06:02.601144 I | util: retrying after 1m0s, last error: failed to check if we can stop the deployment rook-ceph-osd-13: failed to check if rook-ceph-osd-13 was ok to stop: deployment rook-ceph-osd-13 cannot be stopped: exit status 16
2020-07-15 23:07:02.601397 I | op-mon: checking if we can stop the deployment rook-ceph-osd-13
2020-07-15 23:07:12.808486 E | op-osd: failed to update osd deployment 13. failed to check if deployment "rook-ceph-osd-13" can be updated. max retries exceeded, last err: exit status 16
deployment rook-ceph-osd-13 cannot be stopped
github.com/rook/rook/pkg/daemon/ceph/client.okToStopDaemon
	/home/rook/go/src/github.com/rook/rook/pkg/daemon/ceph/client/upgrade.go:203
github.com/rook/rook/pkg/daemon/ceph/client.OkToStop
	/home/rook/go/src/github.com/rook/rook/pkg/daemon/ceph/client/upgrade.go:168
github.com/rook/rook/pkg/operator/ceph/cluster/mon.UpdateCephDeploymentAndWait.func1
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/mon/spec.go:308
github.com/rook/rook/pkg/operator/k8sutil.UpdateDeploymentAndWait.func1
	/home/rook/go/src/github.com/rook/rook/pkg/operator/k8sutil/deployment.go:80
github.com/rook/rook/pkg/util.Retry
	/home/rook/go/src/github.com/rook/rook/pkg/util/retry.go:28
github.com/rook/rook/pkg/operator/k8sutil.UpdateDeploymentAndWait
	/home/rook/go/src/github.com/rook/rook/pkg/operator/k8sutil/deployment.go:79
github.com/rook/rook/pkg/operator/ceph/cluster/mon.UpdateCephDeploymentAndWait
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/mon/spec.go:332
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).startOSDDaemonsOnNode
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/osd.go:566
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).handleStatusConfigMapStatus
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/status.go:265
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).checkNodesCompleted
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/status.go:147
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).completeOSDsForAllNodes
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/status.go:164
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).completeProvision
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/status.go:119
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).startProvisioningOverNodes
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/osd.go:409
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).Start
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/osd.go:209
github.com/rook/rook/pkg/operator/ceph/cluster.(*cluster).doOrchestration
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/cluster.go:297
github.com/rook/rook/pkg/operator/ceph/cluster.(*cluster).createInstance
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/cluster.go:226
github.com/rook/rook/pkg/operator/ceph/cluster.(*ClusterController).configureLocalCephCluster.func1
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/controller.go:471
k8s.io/apimachinery/pkg/util/wait.WaitFor
	/home/rook/go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:434
k8s.io/apimachinery/pkg/util/wait.pollInternal
	/home/rook/go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:320
k8s.io/apimachinery/pkg/util/wait.Poll
	/home/rook/go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:314
github.com/rook/rook/pkg/operator/ceph/cluster.(*ClusterController).configureLocalCephCluster
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/controller.go:455
github.com/rook/rook/pkg/operator/ceph/cluster.(*ClusterController).initializeCluster
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/controller.go:522
github.com/rook/rook/pkg/operator/ceph/cluster.(*ClusterController).onAdd
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/controller.go:290
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd
	/home/rook/go/pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/controller.go:198
k8s.io/client-go/tools/cache.newInformer.func1
	/home/rook/go/pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/controller.go:370
k8s.io/client-go/tools/cache.(*DeltaFIFO).Pop
	/home/rook/go/pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/delta_fifo.go:422
k8s.io/client-go/tools/cache.(*controller).processLoop
	/home/rook/go/pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/controller.go:153
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
	/home/rook/go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/home/rook/go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
	/home/rook/go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:88
k8s.io/client-go/tools/cache.(*controller).Run
	/home/rook/go/pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/controller.go:125
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1357
failed to check if rook-ceph-osd-13 was ok to stop
github.com/rook/rook/pkg/daemon/ceph/client.OkToStop
	/home/rook/go/src/github.com/rook/rook/pkg/daemon/ceph/client/upgrade.go:170
github.com/rook/rook/pkg/operator/ceph/cluster/mon.UpdateCephDeploymentAndWait.func1
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/mon/spec.go:308
github.com/rook/rook/pkg/operator/k8sutil.UpdateDeploymentAndWait.func1
	/home/rook/go/src/github.com/rook/rook/pkg/operator/k8sutil/deployment.go:80
github.com/rook/rook/pkg/util.Retry
	/home/rook/go/src/github.com/rook/rook/pkg/util/retry.go:28
github.com/rook/rook/pkg/operator/k8sutil.UpdateDeploymentAndWait
	/home/rook/go/src/github.com/rook/rook/pkg/operator/k8sutil/deployment.go:79
github.com/rook/rook/pkg/operator/ceph/cluster/mon.UpdateCephDeploymentAndWait
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/mon/spec.go:332
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).startOSDDaemonsOnNode
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/osd.go:566
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).handleStatusConfigMapStatus
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/status.go:265
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).checkNodesCompleted
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/status.go:147
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).completeOSDsForAllNodes
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/status.go:164
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).completeProvision
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/status.go:119
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).startProvisioningOverNodes
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/osd.go:409
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).Start
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/osd.go:209
github.com/rook/rook/pkg/operator/ceph/cluster.(*cluster).doOrchestration
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/cluster.go:297
github.com/rook/rook/pkg/operator/ceph/cluster.(*cluster).createInstance
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/cluster.go:226
github.com/rook/rook/pkg/operator/ceph/cluster.(*ClusterController).configureLocalCephCluster.func1
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/controller.go:471
k8s.io/apimachinery/pkg/util/wait.WaitFor
	/home/rook/go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:434
k8s.io/apimachinery/pkg/util/wait.pollInternal
	/home/rook/go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:320
k8s.io/apimachinery/pkg/util/wait.Poll
	/home/rook/go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:314
github.com/rook/rook/pkg/operator/ceph/cluster.(*ClusterController).configureLocalCephCluster
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/controller.go:455
github.com/rook/rook/pkg/operator/ceph/cluster.(*ClusterController).initializeCluster
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/controller.go:522
github.com/rook/rook/pkg/operator/ceph/cluster.(*ClusterController).onAdd
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/controller.go:290
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd
	/home/rook/go/pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/controller.go:198
k8s.io/client-go/tools/cache.newInformer.func1
	/home/rook/go/pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/controller.go:370
k8s.io/client-go/tools/cache.(*DeltaFIFO).Pop
	/home/rook/go/pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/delta_fifo.go:422
k8s.io/client-go/tools/cache.(*controller).processLoop
	/home/rook/go/pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/controller.go:153
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
	/home/rook/go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/home/rook/go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
	/home/rook/go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:88
k8s.io/client-go/tools/cache.(*controller).Run
	/home/rook/go/pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/controller.go:125
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1357
failed to check if we can stop the deployment rook-ceph-osd-13
github.com/rook/rook/pkg/operator/ceph/cluster/mon.UpdateCephDeploymentAndWait.func1
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/mon/spec.go:314
github.com/rook/rook/pkg/operator/k8sutil.UpdateDeploymentAndWait.func1
	/home/rook/go/src/github.com/rook/rook/pkg/operator/k8sutil/deployment.go:80
github.com/rook/rook/pkg/util.Retry
	/home/rook/go/src/github.com/rook/rook/pkg/util/retry.go:28
github.com/rook/rook/pkg/operator/k8sutil.UpdateDeploymentAndWait
	/home/rook/go/src/github.com/rook/rook/pkg/operator/k8sutil/deployment.go:79
github.com/rook/rook/pkg/operator/ceph/cluster/mon.UpdateCephDeploymentAndWait
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/mon/spec.go:332
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).startOSDDaemonsOnNode
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/osd.go:566
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).handleStatusConfigMapStatus
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/status.go:265
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).checkNodesCompleted
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/status.go:147
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).completeOSDsForAllNodes
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/status.go:164
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).completeProvision
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/status.go:119
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).startProvisioningOverNodes
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/osd.go:409
github.com/rook/rook/pkg/operator/ceph/cluster/osd.(*Cluster).Start
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/osd.go:209
github.com/rook/rook/pkg/operator/ceph/cluster.(*cluster).doOrchestration
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/cluster.go:297
github.com/rook/rook/pkg/operator/ceph/cluster.(*cluster).createInstance
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/cluster.go:226
github.com/rook/rook/pkg/operator/ceph/cluster.(*ClusterController).configureLocalCephCluster.func1
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/controller.go:471
k8s.io/apimachinery/pkg/util/wait.WaitFor
	/home/rook/go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:434
k8s.io/apimachinery/pkg/util/wait.pollInternal
	/home/rook/go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:320
k8s.io/apimachinery/pkg/util/wait.Poll
	/home/rook/go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:314
github.com/rook/rook/pkg/operator/ceph/cluster.(*ClusterController).configureLocalCephCluster
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/controller.go:455
github.com/rook/rook/pkg/operator/ceph/cluster.(*ClusterController).initializeCluster
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/controller.go:522
github.com/rook/rook/pkg/operator/ceph/cluster.(*ClusterController).onAdd
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/controller.go:290
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd
	/home/rook/go/pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/controller.go:198
k8s.io/client-go/tools/cache.newInformer.func1
	/home/rook/go/pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/controller.go:370
k8s.io/client-go/tools/cache.(*DeltaFIFO).Pop
	/home/rook/go/pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/delta_fifo.go:422
k8s.io/client-go/tools/cache.(*controller).processLoop
	/home/rook/go/pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/controller.go:153
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
	/home/rook/go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/home/rook/go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
	/home/rook/go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:88
k8s.io/client-go/tools/cache.(*controller).Run
	/home/rook/go/pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/controller.go:125
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1357

Antiarchitect · 2020-07-16T20:06:51Z

Is that normal that osd_memory_target value is always exactly the same as my memory resource limit. What about pod / OSD process overhead etc. Still getting OOMKilled. Tried to set 4GB, 8GB, 16GB limits. Only limit of 32GB is now giving stable result. That is sad as I have 3 OSDs per node and only 64GB of memory on each node.

travisn · 2020-07-16T23:04:40Z

@Antiarchitect It's very unexpected that you would need over 4GB to stabilize.
@bengland2 What would help troubleshoot why the OSDs are getting OOMKilled? Looks like the cluster is on 15.2.4.

Antiarchitect · 2020-07-16T23:09:34Z

@travisn It seems like when the cluster is stable OSDs consume less, but when part of OSDs is down the others start to eat memory (but I haven't faced these amounts before - the record was 45Gi on one of the OSDs) and it is a self-destabilizing process. Added system-cluster-critical to OSDs and other parts of Rook and left 16Gi limit (it's currently applying).
Ceph cluster capacity in my case is ridiculous: 5 nodes, 6TiB total, 15 OSDs (3 per node, each bluestore, each separate device but attached to one HPSA disk controller), ~120 PGs. Actually we're experiencing some troubles with HPSA disk controller sometimes - all disks on the node are attached to it (https://bugzilla.kernel.org/show_bug.cgi?id=208215)

P.S. Ceph is one of the beautiful pieces of software I've ever met in practice - its tenacious like a cockroach.

Antiarchitect · 2020-07-17T00:54:10Z

Cluster has stabilized - no data loss, updated Rook to 1.3.8. The picture:

rook-ceph-osd-0-7c5c6dd5c4-dfdsm                                  49m          499Mi
rook-ceph-osd-1-64cdcdfd65-m8xdj                                  48m          442Mi
rook-ceph-osd-11-dcb85cf9-c7w7b                                   39m          648Mi
rook-ceph-osd-12-68c7cf9699-wmf9c                                 32m          812Mi
rook-ceph-osd-13-6dbb7477fb-crw2r                                 54m          219Mi
rook-ceph-osd-15-677b4cd9c-lz2dt                                  108m         304Mi
rook-ceph-osd-2-d8c46d685-f5hbj                                   152m         451Mi
rook-ceph-osd-3-bb64b7fb9-nd9nq                                   49m          492Mi
rook-ceph-osd-4-6c7c698976-z6xxj                                  40m          222Mi
rook-ceph-osd-5-58cc767849-qjhg5                                  45m          678Mi
rook-ceph-osd-6-b4d7558bb-ksxct                                   31m          617Mi
rook-ceph-osd-7-5869bb86f4-7khmx                                  79m          356Mi
rook-ceph-osd-8-6f7c9fff8c-wdznr                                  38m          354Mi
rook-ceph-osd-9-784b55766b-zcrdp                                  108m         439Mi

It scares that this calm picture can turn in to two nights nightmare momentarily.

bengland2 · 2020-07-17T14:34:49Z

@Antiarchitect I ran into something like this with rook 1.3 (not OCS) a couple days ago, when I did a really intense fio read workload with 4 NVMs/host and 25-GbE link - the OSDs started caching data furiously until the node ran out of memory, at which time I started seeing OOM Kills on the OSDs, which causes remaining OSDs to work hard to recover during this heavy load. Eventually it recovers, but the point is to prevent this you need to control caching. I did this with bluestore_default_buffered_read: false, but Josh Durgin suggested that instead of that, bluefs_buffered_io can be set to false so that reads are done with O_DIRECT and do not involve the kernel buffer cache. This will slow down bluestore rocksDB compaction somewhat (Mark Nelson) but will result in much more stable memory utilization. You can also ensure that osd_memory_target is set so that the OSD itself is limiting its in-process memory consumption, I'd suggest > 4 GiB, and set the memory CGroup limit to at least 50% higher than osd_memory_target to give OSDs a chance to avoid OOM. Make sure transparent hugepages are disabled for Ceph daemons, this should happen automatically in modern versions of Ceph such as the one you are using. If you are using SSD storage, the cost of a cache miss is much lower. Let us know if this makes sense and if it helps. @travisn FYI

Antiarchitect · 2020-07-17T18:43:35Z

@bengland2 Thank you so much for the explanation! It does make sense and I will try it. Actually this could be part of bluestore fine-tuning in Rook storage config in future releases of Rook.

P.S. We're decided to move the Ceph cluster out to dedicated nodes not managed by K8s and use Rook's ceph external cluster feature to provide RBD and CephFS storage types.

travisn · 2020-07-17T18:53:15Z

@bengland2 Thanks for all the input! Rook only sets these two vars depending on the resource requests/limits in the CephCluster CR: POD_MEMORY_LIMIT and POD_MEMORY_REQUEST.

Ceph picks up on those env vars in the osd daemon and will set the cgroup limit accordingly, looks like here. I thought Ceph was setting it to 0.8 of the memory limit, but I don't see that calculation there. Are we missing some other setting that will do that calculation?

Antiarchitect · 2020-07-17T20:06:25Z

@bengland2 @travisn Just ran ceph config show osd.0 and cannot find bluefs_buffered_io in the output (I'm using ceph/ceph:v15.2.4-20200630 official docker image). Am I looking in the wrong place? Anyway, if there is a guide of how these low-level options can be set within Rook YAML files it would be nice to see it. Thank you all once again :)

Antiarchitect · 2020-07-19T11:06:16Z

Found it. Seems false by default.

bengland2 · 2020-07-19T13:30:50Z

@Antiarchitect so is the RSS in your OSDs growing or is the kernel buffer cache (inactive pages) growing? If the kernel buffer cache, then Bluestore is not doing O_DIRECT. Otherwise, the bluestore OSD-internal cache must be growing, I think there are counters available for monitoring this ceph daemon osd.N perf dump (hard to get to with containers). Let's isolate the problem.

To set low-level options (e.g. ceph.conf), you can use ceph_config_overrides configmap, this is documented by rook.io. To change them on the fly, use "ceph tell" with injectargs but this isn't guaranteed to work for all parameters. Try dropping cache manually with echo 1 > /proc/sys/vm/drop_caches after you make changes so cache is clean to begin with.

Antiarchitect · 2020-07-19T16:17:57Z

From the rook toolbox container running ceph daemon osd.0 perf dump I get:

admin_socket: exception getting command descriptions: [Errno 2] No such file or directory

From the osd.0 container I get {}.

leseb · 2020-07-20T07:35:20Z

From the rook toolbox container running ceph daemon osd.0 perf dump I get:
admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
From the osd.0 container I get {}.

Because the toolbox does not run any daemon. The ceph daemon command looks for a socket. So you must be in the osd container.

Antiarchitect · 2020-07-20T11:15:45Z

Ok, I got it. ceph tell osd.X perf dump and ceph tell osd.X config show from the rook toolbox :) The cluster is relatively calm at the moment, but here is the dumps and configs of two most memory consuming OSDs at the moment (~2GB): https://gist.github.com/Antiarchitect/e83ff05b9dc6a65033618588ffea4e70

"bluefs_buffered_io": "false"
"bluestore_default_buffered_read": "true"

bengland2 · 2020-07-24T01:46:59Z

and I see

"osd_memory_target": "1073741824",
"osd_memory_target_cgroup_limit_ratio": "0.800000",

what's your resources:memory:limit on your OSD pod? Because osd_memory_target seems to be limiting you to 1 GiB, but typically I never want to see that below 4 GiB, not sure what lower limit is.

Antiarchitect · 2020-07-24T07:32:57Z

kubectl -n core-rook get pods rook-ceph-osd-12-675595cbfd-xvr87 -ojson | jq .spec.containers[].resources
{
  "limits": {
    "cpu": "2",
    "memory": "32Gi"
  },
  "requests": {
    "cpu": "500m",
    "memory": "1Gi"
  }
}

https://gist.github.com/Antiarchitect/b0b68a463e021e3dabca1e60dff6f924 and new dump

bengland2 · 2020-07-24T12:20:58Z

I haven't done any tests where osd_memory_target was set to 1 GiB. If your memory limit is 32 Gi, that means you're willing to give the OSD up to 32 GB of RAM before killing it, so why set osd_memory_target to 1 GiB? since you have 10 OSDs and 64 GB RAM, you have enough memory to provide 4 GB RAM for each OSD, which is usually what I see it set at. Based on above discussion, since you avoid using kernel buffer cache, that should free up memory for OSDs. I'm not sure what minimum value of osd_memory_target is but Ceph has to be able to cache OSD metadata to run efficiently, if you starve it for RAM then it will constantly have to go to RocksDB to get this metadata and will slow down, at best.

Antiarchitect · 2020-07-24T13:33:35Z

I didn't set osd_memory_target manually, any options I've changed so far is by changing rook osd limits

bengland2 · 2020-07-24T14:28:13Z

try raising request memory to 4 GiB, see what happens? request=1GiB is still too low. Basically that means "don't schedule the pod on this node unless there is 1 GiB of free mem". You don't want the OSD running on there anyway unless there is more memory available than that. In fact, I'd suggest setting request to 4 GiB and limit to 6 GiB (i.e. don't OOM kill it until it gets to the limit), and watch what happens to osd_memory_target. Hopefully you get an OSD memory target that is some percentage under the limit, so the Ceph OSD trims its memory usage before it gets OOM killed by the Linux kernel. Make sense?

Antiarchitect · 2020-07-25T01:43:09Z

@bengland2 Thank you for the tip: Here is the result: https://gist.github.com/Antiarchitect/fc1cfd989a58528a33a3c04455a45b3b
and "osd_memory_target": "4294967296" which is exactly 4Gb

Antiarchitect · 2020-07-25T01:44:59Z

I will close this issue as the cluster is stable for about a week already. If it shows itself again - will reopen. Thank you all for your patience and good advice!

Antiarchitect added the bug label Jul 13, 2020

alexcpn mentioned this issue Jul 14, 2020

Rook 1.2 Ceph OSD Pod memory consumption very high #5821

Closed

Antiarchitect closed this as completed Jul 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OSD and MON memory consumption #5811

OSD and MON memory consumption #5811

Antiarchitect commented Jul 13, 2020 •

edited

Loading

OpsPita commented Jul 13, 2020

Antiarchitect commented Jul 13, 2020

Antiarchitect commented Jul 15, 2020

travisn commented Jul 15, 2020

Antiarchitect commented Jul 15, 2020

travisn commented Jul 15, 2020

Antiarchitect commented Jul 15, 2020 •

edited

Loading

travisn commented Jul 15, 2020

Antiarchitect commented Jul 15, 2020

Antiarchitect commented Jul 15, 2020 •

edited

Loading

Antiarchitect commented Jul 16, 2020

travisn commented Jul 16, 2020

Antiarchitect commented Jul 16, 2020 •

edited

Loading

Antiarchitect commented Jul 17, 2020

bengland2 commented Jul 17, 2020

Antiarchitect commented Jul 17, 2020 •

edited

Loading

travisn commented Jul 17, 2020

Antiarchitect commented Jul 17, 2020 •

edited

Loading

Antiarchitect commented Jul 19, 2020

bengland2 commented Jul 19, 2020

Antiarchitect commented Jul 19, 2020

leseb commented Jul 20, 2020

Antiarchitect commented Jul 20, 2020 •

edited

Loading

bengland2 commented Jul 24, 2020

Antiarchitect commented Jul 24, 2020

bengland2 commented Jul 24, 2020

Antiarchitect commented Jul 24, 2020

bengland2 commented Jul 24, 2020

Antiarchitect commented Jul 25, 2020

Antiarchitect commented Jul 25, 2020

OSD and MON memory consumption #5811

OSD and MON memory consumption #5811

Comments

Antiarchitect commented Jul 13, 2020 • edited Loading

OpsPita commented Jul 13, 2020

Antiarchitect commented Jul 13, 2020

Antiarchitect commented Jul 15, 2020

travisn commented Jul 15, 2020

Antiarchitect commented Jul 15, 2020

travisn commented Jul 15, 2020

Antiarchitect commented Jul 15, 2020 • edited Loading

travisn commented Jul 15, 2020

Antiarchitect commented Jul 15, 2020

Antiarchitect commented Jul 15, 2020 • edited Loading

Antiarchitect commented Jul 16, 2020

travisn commented Jul 16, 2020

Antiarchitect commented Jul 16, 2020 • edited Loading

Antiarchitect commented Jul 17, 2020

bengland2 commented Jul 17, 2020

Antiarchitect commented Jul 17, 2020 • edited Loading

travisn commented Jul 17, 2020

Antiarchitect commented Jul 17, 2020 • edited Loading

Antiarchitect commented Jul 19, 2020

bengland2 commented Jul 19, 2020

Antiarchitect commented Jul 19, 2020

leseb commented Jul 20, 2020

Antiarchitect commented Jul 20, 2020 • edited Loading

bengland2 commented Jul 24, 2020

Antiarchitect commented Jul 24, 2020

bengland2 commented Jul 24, 2020

Antiarchitect commented Jul 24, 2020

bengland2 commented Jul 24, 2020

Antiarchitect commented Jul 25, 2020

Antiarchitect commented Jul 25, 2020

Antiarchitect commented Jul 13, 2020 •

edited

Loading

Antiarchitect commented Jul 15, 2020 •

edited

Loading

Antiarchitect commented Jul 15, 2020 •

edited

Loading

Antiarchitect commented Jul 16, 2020 •

edited

Loading

Antiarchitect commented Jul 17, 2020 •

edited

Loading

Antiarchitect commented Jul 17, 2020 •

edited

Loading

Antiarchitect commented Jul 20, 2020 •

edited

Loading