Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SLUB: Unable to allocate memory on node -1 #22794

Closed
ThomasCassimon opened this issue Sep 11, 2019 · 1 comment
Closed

SLUB: Unable to allocate memory on node -1 #22794

ThomasCassimon opened this issue Sep 11, 2019 · 1 comment

Comments

@ThomasCassimon
Copy link

What kind of request is this (question/bug/enhancement/feature request):
Bug

Steps to reproduce (least amount of steps as possible):
Leave Rancher running for a couple of days.

Result:
When dockers are running on the server, the following errors are generated by dmesg.

[319003.331580] SLUB: Unable to allocate memory on node -1 (gfp=0x2088020)
[319003.331587]   cache: mnt_cache(9946:ea4c61d01895b46bf04a9b8c54602a4a6fff12ca7341b3b21f879414c120da79), object size: 384, buffer size: 384, default order: 2, min order: 0
[319003.331591]   node 0: slabs: 20, objs: 776, free: 0
[319003.331594]   node 1: slabs: 14, objs: 556, free: 0
[319940.222707] SLUB: Unable to allocate memory on node -1 (gfp=0x2088020)
[319940.222714]   cache: blkdev_ioc(9946:ea4c61d01895b46bf04a9b8c54602a4a6fff12ca7341b3b21f879414c120da79), object size: 104, buffer size: 104, default order: 0, min order: 0
[319940.222718]   node 0: slabs: 2, objs: 78, free: 0
[319940.222721]   node 1: slabs: 4, objs: 156, free: 0
[320001.028578] SLUB: Unable to allocate memory on node -1 (gfp=0x2080020)
[320001.028582]   cache: kmalloc-128(9946:ea4c61d01895b46bf04a9b8c54602a4a6fff12ca7341b3b21f879414c120da79), object size: 128, buffer size: 128, default order: 1, min order: 0
[320001.028585]   node 0: slabs: 19, objs: 1216, free: 0
[320001.028587]   node 1: slabs: 18, objs: 1152, free: 0
[320004.629230] SLUB: Unable to allocate memory on node -1 (gfp=0x2088020)
[320004.629236]   cache: mnt_cache(9946:ea4c61d01895b46bf04a9b8c54602a4a6fff12ca7341b3b21f879414c120da79), object size: 384, buffer size: 384, default order: 2, min order: 0
[320004.629239]   node 0: slabs: 24, objs: 912, free: 0
[320004.629241]   node 1: slabs: 18, objs: 692, free: 0

Eventually, the server crashes (after about 3-4 days since first docker boot) and the last thing that can be seen in the kern.log are the SLUB errors.

Other details that may be helpful:
Related problems I found:
https://pingcap.com/blog/try-to-fix-two-linux-kernel-bugs-while-testing-tidb-operator-in-k8s/
opencontainers/runc#1725
kubernetes/kubernetes#61937 (comment)

However, these issues are related to CentOS and not Ubuntu.
Additionally, these issues claim tasks are blocked, which doesn't happen according to our dmesg.

This problem was also reported to Docker and Kubernetes:
kubernetes/kubernetes#82534
docker/for-linux#774

Environment information

  • Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): 2.2.7
  • Installation option (single install/HA):

Cluster information

  • Cluster type (Hosted/Infrastructure Provider/Custom/Imported): Hosted

  • Machine type (cloud/VM/metal) and specifications (CPU/memory): Dell Poweredge 430, Intel(R) Xeon(R) CPU E5-2650 v4, 128GB RAM

  • Kubernetes version: 1.14.6

  • Docker version (use docker version):

Client: Docker Engine - Community
 Version:           19.03.1
 API version:       1.40
 Go version:        go1.12.5
 Git commit:        74b1e89e8a
 Built:             Thu Jul 25 21:21:35 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.1
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.5
  Git commit:       74b1e89e8a
  Built:            Thu Jul 25 21:20:09 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
@stale
Copy link

stale bot commented Jul 12, 2021

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Jul 12, 2021
@stale stale bot closed this as completed Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant