Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker stats shows MEM USAGE of 16EiB -- that's exactly uint64.Max #42140

Open
fierlion opened this issue Mar 11, 2021 · 5 comments
Open

docker stats shows MEM USAGE of 16EiB -- that's exactly uint64.Max #42140

fierlion opened this issue Mar 11, 2021 · 5 comments
Labels
kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/19.03

Comments

@fierlion
Copy link

fierlion commented Mar 11, 2021

Description
docker stats is intermittently returning 16EiB (16 exbibytes) of MEM USAGE

CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT    MEM %               NET I/O
<container-id>      <name>              0.00%               16EiB / 256MiB       6871947673599.99%   8.32kB / 0B  5.41MB / 0B         1

This directly corresponds to the value of golang's max unit64: 18446744073709551615:

Screen Shot 2021-03-11 at 10 43 25 AM

Steps to reproduce the issue:
Not clear; this is happening intermittently.
The containers are managed by AWS ECS.
First noticed the memory usage spikes in CloudWatch, the jump representing a change of approximately 10^6 in magnitude.

We configured our instance logging to separately capture the direct output of the docker stats command on the instance when we see the anomaly.
A snippet of the docker stats output is pasted above (with redacted container-id and name).

Describe the results you received:
When running docker stats, we intermittently see the MEM USAGE spike to 16EiB

Describe the results you expected:
Our memory limit is 256MiB, (we're expecting less than that).

Additional information you deem important (e.g. issue happens only occasionally):
The issue happens occasionally.

Output of docker version:

Client: Docker Engine - Community
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Mon Jun 22 15:45:36 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.12
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       48a66213fe
  Built:            Mon Jun 22 15:44:07 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 23
  Running: 19
  Paused: 0
  Stopped: 4
 Images: 21
 Server Version: 19.03.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: syslog
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-1028-aws
 Operating System: Ubuntu 18.04.5 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 36
 Total Memory: 68.59GiB
 Docker Root Dir: /mnt/containers
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Live Restore Enabled: true

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):
AWS ECS managed containers

@thaJeztah thaJeztah added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/19.03 labels Mar 11, 2021
@thaJeztah
Copy link
Member

We get this info from the kernel, but an average is calculated, and I wonder if it would (e.g.) return -1 if no information was present yet (resulting in uint64.Max -1 to be shown)

There's also some calculation happening in the CLI to remove cache from usage;

For docker 19.03.12; https://github.com/docker/cli/blob/v19.03.12/cli/command/container/stats_helpers.go#L227-L231

// calculateMemUsageUnixNoCache calculate memory usage of the container.
// Page cache is intentionally excluded to avoid misinterpretation of the output.
func calculateMemUsageUnixNoCache(mem types.MemoryStats) float64 {
	return float64(mem.Usage - mem.Stats["cache"])
}

And docker 20.10 (which added cgroupv2);

https://github.com/docker/cli/blob/v20.10.5/cli/command/container/stats_helpers.go#L227-L249

// calculateMemUsageUnixNoCache calculate memory usage of the container.
// Cache is intentionally excluded to avoid misinterpretation of the output.
//
// On cgroup v1 host, the result is `mem.Usage - mem.Stats["total_inactive_file"]` .
// On cgroup v2 host, the result is `mem.Usage - mem.Stats["inactive_file"] `.
//
// This definition is consistent with cadvisor and containerd/CRI.
// * https://github.com/google/cadvisor/commit/307d1b1cb320fef66fab02db749f07a459245451
// * https://github.com/containerd/cri/commit/6b8846cdf8b8c98c1d965313d66bc8489166059a
//
// On Docker 19.03 and older, the result was `mem.Usage - mem.Stats["cache"]`.
// See https://github.com/moby/moby/issues/40727 for the background.
func calculateMemUsageUnixNoCache(mem types.MemoryStats) float64 {
	// cgroup v1
	if v, isCgroup1 := mem.Stats["total_inactive_file"]; isCgroup1 && v < mem.Usage {
		return float64(mem.Usage - v)
	}
	// cgroup v2
	if v := mem.Stats["inactive_file"]; v < mem.Usage {
		return float64(mem.Usage - v)
	}
	return float64(mem.Usage)
}

@thaJeztah
Copy link
Member

I guess it would be useful (if possible) to get the data that's returned by the API to see what value is causing the issue 🤔

@fierlion
Copy link
Author

fierlion commented Mar 15, 2021

it would be useful (if possible) to get the data that's returned by the API

Just to clarify, you mean data from the /containers/(id)/stats API endpoint, correct?

(Note: the issue is intermittent so we'll need to catch it.)

@thaJeztah
Copy link
Member

Just to clarify, you mean data from the /containers/(id)/stats API endpoint, correct?

Yes, correct; it would give a datapoint where the wrong value is coming from (bug in the cli doing the calculation, incorrect / incomplete data returned by the daemon (or containerd / kernel).

@MrZXR
Copy link

MrZXR commented Aug 11, 2021

Just to clarify, you mean data from the /containers/(id)/stats API endpoint, correct?

Yes, correct; it would give a datapoint where the wrong value is coming from (bug in the cli doing the calculation, incorrect / incomplete data returned by the daemon (or containerd / kernel).

Whoops... I run into the same problem.

Output of docker stats

CONTAINER ID        NAME                     CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
<container-id>       <container_name>       0.17%               16EiB / 62.86GiB      27331069099.50%     4.35GB / 1GB      5GB / 7GB     2

Output of docker version

Client:
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.6
 Git commit:        6247962
 Built:             Sun Feb 10 04:13:27 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.0
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.4
  Git commit:       4d60db4
  Built:            Wed Nov  7 00:19:08 2018
  OS/Arch:          linux/amd64
  Experimental:     false

Output of docker info

Containers: 10
 Running: 9
 Paused: 0
 Stopped: 1
Images: 15
Server Version: 18.09.0
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
runc version: 09c8266bf2fcf9519a651b04ae54c967b9ab86ec
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 5.10.3-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 62.86GiB
Name: 0210000c7c0b07af2792f6fbe
ID: FYQL:Z564:TNLJ:65UT:O4WY:ALMR:KEMO:ZAC4:IAML:XEAW:FHJ2:JAW3
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

memory data from the /containers/(id)/stats API endpoint

"memory_stats": {
    "usage": 6258688,
    "max_usage": 13495779328,
    "stats": {
      "active_anon": 0,
      "active_file": 4411392,
      "cache": 6352896,
      "dirty": 135168,
      "hierarchical_memory_limit": 9223372036854771712,
      "hierarchical_memsw_limit": 9223372036854771712,
      "inactive_anon": 851968,
      "inactive_file": 3944448,
      "mapped_file": 540672,
      "pgfault": 143393646,
      "pgmajfault": 1089,
      "pgpgin": 72811365,
      "pgpgout": 72810414,
      "rss": 1257472,
      "rss_huge": 0,
      "total_active_anon": 0,
      "total_active_file": 4411392,
      "total_cache": 6352896,
      "total_dirty": 135168,
      "total_inactive_anon": 851968,
      "total_inactive_file": 3944448,
      "total_mapped_file": 540672,
      "total_pgfault": 143393646,
      "total_pgmajfault": 1089,
      "total_pgpgin": 72811365,
      "total_pgpgout": 72810414,
      "total_rss": 1257472,
      "total_rss_huge": 0,
      "total_unevictable": 0,
      "total_writeback": 0,
      "unevictable": 0,
      "writeback": 0
    },
    "limit": 67493679104
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/19.03
Projects
None yet
Development

No branches or pull requests

3 participants