Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node level memory state from root cgroup is different from /proc/meminfo #2042

Open
ddongchen opened this issue Sep 13, 2018 · 13 comments
Open

Comments

@ddongchen
Copy link

In kubernetes 1.9, the node level memory is from root cgroup:
rootStats, networkStats, err := sp.provider.GetCgroupStats("/", updateStats)

While I found the node level memory from "/proc/meminfo" is different:

cat /proc/meminfo
MemTotal: 263774064 kB
MemFree: 59887272 kB
MemAvailable: 221831940 kB
Buffers: 1097668 kB
Cached: 154110176 kB
SwapCached: 0 kB
Active: 79424372 kB
Inactive: 98926152 kB
Active(anon): 23219964 kB
Inactive(anon): 171352 kB
Active(file): 56204408 kB
Inactive(file): 98754800 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 219692 kB
Writeback: 0 kB
AnonPages: 23143444 kB
Mapped: 2724656 kB
Shmem: 248384 kB
Slab: 11127012 kB
SReclaimable: 7229012 kB
SUnreclaim: 3898000 kB
KernelStack: 64432 kB
PageTables: 101444 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 131887032 kB
Committed_AS: 63167748 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 1520832 kB
VmallocChunk: 34223804980 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 15480156 kB
DirectMap2M: 202510336 kB
DirectMap1G: 52428800 kB

the node level memory used:
used = MemTotal - MemFree = 263774064 kB - 59887272 kB = 203886792KB
the real used:
real used = MemTotal - MemFree - (Buffers + Cached - Shmem)
= 263774064 kB - 59887272 kB - (1097668 kB + 154110176 kB - 248384 kB)
= 48927332kB

while from root cgroup:

cat /sys/fs/cgroup/memory/memory.usage_in_bytes
183140081664
cat /sys/fs/cgroup/memory/memory.stat
cache 39511924736
rss 267431936
rss_huge 0
mapped_file 424710144
swap 0
pgpgin 18514489
pgpgout 8802732
pgfault 12272911
pgmajfault 188
inactive_anon 651264
active_anon 268210176
inactive_file 33488736256
active_file 6021758976
unevictable 0
hierarchical_memory_limit 9223372036854775807
hierarchical_memsw_limit 9223372036854775807
total_cache 158941466624
total_rss 24187408384
total_rss_huge 0
total_mapped_file 1903247360
total_swap 0
total_pgpgin 5077787329
total_pgpgout 5033078131
total_pgfault 12592699800
total_pgmajfault 85677
total_inactive_anon 175456256
total_active_anon 24265732096
total_inactive_file 101087215616
total_active_file 57599930368
total_unevictable 0

the node level memory used:
used = memory.usage_in_bytes = 183140081664 = 178847736kB
the real used:
real used = memory.usage_in_bytes - total_inactive_file
= 183140081664 - 101087215616
= 82052866048
= 80129752kB

the two real used values is very different. Also, we used the same total value, so the "available" memory is also very different. 
This also have an effect on eviction in kubelet, shall we use the "/proc/meminfo"? for this may be more correct.
@dashpole dashpole self-assigned this Sep 18, 2018
@andrewghobrial
Copy link

I also see the same thing and would also like to understand the relation between the root cgroup memory metrics and that of /proc/meminfo

@prabhu43
Copy link

Any update on this?

I have also noticed huge difference in memory usage from cgroup and memory usage of /proc/meminfo

Memory usage from cgroup is showing 83%,
Memory usage from /proc/meminfo is showing 25%

@alexbrand
Copy link

Also super curious about this.

@jbouzekri
Copy link

Still seeing this kind of huge difference in Azure AKS 1.16.13. There is a difference of around 33% between the metric instance:node_memory_utilisation:ratio{job="node-exporter", job="node-exporter", instance="$instance"} collected by prometheus node exporter and the result of kubectl top nodes

@phongvq
Copy link

phongvq commented May 3, 2021

I think there is the difference because "memory usage" in cgroup and /proc/meminfo that you mentioned are 2 different metrics.

The one taken from cgroup (real used = memory.usage_in_bytes - total_inactive_file) is the memory working set, it is actually Active: 79424372 kB in /proc/meminfo output.
And the reason cgroup real usage is much higher than /proc/meminfo one: it includes "active" buffers/cached, which is "subtracted" in /proc/meminfo real usage (real used = MemTotal - MemFree - (Buffers + Cached - Shmem) - the buffers and cached here contains both active and inactive memory IMO).

Please correct me if I'm wrong, since I'm facing similar thing and not pretty sure about the cause.

Ref: https://www.heroix.com/blog/linux-memory-use/

@Korijn
Copy link

Korijn commented Nov 17, 2021

Did anyone figure this out? Specifically how to monitor the root cgroup memory usage in prometheus/grafana...

@michelefa1988
Copy link

michelefa1988 commented Jun 6, 2022

I also have the same problem. Reading memory via kubectk top nodes, I get 100% memory usage whilst via node_memory_MemTotal_bytes{node=~"$nodename"} - node_memory_MemFree_bytes{node=~"$nodename"} - node_memory_Buffers_bytes{node=~"$nodename"} - node_memory_Cached_bytes{node=~"$nodename"} memory usage is at 54% Obviously this discrepancy kinda sucks

Moreover which is the "correct" one?

@anupamsr
Copy link

Afaik elastic/apm-agent-java#1197 (comment) says the memory reported via /proc/meminfo (which is read by prometheus) is wrong.

@tomsucho
Copy link

tomsucho commented May 5, 2023

so to get something closer to the kctl top nodes one needs to use this in Prom?
100 * (avg_over_time(node_memory_Active_bytes[5m])/avg_over_time(node_memory_MemTotal_bytes[5m]))
comparing values it seem closely matching, but would like to ensure it's not some strange coincidence

@Pavangj959
Copy link

I also have the same problem. Reading memory via kubectk top nodes, I get 100% memory usage whilst via node_memory_MemTotal_bytes{node=~"$nodename"} - node_memory_MemFree_bytes{node=~"$nodename"} - node_memory_Buffers_bytes{node=~"$nodename"} - node_memory_Cached_bytes{node=~"$nodename"} memory usage is at 54% Obviously this discrepancy kinda sucks

Moreover which is the "correct" one?

I guess you can try this, (node_memory_MemTotal_bytes - node_memory_MemFree_bytes - (node_memory_Buffers_bytes + node_memory_Cached_bytes - node_memory_Shmem_bytes), giving me the close results related to kubectl top nodes.

@rizalmf
Copy link

rizalmf commented Nov 16, 2023

maybe you can try this
sum(container_memory_working_set_bytes{id="/"}) by (instance)

@dashpole dashpole removed their assignment Jan 31, 2024
@sameeraksc
Copy link

we are having the same problem. prometheus metrics are accurate for us.

@AzySir
Copy link

AzySir commented Apr 26, 2024

Also the same for me - I can't progress a card around this because I literally can't give accurate numbers...

I also notice these values are available from the k8s api so it potentially the owners of prometheus not forwarding the correct data - however there is no clear query that would fetch this data which should exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests