Docker stats memory usage is misleading #10824

Closed
SamSaffron opened this Issue Feb 16, 2015 · 37 comments

Comments

Projects
None yet
@SamSaffron
# free -h
             total       used       free     shared    buffers     cached
Mem:          2.0G       1.9G        95M       273M        97M       879M
-/+ buffers/cache:       930M       1.0G
Swap:         1.0G        15M       1.0G
# docker stats app
CONTAINER           CPU %               MEM USAGE/LIMIT       MEM %               NET I/O
app                 51.20%              1.311 GiB/1.955 GiB   67.06%              58.48 MiB/729.5 MiB

so I have 1GB of RAM to play with yet docker stats is reporting 1.3G used on a 2G system, so its reporting pages that are used for disk caching as well as used, which it "technically" used but will be freed by the OS once we run low on RAM

Instead stat should just report (used - buffers cached) and (buffers cached in brackets)

Otherwise people using this information may be mislead as to how bad stuff really is ( http://www.linuxatemyram.com/ )

Also recommend using standard unix postfixes here, so it 1.311G not 1.311 GiB

cc @crosbymichael

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Feb 23, 2015

Contributor

+1 I think it would make sense to make this change.

Contributor

crosbymichael commented Feb 23, 2015

+1 I think it would make sense to make this change.

@blackfader

This comment has been minimized.

Show comment
Hide comment
@blackfader

blackfader Feb 26, 2015

yes memory is not right

yes memory is not right

@mattva01

This comment has been minimized.

Show comment
Hide comment
@mattva01

mattva01 Mar 22, 2015

@spf13
This is a little harder then it seems, as it requires modifying libcontainer code here: https://github.com/docker/libcontainer/blob/master/cgroups/fs/memory.go
Do we use memory.stat.rss, memory.stat.usage_in_bytes - memory.stat.cache, or some other combination of stats...?

@spf13
This is a little harder then it seems, as it requires modifying libcontainer code here: https://github.com/docker/libcontainer/blob/master/cgroups/fs/memory.go
Do we use memory.stat.rss, memory.stat.usage_in_bytes - memory.stat.cache, or some other combination of stats...?

@jessfraz

This comment has been minimized.

Show comment
Hide comment
@jessfraz

jessfraz Mar 22, 2015

Contributor

I switched to the proficient label

On Sunday, March 22, 2015, Matthew Gallagher notifications@github.com
wrote:

@spf13 https://github.com/spf13
This is a little harder then it seems, as it requires modifying
libcontainer code here:
https://github.com/docker/libcontainer/blob/master/cgroups/fs/memory.go
Do we use memory.stat.rss, memory.stat.usage_in_bytes - memory.stat.cache,
or some other combination of stats...?


Reply to this email directly or view it on GitHub
docker#10824 (comment).

Contributor

jessfraz commented Mar 22, 2015

I switched to the proficient label

On Sunday, March 22, 2015, Matthew Gallagher notifications@github.com
wrote:

@spf13 https://github.com/spf13
This is a little harder then it seems, as it requires modifying
libcontainer code here:
https://github.com/docker/libcontainer/blob/master/cgroups/fs/memory.go
Do we use memory.stat.rss, memory.stat.usage_in_bytes - memory.stat.cache,
or some other combination of stats...?


Reply to this email directly or view it on GitHub
docker#10824 (comment).

@imleon

This comment has been minimized.

Show comment
Hide comment
@imleon

imleon Mar 23, 2015

Try set the param of docker run --memory,then check your
/sys/fs/cgroup/memory/docker/<container_id>/memory.usage_in_bytes
It should be right.

imleon commented Mar 23, 2015

Try set the param of docker run --memory,then check your
/sys/fs/cgroup/memory/docker/<container_id>/memory.usage_in_bytes
It should be right.

@resouer

This comment has been minimized.

Show comment
Hide comment
@resouer

resouer Mar 27, 2015

Contributor

#dibs ping @wonderflow, I think this is what you are trying to fix. Just send a PR.

Contributor

resouer commented Mar 27, 2015

#dibs ping @wonderflow, I think this is what you are trying to fix. Just send a PR.

@nhsiehgit

This comment has been minimized.

Show comment
Hide comment
@nhsiehgit

nhsiehgit Mar 27, 2015

Contributor

Hi @resouer @wonderflow,
Please remember to include the keywords closes or fixes once you've submitted your PR to help make tracking of fixes easier. Thanks!

Contributor

nhsiehgit commented Mar 27, 2015

Hi @resouer @wonderflow,
Please remember to include the keywords closes or fixes once you've submitted your PR to help make tracking of fixes easier. Thanks!

@wonderflow

This comment has been minimized.

Show comment
Hide comment
@wonderflow

wonderflow Apr 3, 2015

Contributor

"Also recommend using standard unix postfixes here, so it 1.311G not 1.311 GiB"
This is related using function HumanSize or BytesSize in size.go.

I'm not sure if I should change. @crosbymichael
cc @SamSaffron

Contributor

wonderflow commented Apr 3, 2015

"Also recommend using standard unix postfixes here, so it 1.311G not 1.311 GiB"
This is related using function HumanSize or BytesSize in size.go.

I'm not sure if I should change. @crosbymichael
cc @SamSaffron

@SamSaffron

This comment has been minimized.

Show comment
Hide comment
@SamSaffron

SamSaffron Apr 14, 2015

@wonderflow thanks for the fixes !

@wonderflow thanks for the fixes !

@pbelmann pbelmann referenced this issue in metagenomics/2015-biogas-cebitec Jun 4, 2015

Closed

Example: Docker on AWS #21

@wonderflow

This comment has been minimized.

Show comment
Hide comment
@wonderflow

wonderflow Jul 21, 2015

Contributor

I can’t find your comment on github.

Can you tell me more details about it? Thanks.

On Jul 21, 2015, at 12:14 AM, Etienne Bruines notifications@github.com wrote:

Since this PR (#12172 docker#12172) was merged April 14th, and I'm running 1.6.2 (released May 13th), I believe I'm running a version that should have this fix.

However, docker stats still returns the memory usage including the buffers.

containername 0.00% 218.4 MiB/512 MiB 42.66% 17.22 KiB/130.1 KiB

$ free -m
total used free shared buffers
Mem: 971 889 82 26 46
-/+ buffers: 842 129
Swap: 2047 0 2047
I like the fact that it's not saying the usage is 889M, but 889 - 842 != 218.

Any way this can be fixed?

(Posting in this issue because it feels like the same issue, that's not 100% fixed, instead of a new issue)


Reply to this email directly or view it on GitHub docker#10824 (comment).

Contributor

wonderflow commented Jul 21, 2015

I can’t find your comment on github.

Can you tell me more details about it? Thanks.

On Jul 21, 2015, at 12:14 AM, Etienne Bruines notifications@github.com wrote:

Since this PR (#12172 docker#12172) was merged April 14th, and I'm running 1.6.2 (released May 13th), I believe I'm running a version that should have this fix.

However, docker stats still returns the memory usage including the buffers.

containername 0.00% 218.4 MiB/512 MiB 42.66% 17.22 KiB/130.1 KiB

$ free -m
total used free shared buffers
Mem: 971 889 82 26 46
-/+ buffers: 842 129
Swap: 2047 0 2047
I like the fact that it's not saying the usage is 889M, but 889 - 842 != 218.

Any way this can be fixed?

(Posting in this issue because it feels like the same issue, that's not 100% fixed, instead of a new issue)


Reply to this email directly or view it on GitHub docker#10824 (comment).

@pdericson

This comment has been minimized.

Show comment
Hide comment
@pdericson

pdericson Aug 13, 2015

Contributor

Just want to point out that this has had me chasing phantom memory leaks for more hours than I care to admit ;)

top says "18.4" MB, docker stats says "192.9" MB.

screen shot 2015-08-13 at 11 47 59 pm

Update:

Here's the part that worries me, if I set a memory limit on the container of 32MB, it looks like my container will be killed even though resident memory doesn't go above 18.4 MB.

Contributor

pdericson commented Aug 13, 2015

Just want to point out that this has had me chasing phantom memory leaks for more hours than I care to admit ;)

top says "18.4" MB, docker stats says "192.9" MB.

screen shot 2015-08-13 at 11 47 59 pm

Update:

Here's the part that worries me, if I set a memory limit on the container of 32MB, it looks like my container will be killed even though resident memory doesn't go above 18.4 MB.

@petervo petervo referenced this issue in cockpit-project/cockpit Sep 19, 2016

Open

Container memory usage #5065

@akailash

This comment has been minimized.

Show comment
Hide comment
@akailash

akailash Jan 13, 2017

Is this resolved? I have the same problem as @pdericson with my apps running in docker containers (running on my local machine). top shows me 15MB in RES and the docker stats show a slowly linearly increasing memory usage. Which climbs to around 250MB over 12 hours.
I am using Docker Version: 1.12.5

akailash commented Jan 13, 2017

Is this resolved? I have the same problem as @pdericson with my apps running in docker containers (running on my local machine). top shows me 15MB in RES and the docker stats show a slowly linearly increasing memory usage. Which climbs to around 250MB over 12 hours.
I am using Docker Version: 1.12.5

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Jan 13, 2017

Contributor

How many processes are running inside that container?

Contributor

crosbymichael commented Jan 13, 2017

How many processes are running inside that container?

@akailash

This comment has been minimized.

Show comment
Hide comment

14 processes. I posted some more details in this link: http://stackoverflow.com/questions/41604669/golang-pprof-vs-docker-stats

@coolljt0725

This comment has been minimized.

Show comment
Hide comment
@coolljt0725

coolljt0725 Jan 16, 2017

Contributor

@akailash I think it's because the memory statistics of docker stats are from memory cgroup which also take page cache into account but RES not. Can you try drop cache and see docker stats also decrease ?

Contributor

coolljt0725 commented Jan 16, 2017

@akailash I think it's because the memory statistics of docker stats are from memory cgroup which also take page cache into account but RES not. Can you try drop cache and see docker stats also decrease ?

@akailash

This comment has been minimized.

Show comment
Hide comment
@akailash

akailash Jan 16, 2017

@coolljt0725 I didnt exactly drop the cache, but another application which does no File I/O but makes similar network I/O does not show an increase via docker stats. So I do think the increase that is shown in docker stats is related to cache. Still, this bug is unresolved in that case? And more importantly, the container should not be killed instead of dropping cache if the docker stats memory usage number hits the ceiling.

@coolljt0725 I didnt exactly drop the cache, but another application which does no File I/O but makes similar network I/O does not show an increase via docker stats. So I do think the increase that is shown in docker stats is related to cache. Still, this bug is unresolved in that case? And more importantly, the container should not be killed instead of dropping cache if the docker stats memory usage number hits the ceiling.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jan 16, 2017

Member

Docker does not kill containers; It's the kernel that kills processes if the whole system is out of memory.

Member

thaJeztah commented Jan 16, 2017

Docker does not kill containers; It's the kernel that kills processes if the whole system is out of memory.

@akailash

This comment has been minimized.

Show comment
Hide comment
@akailash

akailash Jan 16, 2017

Did a quick test to check whether container gets killed using this in my docker-compose.yml
mem_limit: 15m
On checking docker stats, I see that the container isn't killed. It does free up some unused memory and continue running. I expect that if it hits an OOM, kernel will kill the processes in the container. Thanks @thaJeztah, @coolljt0725!

akailash commented Jan 16, 2017

Did a quick test to check whether container gets killed using this in my docker-compose.yml
mem_limit: 15m
On checking docker stats, I see that the container isn't killed. It does free up some unused memory and continue running. I expect that if it hits an OOM, kernel will kill the processes in the container. Thanks @thaJeztah, @coolljt0725!

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Jan 17, 2017

Contributor

Also your top is only showing one process, docker stats is showing the mem usage + cache for all processes in the container combined.

Contributor

crosbymichael commented Jan 17, 2017

Also your top is only showing one process, docker stats is showing the mem usage + cache for all processes in the container combined.

@akailash

This comment has been minimized.

Show comment
Hide comment
@akailash

akailash Jan 18, 2017

well, each container actually runs just one process with many threads. docker stats shows the number of threads as the "PIDS". So RES value was checked for that process.
The components of the memory usage metric should be clearly mentioned in the documentation of the command. It is confusing for many users.

well, each container actually runs just one process with many threads. docker stats shows the number of threads as the "PIDS". So RES value was checked for that process.
The components of the memory usage metric should be clearly mentioned in the documentation of the command. It is confusing for many users.

@MarkRx MarkRx referenced this issue in openshift/origin-web-console Mar 1, 2017

Closed

Pod metrics should display memory cache information #1315

@Sergeant007

This comment has been minimized.

Show comment
Hide comment
@Sergeant007

Sergeant007 Apr 9, 2017

Since we also encountered this issue, here is bottom line of our investigation:

  • This issue has been closed, but misleading memory usage printed to console by "docker stats" had not been fixed
  • docker/libcontainer#506 (Change memory usage by minus cache memory) was never merged
  • instead there was docker/libcontainer#518 (Add cache to MemoryStats) that added cached memory stat to statistics API
  • but it did nothing with with output of "docker stats". It still has only generic "MEM USAGE" that includes caches.

So to get memory usage without caches I would recommend to invoke docker stats API and then do some math (see example below).

Experiment (Docker version 17.04.0-ce, build 4845c56, Ubuntu 14.04.2):
Step 1. start a container with bash

docker run --name mem_stats_test --interactive --tty ubuntu:14.04 /bin/bash

Step 2. open a new terminal window and ensure that memory consumption in stats is low

$ docker stats mem_stats_test --no-stream
CONTAINER           CPU %               MEM USAGE / LIMIT   MEM %               NET I/O             BLOCK I/O           PIDS
mem_stats_test      0.00%               568KiB / 15.37GiB   0.00%               4.12kB / 0B         73.7kB / 0B         0
$
$ # Use curl of version 7.50 (not a part of vanilla Ubuntu 14.04)
$ # Note, the output below is shrinked during copy-paste. Only memory statistics is included
$curl --unix-socket /var/run/docker.sock http://localhost/containers/mem_stats_test/stats?stream=false | python -m json.tool
...
    "memory_stats": {
        "limit": 16504213504,
        "max_usage": 974848,
        "stats": {
            "active_anon": 245760,
            "active_file": 73728,
            "cache": 90112,                          <-- THIS IS PAGE CACHE    !!!!!
            "hierarchical_memory_limit": 9223372036854771712,
            "inactive_anon": 262144,
            "inactive_file": 0,
            "mapped_file": 0,
            "pgfault": 1174,
            "pgmajfault": 4,
            "pgpgin": 569,
            "pgpgout": 427,
            "rss": 491520,
            "rss_huge": 0,
            "total_active_anon": 245760,
            "total_active_file": 73728,
            "total_cache": 90112,
            "total_inactive_anon": 262144,
            "total_inactive_file": 0,
            "total_mapped_file": 0,
            "total_pgfault": 1174,
            "total_pgmajfault": 4,
            "total_pgpgin": 569,
            "total_pgpgout": 427,
            "total_rss": 491520,
            "total_rss_huge": 0,
            "total_unevictable": 0,
            "total_writeback": 0,
            "unevictable": 0,
            "writeback": 0
        },
        "usage": 581632                        <-- THIS IS OVERALL STATS !!!!!
    }
...

Step 3. Let is work for several hours to ensure that memory doesn't grow significanly (or believe me ans skip this step)
Step 4. Do some heavy file operations that grow page cache (in docker container console from Step 1)

# # Once following operation is finished, page cache will grow, but no additional "real" memory consumption is added
# dd if=/dev/urandom of=/tmp/1GB.bin bs=64M count=16 iflag=fullblock
16+0 records in
16+0 records out
1073741824 bytes (1.1 GB) copied, 74.4138 s, 14.4 MB/s

Step 5. Check docker stats again

$ # reported memoty usage grew up!!!
$ docker stats mem_stats_test --no-stream
CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
mem_stats_test      0.00%               855.2MiB / 15.37GiB   5.43%               6.32kB / 0B         111kB / 0B          0
$curl --unix-socket /var/run/docker.sock http://localhost/containers/mem_stats_test/stats?stream=false | python -m json.tool
...
"memory_stats": {
        "limit": 16504213504,
        "max_usage": 964055040,
        "stats": {
            "active_anon": 81920,
            "active_file": 126976,
            "cache": 896299008,                         <-- THIS IS OUR CACHE !!!
            "hierarchical_memory_limit": 9223372036854771712,
            "inactive_anon": 430080,
            "inactive_file": 896155648,
            "mapped_file": 0,
            "pgfault": 2031,
            "pgmajfault": 5,
            "pgpgin": 263395,
            "pgpgout": 44451,
            "rss": 495616,
            "rss_huge": 0,
            "total_active_anon": 81920,
            "total_active_file": 126976,
            "total_cache": 896299008,
            "total_inactive_anon": 430080,
            "total_inactive_file": 896155648,
            "total_mapped_file": 0,
            "total_pgfault": 2031,
            "total_pgmajfault": 5,
            "total_pgpgin": 263395,
            "total_pgpgout": 44451,
            "total_rss": 495616,
            "total_rss_huge": 0,
            "total_unevictable": 0,
            "total_writeback": 0,
            "unevictable": 0,
            "writeback": 0
        },
        "usage": 896794624                                  <-- THIS INCLUDES PAGE CACHE !!!!
...

Step 6. Conclusion
If you need more reliable statistics for your monitoring, just subtract "cache" from "usage", what in our case would give 896794624-896299008=495616 (400KB instead of reported 900MB).

Since we also encountered this issue, here is bottom line of our investigation:

  • This issue has been closed, but misleading memory usage printed to console by "docker stats" had not been fixed
  • docker/libcontainer#506 (Change memory usage by minus cache memory) was never merged
  • instead there was docker/libcontainer#518 (Add cache to MemoryStats) that added cached memory stat to statistics API
  • but it did nothing with with output of "docker stats". It still has only generic "MEM USAGE" that includes caches.

So to get memory usage without caches I would recommend to invoke docker stats API and then do some math (see example below).

Experiment (Docker version 17.04.0-ce, build 4845c56, Ubuntu 14.04.2):
Step 1. start a container with bash

docker run --name mem_stats_test --interactive --tty ubuntu:14.04 /bin/bash

Step 2. open a new terminal window and ensure that memory consumption in stats is low

$ docker stats mem_stats_test --no-stream
CONTAINER           CPU %               MEM USAGE / LIMIT   MEM %               NET I/O             BLOCK I/O           PIDS
mem_stats_test      0.00%               568KiB / 15.37GiB   0.00%               4.12kB / 0B         73.7kB / 0B         0
$
$ # Use curl of version 7.50 (not a part of vanilla Ubuntu 14.04)
$ # Note, the output below is shrinked during copy-paste. Only memory statistics is included
$curl --unix-socket /var/run/docker.sock http://localhost/containers/mem_stats_test/stats?stream=false | python -m json.tool
...
    "memory_stats": {
        "limit": 16504213504,
        "max_usage": 974848,
        "stats": {
            "active_anon": 245760,
            "active_file": 73728,
            "cache": 90112,                          <-- THIS IS PAGE CACHE    !!!!!
            "hierarchical_memory_limit": 9223372036854771712,
            "inactive_anon": 262144,
            "inactive_file": 0,
            "mapped_file": 0,
            "pgfault": 1174,
            "pgmajfault": 4,
            "pgpgin": 569,
            "pgpgout": 427,
            "rss": 491520,
            "rss_huge": 0,
            "total_active_anon": 245760,
            "total_active_file": 73728,
            "total_cache": 90112,
            "total_inactive_anon": 262144,
            "total_inactive_file": 0,
            "total_mapped_file": 0,
            "total_pgfault": 1174,
            "total_pgmajfault": 4,
            "total_pgpgin": 569,
            "total_pgpgout": 427,
            "total_rss": 491520,
            "total_rss_huge": 0,
            "total_unevictable": 0,
            "total_writeback": 0,
            "unevictable": 0,
            "writeback": 0
        },
        "usage": 581632                        <-- THIS IS OVERALL STATS !!!!!
    }
...

Step 3. Let is work for several hours to ensure that memory doesn't grow significanly (or believe me ans skip this step)
Step 4. Do some heavy file operations that grow page cache (in docker container console from Step 1)

# # Once following operation is finished, page cache will grow, but no additional "real" memory consumption is added
# dd if=/dev/urandom of=/tmp/1GB.bin bs=64M count=16 iflag=fullblock
16+0 records in
16+0 records out
1073741824 bytes (1.1 GB) copied, 74.4138 s, 14.4 MB/s

Step 5. Check docker stats again

$ # reported memoty usage grew up!!!
$ docker stats mem_stats_test --no-stream
CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
mem_stats_test      0.00%               855.2MiB / 15.37GiB   5.43%               6.32kB / 0B         111kB / 0B          0
$curl --unix-socket /var/run/docker.sock http://localhost/containers/mem_stats_test/stats?stream=false | python -m json.tool
...
"memory_stats": {
        "limit": 16504213504,
        "max_usage": 964055040,
        "stats": {
            "active_anon": 81920,
            "active_file": 126976,
            "cache": 896299008,                         <-- THIS IS OUR CACHE !!!
            "hierarchical_memory_limit": 9223372036854771712,
            "inactive_anon": 430080,
            "inactive_file": 896155648,
            "mapped_file": 0,
            "pgfault": 2031,
            "pgmajfault": 5,
            "pgpgin": 263395,
            "pgpgout": 44451,
            "rss": 495616,
            "rss_huge": 0,
            "total_active_anon": 81920,
            "total_active_file": 126976,
            "total_cache": 896299008,
            "total_inactive_anon": 430080,
            "total_inactive_file": 896155648,
            "total_mapped_file": 0,
            "total_pgfault": 2031,
            "total_pgmajfault": 5,
            "total_pgpgin": 263395,
            "total_pgpgout": 44451,
            "total_rss": 495616,
            "total_rss_huge": 0,
            "total_unevictable": 0,
            "total_writeback": 0,
            "unevictable": 0,
            "writeback": 0
        },
        "usage": 896794624                                  <-- THIS INCLUDES PAGE CACHE !!!!
...

Step 6. Conclusion
If you need more reliable statistics for your monitoring, just subtract "cache" from "usage", what in our case would give 896794624-896299008=495616 (400KB instead of reported 900MB).

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Apr 10, 2017

Contributor

@Sergeant007

Do you want to open a PR with your changes to modify the output?

Contributor

crosbymichael commented Apr 10, 2017

@Sergeant007

Do you want to open a PR with your changes to modify the output?

@Sergeant007

This comment has been minimized.

Show comment
Hide comment
@Sergeant007

Sergeant007 Apr 11, 2017

@crosbymichael
I could try to do the same thing as it was done initially in docker/libcontainer#506, but in a new codebase. I do think, that solution suggested by @wonderflow in that pull request was correct. Documentation update will be needed as well.

A signal for me to start the work will be if you reopen this issue.

P.S. I'm not worried that changes of existing columns in "docker stats" output will break backward compatibility, because currently that metric (which includes caches) is practically useless and the updated behavior will have exactly the same format, but more accurate.

@crosbymichael
I could try to do the same thing as it was done initially in docker/libcontainer#506, but in a new codebase. I do think, that solution suggested by @wonderflow in that pull request was correct. Documentation update will be needed as well.

A signal for me to start the work will be if you reopen this issue.

P.S. I'm not worried that changes of existing columns in "docker stats" output will break backward compatibility, because currently that metric (which includes caches) is practically useless and the updated behavior will have exactly the same format, but more accurate.

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Apr 12, 2017

Contributor

@Sergeant007 after looking at that llibcontainer PR, isn't the fix in the docker CLI to just subtract the cache from the usage value before displaying it to the user?

Contributor

crosbymichael commented Apr 12, 2017

@Sergeant007 after looking at that llibcontainer PR, isn't the fix in the docker CLI to just subtract the cache from the usage value before displaying it to the user?

@Sergeant007

This comment has been minimized.

Show comment
Hide comment
@Sergeant007

Sergeant007 Apr 14, 2017

@crosbymichael I took a closer look into previous PR. Yes, you're right, the best solution would be to subtract the cache right before displaying, without changes to MemoryStats behavior.

@crosbymichael I took a closer look into previous PR. Yes, you're right, the best solution would be to subtract the cache right before displaying, without changes to MemoryStats behavior.

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Apr 14, 2017

Contributor

@Sergeant007 sounds good to me.

Contributor

crosbymichael commented Apr 14, 2017

@Sergeant007 sounds good to me.

@wonderflow

This comment has been minimized.

Show comment
Hide comment
@wonderflow

wonderflow Apr 17, 2017

Contributor

so we round back after a long way... 😿

Contributor

wonderflow commented Apr 17, 2017

so we round back after a long way... 😿

@Sergeant007

This comment has been minimized.

Show comment
Hide comment
@Sergeant007

Sergeant007 Apr 17, 2017

@crosbymichael, @wonderflow, let me prepare a PR for this. If will be somewhat at the end of this week.

@crosbymichael, @wonderflow, let me prepare a PR for this. If will be somewhat at the end of this week.

@Sergeant007

This comment has been minimized.

Show comment
Hide comment
@Sergeant007

Sergeant007 Apr 22, 2017

@crosbymichael, @wonderflow, as agreed, created moby/moby#32777 with the fix - please take a look.

Sergeant007 commented Apr 22, 2017

@crosbymichael, @wonderflow, as agreed, created moby/moby#32777 with the fix - please take a look.

@farcaller

This comment has been minimized.

Show comment
Hide comment
@farcaller

farcaller Apr 30, 2017

Contributor

Does that mean that page cache will account for the memory limit and the process might get killed because of running out of disk cache (which, AFAIK, cannot be limited in the kernel)?

Contributor

farcaller commented Apr 30, 2017

Does that mean that page cache will account for the memory limit and the process might get killed because of running out of disk cache (which, AFAIK, cannot be limited in the kernel)?

@MadMub

This comment has been minimized.

Show comment
Hide comment
@MadMub

MadMub May 11, 2017

ping on @farcaller question, we have been chasing real and phantom memory leaks over the past month. Does the file IO cache come into effect in conjunction with --memory limit? Or will the container always release the cache if the PID needs more memory?

MadMub commented May 11, 2017

ping on @farcaller question, we have been chasing real and phantom memory leaks over the past month. Does the file IO cache come into effect in conjunction with --memory limit? Or will the container always release the cache if the PID needs more memory?

@Sergeant007

This comment has been minimized.

Show comment
Hide comment
@Sergeant007

Sergeant007 May 13, 2017

Hi @farcaller, @MadMub

My comments are below (however I'm not a docker representative and not an expert in system internals).

Does that mean that page cache will account for the memory limit...

I think no. Memory limits (including killing of oversized apps) is responsibility to cgroups, not Docker. Docker kills nothing. And most probably "developers of cgroups" did it correctly and do not account disk cache.

we have been chasing real and phantom memory leaks over the past month... Or will the container always release the cache if the PID needs more memory?

Please, share more information if you still have the problem. The container does not release IO cache. Cached memory is released by operating system. At the same time "cgroups" just split cached memory across all cgroups and keep some statistics about it. At the same time docker just picks that statistic up and reports it (and, in case of this particular issue, cached memory should not be counted in the "docker stats" report).

P.S. The fix is still hanging in the PR.

Hi @farcaller, @MadMub

My comments are below (however I'm not a docker representative and not an expert in system internals).

Does that mean that page cache will account for the memory limit...

I think no. Memory limits (including killing of oversized apps) is responsibility to cgroups, not Docker. Docker kills nothing. And most probably "developers of cgroups" did it correctly and do not account disk cache.

we have been chasing real and phantom memory leaks over the past month... Or will the container always release the cache if the PID needs more memory?

Please, share more information if you still have the problem. The container does not release IO cache. Cached memory is released by operating system. At the same time "cgroups" just split cached memory across all cgroups and keep some statistics about it. At the same time docker just picks that statistic up and reports it (and, in case of this particular issue, cached memory should not be counted in the "docker stats" report).

P.S. The fix is still hanging in the PR.

yadutaf added a commit to yadutaf/ctop that referenced this issue Jul 27, 2017

exclude cache from memory report as suggested in #47
Arguably, cache space is also reclaimable, meaning that, under pressure,
it can be reclaimed instead of triggering an OOM. Wether to substract it
or not would be a debate in itself. I chose to do it here to follow the
lead of Docker. Even if this is not the best decision (and I have not
opinion on that), doing the opposite would introduce confusion and
confusion rarely do good.

See: moby/moby#10824

yadutaf added a commit to yadutaf/ctop that referenced this issue Jul 27, 2017

exclude cache from memory report as suggested in #47
Arguably, cache space is also reclaimable, meaning that, under pressure,
it can be reclaimed instead of triggering an OOM. Wether to substract it
or not would be a debate in itself. I chose to do it here to follow the
lead of Docker. Even if this is not the best decision (and I have not
opinion on that), doing the opposite would introduce confusion and
confusion rarely do good.

See: moby/moby#10824

@frol frol referenced this issue in bcicen/ctop Jul 28, 2017

Closed

Memory usage is misleading #84

yadutaf added a commit to yadutaf/ctop that referenced this issue Jul 28, 2017

exclude cache from memory report as suggested in #47
Arguably, cache space is also reclaimable, meaning that, under pressure,
it can be reclaimed instead of triggering an OOM. Wether to substract it
or not would be a debate in itself. I chose to do it here to follow the
lead of Docker. Even if this is not the best decision (and I have not
opinion on that), doing the opposite would introduce confusion and
confusion rarely do good.

See: moby/moby#10824

@frol frol referenced this issue in moncho/dry Aug 1, 2017

Closed

Memory usage is misleading #51

@maleadt maleadt referenced this issue in scanterog/munin-plugin-docker Aug 10, 2017

Open

Substract cache from memory usage. #2

@coolljt0725

This comment has been minimized.

Show comment
Hide comment
@coolljt0725

coolljt0725 Nov 29, 2017

Contributor

@theonlydoo If I understand correctly, the memory usage in docker stats is exactly read from containers's memory cgroup, you can see the value is the same with 490270720 which you read from cat /sys/fs/cgroup/memory/docker/665e99f8b760c0300f10d3d9b35b1a5e5fdcf1b7e4a0e27c1b6ff100981d9a69/memory.usage_in_bytes, and the limit is also the memory cgroup limit which is set by -m when you create container. The statistics of RES and memory cgroup are different, the RES does not take caches into account, but the memory cgroup does, that's why MEM USAGE in docker stats is much more than RES in top

Hope this helps :)

Contributor

coolljt0725 commented Nov 29, 2017

@theonlydoo If I understand correctly, the memory usage in docker stats is exactly read from containers's memory cgroup, you can see the value is the same with 490270720 which you read from cat /sys/fs/cgroup/memory/docker/665e99f8b760c0300f10d3d9b35b1a5e5fdcf1b7e4a0e27c1b6ff100981d9a69/memory.usage_in_bytes, and the limit is also the memory cgroup limit which is set by -m when you create container. The statistics of RES and memory cgroup are different, the RES does not take caches into account, but the memory cgroup does, that's why MEM USAGE in docker stats is much more than RES in top

Hope this helps :)

@theonlydoo

This comment has been minimized.

Show comment
Hide comment
@theonlydoo

theonlydoo Nov 30, 2017

@coolljt0725 so caches are increasing as the app is writing logs on the host's filesystem, and it triggers
oom reaping even if the app isn't using that memory and it's only caches.

@coolljt0725 so caches are increasing as the app is writing logs on the host's filesystem, and it triggers
oom reaping even if the app isn't using that memory and it's only caches.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Dec 3, 2017

Member

OOM reaping is handled by the kernel though

Member

thaJeztah commented Dec 3, 2017

OOM reaping is handled by the kernel though

@axelabs

This comment has been minimized.

Show comment
Hide comment
@axelabs

axelabs Apr 6, 2018

@Sergeant007, your suggested container memory usage formula does not match the sum of process memory usage for me. When I compare memory used from cgroups (minus cache) to the sum of process rss, I get a not so small difference:

# RSS_USED=$(ps -eo rss --sort -rss --no-headers|awk '{RSS+=$1}END{print RSS*1024}')
# MEM_USED=$((`cat /sys/fs/cgroup/memory/memory.usage_in_bytes` - `awk '/^cache/{print $2}' /sys/fs/cgroup/memory/memory.stat`))
# echo $RSS_USED - $MEM_USED = $(($RSS_USED-$MEM_USED))
27811590144 - 27780145152 = 31444992

Does ps not reflect actual memory usage of processes in the container like some other tools?

This is in a CentOS 7.3 container running on Amazon Linux 2016.09.

axelabs commented Apr 6, 2018

@Sergeant007, your suggested container memory usage formula does not match the sum of process memory usage for me. When I compare memory used from cgroups (minus cache) to the sum of process rss, I get a not so small difference:

# RSS_USED=$(ps -eo rss --sort -rss --no-headers|awk '{RSS+=$1}END{print RSS*1024}')
# MEM_USED=$((`cat /sys/fs/cgroup/memory/memory.usage_in_bytes` - `awk '/^cache/{print $2}' /sys/fs/cgroup/memory/memory.stat`))
# echo $RSS_USED - $MEM_USED = $(($RSS_USED-$MEM_USED))
27811590144 - 27780145152 = 31444992

Does ps not reflect actual memory usage of processes in the container like some other tools?

This is in a CentOS 7.3 container running on Amazon Linux 2016.09.

@Sergeant007

This comment has been minimized.

Show comment
Hide comment
@Sergeant007

Sergeant007 Apr 7, 2018

@axelabs, I'm not sure if ps work 100% accurate, but even my method described above does not pretend to be fully correct. It just fixes the original behavior that was really-really misleading.

So in your particular measurement the difference was 32MB out of 27GB of consumed memory. This is 0.1%. I'm not sure if there is an easier to achieve something better. And I'm not even sure which one (RSS_USED or MEM_USED) was correct.

@axelabs, I'm not sure if ps work 100% accurate, but even my method described above does not pretend to be fully correct. It just fixes the original behavior that was really-really misleading.

So in your particular measurement the difference was 32MB out of 27GB of consumed memory. This is 0.1%. I'm not sure if there is an easier to achieve something better. And I'm not even sure which one (RSS_USED or MEM_USED) was correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment