Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tmpfs not properly freeing memory after rsync #133

Closed
iamacarpet opened this Issue Feb 20, 2019 · 5 comments

Comments

Projects
None yet
2 participants
@iamacarpet
Copy link

iamacarpet commented Feb 20, 2019

Hello,

I found an issue while performing an rsync into a gVisor container on App Engine Standard, which when tested locally against gVisor, elicits the same behavior.

With /tmp mounted as tmpfs inside the container, after rsyncing the same files over and over again, the memory usage goes up and up, never to be freed (until the container is killed for using to much memory, as is the case for App Engine).

To test this out, first create a 100MB file locally:
dd if=/dev/zero of=/tmp/test count=100 bs=1M

Check the free memory status of the container:

$ ssh -p 32769 root@conainer-host free -m
              total        used        free      shared  buff/cache   available
Mem:           2048           5        1953           0          89        1953
Swap:             0           0           0

container created from guide at [1], command to run it [2]

Then rsync it over to a gVisor container running an SSH server:
rsync -e 'ssh -p 32769' /tmp/test root@container-host:/tmp/test

$ ssh -p 32769 root@conainer-host free -m
              total        used        free      shared  buff/cache   available
Mem:           2048           6        1852         100         189        1852
Swap:             0           0           0

And again...

rsync -e 'ssh -p 32769' /tmp/test root@container-host:/tmp/test

$ ssh -p 32769 root@conainer-host free -m
              total        used        free      shared  buff/cache   available
Mem:           2048           5        1753         200         289        1753
Swap:             0           0           0

And again...

rsync -e 'ssh -p 32769' /tmp/test root@container-host:/tmp/test

$ ssh -p 32769 root@conainer-host free -m
              total        used        free      shared  buff/cache   available
Mem:           2048           4        1654         300         389        1654
Swap:             0           0           0

If we do an ls -lh in /tmp:

-rw-r--r-- 1 root root 100M Feb 20 12:38 test

And if we rm /tmp/test, the memory usage still looks like this:

$ ssh -p 32769 root@conainer-host free -m
              total        used        free      shared  buff/cache   available
Mem:           2048           6        1752         200         289        1752
Swap:             0           0           0

As shown, the 200 MB is still showing under "shared" even after the file is gone.

The memory doesn't look like it is being held by any processes:

$ ssh -p 32769 root@container-host ps vax
  PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
    1 ?        Ss     0:00      0     0 73564 10488  0.5 /usr/sbin/sshd -D
    1 ?        Ss     0:00      0     0 73564 10488  0.5 /usr/sbin/sshd -D
  246 ?        Ss     0:00      0     0 98532 12616  0.6 sshd: root@notty
  255 ?        Rs     0:00      0     0 34028  6940  0.3 ps vax

Running the same thing against a standard docker container (not runsc/gVisor) performs how you'd expect, not showing this behavior:

$ ssh -p 32770 root@container-host free -m
              total        used        free      shared  buff/cache   available
Mem:          32137        2441        3753         343       25942       28507
Swap:         51199           2       51197
$ rsync -e 'ssh -p 32770' /tmp/test root@container-host:/tmp/test
$ ssh -p 32770 root@container-host free -m
              total        used        free      shared  buff/cache   available
Mem:          32137        2443        3651         443       26042       28406
Swap:         51199           2       51197

$ rsync -e 'ssh -p 32770' /tmp/test root@container-host:/tmp/test
$ ssh -p 32770 root@container-host free -m
              total        used        free      shared  buff/cache   available
Mem:          32137        2443        3650         444       26043       28405
Swap:         51199           2       51197
$ rsync -e 'ssh -p 32770' /tmp/test root@container-host:/tmp/test
$ ssh -p 32770 root@container-host free -m
              total        used        free      shared  buff/cache   available
Mem:          32137        2443        3650         443       26043       28405
Swap:         51199           2       51197
$ rsync -e 'ssh -p 32770' /tmp/test root@container-host:/tmp/test
$ ssh -p 32770 root@container-host free -m
              total        used        free      shared  buff/cache   available
Mem:          32137        2444        3649         443       26044       28404
Swap:         51199           2       51197
$ rsync -e 'ssh -p 32770' /tmp/test root@container-host:/tmp/test
$ ssh -p 32770 root@container-host free -m
              total        used        free      shared  buff/cache   available
Mem:          32137        2444        3649         443       26044       28404
Swap:         51199           2       51197

$ ssh -p 32770 root@container-host ls -lh /tmp
total 100M
-rw-r--r-- 1 root root 100M Feb 20 13:07 test
$ ssh -p 32770 root@container-host rm /tmp/test
$ ssh -p 32770 root@container-host free -m
              total        used        free      shared  buff/cache   available
Mem:          32137        2442        3750         343       25944       28505
Swap:         51199           2       51197

Regards,
iamacarpet

[1] : https://docs.docker.com/engine/examples/running_ssh_service/
[2] : docker run -d -P --runtime=runsc -it --name tmptest --mount type=tmpfs,destination=/tmp ssh-test

@iamacarpet

This comment has been minimized.

Copy link
Author

iamacarpet commented Feb 20, 2019

Debug data is too big to upload, so please find it at [1] (GCS, requestor pays).

Installation was the latest nightly installer as per instructions at [2].

[1] : https://storage.cloud.google.com/a1comms-debug-public/runsc/2019-02-20/runsc-final.tar.gz
[2] : https://github.com/google/gvisor#installation

@prattmic prattmic added the kind:bug label Feb 20, 2019

@prattmic

This comment has been minimized.

Copy link
Member

prattmic commented Feb 20, 2019

Thanks for the detailed report! I suspect we have a reference counting issue that is causing us to keep the file in memory.

I can reproduce the issue with this script:

#!/bin/bash

echo "before"
free -m

echo
echo "test1"
dd if=/dev/zero of=/tmp/test1 bs=1M count=100
free -m

echo
echo "test2"
dd if=/dev/zero of=/tmp/test2 bs=1M count=100
free -m

echo
echo "move test2 to test1"
mv /tmp/test2 /tmp/test1
free -m

echo
echo "remove test1"
rm /tmp/test1
free -m

echo
echo "remove test2"
rm /tmp/test2
free -m
before
              total        used        free      shared  buff/cache   available
Mem:           2048          14        2002           0          31        2002
Swap:             0           0           0

test1
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.0819665 s, 1.3 GB/s
              total        used        free      shared  buff/cache   available
Mem:           2048           3        1913         100         131        1913
Swap:             0           0           0

test2
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.0582143 s, 1.8 GB/s
              total        used        free      shared  buff/cache   available
Mem:           2048           3        1813         200         231        1813
Swap:             0           0           0

move test2 to test1
              total        used        free      shared  buff/cache   available
Mem:           2048           3        1813         200         231        1813
Swap:             0           0           0

remove test1
              total        used        free      shared  buff/cache   available
Mem:           2048           3        1913         100         131        1913
Swap:             0           0           0

remove test2
rm: cannot remove '/tmp/test2': No such file or directory
              total        used        free      shared  buff/cache   available
Mem:           2048           3        1913         100         131        1913
Swap:             0           0           0
@prattmic

This comment has been minimized.

Copy link
Member

prattmic commented Feb 20, 2019

https://gvisor-review.googlesource.com/c/gvisor/+/14901 should fix this issue.

After:

before
              total        used        free      shared  buff/cache   available
Mem:           2048          15        2001           0          31        2001
Swap:             0           0           0

test1
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.0555625 s, 1.9 GB/s
              total        used        free      shared  buff/cache   available
Mem:           2048           2        1914         100         131        1914
Swap:             0           0           0

test2
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.0570842 s, 1.8 GB/s
              total        used        free      shared  buff/cache   available
Mem:           2048           2        1814         200         231        1814
Swap:             0           0           0

move test2 to test1
              total        used        free      shared  buff/cache   available
Mem:           2048           2        1914         100         131        1914
Swap:             0           0           0

remove test1
              total        used        free      shared  buff/cache   available
Mem:           2048           2        2014           0          31        2014
Swap:             0           0           0

remove test2
rm: cannot remove '/tmp/test2': No such file or directory
              total        used        free      shared  buff/cache   available
Mem:           2048           2        2014           0          31        2014
Swap:             0           0           0

@prattmic prattmic self-assigned this Feb 20, 2019

@iamacarpet

This comment has been minimized.

Copy link
Author

iamacarpet commented Feb 21, 2019

@prattmic wow, an amazingly quick fix, thanks so much!

Any idea how long it'll take for that change to be deployed internally for App Engine?

@shentubot shentubot closed this in 8a499ae Mar 19, 2019

@prattmic

This comment has been minimized.

Copy link
Member

prattmic commented Mar 19, 2019

This issue ended up being more involved than I expected, but this commit should fix it.

I don't work directly on App Engine, so I can't comment on when it will be available there.

tonistiigi pushed a commit to tonistiigi/gvisor that referenced this issue Mar 19, 2019

Remove references to replaced child in Rename in ramfs/agentfs
In the case of a rename replacing an existing destination inode, ramfs
Rename failed to first remove the replaced inode. This caused:

1. A leak of a reference to the inode (making it live indefinitely).
2. For directories, a leak of the replaced directory's .. link to the
   parent. This would cause the parent's link count to incorrectly
   increase.

(2) is much simpler to test than (1), so that's what I've done.

agentfs has a similar bug with link count only, so the Dirent layer
informs the Inode if this is a replacing rename.

Fixes google#133

PiperOrigin-RevId: 239105698
Change-Id: I4450af2462d8ae3339def812287213d2cbeebde0
Upstream-commit: 8a499ae
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.