New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of Memory error when running s3fs for couple of days. #748

Closed
H6 opened this Issue Apr 14, 2018 · 11 comments

Comments

Projects
None yet
5 participants
@H6

H6 commented Apr 14, 2018

Hi, i am experiencing an out of memory error when running s3fs for couple of days. Seems to depend on the amount of data uploaded.. but the disk seems not to be full.

Version of s3fs being used (s3fs --version)

1.8.3

Version of fuse being used (pkg-config --modversion fuse)

2.9.4

System information (uname -r)

4.9.20-11.31.amzn1.x86_64

Distro (cat /etc/issue)

Amazon Linux AMI release 2017.03
Kernel \r on an \m

s3fs command line used (if applicable)

s3fs -o iam_role="${S3AccessRole}" -o url="https://s3-eu-central-1.amazonaws.com" -o endpoint=eu-central-1 -o dbglevel=info -o curldbg -o allow_other -o use_cache=/tmp ${DestinationBucket} /var/ftp/data-requests

Logs

login: [3262904.705096] yum invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0, oom_score_adj=0

[3262904.715532] yum cpuset=/ mems_allowed=0

[3262904.718646] CPU: 0 PID: 13361 Comm: yum Tainted: G            E   4.9.20-11.31.amzn1.x86_64 #1

[3262904.721441] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006

[3262904.721441]  ffffc90002de7a48 ffffffff812fa12f ffffc90002de7be8 ffff88001f55d940

[3262904.721441]  ffffc90002de7ad8 ffffffff811f50bb 0000000000000000 0000000000000000

[3262904.721441]  ffffc90002de7b00 ffffc90002de7a80 ffffffff8112414a ffffc90002de7ae8

[3262904.721441] Call Trace:

[3262904.721441]  [<ffffffff812fa12f>] dump_stack+0x63/0x84

[3262904.721441]  [<ffffffff811f50bb>] dump_header+0x82/0x212

[3262904.721441]  [<ffffffff8112414a>] ? __delayacct_freepages_end+0x2a/0x30

[3262904.721441]  [<ffffffff8118ed9a>] ? do_try_to_free_pages+0x2da/0x340

[3262904.721441]  [<ffffffff8117af1c>] oom_kill_process+0x21c/0x3f0

[3262904.721441]  [<ffffffff8117b3b8>] out_of_memory+0x108/0x4b0

[3262904.721441]  [<ffffffff8117fd30>] __alloc_pages_slowpath+0x9a0/0xb90

[3262904.721441]  [<ffffffff81180103>] __alloc_pages_nodemask+0x1e3/0x250

[3262904.721441]  [<ffffffff811cdf48>] alloc_pages_current+0x88/0x120

[3262904.721441]  [<ffffffff81176284>] __page_cache_alloc+0xb4/0xc0

[3262904.721441]  [<ffffffff811794ef>] filemap_fault+0x27f/0x4f0

[3262904.721441]  [<ffffffffa0130a06>] ext4_filemap_fault+0x36/0x50 [ext4]

[3262904.721441]  [<ffffffff811a70aa>] __do_fault+0x6a/0xc0

[3262904.721441]  [<ffffffff811acafb>] handle_mm_fault+0xf6b/0x13a0

[3262904.721441]  [<ffffffff8102a649>] ? __switch_to+0x1f9/0x5d0

[3262904.721441]  [<ffffffff81060ff5>] __do_page_fault+0x225/0x4a0

[3262904.721441]  [<ffffffff81061292>] do_page_fault+0x22/0x30

[3262904.721441]  [<ffffffff81531c78>] page_fault+0x28/0x30

[3262904.817561] Mem-Info:

[3262904.819637] active_anon:115548 inactive_anon:12 isolated_anon:0

[3262904.819637]  active_file:21 inactive_file:24 isolated_file:0

[3262904.819637]  unevictable:0 dirty:0 writeback:0 unstable:0

[3262904.819637]  slab_reclaimable:2368 slab_unreclaimable:2817

[3262904.819637]  mapped:23 shmem:14 pagetables:941 bounce:0

[3262904.819637]  free:1140 free_pcp:114 free_cma:0

[3262904.842826] Node 0 active_anon:462192kB inactive_anon:48kB active_file:84kB inactive_file:96kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:92kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 56kB writeback_tmp:0kB unstable:0kB pages_scanned:70 all_unreclaimable? no

[3262904.878220] Node 0 DMA free:1908kB min:88kB low:108kB high:128kB active_anon:12596kB inactive_anon:0kB active_file:8kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB slab_reclaimable:348kB slab_unreclaimable:260kB kernel_stack:0kB pagetables:56kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB

[3262904.899661] lowmem_reserve[]: 0 456 456 456

[3262904.903498] Node 0 DMA32 free:2652kB min:2688kB low:3360kB high:4032kB active_anon:449604kB inactive_anon:48kB active_file:80kB inactive_file:92kB unevictable:0kB writepending:0kB present:507904kB managed:485284kB mlocked:0kB slab_reclaimable:9124kB slab_unreclaimable:11008kB kernel_stack:1632kB pagetables:3708kB bounce:0kB free_pcp:456kB local_pcp:456kB free_cma:0kB

[3262904.926588] lowmem_reserve[]: 0 0 0 0

[3262904.930060] Node 0 DMA: 129*4kB (UME) 44*8kB (UE) 19*16kB (UME) 17*32kB (UME) 3*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1908kB

[3262904.941530] Node 0 DMA32: 157*4kB (UH) 1*8kB (H) 0*16kB 1*32kB (H) 1*64kB (H) 1*128kB (H) 1*256kB (H) 1*512kB (H) 1*1024kB (H) 0*2048kB 0*4096kB = 2652kB

[3262904.952870] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB

[3262904.958920] 58 total pagecache pages

[3262904.961580] 0 pages in swap cache

[3262904.964495] Swap cache stats: add 0, delete 0, find 0/0

[3262904.968404] Free swap  = 0kB

[3262904.970837] Total swap = 0kB

[3262904.973286] 130973 pages RAM

[3262904.975687] 0 pages HighMem/MovableOnly

[3262904.978640] 5676 pages reserved

[3262904.981173] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name

[3262904.987484] [ 1526]     0  1526     2865      228      11       4        0         -1000 udevd

[3262904.993882] [ 1650]     0  1650     2864      212      10       3        0         -1000 udevd

[3262905.000317] [ 1849]     0  1849    27271       57      21       3        0             0 lvmetad

[3262905.006767] [ 1858]     0  1858     6785       48      17       3        0             0 lvmpolld

[3262905.013298] [ 2054]     0  2054     2340      125       9       4        0             0 dhclient

[3262905.019818] [ 2141]     0  2141     2340      120       8       3        0             0 dhclient

[3262905.026307] [ 2188]     0  2188    13238      107      26       3        0         -1000 auditd

[3262905.032466] [ 2209]     0  2209    61962      858      24       3        0             0 rsyslogd

[3262905.038569] [ 2231]     0  2231     1094       21       8       3        0             0 rngd

[3262905.044487] [ 2270]    29  2270     9968      202      25       3        0             0 rpc.statd

[3262905.051099] [ 2301]    81  2301     5448       60      16       3        0             0 dbus-daemon

[3262905.058013] [ 2336]     0  2336     1086       35       8       3        0             0 acpid

[3262905.064400] [ 2601]    32  2601     8831       92      22       3        0             0 rpcbind

[3262905.071039] [ 2657]     0  2657    20001      209      42       3        0         -1000 sshd

[3262905.077380] [ 2667]    38  2667     7326      143      20       3        0             0 ntpd

[3262905.083813] [ 2687]     0  2687    22262      429      44       3        0             0 sendmail

[3262905.090326] [ 2696]    51  2696    20127      371      41       3        0             0 sendmail

[3262905.096831] [ 2708]     0  2708    30402      150      16       3        0             0 crond

[3262905.103323] [ 2722]     0  2722     4787       41      13       3        0             0 atd

[3262905.109613] [ 4226]     0  4226    12676      136      26       3        0             0 vsftpd

[3262905.116056] [ 4232]     0  4232     2718       94      10       3        0         -1000 udevd

[3262905.122429] [ 4234]     0  4234   195931    93783     263       4        0             0 s3fs

[3262905.128189] [ 4282]     0  4282     1617       30       8       3        0             0 agetty

[3262905.134170] [ 4285]     0  4285     1080       25       8       3        0             0 mingetty

[3262905.140344] [ 4288]     0  4288     1080       25       8       3        0             0 mingetty

[3262905.147180] [ 4290]     0  4290     1080       24       7       3        0             0 mingetty

[3262905.153676] [ 4292]     0  4292     1080       25       8       3        0             0 mingetty

[3262905.160282] [ 4294]     0  4294     1080       24       8       3        0             0 mingetty

[3262905.166893] [ 4296]     0  4296     1080       25       8       3        0             0 mingetty

[3262905.173386] [13350]     0 13350     2902       59      10       3        0             0 update-motd

[3262905.180268] [13360]     0 13360     2902       47      11       3        0             0 70-available-up

[3262905.187749] [13361]     0 13361   103455    17602     161       4        0             0 yum

[3262905.194283] [13362]     0 13362     1858       19       9       3        0             0 grep

[3262905.200579] Out of memory: Kill process 4234 (s3fs) score 728 or sacrifice child

[3262905.206300] Killed process 4234 (s3fs) total-vm:783724kB, anon-rss:375132kB, file-rss:0kB, shmem-rss:0kB

/dev/fd/11: line 1: /sbin/plymouthd: No such file or directory
@ggtakec

This comment has been minimized.

Member

ggtakec commented May 2, 2018

@H6
Please accept my apologies for the late response.
I have confirmed that there is probably a problem with the memory in s3fs with NSS(libnss).
However, I am still checking out all versions.

Please tell me the exact version of s3fs you are using.
(It is the result of executing "s3fs --version")
In order to investigate and solve this problem, I also want to know your s3fs version.

Thanks in advance for your assistance.

@H6

This comment has been minimized.

H6 commented May 3, 2018

Hi @ggtakec , thank for the reply. I wrote the version already above. It's 1.8.3. Thanks.

@ggtakec

This comment has been minimized.

Member

ggtakec commented May 6, 2018

@H6 I'm looking for a memory leak using valgrind etc, but we have not found a certain one yet.
As a precaution measure, I merged # XXX, but this will not affect this problem.
Still, it will take time to solve this problem, but I will continue to investigate.

@ggtakec

This comment has been minimized.

Member

ggtakec commented May 7, 2018

@H6 Probably I found the cause of the memory leak.
(I hope #340 is as same as this cause)0
I will fix and test the code, and will merge it on git, so please wait a few days.
Thanks in advance for your assistance.

@ggtakec

This comment has been minimized.

Member

ggtakec commented May 13, 2018

@H6 I'm sorry about that I have not yet solved this Issue.
I tried to correct the contents that seems to be the leak cause, but just that correction is not enough, and it will take more time to investigate.
Please wait for a while.

@pankajbat

This comment has been minimized.

pankajbat commented May 23, 2018

We are having the same issue. Should we switch to an older version?

@ggtakec

This comment has been minimized.

Member

ggtakec commented May 27, 2018

@H6 I'm sorry for my late reply

I checked, reproduced and corrected the memory leak about v1.83.
And I merged #768 PR into the master branch.
If you can, it is helpful if you build the latest code of the master branch and try it.

If both s3fs and curl are linked with the NSS library, the possibility that the memory used inside CURL is increasing remains.
In this case, set YES to the environment variable NSS_SDB_USE_CACHE.
Here, I have not confirmed about this, but there are cases where memory usage increases in CURL+NSS.
In order to avoid this memory increasing, try "export NSS_SDB_USE_CACHE=YES".

@pankajbat

This comment has been minimized.

pankajbat commented Jun 17, 2018

@silviomoreto

This comment has been minimized.

silviomoreto commented Jun 20, 2018

Hi @ggtakec ! Thanks fixing that! When are you planning to create a release containing the fix?

@ggtakec

This comment has been minimized.

Member

ggtakec commented Jul 8, 2018

I'm sorry for my late reply.
Just now, I release v1.84 with tagging, then it will create new package soon.
After packaging new version, please try to use it.
Thanks in advance for your assistance.

@ggtakec ggtakec closed this Jul 8, 2018

@mattzuba

This comment has been minimized.

mattzuba commented Oct 8, 2018

@ggtakec - where should one specify the NSS_SDB_USE_CACHE environment variable? I'm encountering this memory leak issue on 1.84 without having this variable set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment