Composes sometimes hitting `error: fstatat(<checksum>.filez): Cannot allocate memory` or `error: openat(<checksum>.filez): Invalid argument` #594

cgwalters · 2021-07-26T17:34:50Z

CI is failing right now on this very reliably.

Some sort of regression? Somehow related to the size of the content? I think we're missing some error prefixing in rpm-ostree here too.

We're getting a weird OOM issue in RHCOS builds: openshift/os#594 which looks like: `fstatat(92/4682a3540a7d5f834f5159c024f412ff528600f8a683116ce99a1832251b67.filez): Cannot allocate memory` Reading some code here I noticed that the "open directory" call could follow symlinks when it shouldn't. Also explicitly only check for the size of regular files. I'm pretty sure this is a no-op, but let's harden the code.

We're getting a weird OOM issue in RHCOS builds: openshift/os#594 which looks like: `fstatat(92/4682a3540a7d5f834f5159c024f412ff528600f8a683116ce99a1832251b67.filez): Cannot allocate memory` Reading some code here I noticed that the "open directory" call could follow symlinks when it shouldn't. Let's just RIIR and make this code more explicit - e.g. we ignore symlinks. I'm pretty sure this is a no-op, but let's harden the code.

This ends up doing some nontrivial work in e.g. `db diff` and I want to see if an error trace is happening from there. xref openshift/os#594

Also add a missing `:` in the editor code. xref openshift/os#594

miabbott · 2021-07-27T18:53:16Z

~~Not sure this is entirely related, but on a local build I'm hitting the following after cosa clean && cosa fetch && cosa build~~

Running post scripts... done
Checking out ostree layers... done
error: Hardlinking a3/aeb8c094807bf6d275fb9619d2449dba92a54e42cd253e9005cbced8fa23f2.file to statoverride: File exists

Edit: not related, see #594 (comment)

cgwalters · 2021-07-27T18:53:22Z

OK to debug this I pushed openshift/release#20634

cgwalters · 2021-07-27T18:53:54Z

error: Hardlinking a3/aeb8c094807bf6d275fb9619d2449dba92a54e42cd253e9005cbced8fa23f2.file to statoverride: File exists

@miabbott I think you have an old cosa missing coreos/coreos-assembler#2293 ?

miabbott · 2021-07-27T18:54:42Z

error: Hardlinking a3/aeb8c094807bf6d275fb9619d2449dba92a54e42cd253e9005cbced8fa23f2.file to statoverride: File exists

@miabbott I think you have an old cosa missing coreos/coreos-assembler#2293 ?

Probably...doing podman pull is getting a new version...will retest.

miabbott · 2021-07-27T19:00:12Z

Probably...doing podman pull is getting a new version...will retest.

Yup, got further this time. Sorry for the noise!

cgwalters · 2021-07-27T19:50:12Z

Slight progress:
error: Writing content object: fstatat(d3/b8efc84c44f971d6e9b8e63b296056ff25cb38da2d0e8cb4c64265c79e0518.filez): Cannot allocate memory

That helps. It's probably the pull-local path. But it's just so exceedingly bizarre that we get ENOMEM consistently from fstatat particularly instead of being OOM killed.

Will add more debugging.

To help debug openshift/os#594

Prashanth684 · 2021-07-27T20:01:40Z

when doing a build on Power with 4.9 cosa and latest redhat-coreos , i see:

+ '[' xfs = ext4verity ']'
+ ostree pull-local --repo /tmp/rootfs/ostree/repo /srv/tmp/repo 0ba9e4b9a79fe43862d7209d8d49c788ee098bf0e5676c0af116e3bc9a4bf41a
error: openat(82/65f5fa8c2817061b0ea3ff6eac93fc273cab44dcd9db25ee2eb2d798c2fd28.filez): Cannot allocate memory

is it related ?

cgwalters · 2021-07-27T20:03:22Z

Yes, definitely. Can you reproduce this reliably? Can you reproduce outside of supermin?

Prashanth684 · 2021-07-27T20:07:17Z

i can reproduce reliably but as of now only through running cosa build, so inside supermin. i will try to.

cgwalters · 2021-07-27T20:09:12Z

Can you try e.g. wedging gdb into supermin, and use e.g. cosa supermin-shell to debug this interactively? In particular the output of t a a bt and the output of /proc/<pid>/fd as well as the output of dmesg.

Prashanth684 · 2021-07-27T20:46:10Z

after running pull-local inside supermin-shell i see this:

Writing objects: 30                                                             [ 1151.216838] SLUB: Unable to allocate memory on node -1, gfp=0xc40(GFP_NOFS)c_t:s0 is not valid (left unmapped).
[ 1151.216831] ostree: page allocation failure: order:7, mode:0x40c40(GFP_NOFS|__GFP_COMP), nodemask=(null)
[ 1151.216849]   cache: 9p-fcall-cache, object size: 8192000, buffer size: 8192000, default order: 7, min order: 7
[ 1151.216852]   node 0: slabs: 42, objs: 42, free: 0
[ 1151.226512] ,cpuset=/,mems_allowed=0
[ 1151.236230] CPU: 8 PID: 350 Comm: ostree Not tainted 5.13.4-200.fc34.ppc64le #1
[ 1151.237306] Call Trace:
[ 1151.237533] [c00000000a0d7400] [c000000000a049bc] dump_stack+0xc0/0x104 (unreliable)
[ 1151.239149] [c00000000a0d7450] [c00000000045efc8] warn_alloc+0x108/0x1b0
[ 1151.239509] [c00000000a0d74f0] [c00000000045fd8c] __alloc_pages_slowpath.constprop.0+0xd1c/0xd30
[ 1151.241110] [c00000000a0d76b0] [c0000000004600dc] __alloc_pages+0x33c/0x3c0
[ 1151.241797] [c00000000a0d7730] [c00000000049430c] alloc_pages+0xcc/0x190
[ 1151.242065] [c00000000a0d7780] [c0000000004a4ae4] allocate_slab+0x6b4/0x740
[ 1151.243043] [c00000000a0d77e0] [c0000000004a8560] ___slab_alloc+0x5d0/0x850
[ 1151.243355] [c00000000a0d78c0] [c0000000004aa260] kmem_cache_alloc+0x230/0x480
[ 1151.243417] [c00000000a0d7930] [c0080000003f1594] p9_fcall_init+0x9c/0xc0 [9pnet]
[ 1151.243876] [c00000000a0d7970] [c0080000003f224c] p9_client_prepare_req.part.0+0x104/0x570 [9pnet]
[ 1151.244027] [c00000000a0d79f0] [c0080000003f275c] p9_client_rpc+0xa4/0x7a0 [9pnet]
[ 1151.244318] [c00000000a0d7ae0] [c0080000003f3680] p9_client_clunk+0xb8/0x1e0 [9pnet]
[ 1151.244462] [c00000000a0d7b70] [c0080000006f74d0] v9fs_dentry_release+0x48/0x80 [9p]
[ 1151.244538] [c00000000a0d7ba0] [c00000000052cb68] __dentry_kill+0x1a8/0x300
[ 1151.244593] [c00000000a0d7be0] [c00000000052e234] dput+0x2b4/0x640
[ 1151.245967] [c00000000a0d7c50] [c00000000051627c] path_put+0x2c/0x50
[ 1151.246203] [c00000000a0d7c80] [c00000000050dd14] vfs_statx+0x114/0x1a0
[ 1151.246319] [c00000000a0d7cf0] [c00000000050e010] __do_sys_newfstatat+0x30/0x60
[ 1151.246411] [c00000000a0d7db0] [c00000000002c22c] system_call_exception+0x10c/0x2c0
[ 1151.246500] [c00000000a0d7e10] [c00000000000d45c] system_call_common+0xec/0x278
[ 1151.246557] --- interrupt: c00 at 0x7fffaf3dec14
[ 1151.246596] NIP:  00007fffaf3dec14 LR: 00007fffafbe45d8 CTR: 0000000000000000
[ 1151.246651] REGS: c00000000a0d7e80 TRAP: 0c00   Not tainted  (5.13.4-200.fc34.ppc64le)
[ 1151.246706] MSR:  800000000200d033 <SF,VEC,EE,PR,ME,IR,DR,RI,LE>  CR: 24248248  XER: 00000000
[ 1151.246798] IRQMASK: 0 
[ 1151.246798] GPR00: 0000000000000123 00007fffdcd68310 00007fffaf4d7000 000000000000000b 
[ 1151.246798] GPR04: 00007fffdcd684a8 00007fffdcd683f8 0000000000000100 0000000000000000 
[ 1151.246798] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[ 1151.246798] GPR12: 0000000000000000 00007fffadd8f8d0 00007fffdcd689b8 00007fffdcd689c0 
[ 1151.246798] GPR16: 00007fffaf85067c 0000000000000010 00007fffdcd683f8 000000000000000b 
[ 1151.246798] GPR20: 00007fffdcd684a8 00007fffafc7a058 00007fffdcd68488 00007fffdcd68670 
[ 1151.246798] GPR24: 0000000000000000 00007fffdcd687a8 000001000d6702e0 0000000000000002 
[ 1151.246798] GPR28: 00007fffdcd687a0 000001000d6700c0 00007fffdcd68370 00007fffdcd68370 
[ 1151.247256] NIP [00007fffaf3dec14] 0x7fffaf3dec14
[ 1151.247294] LR [00007fffafbe45d8] 0x7fffafbe45d8
[ 1151.247331] --- interrupt: c00
[ 1151.247368] Mem-Info:
[ 1151.247391] active_anon:1 inactive_anon:411 isolated_anon:0
[ 1151.247391]  active_file:1201 inactive_file:49285 isolated_file:0
[ 1151.247391]  unevictable:0 dirty:3483 writeback:10
[ 1151.247391]  slab_reclaimable:2266 slab_unreclaimable:7518
[ 1151.247391]  mapped:342 shmem:0 pagetables:12 bounce:0
[ 1151.247391]  free:989 free_pcp:39 free_cma:0
[ 1151.247601] Node 0 active_anon:64kB inactive_anon:26304kB active_file:76864kB inactive_file:3154240kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:21888kB dirty:222912kB writeback:4544kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:2912kB pagetables:768kB all_unreclaimable? no
[ 1151.247799] Node 0 Normal free:63296kB min:8000kB low:12032kB high:16064kB reserved_highatomic:0KB active_anon:64kB inactive_anon:26048kB active_file:76864kB inactive_file:3169792kB unevictable:0kB writepending:237696kB present:4194304kB managed:4059136kB mlocked:0kB bounce:0kB free_pcp:2496kB local_pcp:0kB free_cma:0kB
[ 1151.247990] lowmem_reserve[]: 0 0 0
[ 1151.248020] Node 0 Normal: 52*64kB (UME) 55*128kB (ME) 35*256kB (ME) 30*512kB (M) 14*1024kB (M) 3*2048kB (M) 2*4096kB (M) 0*8192kB 0*16384kB = 63360kB
[ 1151.248126] 50553 total pagecache pages
[ 1151.248155] 0 pages in swap cache
[ 1151.248184] Swap cache stats: add 0, delete 0, find 0/0
[ 1151.248221] Free swap  = 0kB
[ 1151.248249] Total swap = 0kB
[ 1151.248276] 65536 pages RAM
[ 1151.248297] 0 pages HighMem/MovableOnly
[ 1151.248802] 2112 pages reserved
[ 1151.248865] 0 pages cma reserved
[ 1151.248926] 0 pages hwpoisoned
[ 1151.248955] SLUB: Unable to allocate memory on node -1, gfp=0xc40(GFP_NOFS)
[ 1151.249034]   cache: 9p-fcall-cache, object size: 8192000, buffer size: 8192000, default order: 7, min order: 7
[ 1151.249568]   node 0: slabs: 42, objs: 42, free: 4

i'll try to attach gdb

cgwalters · 2021-07-27T21:01:19Z

Ooh. I bet this is a new bug in the Fedora kernel - something like a memory leak in 9p. That would match the symptoms. Does downgrading the kernel fix it?

cgwalters · 2021-07-27T21:40:47Z

I am so sure this was a kernel memory leak, but downgrading the kernel didn't fix it. That's odd.

We bumped the supermin VM memory (on power from 4096 to 8192) and that did work.

So...looking over ostree changes, I wonder if somehow the repo locking changes are triggering a 9p bug. If downgrading ostree to v2021.2 fixes it that would be a strong indication.

cgwalters · 2021-07-27T21:53:23Z

Another thing I find odd here is it certainly looks like dput() to unref a dentry is ending up allocating more memory - presumably to send an RPC to the server?

Prashanth684 · 2021-07-28T00:01:23Z

so as Colin suspected this only happens for RHCOS and not FCOS. FCOS builds fine and this might just be because RHCOS has more content than FCOS. In which case maybe increasing memory for rhcos is the only option?

This is a forward cherry-pick from #2699. The RHCOS 4.10 Power pipeline is hitting the second variant in openshift/os#594 (comment). Let's try to bump the memory to see if it helps.

jlebon · 2022-06-22T20:05:44Z

We saw this at least twice in some rpm-ostree upstream CI runs, so it looks like this is also affecting x86_64 FCOS now.

Edit for more info: the failure in this case is variant 1 of #594 (comment):

Committing...done
Metadata Total: 8961
Metadata Written: 3032
Content Total: 6481
Content Written: 1468
Content Cache Hits: 16431
Content Bytes Written: 140048957
error: Copying from build repo into target repo: fstatat(22/fdb48008b453c4850e985817a1edf99b9b05a672408b806a716fa9c237f7b0.filez): Cannot allocate memory

cgwalters · 2022-06-22T21:58:43Z

Yeah I think there's some sort of further regression in 9p going on here.

Something has changed recently which causes us to hit the ENOMEM issue more easily now: openshift/os#594 (comment) Mid-term, we could rework the compose so that only the OCI archive is pulled through 9p rather than a full `pull-local`. Long-term, the fix is to stop using 9p. But for now to unblock CI, let's just bump the VM memory to 3G which should help.

jlebon · 2022-06-23T12:57:52Z

We have reports of this happening in more places now. I can reproduce it locally with

diff --git a/src/cmdlib.sh b/src/cmdlib.sh
index 2fbd40eaa..f9cd2911c 100755
--- a/src/cmdlib.sh
+++ b/src/cmdlib.sh
@@ -651,7 +651,7 @@ EOF

     # There seems to be some false positives in shellcheck
     # https://github.com/koalaman/shellcheck/issues/2217
-    memory_default=2048
+    memory_default=1024
     # shellcheck disable=2031
     case $arch in
     # Power 8 page faults with 2G of memory in rpm-ostree

I think short-term let's just bump the memory again: coreos/coreos-assembler#2940.

jlebon · 2022-06-23T12:59:58Z

Remember BTW to anyone hitting this, you can also bump the memory using the COSA_SUPERMIN_MEMORY env var.

Something has changed recently which causes us to hit the ENOMEM issue more easily now: openshift/os#594 (comment) Mid-term, we could rework the compose so that only the OCI archive is pulled through 9p rather than a full `pull-local`. Long-term, the fix is to stop using 9p. But for now to unblock CI, let's just bump the VM memory to 4G which should help.

Instead of having rpm-ostree effectively doing a `pull-local` under the hood over 9p, change things so we compose in a local repo, export the commit to an archive and only copy *that* over 9p. This should greatly help with pipelines hitting ENOMEM due to transferring many small files over 9p: openshift/os#594 An initial approach exported to OCI archive instead, but encapsulating and unencapsulating are more expensive operations. Unfortunately, we still need to unpack into `tmp/repo` given that many follow-up commands expect the commit to be available in `tmp/repo`. If we can drop that assumption (and get rid of `tmp/repo` entirely), we could refactor things further so that `compose.sh` creates the final chunked OCI artifact upfront. In local testing, this adds about 30s to `cosa build`. We still compress just once, and we get hardlinks pulling out of the tarball.

jlebon · 2022-06-24T21:18:50Z

I believe coreos/coreos-assembler#2946 will fix this in the rpm-ostree compose path.

Instead of having rpm-ostree effectively doing a `pull-local` under the hood over 9p, change things so we compose in a local repo, export the commit to an archive and only copy *that* over 9p. This should greatly help with pipelines hitting ENOMEM due to transferring many small files over 9p: openshift/os#594 An initial approach exported to OCI archive instead, but encapsulating and unencapsulating are more expensive operations. Unfortunately, we still need to unpack into `tmp/repo` given that many follow-up commands expect the commit to be available in `tmp/repo`. If we can drop that assumption (and get rid of `tmp/repo` entirely), we could refactor things further so that `compose.sh` creates the final chunked OCI artifact upfront. In local testing, this adds about 30s to `cosa build`. We still compress just once, and we get hardlinks pulling out of the tarball.

Instead of having rpm-ostree effectively doing a `pull-local` under the hood over 9p, change things so we compose in a local repo, export the commit to an archive and only copy *that* over 9p. This should greatly help with pipelines hitting ENOMEM due to transferring many small files over 9p: openshift/os#594 An initial approach exported to OCI archive instead, but encapsulating and unencapsulating are more expensive operations. Unfortunately, we still need to unpack into `tmp/repo` given that many follow-up commands expect the commit to be available in `tmp/repo`. If we can drop that assumption (and get rid of `tmp/repo` entirely), we could refactor things further so that `compose.sh` creates the final chunked OCI artifact upfront. In local testing, this adds about 30s to `cosa build`. We still compress just once, and we get hardlinks pulling out of the tarball. (cherry picked from commit 269aa38)

This should fix openshift/os#594 Basically I think 9p has a bug where under memory pressure, having it *free* an inode requires allocation, which can fail. This works around that bug by pulling the ostree-container archive instead, which is a single big file. Note that this code path does *not* change the semantics at all for the generated disk. The information about the pulled container is discarded and lost. Actually making use of the container bits natively is the `deploy-via-container` image option which is still experimental, but will be used when we progress the ostree native containers work. (cherry picked from commit 0bba897)

This should fix openshift/os#594 Basically I think 9p has a bug where under memory pressure, having it *free* an inode requires allocation, which can fail. This works around that bug by pulling the ostree-container archive instead, which is a single big file. Note that this code path does *not* change the semantics at all for the generated disk. The information about the pulled container is discarded and lost. Actually making use of the container bits natively is the `deploy-via-container` image option which is still experimental, but will be used when we progress the ostree native containers work. (cherry picked from commit 0bba897) jlebon: Minor conflict resolution. When that commit was made, cosa was on f35, which matches the Fedora version this branch is on.

cgwalters self-assigned this Jul 26, 2021

cgwalters changed the title ~~error: fstatat(92/4682a3540a7d5f834f5159c024f412ff528600f8a683116ce99a1832251b67.filez): Cannot allocate memory~~ error: fstatat(<checksum>.filez): Cannot allocate memory Jul 26, 2021

cgwalters mentioned this issue Jul 26, 2021

postprocess: Oxidize directory size counting coreos/rpm-ostree#3027

Merged

cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue Jul 26, 2021

compose: Add error prefixing when writing compose JSON

bef4043

This ends up doing some nontrivial work in e.g. `db diff` and I want to see if an error trace is happening from there. xref openshift/os#594

cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue Jul 26, 2021

db: Prefix error when we fail to load the rpmdb

b36cb42

Also add a missing `:` in the editor code. xref openshift/os#594

This was referenced Jul 26, 2021

Two error prefixing patches coreos/rpm-ostree#3028

Merged

DNM: openshift/os: Test updated cosa openshift/release#20634

Closed

cgwalters mentioned this issue Jul 27, 2021

kola-denylist: disable crio.base tests for all arches #595

Merged

cgwalters added the jira label Jul 27, 2021

cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue Jul 27, 2021

compose: Add error prefixing for local pull

d5b9653

To help debug openshift/os#594

cgwalters mentioned this issue Jul 27, 2021

compose: Add error prefixing for local pull coreos/rpm-ostree#3031

Merged

jlebon mentioned this issue Jun 23, 2022

cmdlib: bump supermin VM memory to 3G coreos/coreos-assembler#2940

Closed

jlebon mentioned this issue Jun 24, 2022

Don't pull-local over 9p during rpm-ostree compose coreos/coreos-assembler#2946

Merged

mike-nguyen mentioned this issue Jul 26, 2022

[rhcos-4.10] build.sh: freeze coreos-installer to v0.12.0 coreos/coreos-assembler#3009

Merged

jlebon mentioned this issue Oct 21, 2022

cmd-push-container-manifest: change image key schema coreos/coreos-assembler#3129

Merged

jlebon mentioned this issue Dec 6, 2022

[rhcos-4.10] create_disk: Pull ostree-container over 9p, instead of pull-local coreos/coreos-assembler#3265

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Composes sometimes hitting `error: fstatat(<checksum>.filez): Cannot allocate memory` or `error: openat(<checksum>.filez): Invalid argument` #594

Composes sometimes hitting `error: fstatat(<checksum>.filez): Cannot allocate memory` or `error: openat(<checksum>.filez): Invalid argument` #594

cgwalters commented Jul 26, 2021

miabbott commented Jul 27, 2021 •

edited

Loading

cgwalters commented Jul 27, 2021

cgwalters commented Jul 27, 2021

miabbott commented Jul 27, 2021

miabbott commented Jul 27, 2021 •

edited

Loading

cgwalters commented Jul 27, 2021

Prashanth684 commented Jul 27, 2021

cgwalters commented Jul 27, 2021

Prashanth684 commented Jul 27, 2021

cgwalters commented Jul 27, 2021

Prashanth684 commented Jul 27, 2021

cgwalters commented Jul 27, 2021 •

edited

Loading

cgwalters commented Jul 27, 2021

cgwalters commented Jul 27, 2021

Prashanth684 commented Jul 28, 2021 •

edited

Loading

jlebon commented Jun 22, 2022 •

edited

Loading

cgwalters commented Jun 22, 2022

jlebon commented Jun 23, 2022

jlebon commented Jun 23, 2022

jlebon commented Jun 24, 2022

Composes sometimes hitting error: fstatat(<checksum>.filez): Cannot allocate memory or error: openat(<checksum>.filez): Invalid argument #594

Composes sometimes hitting error: fstatat(<checksum>.filez): Cannot allocate memory or error: openat(<checksum>.filez): Invalid argument #594

Comments

cgwalters commented Jul 26, 2021

miabbott commented Jul 27, 2021 • edited Loading

cgwalters commented Jul 27, 2021

cgwalters commented Jul 27, 2021

miabbott commented Jul 27, 2021

miabbott commented Jul 27, 2021 • edited Loading

cgwalters commented Jul 27, 2021

Prashanth684 commented Jul 27, 2021

cgwalters commented Jul 27, 2021

Prashanth684 commented Jul 27, 2021

cgwalters commented Jul 27, 2021

Prashanth684 commented Jul 27, 2021

cgwalters commented Jul 27, 2021 • edited Loading

cgwalters commented Jul 27, 2021

cgwalters commented Jul 27, 2021

Prashanth684 commented Jul 28, 2021 • edited Loading

jlebon commented Jun 22, 2022 • edited Loading

cgwalters commented Jun 22, 2022

jlebon commented Jun 23, 2022

jlebon commented Jun 23, 2022

jlebon commented Jun 24, 2022

Composes sometimes hitting `error: fstatat(<checksum>.filez): Cannot allocate memory` or `error: openat(<checksum>.filez): Invalid argument` #594

Composes sometimes hitting `error: fstatat(<checksum>.filez): Cannot allocate memory` or `error: openat(<checksum>.filez): Invalid argument` #594

miabbott commented Jul 27, 2021 •

edited

Loading

miabbott commented Jul 27, 2021 •

edited

Loading

cgwalters commented Jul 27, 2021 •

edited

Loading

Prashanth684 commented Jul 28, 2021 •

edited

Loading

jlebon commented Jun 22, 2022 •

edited

Loading