Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor/linux: report large files when exiting with ENOSPC error #2127

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

KumanekoSakura
Copy link

There are reports which terminated due to ENOSPC error. It is possible that something is failing to remove files created during the testing.
To understand what is happening, start from reporting filesystem statistics and files which are larger than 1MB and modified within 24hours.

@codecov
Copy link

codecov bot commented Sep 21, 2020

Codecov Report

Merging #2127 into master will not change coverage.
The diff coverage is n/a.

@dvyukov
Copy link
Collaborator

dvyukov commented Sep 27, 2020

I've got one sample running this locally:

2020/09/27 10:46:51 executor 2 failed 11 times:
executor 2: exit status 67
failed to mkdir (errno 28)
Current directory is /syzkaller-testdir806427050/syzkaller.0T1z1w
  Free blocks: 4096/499668
  Block size: 4096
  Filesystem type: ef53
  Free inodes: 115710/131072
Scanning recently written large files under /
   1726896 /syz-executor.1
   1726896 /syz-executor.5
   1726896 /syz-executor.2
   1726896 /syz-executor.4
   1726896 /syz-executor.0
   1726896 /syz-executor
   1726896 /syz-executor.3
  22552576 /syz-fuzzer
   1069056 /var/log/kern.log
   1069056 /var/log/syslog
loop exited with status 67

The syz-executor/fuzzer are intentionally there. Seems there are no large files, but the ext4 filesystem is somehow out of free blocks (I suspect that 4096 is some kind of magic number and f_bavail is potentially 0).

@dvyukov
Copy link
Collaborator

dvyukov commented Sep 27, 2020

Looking at the "lost connection" crashes on the dashboard, it seems that ENOSPC happens with both sandbox=none and sandbox=namespace. So somehow it escapes the namespace sandbox...

What may give a definitive answer is if snapshot the disk image. Then we could examine it with fsck and just look around.
This should be doable using QMP protocol support from #2094:
6de752f

We could add snapshot bool argument to vm Diagnose method. Then if output contains 'errno 28' pass snapshot=true. Then in the qemu implementation, save disk snapshot into some temp file and print the name of the file.

I hope snapshot_blkdev command will do what we need, but I am not sure.
https://wiki.qemu.org/Features/Snapshots
https://github.com/qemu/qemu/blob/master/hmp-commands.hx#L1181

There are reports which terminated due to ENOSPC error, but it seems that all
temporary files are deleted after the test. Let's check from /proc/*/fd/* side
in case somebody is using large temporary files created by open(O_TMPFILE).
@KumanekoSakura
Copy link
Author

Thank you for testing. Hmm, at least, it seems that fielsystem is not shrinking.

Since Linux has O_TMPFILE, there might be nameless large files.
If that is the cause, removing temporary files alone is not sufficient.
We need to wait for termination of processes opening nameless large files.

Before waiting for implementing snapshot, could you retry with 9c2e8a7 ?

By the way,

fprintf(stderr, "Scanning large files from /proc//fd/\n");

triggered

Don't use /* */ block comments. Use // line comments instead

failure, and I workarounded using

fprintf(stderr, "Scanning large files from /proc/\052/fd/\052\n");

. Since I'm running "make presubmit" on a VM with 4CPUs / 4GB RAM (which of course
can't complete due to OOM), I appreciate if checks on source code level are performed
as early as possible. Maybe implementing a make target for doing minimal checks on
small resource environment?

@dvyukov
Copy link
Collaborator

dvyukov commented Sep 29, 2020

For the record here are 3 more samples (they don't seem to happen all that often, so I better save them here).
In all cases we have:

  Free blocks: 4096/499668

No significant amount of large file. However, this is interesting:

   1572864 /selinux/file0

this means that the fuzzer escapes the sandbox and creates files in /selinux.

2020/09/29 01:58:20 executor 1 failed 11 times:
executor 1: exit status 67
failed to mkdir (errno 28)
Current directory is /syzkaller-testdir163703323/syzkaller.J4Bkc5
  Free blocks: 4096/499668
  Block size: 4096
  Filesystem type: ef53
  Free inodes: 115686/131072
Scanning recently written large files under /
   1726896 /syz-executor.1
   1726896 /syz-executor.5
   1726896 /syz-executor.2
   1726896 /syz-executor.4
   1726896 /syz-executor.0
   1726896 /syz-executor
   1726896 /syz-executor.3
  22552576 /syz-fuzzer
   1114160 /selinux/cpu.stat
   1572864 /selinux/file0
         0 /selinux/bus
loop exited with status 67

2020/09/29 01:59:08 executor 4 failed 11 times:
executor 4: exit status 67
failed to mkdir (errno 28)
Current directory is /syzkaller-testdir409885286/syzkaller.JQB1rD
  Free blocks: 4096/499668
  Block size: 4096
  Filesystem type: ef53
  Free inodes: 115680/131072
Scanning recently written large files under /
   1726896 /syz-executor.1
   1726896 /syz-executor.5
   1726896 /syz-executor.2
   1726896 /syz-executor.4
   1726896 /syz-executor.0
   1726896 /syz-executor
   1726896 /syz-executor.3
  22552576 /syz-fuzzer
         0 /selinux/bus
loop exited with status 67

2020/09/29 01:59:35 executor 0 failed 11 times:
executor 0: exit status 67
failed to mkdir (errno 28)
Current directory is /syzkaller-testdir322056815/syzkaller.ALulZk
  Free blocks: 4096/499668
  Block size: 4096
  Filesystem type: ef53
  Free inodes: 115697/131072
Scanning recently written large files under /
   1726896 /syz-executor.1
   1726896 /syz-executor.5
   1726896 /syz-executor.2
   1726896 /syz-executor.4
   1726896 /syz-executor.0
   1726896 /syz-executor
   1726896 /syz-executor.3
  22552576 /syz-fuzzer
   1703936 /selinux/cgroup.controllers
         0 /selinux/bus
loop exited with status 67

@KumanekoSakura
Copy link
Author

Indeed, it suggests that umount("/selinux/") and chdir("/selinux/") are issued.
Also,

0 /selinux/bus

is interesting as well. Since files larger than occupying more than "2048 blocks" * "512 bytes/block" are reported,
a regular file with apparent size == 0 means fallocate(FALLOC_FL_KEEP_SIZE) caused that large file.

@KumanekoSakura
Copy link
Author

How is /proc/ for fuzzer processes?
Unless /proc/ is once unmounted and mounted again, /proc/ can be linked to PID directories outside of PID namespace.
If /proc/1/ is linked to global init process, /proc/1/root/ can be used for accessing outside of mount namespace.

#define _GNU_SOURCE
#include <unistd.h>
#include <sys/mount.h>
#include <sys/wait.h>
#include <sched.h>

int main(int argc, char *argv[])
{
	if (unshare(CLONE_NEWNS | CLONE_NEWPID) ||
	    mount(NULL, "/", NULL, MS_REC|MS_PRIVATE, NULL))
		return 1;
	if (fork() == 0) {
		if (argc != 1 &&
		    (umount2("/proc/", MNT_DETACH) || mount("/proc/", "/proc/", "proc", 0, NULL)))
			_exit(1);
		execl("/bin/ls", "/bin/ls", "-al", "/proc/1/exe", NULL);
		_exit(1);
	}
	wait(NULL);
	return 0;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants