Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EBADF during static delta apply #421

Closed
cgwalters opened this issue Aug 1, 2016 · 11 comments
Closed

EBADF during static delta apply #421

cgwalters opened this issue Aug 1, 2016 · 11 comments
Labels

Comments

@cgwalters
Copy link
Member

https://mail.gnome.org/archives/ostree-list/2016-July/msg00020.html

openat(3, "refs/remotes", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 15
newfstatat(15, "qt-os", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(15, "qt-os", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 19
close(15)                               = 0
newfstatat(19, "linux", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(19, "linux", O_WRONLY|O_DIRECTORY|O_CLOEXEC|0x400000) = 15
fallocate(15, 0, 0, 65)                 = -1 EOPNOTSUPP (Operation not supported)
fstat(15, {st_mode=S_IFREG, st_size=0, ...}) = 0
fstatfs(15, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=237798, f_bfree=44086, f_bavail=31798, f_files=61440, f_ffree=37789, f_fsid={297229479, -412021672}, f_namelen=255, f_frsize=4096}) = 0
pwrite(15, "\0", 1, 64)                 = 1
write(15, "6962c73eeebeedccb7fe05cdc57d5309"..., 65) = 65
newfstatat(19, "linux/qt", 0x7ffe6ea4b040, AT_SYMLINK_NOFOLLOW) = -1 EBADF (Bad file descriptor)
close(15)                               = 0
close(19)                               = 0
openat(8, ".", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 15
fstat(15, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
fcntl(15, F_GETFL)                      = 0x18800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY)
fcntl(15, F_SETFD, FD_CLOEXEC)          = 0
getdents(15, /* 8 entries */, 32768)    = 400
newfstatat(15, "staging-c1e8de4e-d754-45db-9216-18b36ecf1451-YE7LLY-lock", {st_mode=S_IFREG|0600, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(15, "staging-c1e8de4e-d754-45db-9216-18b36ecf1451-YE7LLY-lock-lock", O_RDWR|O_CREAT|O_NOCTTY|O_NOFOLLOW|O_CLOEXEC, 0600) = 19
fcntl(19, 0x25 /* F_??? */, 0x7ffe6ea4b140) = 0
fstat(19, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
unlinkat(15, "staging-c1e8de4e-d754-45db-9216-18b36ecf1451-YE7LLY-lock-lock", 0) = 0
close(19)                               = 0
newfstatat(15, "staging-c1e8de4e-d754-45db-9216-18b36ecf1451-9ZLGLY-lock", {st_mode=S_IFREG|0600, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(15, "staging-c1e8de4e-d754-45db-9216-18b36ecf1451-9ZLGLY-lock-lock", O_RDWR|O_CREAT|O_NOCTTY|O_NOFOLLOW|O_CLOEXEC, 0600) = 19
fcntl(19, 0x25 /* F_??? */, 0x7ffe6ea4b140) = 0
fstat(19, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
unlinkat(15, "staging-c1e8de4e-d754-45db-9216-18b36ecf1451-9ZLGLY-lock-lock", 0) = 0
close(19)                               = 0
newfstatat(15, "staging-c1e8de4e-d754-45db-9216-18b36ecf1451-AGWYLY-lock", {st_mode=S_IFREG|0600, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(15, "staging-c1e8de4e-d754-45db-9216-18b36ecf1451-AGWYLY-lock-lock", O_RDWR|O_CREAT|O_NOCTTY|O_NOFOLLOW|O_CLOEXEC, 0600) = 19
fcntl(19, 0x25 /* F_??? */, 0x7ffe6ea4b140) = 0
fstat(19, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
unlinkat(15, "staging-c1e8de4e-d754-45db-9216-18b36ecf1451-AGWYLY-lock-lock", 0) = 0
close(19)                               = 0
newfstatat(15, "fetcher-341DLY-lock", {st_mode=S_IFREG|0600, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(15, "fetcher-341DLY-lock-lock", O_RDWR|O_CREAT|O_NOCTTY|O_NOFOLLOW|O_CLOEXEC, 0600) = 19
fcntl(19, 0x25 /* F_??? */, 0x7ffe6ea4b140) = 0
fstat(19, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
unlinkat(15, "fetcher-341DLY-lock-lock", 0) = 0
close(19)                               = 0
newfstatat(15, "cache", {st_mode=S_IFDIR|0775, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(15, "fetcher-341DLY", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(15, "fetcher-341DLY-lock", O_RDWR|O_CREAT|O_NOCTTY|O_NOFOLLOW|O_CLOEXEC, 0600) = 19
fcntl(19, 0x25 /* F_??? */, 0x7ffe6ea4b140) = -1 EAGAIN (Resource temporarily unavailable)
close(19)                               = 0
getdents(15, /* 0 entries */, 32768)    = 0
close(15)                               = 0
close(14)                               = 0
unlinkat(8, "staging-c1e8de4e-d754-45db-9216-18b36ecf1451-YE7LLY-lock", 0) = 0
close(16)                               = 0
write(13, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x65e460, FUTEX_WAKE, 2147483647) = 0
futex(0x65e450, FUTEX_WAKE, 1)          = 1
close(13)                               = 0
close(17)                               = 0
unlinkat(15, "fetcher-341DLY-lock", 0)  = -1 EBADF (Bad file descriptor)
close(18)                               = 0
write(1, "\n", 1
)                       = 1
close(3)                                = 0
close(8)                                = 0
close(9)                                = 0
close(7)                                = 0
unlinkat(5, "ostree/lock", 0)           = 0
close(6)                                = 0
close(5)                                = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(2, "\33[31m\33[1merror: \33[22m\33[0mBad fil"..., 45error: Bad file descriptor
) = 45
exit_group(1)                           = ?
+++ exited with 1 +++
@cgwalters cgwalters added the bug label Aug 1, 2016
@gatispaeglis
Copy link
Contributor

gatispaeglis commented Aug 2, 2016

I should have mentioned that this is also reproducible when not using static deltas. Still investigating, I will provide more info if I will find something. Compiler is GCC 4.9.2, Kernel version is 3.19.5, file system: ext3. And the stack trace in this case is:

1 write_checksum_file_at ostree-repo-refs.c 96 0x7ffff7b9a757
2 _ostree_repo_write_ref ostree-repo-refs.c 848 0x7ffff7b9acde
3 _ostree_repo_update_refs ostree-repo-refs.c 899 0x7ffff7b9af7d
4 ostree_repo_commit_transaction ostree-repo-commit.c 1453 0x7ffff7b8f71a
5 ostree_repo_pull_with_options ostree-repo-pull.c 2794 0x7ffff7b97a4f
6 ostree_repo_pull_one_dir ostree-repo.c 3782 0x7ffff7b8947c
7 glnx_local_obj_unref glnx-local-alloc.h 63 0x7ffff7ba7a95
8 ostree_sysroot_upgrader_pull_one_dir ostree-sysroot-upgrader.c 530 0x7ffff7ba7a95
9 ??

Which results in calling glnx_file_replace_contents_with_perms_at::posix_fallocate which fails and sets errno as we can see in the strace:

fallocate(15, 0, 0, 65) = -1 EOPNOTSUPP (Operation not supported)

@cgwalters
Copy link
Member Author

The fallocate call looks like it did an expected fallback:

From glibc/sysdeps/unix/sysv/linux/posix_fallocate64.c:

  if (INTERNAL_SYSCALL_ERRNO (res, err) != EOPNOTSUPP)
    return INTERNAL_SYSCALL_ERRNO (res, err);
  return internal_fallocate64 (fd, offset, len);

So the failing syscall is this:

newfstatat(19, "linux/qt", 0x7ffe6ea4b040, AT_SYMLINK_NOFOLLOW) = -1 EBADF (Bad file descriptor)

Yet just a few lines before we successfully did:

openat(19, "linux", O_WRONLY|O_DIRECTORY|O_CLOEXEC|0x400000) = 15

I.e. we used O_TMPFILE sucessfully on fd 19. And further it's only after that that our:

close(19)                               = 0

succeeds. AFAICS what we're doing in userspace here is fine - something seems wrong on the kernel side. Does disabling the O_TMPFILE code in glnx-dirfd.c:glnx_open_tmpfile_linkable_at help?

@gatispaeglis
Copy link
Contributor

Does disabling the O_TMPFILE code in glnx-dirfd.c:glnx_open_tmpfile_linkable_at help?

Amazingly that helped! Unfortunately I have no idea what is a proper fix for this. If there is something else that you want me to test/verify, I can help with that.

@cgwalters
Copy link
Member Author

It's not the only fallout from O_TMPFILE; see also https://bugzilla.gnome.org/show_bug.cgi?id=769453
As far as I'm aware not many userspace programs have started to pick it up yet; it's still a relatively obscure kernel feature, so it's not surprising there are bugs around.

It'd be useful to run this by your kernel maintainers or upstream project. In the meantime, we can provide an option to disable O_TMPFILE.

@gatispaeglis
Copy link
Contributor

Out of curiosity I tried this on reiserfs (on the same kernel version). This file system produces a more meaningful error message:
error: Creating temp file: Bad file descriptor

In the meantime, we can provide an option to disable O_TMPFILE.

Sounds good to me, otherwise now OSTree is not usable on older kernels.

@cgwalters
Copy link
Member Author

I assume your base filesystem is ext4? FWIW I do get correct fallbacks in current CentOS 7 kernel:

# uname -a
Linux localhost.localdomain 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 23 17:05:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
 rpm -q kernel
kernel-3.10.0-327.22.2.el7.x86_64
# strace -e openat,open ostree --repo=/mnt/repo commit -b test --tree=dir=/var/roothome
...
openat(7, ".", O_WRONLY|O_DIRECTORY|O_CLOEXEC|0x400000) = -1 EISDIR (Is a directory)
openat(7, "./tmp.4V47LY", O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NOFOLLOW|O_CLOEXEC, 0600) = 10

@gatispaeglis
Copy link
Contributor

I assume your base filesystem is ext4?

ext3 actually, if that makes a difference.

cgwalters added a commit to cgwalters/libglnx that referenced this issue Aug 3, 2016
Some systems have bugs with it, so let's allow downstreams to easily
disable it.

https://bugzilla.gnome.org/show_bug.cgi?id=769453
ostreedev/ostree#421
@cgwalters
Copy link
Member Author

Do you want to review GNOME/libglnx#22 ? Just a :+1: in the PR is good, it's unfortunately not Homu enabled.

@cgwalters
Copy link
Member Author

Also if you can test this: https://bugzilla.gnome.org/show_bug.cgi?id=769453#c11 that'd be useful.

@gatispaeglis
Copy link
Contributor

Testing with more recent version of kernel with ext3, issue is not present as expected.

sh-4.3# uname -a
Linux b2qt-intel-corei7-64 4.1.8-yocto-standard #1 SMP PREEMPT Thu Mar 17 08:16:06 EET 2016 x86_64 GNU/Linux

cgwalters added a commit to cgwalters/ostree that referenced this issue Aug 3, 2016
rh-atomic-bot pushed a commit that referenced this issue Aug 4, 2016
See: #421

Closes: #426
Approved by: gatispaeglis
@cgwalters
Copy link
Member Author

Closed by #426

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants