Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readdir support #11 #12

Merged
merged 2 commits into from
Aug 7, 2022
Merged

Readdir support #11 #12

merged 2 commits into from
Aug 7, 2022

Conversation

rbtcollins
Copy link
Owner

No description provided.

@rbtcollins
Copy link
Owner Author

I won't be able to follow up on this for a bit, but it should be green, its not clear why its failing in github actions. @sourcefrog perhaps you'd like to play with the branch and see if its roughly what you need?

@rbtcollins
Copy link
Owner Author

( I mean, clearly its creating the files somewhere that the readdir cannot see them. But a parent dir? sibling? different layer of a filesystem stack?)

@sourcefrog
Copy link

Thanks!

@rbtcollins
Copy link
Owner Author

ok, so reproduced in a docker container, sigh. Will have to dig to see if I'm doing something strange, or its just that docker is fundamentally broken

[pid  4822] mkdir("/tmp/.tmpnKA1x0", 0777) = 0
[pid  4822] mkdir("/tmp/.tmpnKA1x0/parent", 0777) = 0
[pid  4822] openat(AT_FDCWD, "/tmp/.tmpnKA1x0/parent", O_RDONLY|O_NOFOLLOW|O_CLOEXEC) = 3
[pid  4822] rename("/tmp/.tmpnKA1x0/parent", "/tmp/.tmpnKA1x0/renamed-parent") = 0
[pid  4822] fcntl(3, F_DUPFD_CLOEXEC, 0) = 4
[pid  4822] newfstatat(4, "", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_EMPTY_PATH) = 0
[pid  4822] fcntl(4, F_GETFL)           = 0x28000 (flags O_RDONLY|O_LARGEFILE|O_NOFOLLOW)
[pid  4822] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid  4822] getdents64(4, 0x7f2af0000dd0 /* 2 entries */, 32768) = 48
[pid  4822] getdents64(4, 0x7f2af0000dd0 /* 0 entries */, 32768) = 0
[pid  4822] close(4)                    = 0
[pid  4822] openat(3, "1", O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_CLOEXEC, 0777) = 4
[pid  4822] close(4)                    = 0
[pid  4822] openat(3, "2", O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_CLOEXEC, 0777) = 4
[pid  4822] close(4)                    = 0
[pid  4822] mkdirat(3, "child", 0777)   = 0
[pid  4822] openat(3, "child", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_DIRECTORY) = 4
[pid  4822] openat(4, "3", O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_CLOEXEC, 0777) = 5
[pid  4822] close(5)                    = 0
[pid  4822] close(4)                    = 0
[pid  4822] fcntl(3, F_DUPFD_CLOEXEC, 0) = 4
[pid  4822] newfstatat(4, "", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_EMPTY_PATH) = 0
[pid  4822] fcntl(4, F_GETFL)           = 0x28000 (flags O_RDONLY|O_LARGEFILE|O_NOFOLLOW)
[pid  4822] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid  4822] getdents64(4, 0x7f2af0000dd0 /* 0 entries */, 32768) = 0
[pid  4822] close(4)                    = 0
[pid  4822] write(2, "thread '", 8thread ')     = 8
[pid  4822] write(2, "tests::readdir", 14tests::readdir) = 14
[pid  4822] write(2, "' panicked at '", 15' panicked at ') = 15

@rbtcollins
Copy link
Owner Author

Not materially affected by a different tmp dir.

[pid  4838] mkdir("/fs_at/t/.tmpiffEnk", 0777) = 0
[pid  4838] mkdir("/fs_at/t/.tmpiffEnk/parent", 0777) = 0
[pid  4838] openat(AT_FDCWD, "/fs_at/t/.tmpiffEnk/parent", O_RDONLY|O_NOFOLLOW|O_CLOEXEC) = 3
[pid  4838] rename("/fs_at/t/.tmpiffEnk/parent", "/fs_at/t/.tmpiffEnk/renamed-parent") = 0
[pid  4838] fcntl(3, F_DUPFD_CLOEXEC, 0) = 4
[pid  4838] newfstatat(4, "", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_EMPTY_PATH) = 0
[pid  4838] fcntl(4, F_GETFL)           = 0x28000 (flags O_RDONLY|O_LARGEFILE|O_NOFOLLOW)
[pid  4838] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid  4838] getdents64(4, 0x7f2770000de0 /* 2 entries */, 32768) = 48
[pid  4838] getdents64(4, 0x7f2770000de0 /* 0 entries */, 32768) = 0
[pid  4838] close(4)                    = 0
[pid  4838] openat(3, "1", O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_CLOEXEC, 0777) = 4
[pid  4838] close(4)                    = 0
[pid  4838] openat(3, "2", O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_CLOEXEC, 0777) = 4
[pid  4838] close(4)                    = 0
[pid  4838] mkdirat(3, "child", 0777)   = 0
[pid  4838] openat(3, "child", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_DIRECTORY) = 4
[pid  4838] openat(4, "3", O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_CLOEXEC, 0777) = 5
[pid  4838] close(5)                    = 0
[pid  4838] close(4)                    = 0
[pid  4838] fcntl(3, F_DUPFD_CLOEXEC, 0) = 4
[pid  4838] newfstatat(4, "", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_EMPTY_PATH) = 0
[pid  4838] fcntl(4, F_GETFL)           = 0x28000 (flags O_RDONLY|O_LARGEFILE|O_NOFOLLOW)
[pid  4838] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid  4838] getdents64(4, 0x7f2770000de0 /* 0 entries */, 32768) = 0
[pid  4838] close(4)                    = 0
[pid  4838] write(2, "thread '", 8thread ')     = 8
[pid  4838] write(2, "tests::readdir", 14tests::readdir) = 14
[pid  4838] write(2, "' panicked at '", 15' panicked at ') = 15

@sourcefrog
Copy link

Could it be that we need to seek back to the start before reading the directory?

@sourcefrog
Copy link

Yep, that seems to be it. Will push an updated branch.

@sourcefrog
Copy link

Something like sourcefrog@88faff1 works on Linux at least.

/// provide rich metadata may in future expose this through methods or extension
/// traits on DirEntry.
///
/// For now however, only the [`name()`] is exposed. This does not imply any

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point about it being racy, but, does open_at give you a way to avoid the race consistently.

If there's an entry that's flipping between being a symlink, directory, file, and something else, and I want to readlink, readlink, read, or error out... I'd need a nofollow option to start with, and then I guess the application code needs to retry various options until something succeeds? Seems a bit messy?

(Probably not a reason to block addition of this with just the name to start with, just a thought.)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like sourcefrog@88faff1 works on Linux at least.

Thats ... fascinating. What bakes my brain here is that this is a new fd - where is the state coming from? (The previous iteration over the DIR* contents should be wiped out when fdopendir is called : https://docs.rs/nix/0.24.2/src/nix/dir.rs.html#57

And we never actually read from the original fd: the pattern is:

  • open dir (fd 3)
  • fcntl(3, F_DUPFD_CLOEXEC, 0) = 4
  • fdopendir(4) = DIR*
  • drop for Dir calls closedir correctly., which closes fd 4.

It works on Linux here outside docker. Doesn't work within docker. That means we don't require a kernel version change to trigger the situation.

Linux tr2vm 5.13.0-52-generic #59-Ubuntu SMP Wed Jun 15 20:17:13 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
libc6:amd64 2.34-0ubuntu3.2

The failing docker container contents:
ii libc6:amd64 2.35-0ubuntu3.1 amd64 GNU C Library: Shared libraries

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point about it being racy, but, does open_at give you a way to avoid the race consistently.

If there's an entry that's flipping between being a symlink, directory, file, and something else, and I want to readlink, readlink, read, or error out... I'd need a nofollow option to start with, and then I guess the application code needs to retry various options until something succeeds? Seems a bit messy?

(Probably not a reason to block addition of this with just the name to start with, just a thought.)

NoFollow is absolutely needed yes. We also need an inode abstraction (to permit safeish open_at(child, '..')) to walk back up the tree. But then you don't need to retry, you just open, interrogate, and if a symlink follow the pointer from userspace, otherwise process the object you obtained from open_at.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like sourcefrog@88faff1 works on Linux at least.

Thats ... fascinating. What bakes my brain here is that this is a new fd - where is the state coming from? (The previous iteration over the DIR* contents should be wiped out when fdopendir is called : https://docs.rs/nix/0.24.2/src/nix/dir.rs.html#57

And we never actually read from the original fd: the pattern is:

  • open dir (fd 3)
  • fcntl(3, F_DUPFD_CLOEXEC, 0) = 4
  • fdopendir(4) = DIR*
  • drop for Dir calls closedir correctly., which closes fd 4.

It works on Linux here outside docker. Doesn't work within docker. That means we don't require a kernel version change to trigger the situation.

Linux tr2vm 5.13.0-52-generic #59-Ubuntu SMP Wed Jun 15 20:17:13 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux libc6:amd64 2.34-0ubuntu3.2

The failing docker container contents: ii libc6:amd64 2.35-0ubuntu3.1 amd64 GNU C Library: Shared libraries

Installed an impish container (same as host os that 'works') and it still fails - so this is looking very specific to docker.

Going to try a couple things :). Also I've realised nix has poor readdir hygiene, filing a bug there.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaves a strong hint (from the NetBSD man page in particular) nix-rust/nix#1784

So I think seek might be right, though I'm still suspicious of underlying things here.

Copy link

@sourcefrog sourcefrog Aug 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It failed for me first time, outside of Docker, on Pop_OS 22.04 Linux lift

5.18.10-76051810-generic #202207071639~1659108431~22.04~c9172fb SMP PREEMPT_DYNAMIC Fri J x86_64 x86_64 x86_64 GNU/Linux

I can't immediately cite chapter and verse but I'm not surprised that dup'd file descriptors share a seek position.

https://man7.org/linux/man-pages/man2/dup.2.html

   The dup() system call allocates a new file descriptor that refers to the same open file description as the descriptor oldfd. 

https://man7.org/linux/man-pages/man2/open.2.html

    A call to open() creates a new open file description, an entry in
   the system-wide table of open files.  The open file description
   records the file offset and the file status flags (see below).  A
   file descriptor is a reference to an open file description; this
   reference is unaffected if pathname is subsequently removed or
   modified to refer to a different file.  For further details on
   open file descriptions, see NOTES.

So my model was that you have a single open file object in the kernel with two fd numbers pointing to it. If you read from it by one, you have to seek it back in the other.

That does not explain why it would succeed in some cases though!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NoFollow is absolutely needed yes. We also need an inode abstraction (to permit safeish open_at(child, '..')) to walk back up the tree. But then you don't need to retry, you just open, interrogate, and if a symlink follow the pointer from userspace, otherwise process the object you obtained from open_at.

But with O_NOFOLLOW you'll fail to open the file. You need to then retry with readlinkat.

I split out #13 on this too.

The other case I though of where statat will help beyond fstat is a file that you don't have permission to read.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the success in some cases is very confusing. We may need to add a note that the same file object is in use and thus the file location offset can be updated by readdir.

O_PATH permits fstat after opening, but the portability concerns are there :/.

The advice for programs that want to be secure and run on Darwin, from that stackexchange, is to tell folk that they need to grant read access to permit such software to work, or document its vulnerability to races on Darwin.

/// Read the children of the directory d.
///
/// See [`ReadDir`] and [`DirEntry`] for details.
pub fn read_dir(d: &mut File) -> Result<ReadDir> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somewhat would have expected this would be a method on a specific DirHandle object, not a generic fs::File. Or, maybe it would be an extension trait of File... (I realize this is the style it's using now.)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so I can understand that.

There is a consistency question. The stdlib's OpenOptions struct is a programmable factory basically. I've chosen to put mkdir_at and open_at (and in future symlink_at and delete_at) on a similar struct, at least for now. That permits a foreign File like that implements AsDeref to be used I guess, which an extension trait wouldn't so much.

readdir doesn't seem to need OpenOptions which is why I made it a free function for now.

The type it works on is File because one doesn't know when a path is opened whether it is a directory or not. We could do a type conversion thing where we check the metadata, but std lib File's don't have free metadata last I checked, so we'd be adding a syscall hidden behind a type conversion, which feels strange to me.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have a struct DirHandle(File) that allows construction from a File without checking the on-disk type, just letting later calls fail if it's not really a dir.

It might make client code feel more natural: dir_handle.readlink_at(name). Maybe this is imposing too much of an opinion on the client.

Of course on Linux (at least) it doesn't have to be a dir; you can have an O_PATH.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps. We can try a few things and see whats nice. e.g.

struct FileAndError { f: File, e: io::Error}
DirHandle::from(File) -> Result<DirHandle, FileAndError>

would certainly work. I'm just not sure its better yet :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants