[FS] Fix open() ignoring trailing slash on regular files#26773
[FS] Fix open() ignoring trailing slash on regular files#26773sbc100 wants to merge 1 commit intoemscripten-core:mainfrom
Conversation
|
I will followup with a wasmfs fix for this test. |
|
Actually wasmfs fix included now. |
f2ab330 to
da09207
Compare
91199a1 to
42a1ef7
Compare
tlively
left a comment
There was a problem hiding this comment.
I'll have to take another look, but I wonder if it's possible to check for directories in fewer places.
| // setting opts.follow = true will override this behavior. | ||
| if (FS.isLink(current.mode) && (!islast || opts.follow)) { | ||
| // setting opts.follow = true or having a trailing slash will override this behavior | ||
| // (POSIX requires that a trailing slash forces following of symbolic links). |
42a1ef7 to
1fcbccf
Compare
| struct ParentChild { | ||
| std::shared_ptr<Directory> parent; | ||
| std::string_view child; | ||
| bool hasTrailingSlash; | ||
| }; |
There was a problem hiding this comment.
Can we model a parsed path with a trailing slash as having the indicated directory as parent and having empty child rather than adding this extra bool? Then parseParent would already handle the is<Directory>() check and getChild would just return the parent when the child is empty.
There was a problem hiding this comment.
I think its tricky to get this right. Here is what AI said about your suggestion:
Thomas's suggestion to model paths with trailing slashes as (directory, empty_child) is architecturally cleaner in
principle, but it requires a system-wide refactoring of both the JS and WasmFS layers.
Why it's not a simple "cleanup":
-
Syscall Logic Impact:
- Currently, syscalls like unlink, mkdir, rename, and rmdir depend on having the name of the child within the parent.
If we change path/ to mean (directory, ""), then unlink("file/") would try to unlink the entry named "" inside the
directory file. This is not how these syscalls work; they expect the name of the file being removed. - On Linux, unlink("file/") fails with ENOTDIR if file is a regular file. Our current hasTrailingSlash boolean
correctly captures this "intent to refer to a directory" without losing the name of the entry we are operating on.
- Currently, syscalls like unlink, mkdir, rename, and rmdir depend on having the name of the child within the parent.
-
Parent Resolution:
- If we resolve path/ as (path_resolved_as_dir, ""), then parseParent itself must perform the resolution of the final
component. Currently, parseParent only resolves the directory part and returns the name of the final component.
Changing this would invert the responsibility of path resolution for every caller.
- If we resolve path/ as (path_resolved_as_dir, ""), then parseParent itself must perform the resolution of the final
-
WasmFS implementation:
- The getChild logic in WasmFS already handles recursion and link-following. If we pass an empty child, we'd need to
add special-case logic to every lockedParent.getChild(name) call to handle the empty string, which currently returns
nullptr (leading to ENOENT instead of the expected ENOTDIR or success).
- The getChild logic in WasmFS already handles recursion and link-following. If we pass an empty child, we'd need to
Conclusion:
Modeling this as a boolean hasTrailingSlash is the most surgical and robust way to handle this in Emscripten. It allows us
to:
- Keep the filename available for the syscalls that need it.
- Enforce POSIX rules (trailing slash forces directory resolution) at the exact point where resolution happens (lookupPath
or doParseFile). - Perform validation (trailing slash must be a directory) without changing the fundamental contract of what a "parent" and
"child" are in the path parsing logic.
| while (!path.empty() && path.back() == '/') { | ||
| path.remove_suffix(1); | ||
| hasTrailingSlash = true; |
There was a problem hiding this comment.
Another idea: what if we don't eagerly remove the trailing slash here. Then we would naturally check that the final element is a directory and follow links in the loop below. We could then detect that there is nothing after the slash and have a new exit condition where the final element (that we already checked is a directory) is still returned as the child.
It just seems really unfortunate to have 10 different locations where we have to branch on hasTrailingSlash.
This fixed both WasmFS and the old FS.
Fixes: #26710