Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure parent node is always updated in rename(), regardless of fs implementation #21964

Conversation

SiobhyB
Copy link
Contributor

@SiobhyB SiobhyB commented May 20, 2024

Description:

This PR addresses an issue encountered in the WordPress Playground project where the parent node (old_node.parent) needs to be explicitly updated in the finally block to ensure consistent handling of directory renames across file systems.

Problem:

In our use case, we observed that the rename() fails when using NODEFS.

Solution:

old_node.parent = new_dir was missing from the NODEFS rename function, which was leading to failures in our project. As such, this has now been added to the finally block in library_fs.js to ensure the parent node is always correctly updated, regardless of fs implementation.

The old_node.parent = new_dir line has also been removed from the MEMFS and PROXYFS files to remove redundancy.

Add a fallback update to the parent directory reference in the finally block of the rename function to ensure consistency. This change addresses cases where the node_ops.rename update doesn't propagate across different file systems or mount points, ensuring that the parent directory reference is correctly updated.
@kripken
Copy link
Member

kripken commented May 20, 2024

Thanks for the PR!

Looking at related code, I see that the two backends that fully support rename(), MEMFS and PROXYFS, do this operation themselves:

old_node.parent = new_dir;

and

oldNode.parent = newDir;

Which backend are you using in your project? If it's one of those two then we may need to debug to find out why the existing line did not work. Or, if it's a custom one, then this PR may be the right way to go - better to have that logic in a shared place rather than in each backend - but then we'd probably want to document that and remove those lines from MEMFS and PROXYFS.

@SiobhyB
Copy link
Contributor Author

SiobhyB commented May 20, 2024

@kripken, thank you so much for taking a look at this!

Which backend are you using in your project? If it's one of those two then we may need to debug to find out why the existing line did not work.

We're using MEMFS and I can confirm an existing call to old_node.parent = new_dir; is here in our project. It works as expected in cases where only a file from a virtual file system is being renamed. However, there are problems when the rename needs to sync up with a local file system (the source file isn't deleted from the virtual file system).

Or, if it's a custom one, then this PR may be the right way to go - better to have that logic in a shared place rather than in each backend - but then we'd probably want to document that and remove those lines from MEMFS and PROXYFS.

Adding the call to the finally block has been the only way I've found to bypass the issue we've had, which makes me wonder whether having the logic there may be more reliable for others working across file systems or with custom solutions, too. Would removing the line from MEMFS and PROXYFS be okay as long as the call is there in the finally block? I'd be happy to update this PR if you think that seems like a sound idea. I'd also be keen to hear any other thoughts you might have on this problem. 🙇‍♀️

@kripken
Copy link
Member

kripken commented May 20, 2024

Since you are using MEMFS then I think the only difference is that doing it in the finally is more robust, that is, an exception was hit in the other case, and that line wasn't reached. Does that sound possible? (If so then I would need to think more on this, as I'm not sure offhand what the right behavior is here when an error happens in the middle.)

What do you mean by "virtual file system" as opposed to local? (All files in MEMFS are in a virtual file system in some sense, and we may call it that in the docs in some places, I'm not sure. But I think you might mean something else?)

@SiobhyB
Copy link
Contributor Author

SiobhyB commented May 21, 2024

@kripken, apologies, I realised when reading my last comment back that I got mixed up between virtual/local file systems. A colleague, @fluiddot, also pinged me to further clarify some things. We think we were able to get to the bottom of what's happening. Here's an updated explanation:

  • The MEMFS rename function is called in our code when clients only use virtual file systems (that is, the file is only in memory). Everything works as expected in these cases, as old_node.parent = new_dir; is called.
  • In some clients, those that require local files that exist on the user's computer, we also call the NODEFS rename function. (This results in a mix of calls to the MEMFS and NODEFS rename functions, which threw me off somewhat here and helps explain some of the oddities in the bug we're seeing.)

We were able to get things working on our side by adding old_node.parent = new_dir to the NODEFS function here:

rename: (oldNode, newDir, newName) => {
var oldPath = NODEFS.realPath(oldNode);
var newPath = PATH.join2(NODEFS.realPath(newDir), newName);
try {
fs.renameSync(oldPath, newPath);
} catch (e) {
if (!e.code) throw e;
throw new FS.ErrnoError(NODEFS.convertNodeCode(e));
}
oldNode.name = newName;
},

However, we're not sure if there's a specific reason old_node.parent = new_dir isn't already used there. Do you perhaps have any insight?

I see two ways we could move forward:

  1. Add old_node.parent = new_dir to the NODEFS function.
  2. Remove old_node.parent = new_dir from the MEMFS and PROXYFS files, then add it to the finally block, so that it's always covered.

Do either of those seem preferable to the other to you, or are there alternatives?

@kripken
Copy link
Member

kripken commented May 21, 2024

Thanks! Now I think I understand.

I think option 2 is the right one. We can do it in one place in the finally block because we have access to all the things we need there, and it is safer as it avoids the risk of forgetting it elsewhere.

In fact we can probably also do the same with .name, but let's leave that out of this PR and maybe consider it later.

Siobhan added 2 commits May 21, 2024 22:06
As we're now calling `old_node.parent = new_dir` from the `finally` block in library_fs.js, it's not necessary to duplicate that line in MEMFS or PROXYFS.
@SiobhyB SiobhyB changed the title Ensure parent directory is updated in rename(), even when working across file systems Ensure parent node is updated in rename(), regardless of fs implementation May 21, 2024
@SiobhyB SiobhyB changed the title Ensure parent node is updated in rename(), regardless of fs implementation Ensure parent node is always updated in rename(), regardless of fs implementation May 21, 2024
@SiobhyB
Copy link
Contributor Author

SiobhyB commented May 21, 2024

@kripken, thank you for sticking with me here. I've gone ahead to make those changes now.

@@ -815,6 +815,9 @@ FS.staticInit();` +
} catch (e) {
throw e;
} finally {
// update old node (called here to ensure consistency
// across fs implementations)
old_node.parent = new_dir;
Copy link
Member

@kripken kripken May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this again, I don't think we want this to run if an exception happens. Not running it in that case would match the previous behavior - the old places that set the parent were only reached if nothing was thrown. So let's move this to right after the finally. edit: unless that doesn't work in your case for some reason?

For the comment, how about update old node (we do this here to avoid each backend needing to) - ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this again, I don't think we want this to run if an exception happens. Not running it in that case would match the previous behavior - the old places that set the parent were only reached if nothing was thrown. So let's move this to right after the finally. edit: unless that doesn't work in your case for some reason?

Nice catch :D I agree that makes sense and everything still worked as expected for our use case, I updated in 631a8bf.

For the comment, how about update old node (we do this here to avoid each backend needing to) - ?

I think that update has a lot more clarity! I applied that change in b555a45.

Siobhan added 2 commits May 22, 2024 00:24
As per the feedback here, we don't want to reassign the value if an error is thrown: emscripten-core#21964 (comment)
@SiobhyB SiobhyB requested a review from kripken May 21, 2024 23:27
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks! As soon as tests pass we'll merge this in.

@kripken kripken enabled auto-merge (squash) May 21, 2024 23:33
@kripken
Copy link
Member

kripken commented May 22, 2024

Hmm, it looks like a test is failing here, so we must have missed something.

To run that test locally you can use ./test/runner test_rename (there are general test docs that might help).

Let me know if you get stuck on something (I know running tests in a project for the first time can be annoying).

auto-merge was automatically disabled May 22, 2024 10:41

Head branch was pushed to by a user without write access

With this change, we ensure the assignment doesn't happen if an error is thrown, while also ensuring it runs before the code in the finally block.
@SiobhyB
Copy link
Contributor Author

SiobhyB commented May 22, 2024

@kripken, my assumption from looking into this is that there are cases where the code in the finally block here needs to run after all the renaming functionality:

} finally {
// add the node back to the hash (in case node_ops.rename
// changed its name)
FS.hashAddNode(old_node);
}

In b3b8ffe, I've moved the old_node.parent = new_dir call inside the main try block. My thinking is that, with this change, old_node.parent = new_dir won't be called if an error is thrown but will continue to always be called if old_dir.node_ops.rename runs successfully. Does this seem okay? I can see the tests are passing following this change.

@SiobhyB SiobhyB requested a review from kripken May 22, 2024 12:15
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Yes, the current PR has the right order.

I missed that the FS.hashAddNode call can actually interact with that line.

@kripken kripken merged commit b01e901 into emscripten-core:main May 22, 2024
29 checks passed
@SiobhyB SiobhyB deleted the fix/ensure-rename-updates-across-file-systems branch May 23, 2024 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants