[meta] realpath issues in v6 #7726

Closed
saghul opened this Issue Jul 14, 2016 · 54 comments

Projects

None yet
@saghul
Member
saghul commented Jul 14, 2016 edited

Overview

There are a number of problems related to the realpath change in #3594. This issue aims to serve a single entry point to those, with the hope that we have a clear picture and are able to address them one way or another.

Please: Do not report new issues here, this is a collection of the currently existing ones.

  • [unix] symlink depth of > 32 gives ELOOP on some platforms (#7175)
  • [win] realpath doesn't work on ramdisks (#6861)
  • [win] realpath results are different when using substed drives (#7559)
  • [win] permissions problem on mapped paths (#7192)
  • [win] problem in Windows Docker container (#7044)
  • [win] Node can be inconsistent in what case builtin functions use for drive letters [#6624]
  • [win] Repro of Node loading modules twice on Windows [#6978]

Current state

Bad. Some of these issues have PRs addressing a specific issue, but none fix all of them.

Original reasoning

Following the issue trail I can only see performance as the argument for creating uv_fs_realpath and using it in node:

Way forward(?)

After scratching my head for a while I have no other suggestion than to revert back to the JS implementation. Since we changed the API to remove the cache argument, which was needed for performance, I suggest we keep the cache private, thus avoiding yet another API change.

I'll update the libuv documentation so nobody tries to bring this up again in Node v103.x, or at least not without validating it solves all the aforementioned problems.

Suggestions / alternatives are more than welcome, but at this point it's all or nothing, we either have a plan for fixing all the problems or we revert, IMHO.

@ronkorving
Contributor

Are the proposed fixes that are currently open workarounds or proper solutions? Also, could you link to them just like you did the issues so we can have a look into them?

@mscdex
Contributor
mscdex commented Jul 14, 2016

Why was this tagged ctc-agenda again?

@saghul
Member
saghul commented Jul 14, 2016

@mscdex I haven't gone through the last discussions the CTC had about this, but IIRC the plan was to keep the current implementation and fix it. I argue it cannot. Feel free to remove it if you feel otherwise.

@saghul
Member
saghul commented Jul 14, 2016

@ronkorving they are linked on the issues themselves AFAIK.

@trevnorris
Contributor

I have a fix for the linux ELOOP issue. It makes sense to me that we keep that implementation around. Performance is noticeably better (been taking advantage of this recently in an app that's getting the realpath of > 50,000 files as quickly as possible).

@saghul
Member
saghul commented Jul 14, 2016

@trevnorris Unfortunately, that's not enough. It's unacceptable IMHO that Windows people can't even run a hello world file if it's on one of those weird devices or network shares. I'm not opposed to some _realpath method somewhere, for those who want it though.

@saghul saghul referenced this issue Jul 14, 2016
Closed

win, fs: fix realpath behavior on substed drives #7559

3 of 3 tasks complete
@addaleax
Member

@saghul What would you think of implementing #7559 in more general way that would cover all of the Windows problems listed above? It is currently based on bailing out to the original JS implementation when encountering an error in the libuv-based realpath (currently, only upon certain kinds of errors, but it’s definitely written in a way that would make it easy to add more conditions).

That way forward has come up there and in #7175; it should work consistently and retain the performance gains in the most common (by far) cases.

@rvagg
Member
rvagg commented Jul 14, 2016

Thanks for this @saghul, it's helpful to have a roll-up like this and I'm very much inclined to view it in the same way as you at this stage. Without satisfactory resolution for Windows users we're stuck in a corner, regardless of progress papering over ELOOP.

I see two alternatives here:

  1. Revert entirely (I'd actually +1 the re-addition of cache, it shouldn't have been removed without a deprecation cycle in the first place, especially when it was replaced with an argument of the same type!)
  2. Revert for Windows and apply @trevnorris' fixes for everything else. Or something like that, perhaps as per #7559 as @addaleax says above.

What is the libuv stance on its new realpath implementation? Given that this issue is an admission that it's essentially broken (for now), is it going to stay around? Does it make sense for libuv to continue shipping something that is known broken for Windows?

@addaleax
Member

Given that this issue is an admission that it's essentially broken (for now), is it going to stay around?

Or maybe the other way around: Would you accept a port of whatever solution ultimately ends up in Node to libuv? It’s something I’d definitely be interested in if you think it makes sense.

@saghul
Member
saghul commented Jul 14, 2016

What is the libuv stance on its new realpath implementation? Given that this issue is an admission that it's essentially broken (for now), is it going to stay around? Does it make sense for libuv to continue shipping something that is known broken for Windows?

Given our versioning scheme, we can't remove it. Now, I also wouldn't consider it broken (on Unix): macOS and some BSDs have this systemwide 32 symlinks limit and nobody complains about it except the Node ecosystem, so one could argue Node was abusing this. And I would agree with that. This can be fixed, however, by taking the realpath(3) implementation in OpenSSH-portable (for example) and removing the limit. On Linux we could even copy the musl implementation, which is really simple and elegant.

But! Windows.

Or maybe the other way around: Would you accept a port of whatever solution ultimately ends up in Node to libuv? It’s something I’d definitely be interested in if you think it makes sense.

Certainly.

On Windows there is no realpath(3), so we are trying to emulate it, albeit without much success. It's possible that a way to solve all this is found later on, in which case we could go back to libuv's implementation, but know what to test for. That's partially the reason why I suggest we keep the current API, so swapping the implementation is possible. Also, those who know they are not affected by any of these shortcomings could use some "node-uvrealpath" addon and monkeypatch fs.realpath for their purposes.

@saghul
Member
saghul commented Jul 14, 2016

Revert entirely (I'd actually +1 the re-addition of cache, it shouldn't have been removed without a deprecation cycle in the first place, especially when it was replaced with an argument of the same type!)

The cache allowed for some very bizarre and unexpected path substitutions, IIRC that was the reason for its removal. If nobody misses the cache now, I argue we are ok without it. Since the JS version needs a cache, however, we will have to have one, but not as part of the API, IMHO.

@trevnorris
Contributor

@saghul I'm curious, the code to transverse the filesystem in the JS implementation is fairly straight forward. What if that same method was ported to C and used for the windows implementation in libuv?

@saghul
Member
saghul commented Jul 16, 2016

That's a possibility indeed. I guess the cache part would be a bit problematic. As in: have it or not? Store it where? etc.

@trevnorris
Contributor

@saghul I'd say for initial implementation we not have the cache (meaning accepting a cache from the user). From there we can do benchmarks and see how it affects performance. As for libuv generating a cache, isn't that technically a race condition? Cache coming from the user means they are aware how the paths will be resolved, but if a path changes in between two calls then internal cache could screw with us.

@saghul
Member
saghul commented Jul 19, 2016

@trevnorris Sure, no cache is fine. realpath(3) as implemented by glibc doesn't have one either, for example.

Now, if we are going to have a full-custom realpath implementation for Windows, we might as well do that on Unix, fixing the ELOOP problems and making the Linux version faster as byproducts.

At any rate, my intent with this issue is to reach a stable state ASAP, and I think a revert is still the best idea for now. We now know where we failed short, so back to the drawing board. If we do it as I suggested (no API changes, internal cache) swapping it out when we have a solid alternative would not imply any API changes, it would just be a performance boost.

One of the Windows issues is particularly hairy: #7559. Some people have used substed drives to make paths shorter and have some tools not choke on that. We'd have to agree on what we do there.

@trevnorris
Contributor

I wouldn't mind doing a full revert for the moment if the libuv fix and update in node isn't going take until v7. That delay is my only worry for the revert.

@saghul
Member
saghul commented Jul 20, 2016

I don't know when a libuv fix will be available, or if at all.

@rvagg
Member
rvagg commented Jul 20, 2016

Yes please, I'm still in favour of a full revert, cache arg included, before we head in to v6 LTS then we can revisit in master for v7.

@evanlucas
Member

I'm +1 on a full revert as well at this point.

@saghul
Member
saghul commented Jul 20, 2016

@rvagg Bringing back a public cache would mean we might need to remove it again. I have not seen any issue about that cache being removed. At the risk of repeating myself: if we keep the cache hidden we can change the guts without making API changes. Hopefully.

@bzoz
Contributor
bzoz commented Jul 20, 2016

+1 for full revert

BTW, do we really need/want to remove the cache? I would guess it is useful in some situations, and passing it as an argument is a nice way to give user control over it.

@saghul
Member
saghul commented Jul 20, 2016

@bzoz if we bring back the cache argument we'd change the API once again. I'd be interested in knowing on which non-contrived situations it's actually useful. It's removed now, and nobody complained about it AFAIK.

@bzoz
Contributor
bzoz commented Jul 20, 2016 edited

@saghul It seems it was added to increase module loader performance: 9bed5dc. @isaacs or @ry should probably know if we really need this.

@addaleax
Member

It seems it was added to increase module loader performance

I would say “yes”, the module loader is the (possibly only) place where a cache is necessary, and it would be needed in order to do a full revert; maybe we could reintroduce that as options.cache.

Although I think a full revert sounds okay to me, I’d still like to hear if there’s anything that would remain broken when taking the following path:

  • Use @bzoz#7559 for Windows (i.e. reintroduce the JS implementation) and just use that implementation if the uv realpath fails (for any reason).
  • Use @trevnorris#7548 to address the Unix ELOOP problem.
@addaleax
Member

Also, fwiw, I’ve taken up some work on porting the JS implementation to C with possible future inclusion in libuv in mind, but that’s quite a bit of work and will take a while.

@saghul
Member
saghul commented Jul 20, 2016

It seems it was added to increase module loader performance: 9bed5dc. @isaacs or @ry should probably know if we really need this.

@bzoz As I said earlier, there would still be a cache, but not in the public API. If would be hidden away.

I would say “yes”, the module loader is the (possibly only) place where a cache is necessary, and it would be needed in order to do a full revert; maybe we could reintroduce that as options.cache.

@addaleax Then how does it do it now without it? See above, there would still be a cache, somewhere.

Although I think a full revert sounds okay to me, I’d still like to hear if there’s anything that would remain broken when taking the following path:

IMHO that would be a mess to maintain. It more than doubles the bug surface area.

@saghul
Member
saghul commented Jul 20, 2016

Also, fwiw, I’ve taken up some work on porting the JS implementation to C with possible future inclusion in libuv in mind, but that’s quite a bit of work and will take a while.

Nice! Keep us posted!

@bzoz
Contributor
bzoz commented Jul 20, 2016

@saghul current uv implementation is much faster, so there is no need for caching. Anyhow we could move the caching from realpath to module loader. It could do path.resolve on module filenames and cache realpath of those.

@addaleax: one more case that I remember is #7192, but we could easily fix this (and others if they ever emerge) with little change to #7559 or probably with your C implementation

@addaleax
Member

Anyhow we could move the caching from realpath to module loader. It could do path.resolve on module filenames and cache realpath of those.

That won’t make a huge difference, I fear. The common case is that there are few or no symlinks involved, so caching the realpath of module filenames won’t affect performance much. The real work without a cache (I assume, haven’t measured it myself) would come from walking up the entire filesystem tree to each require()d file and calling lstat() on each part of the file path.

IMHO that would be a mess to maintain. It more than doubles the bug surface area.

That is a good point.

@MylesBorins
Member

I know for a fact that the cache was relied on in node-glob and potentially other modules as well.

@trevnorris
Contributor

@TheAlphaNerd The others are probably also written by Isaac as well. :P

We're talking a lot about how cache is necessary for performance, but is there a benchmark to show this? I expect that even if there's a revert the benchmarks in #7548 will be merged, but those don't cover the module case specifically. Let's generate some numbers then discuss whether it needs to be removed.

@saghul
Member
saghul commented Jul 20, 2016

Since it was mentioned on the CTC call, I followed the issue trail and updated this issue with the original reasoning for the change.

@addaleax
Member

@saghul If you want, you can take a look at addaleax@d14986c. It’s a somewhat direct translation of the (synchronous) JS realpath code without the cache to C, and the Node.js test suite passes locally for me when using it. It’s certainly far from perfect, and frankly I’m not sure how this would ideally be integrated into the rest of libuv.

@saghul
Member
saghul commented Jul 26, 2016

@addaleax Thanks! I took a quick peek and wow, that's a lot of work! Thanks for taking the time to do it. I'll take a look as soon as I can and see how we can tackle this.

@orangemocha
Member

Also caused by #3594:

  • [windows] On Windows, Node can be inconsistent in what case builtin functions use for drive letters [https://github.com/nodejs/node/issues/6624]
  • [windows] Repro of Node loading modules twice on Windows [https://github.com/nodejs/node/issues/6978]

😞

... because the uv_fs_realpath always normalizes the drive letter to uppercase, as detailed at #6978 (comment):

I tracked down why the keys currently always have an uppercase drive letter. It is because the full path is resolved based on the requiring module __filename, and that is calculated using fs.realPath. That in turn uses GetFinalPathNameByHandleW in libuv, which performs the normalization even if I pass in FILE_NAME_OPENED.

@jasnell
Member
jasnell commented Jul 26, 2016

I'm sorry for coming into this thread a bit late, just getting back from vacation and catching up... At this point I am inclined to revert the changes to realpath entirely and move the new impl to a new method... fs.lrealpath() for instance (short for libuv real path). We can then bring fs.realpath() through a proper deprecation cycle.

@saghul
Member
saghul commented Jul 26, 2016

@jasnell lrealpath is too similar to the stat -> lstat, chown -> lchown convention. At this point I'd not expose it. If someone tryly wants it they can write a simple addon exposing it.

@saghul
Member
saghul commented Jul 26, 2016

@orangemocha thanks, I updated the list at the top. What a mess :-S

@jasnell
Member
jasnell commented Jul 26, 2016

The key challenge with that is that there may be developers who have since
come to depend on the new impl. I'd much rather not drop it entirely.
Exposing both is not problematic if we have a clear deprecation path.

On Tuesday, July 26, 2016, Saúl Ibarra Corretgé notifications@github.com
wrote:

@jasnell https://github.com/jasnell lrealpath is too similar to the
stat -> lstat, chown -> lchown convention. At this point I'd not expose it.
If someone tryly wants it they can write a simple addon exposing it.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7726 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAa2eY4kdbzjxhiejF4r_jeOvD05635xks5qZj69gaJpZM4JMKMc
.

@MylesBorins
Member

@jasnell is there a clear way that we can signal to individuals all the problems associated with this implementation? Should we implement the fallback to the original implementation on failure?

@jasnell
Member
jasnell commented Jul 26, 2016

We document the differences and leave it at that. Fallbacks get messy.

On Tuesday, July 26, 2016, Myles Borins notifications@github.com wrote:

@jasnell https://github.com/jasnell is there a clear way that we can
signal to individuals all the problems associated with this implementation?
Should we implement the fallback to the original implementation on failure?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7726 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAa2eTNGjKpKyB8kOmP82FPWPg_q2cJtks5qZkFSgaJpZM4JMKMc
.

@saghul
Member
saghul commented Jul 26, 2016

The key challenge with that is that there may be developers who have since come to depend on the new impl. I'd much rather not drop it entirely.

There is nothing on the new implementation that can be depended upon. Unless being broken is such thing. I say this as someone who has followed this since the start (I reviewed the libuv PR) and came to the conslusion that we currently have no other reasonable choice.

Exposing both is not problematic if we have a clear deprecation path.

There is no clear path. Exposing something with the intention to deprecate it sounds like a waste of time IMHO.

@saghul
Member
saghul commented Jul 26, 2016

@TheAlphaNerd please let's not have 2 implementations each broken in a different way, it's just a recipe for dissaster.

I was hoping the last CTC call would have suggested some clear course of action, but it seems we are all as confused as we originally were.

There is no alternative to reverting which fixes all the problems without adding back the original implementation, it's the only viable way forward at this point. We can revisit later. Git archeology suggests the only reason for replacing the implementation was performance. If users could live with it until Node v6 they can certainly live with it a bit longer.

@orangemocha
Member

I agree that reverting is necessary. The new implementation simply has caused too many breakages, some of which are not just acceptable behavior changes but rather nasty bugs.

Fallbacks get messy, and do double the surface area for bugs and other behavioral differences. It doesn't sound like a desirable approach IMO.

On Windows, the list of bugs is so bad that we have no choice but reverting to the JS implementation. If you think that the Unix libuv implementation is in a better state, we could consider a Windows-specific revert to JS, and keep the libuv implementation for Unix. Note that this doesn't increase the bug surface area with respect to the current implementation, because the libuv implementation is different on Unix vs Windows. It does however leave room for inconsistencies between platforms, so my preference would be to do a full revert to JS on all platforms, and consider ways to improve performance later on.

@saghul
Member
saghul commented Jul 27, 2016

@orangemocha Agreed, I reached the same conclusion. When I mentioned the bug surface I meant from Node's perspective, considering what libuv internally does an implementation detail. At any rate, it's good to see we are on the same page.

@bzoz bzoz referenced this issue Jul 27, 2016
Closed

fs: restore javascript realpath implementation #7899

2 of 2 tasks complete
@orangemocha
Member

Here's a PR to revert: #7899

@jasnell
Member
jasnell commented Jul 27, 2016 edited

There is one additional consideration for reverting the realpath implementation in that the new implementation allows an encoding option to be passed to get the path back as a Buffer rather than a string. While it's unlikely that has extensive use by this point in time, it's still technically a major change to pull that back out.

@addaleax
Member

Btw, something that would probably be incredibly helpful to have after the revert are regression tests for all the issues we’ve been encountering… /cc @nodejs/platform-windows

@saghul
Member
saghul commented Jul 28, 2016

@addaleax Yes indeed! Alas, most of the Windows issues depend on "weird" system configurations, so the tests will need to make strong assumptions about the environment.

The ELOOP part can be easily tested, but that's a minor thing overall.

@saghul saghul removed the ctc-agenda label Aug 3, 2016
@piscisaureus
Member

Given that the ultimate purpose of realpath()ing module paths is to avoid loading the same module twice, I wonder if there is a completely different solution to this; what about instead of using the path as the cache key, use the [std_dev, st_ino] pair?

@addaleax
Member
addaleax commented Aug 3, 2016

Another purpose of realpath()ing is to allow npm link and similar features of package managers/directory structures; #5950 was an attempt to call realpath() only for the main module, and it turned out to be a pretty bad idea because of that.

@Jeff-Lewis

Thanks @saghul and everyone for addressing this.

@saghul
Member
saghul commented Aug 15, 2016

Time to close this one up. The revert made by @bzoz will be part of #8070, so Node 6.4.0. Thanks everyone who helped get this mess untangled.

@saghul saghul closed this Aug 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment