Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url.pathToFileURL should not dedup '/' in path #35507

Open
bmeck opened this issue Oct 5, 2020 · 19 comments
Open

url.pathToFileURL should not dedup '/' in path #35507

bmeck opened this issue Oct 5, 2020 · 19 comments
Labels
url Issues and PRs related to the legacy built-in url module. whatwg-url Issues and PRs related to the WHATWG URL implementation.

Comments

@bmeck
Copy link
Member

bmeck commented Oct 5, 2020

Found while checking: #35429

Currently it is not possible to have a path represent the file URL new URL('file:///a//b') when going through url.pathToFileURL. url.fileURLToPath does property preserve the // in the path already. This causes round trips to be lossy.

  • Version: 14
  • Platform: OSX
  • Subsystem: url

What steps will reproduce the bug?

console.log(url.pathToFileURL('/a//b').href);

How often does it reproduce? Is there a required condition?

Always

What is the expected behavior?

file:///a//b

What do you see instead?

file:///a/b

Additional information

@bmeck bmeck added url Issues and PRs related to the legacy built-in url module. whatwg-url Issues and PRs related to the WHATWG URL implementation. labels Oct 5, 2020
@jasnell
Copy link
Member

jasnell commented Oct 5, 2020

@bmeck ... is this a spec issue or an issue with our implementation?

@bmeck
Copy link
Member Author

bmeck commented Oct 5, 2020

@jasnell our helper function converting between the 2 (URL and path) seems broken

@guybedford
Copy link
Contributor

I'd argue this is by design. A path has deduplicated // separators. If anything I would argue that the file: URL spec is incorrect in not supporting // deduplication since there exists no pathing schemes on file systems that actually distinguish this.

@bmeck
Copy link
Member Author

bmeck commented Oct 5, 2020

@guybedford I think it could be argued in either direction for different reasons, but to me the lossy behavior being only on 1 end is both odd and a bit hard to explain. The real oddity to me here is that if a filesystem were to exist that did support empty filenames (S3 for example supports this) it would be odd to normalize in the runtime rather than leaving it to the OS. Is there a reason to impose a normalization step that might not be universally respected? I agree that this can cause distinct differences since that would cause issues like the one mentioned above with the new WHATWG change, but I think consistency is paramount personally. If we do want to normalize we should probably change url.fileURLToPath so that it remains consistent. I would heavily prefer to not normalize these as it is reaching into file system path interpretation which isn't something Node typically does except for windows UNC path limitation workarounds.

@guybedford
Copy link
Contributor

To be honest I think we've hit a real gap here between the spec and the reality, and siding with the spec only continues to ignore reality.

@guybedford
Copy link
Contributor

Specifically, I think we should add a new URL normalization function (fileURLNormalize or something) to any operation which deals with file URL resolution, that will dedupe repeated / characters, and possibly do percent encoding deduping in future as well.

I don't see any issue though with this function being lossy. Lossiness is a feature not a bug.

@guybedford
Copy link
Contributor

One could even define fileURLNormalize = pathToFileURL(fileURLToPath(url)) actually.

@bmeck
Copy link
Member Author

bmeck commented Oct 12, 2020

@guybedford given we are now moving to relying on realpath for our default loader, and talking of manual deduping for --preserve-symlinks would it be safe to move forward here? My main goal is that if an OS doesn't remove the // when realpathing that we can preserve that.

@guybedford
Copy link
Contributor

@bmeck there is more than just this deduping at play here - pathToFileURL can also take relative paths and will normalize them to the cwd. I actually don't know of any path operations in Node.js that don't do some path normalization or resolution, so it would seem this would be the rule in general. It might be a hard time changing the established behaviour, just like it would be for any other fs function.

@bmeck
Copy link
Member Author

bmeck commented Oct 13, 2020

@guybedford that seems like the internal usage of path.resolve won't support empty dir names (I agree changing that API is likely dangerous) then; using that internally is not technically required for this fn though. I don't think we should make the URL fileURLToPath normalize out the // so I'm more interested in changing pathToFileURL even if it means some extra work like having path.resolve delegate to an internal function that has a flag on if it should drop empty entries.

@guybedford
Copy link
Contributor

@bmeck the pathToFileURL won't normalize out Windows UNC strings though. But it is doing a path.normalize yes. Just about all path functions in Node.js do normalization, so I still don't quite follow the logic around why this one should be different here.

@bmeck
Copy link
Member Author

bmeck commented Oct 13, 2020

@guybedford my goal is to have these 2 complimentary functions do the same kind of normalization. I'm not against some normalization, but altering fileURLToPath to remove // is a lossy behavior so I don't want that.

@guybedford
Copy link
Contributor

@bmeck if the goal is consistency, would you consider the alternative of normalization on the fileURLToPath output?

@bmeck
Copy link
Member Author

bmeck commented Oct 13, 2020

@guybedford possibly; but, since // in pathnames is significant on flat file systems; we could add a note that these functions don't work for some file systems in the docs and that might be fine as well.

@guybedford
Copy link
Contributor

That sounds sensible to me.

@TimothyGu
Copy link
Member

This issue seems to characterize the bug as related to the spec, but this is not true. new URL('file:///a//b').pathname gives /a//b as expected. The real trouble is the fact that we use path.resolve to sanitize inputs to url.pathToFileURL, and path.resolve deletes empty path parts.

We can either stop using path.resolve (the URL constructor resolves .. and . segments anyway), or we can document the limitations of this API.

@guybedford
Copy link
Contributor

@TimothyGu how are you hitting these limitations currently? It would help to relate the issue to a use case.

@TimothyGu
Copy link
Member

I'm not hitting these limitations myself. I just noticed this open issue which seems to say that the WHATWG URL spec causes '/' to be collapsed, while it's really just path.resolve.

@guybedford
Copy link
Contributor

Agreed it is separate, I guess I would effectively still argue that the file URL protocol should already handle this (in contrast to non file URL protocols).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
url Issues and PRs related to the legacy built-in url module. whatwg-url Issues and PRs related to the WHATWG URL implementation.
Projects
None yet
Development

No branches or pull requests

4 participants