Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize paths #101

Merged
merged 4 commits into from
Sep 6, 2016
Merged

Normalize paths #101

merged 4 commits into from
Sep 6, 2016

Conversation

darsain
Copy link
Contributor

@darsain darsain commented Aug 22, 2016

Changes:

Normalization: passing a path through path.normalize() and stripping any trailing separators.

  • file.cwd and file.base are now a get/set objects that normalize the path on set.
  • Constructor normalizes the whole file.history when passed.
  • file.path normalizes on set.
  • file.base now proxies to file.cwd when null, undefined, or equal to file.cwd. Read more in file.base changes below.

Test changes:

  • Added tests for everything above.
  • Some existing tests touched up to pass on windows.
  • Removed these tests as no longer relevant (read file.base changes below for more):
    • File inspect() should return correct format when Buffer and only path and no base
    • File relative get/set should error on get when no base

file.base changes

It was already being set to file.cwd in constructor so it was never falsy or undefined unless manually made so. This made it tedious when you wanted to change file.cwd as you had to keep the two in sync.

This PR makes file.base a proxy to file.cwd unless set to something else. It can be reset back to "proxy to file.cwd" behavior by passing null, undefined, or value same as file.cwd.

This change made the use of both file.cwd and file.base props easier, simplified file.inspect() and file.relative(), and removed the file.relative() error case when there was no file.base set.

return this._cwd;
},
set: function(cwd) {
if (typeof cwd !== 'string') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently allows an empty string for cwd; do we want to allow that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phated dont think so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, will add a check for it.

@phated
Copy link
Member

phated commented Aug 23, 2016

Related: do you think we should expand the custom normalize method to use a regexp against Windows and Linux separators and do .replace(regexp, path.sep)? /cc @contra

@darsain
Copy link
Contributor Author

darsain commented Aug 23, 2016

@phated the path.normalize function already switches separators to current platform path.sep when appropriate.

For example:

// on windows
path.normalize('/test/foo'); // => \\test\\foo

The only kinda (since none codes like this) issue is the other way around:

// on linux
path.normalize('\\test\\foo'); // => \\test\\foo

Unfortunately nothing can be done about it, since \ is a valid filename character on POSIX compliant systems, so the path.posix.normalize() function correctly sees this path as a one long filename.

In it's current form, Vinyl can be used either as a POSIX compliant module if user is only targeting UNIX systems and needs to use '' in filenames, or a cross platform module by choosing not to use \ in filenames. If we force it to always / on Linux (which is what the gerexp would do) we would be removing this option.

But, on the other hand, if you are saving normalized paths to some database on windows, and than you move said database to linux, you are screwed :)

If you ask me, I'd love to just hard convert everything to / even on windows. It's not like it will break anything, windows is pretty much indifferent to separator type, and I'm not familiar with a reason against it that would make me abandon this idea. But that's not what is considered a "proper" way I guess :)

And just to note: the custom normalize function is there because path.normalize('') returns . which would cause issues.

@@ -135,7 +141,7 @@ File.prototype.inspect = function() {
var inspect = [];

// Use relative path if possible
var filePath = (this.base && this.path) ? this.relative : this.path;
var filePath = this.path ? this.relative : this.path;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this.path at the end of the ternary? Wouldn't it be something falsey always? Could probably get away with hardcoding an empty string or using undefined/null instead. 1 less getter call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. I've just removed the now unnecessary this.base check without re-evaluating the ternary. That's how you arrive at ternaries like true ? foo : bar :) Gonna change it to null.

@phated
Copy link
Member

phated commented Aug 26, 2016

@darsain I'm fine with leaving the normalization up to node's path.normalize. I didn't actually realize it switched the / on Windows to \\. I don't want to take any options from *nix platforms.

I had a few follow-ups inline, but otherwise this is looking pretty good.

@darsain
Copy link
Contributor Author

darsain commented Aug 27, 2016

@phated yup, gonna make the changes.

What about your question whether to disallow empty paths? You fine with this reasoning against it?

- `file.stat` is an object
- `file.stat.isDirectory()` returns `true`

When marking file as a directory, do it by mocking the `fs.Stats` object, and passing it via `options.stat` property into the constructor. Vinyl needs to know the file is a directory from the get go, or it might result in an inconsistent path history (some entries ending in separator, some not).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like "When marking file as a directory". Maybe something like "When constructing a Vinyl object, pass in a valid fs.Stats object. Vinyl needs to know the file is a directory from the get go, or it might result in an inconsistent path history (some entries ending in separator, some not). If you are mocking the fs.Stats object, ensure it has the isDirectory method.", or something similar.

@phated
Copy link
Member

phated commented Aug 30, 2016

@darsain after running the vinyl-fs tests with your changes linked in, I'm encountering some issues:

  1. The isDirectory() method needs to be rewritten like https://github.com/gulpjs/vinyl/pull/98/files#diff-168726dbe96b3ce427e7fedce31bb0bcR60 in order to construct without a stats object.
  2. The directory symlink tests are failing because you can't create a symlink with a trailing slash. This leads me to believe we need to check isDirectory as well as isSymbolic (from https://github.com/gulpjs/vinyl/pull/98/files#diff-168726dbe96b3ce427e7fedce31bb0bcR72) to do proper normalization.

Should I get the symbolic changes merged and then you can rebase and tweak your implementation?

@darsain
Copy link
Contributor Author

darsain commented Aug 31, 2016

Yeah that would be best.

And can you clarify the simlink issue and solution? Will there have to be a getter that drops path.sep when isSimbolic() is true, or?

@phated
Copy link
Member

phated commented Aug 31, 2016

@darsain I believe the setters/getters that need to take isSymbolic into account are path, history, basename & stem. Basically anything with a path ending can't end in a path.sep if it is a directory and a symlink.

@phated
Copy link
Member

phated commented Aug 31, 2016

@darsain you can link vinyl into vinyl-fs master and run the tests. The failing tests related to this are in symlink stream and result in ENOENT failures.

@phated
Copy link
Member

phated commented Aug 31, 2016

I've merged my isSymbolic changes for you to rebase and utilize.

@darsain
Copy link
Contributor Author

darsain commented Sep 1, 2016

the setters/getters that need to take isSymbolic into account are path, history, basename & stem

Hm... currently, the stem doesn't have a separator appended when directory. Should it?

@darsain
Copy link
Contributor Author

darsain commented Sep 1, 2016

Rebased, and these props now don't end in separator when both isDirectory() and isSymbolic() are true:

  • path and by extension all items in history
  • basename
  • relative

Also added tests for this.

There is still the question of what to do with stem, which wasn't part of the "end with sep when directory" normalization before. Should it be?

@phated
Copy link
Member

phated commented Sep 1, 2016

My quick thought is that stem should end in a trailing separator if it's a directory. Any reason it shouldn't?

@darsain
Copy link
Contributor Author

darsain commented Sep 2, 2016

Well now that I think about it, what if you want to do something that was previously super simple, like append to the basename/stem? If all will end in separator, you have to first run it through something that strips slashes.

We made concatenation easier, but renaming harder. It's especially evident with basename, where if you want to append to it and it's a directory, you have to run basename through path.basename to get what you need:

file.basename = path.basename(file.basename) + '-appendinx'; // path.basename strips slashes

It feels silly, redundant, and will also break all gulp renaming plugins, and all code currently dependent on simple appending to basename/stem. What is currently simple will need a uniform slash stripping, or a special case for isDirectory() === true, and another one for isSymlink() === true.

In all coding environments, renaming/prepending/appending is straight forward, while you use something special to concatenate path levels (in node's case path.join), and we just made it the opposite. I think that all we'll accomplish by automatic trailing separator for directories is just switch path.join() everywhere for stripSlashes() everywhere, while breaking a lot of current code and diverging from standard way of doing this. Actually, path.join will still be common, read below.

Lets compare pros/cons of trailing separator for directories.

Pros:

  1. Makes these few concatenations easy:

    file.path = file.dirname + file.basename;
    file.path = file.base + file.relative;
    file2.path = file.path + 'filename'; // but will file.path end in a separator?
                                         // does it have a proper directory stats object?
                                         // let's just be sure and use path.join here anyway

Cons:

  1. Breaks this concatenation when dir has an extension:

    file.basename === file.stem + file.extname;
  2. Breaks appending to cwd, base, basename, path, and (soon?) stem.

  3. Makes getting slashless cwd, base, path, basename, and stem more complex since you need to strip.

  4. Breaks a lot of renaming plugins and current code.

All renaming and retrieving slashless path and parts of path will be especially cumbersome, since strip trailing slashes function is not part of any core lib, so you have to substitute with something like path.basename, or import/code your own every time you need it.

And it's not like we got rid of an unnecessary complexity here. path.join definitely adds a complexity, but it seems like it's the necessary kind. That's why its equivalents are part of every stdlib out there with no commonly used alternatives that would do it better.

Also, we didn't even got rid of it completely. You don't need it only in those few concatenation examples above. You'll still need it in tons of other common operations, such as adding a directory level:

file.path = path.join(file.dirname, 'new-level', file.basename);

So you'll still need to path.join often, but now on top of that you have to strip slashes when:

  • appending to basename/stem
  • you need a slashless basename or stem to
    • save it somewhere
    • use it in a template when generating static site
    • ...

The trailing separator for directories didn't feel quite right for me from the beginning, but while implementing it I just keep on stumbling into reasons why not to do it.

Besides, with automatic normalization, you can just concatenate with / and normalization will take care of it properly on all platforms:

file.path = file.dirname + '/' + file.basename;

There is also a value in having everything not end in a slash, since some tools/operations will complain when it does. We've already run into it even before merging this PR when vinyl-fs simlinking tests started failing.

So, are those few concatenation examples in the Pros list worth all the hassle and breakage that trailing separator is gonna introduce?

Overall, I now think that consistently never ending in a separator is way more practical and robust than doing the opposite.

@phated
Copy link
Member

phated commented Sep 2, 2016

@darsain I think your observations are correct and forcing the separator isn't the correct way to normalize paths. This problem I was trying to solve with this suggestion is at https://github.com/gulpjs/vinyl-fs/blob/master/lib/prepare-write.js#L47-L49

Is there a way we can solve that problem with normalization (such as always removing the trailing slash, etc)? I am open to altering the vinyl-fs tests but everything should be consistent.

An aside: do you think we should have .isDirectory() checks in the stem and ext getters/setters? I believe a dot in a directory is valid but path.basename(basename, ext) would split it, so maybe stem for a directory would return the same as basename and ext would return an empty string. Thoughts?

@phated
Copy link
Member

phated commented Sep 3, 2016

@darsain I thought node's path.normalize always stripped the trailing separator; is that not the case?

@darsain
Copy link
Contributor Author

darsain commented Sep 3, 2016

@phated nope. It just normalizes the path :) resolves dots and switches separators to the separator of the current platform. Stripping trailing separators needs to be done separately.

@@ -124,8 +135,14 @@ if (file.isBuffer()) {
}
```

### cwd
Ges and sets current working directory. Defaults to `process.cwd`. Will always be normalized and stripped off a trailing separator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Gets" typo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"stripped off" -> "remove"

@phated
Copy link
Member

phated commented Sep 6, 2016

Looks like my only problems are docs and test nits. I do want to further discuss the history normalization when no stats are passed but that can be tweaked separately.

@phated
Copy link
Member

phated commented Sep 6, 2016

Looks like vinyl-fs tests are passing except for tests that had trailing separators. I'll merge and do a cleanup pass on my nits.

@phated phated merged commit ece4abf into gulpjs:master Sep 6, 2016
phated pushed a commit that referenced this pull request Sep 6, 2016
@phated
Copy link
Member

phated commented Sep 6, 2016

@darsain Thanks for all the work on this! Really glad you found the issues with my trailing separator thinking. This will be released in the next major, after I land a few more changes.

@phated
Copy link
Member

phated commented Sep 7, 2016

Windows tests passing for the first time ever?!?! https://ci.appveyor.com/project/gulpjs/vinyl

phated pushed a commit that referenced this pull request Sep 27, 2016
phated pushed a commit that referenced this pull request Sep 27, 2016
phated pushed a commit that referenced this pull request Sep 27, 2016
phated pushed a commit that referenced this pull request Sep 27, 2016
phated pushed a commit that referenced this pull request Sep 27, 2016
phated pushed a commit that referenced this pull request Sep 27, 2016
phated pushed a commit that referenced this pull request Sep 27, 2016
phated pushed a commit that referenced this pull request Sep 27, 2016
phated pushed a commit that referenced this pull request Sep 27, 2016
phated pushed a commit that referenced this pull request Sep 27, 2016
phated pushed a commit that referenced this pull request Sep 27, 2016
phated pushed a commit that referenced this pull request Sep 27, 2016
phated pushed a commit that referenced this pull request Sep 27, 2016
phated pushed a commit that referenced this pull request Sep 27, 2016
phated pushed a commit that referenced this pull request Sep 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants