Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] futimes only works for owned files with write access #128

Closed
wants to merge 17 commits into from

Conversation

piranna
Copy link
Contributor

@piranna piranna commented Nov 6, 2015

This fixes issue #127

@phated
Copy link
Member

phated commented Nov 9, 2015

@piranna I think we need to lstat the output file before opening and skip over that whole part if we aren't the owner. If we don't do the lstat, we modify the atime on the file (due to it being accessed with open). If we do that first, we can avoid swallowing the error upon EPERM.

@piranna
Copy link
Contributor Author

piranna commented Nov 9, 2015

Are you talking about doing the ownership check first instead of capturing
the exception?
El 09/11/2015 23:39, "Blaine Bublitz" notifications@github.com escribió:

@piranna https://github.com/piranna I think we need to lstat the output
file before opening and skip over that whole part if we aren't the owner.
If we don't do the lstat, we modify the atime on the file (due to it
being accessed with open). If we do that first, we can avoid swallowing
the error upon EPERM.


Reply to this email directly or view it on GitHub
#128 (comment).

@yocontra
Copy link
Member

yocontra commented Nov 9, 2015

Just a note: keep in mind that more fs calls = way slower. dest was only 1 fs call - now it seems to be ballooning to add new features. Performant and minimal should be the default, features should be enabled as needed.

@phated
Copy link
Member

phated commented Nov 9, 2015

src has a lot of calls to fs methods, so I don't think dest will be our bottleneck. However, maybe we need to consider making things like these extra methods on the vinyl-fs adapter. e.g. - vfs.utimes() that generates a stream of utime updates on the vinyl files passed through. /cc @contra

@piranna
Copy link
Contributor Author

piranna commented Nov 10, 2015

Long-term idea: what about make dest to know when we are trying to create a
new file and when to update it? Not also would be useful to diferenciate it
on this situations, but also would make it more secure by using 'wx' and
'r+' open modes by default instead of just 'w'...
El 10/11/2015 00:58, "Blaine Bublitz" notifications@github.com escribió:

src has a lot of calls to fs methods, so I don't think dest will be our
bottleneck. However, maybe we need to consider making things like these
extra methods on the vinyl-fs adapter. e.g. - vfs.utimes() that generates
a stream of utime updates on the vinyl files passed through. /cc @contra
https://github.com/contra


Reply to this email directly or view it on GitHub
#128 (comment).

@yocontra
Copy link
Member

@phated yeah, i'm more in favor of alternate functions (vs. cramming everything into dest)

@piranna
Copy link
Contributor Author

piranna commented Nov 10, 2015

I've updated the pull-request to check if we are the owners of the file instead of capturing the EPERM exception. The code maybe it's cleaner and could be used as basis to solve the problems with chown or chmod, but it's obvious it less performant...

@erikkemperman
Copy link
Member

@phated What would be the advantage of doing up to two filesystem calls (fs.lstat and fs.futimes) instead of just one with a catch?

@contra I understand the argument to keep things like futimes (and possibly chown and chmod) out of dest to keep the basic use case quick... But couldn't an argument also be made to keep the file on disk and the vinyl object as similar as possible, including metadata like times, mode and owner? I mean by default, not requiring extra vinyl-fs methods? I tried to make this case here.

@piranna
Copy link
Contributor Author

piranna commented Nov 12, 2015

Maybe it could be used a flag to set if instead throw an error as now happens, swallow it and update the vinyl object with the content from the filesystem, this way we don't break current functionality but also allow to give an oportunity to that methods that require higher priviledges.

@phated
Copy link
Member

phated commented Nov 16, 2015

@piranna I think this feature needs to be behind an option flag or in a separate method.

@piranna
Copy link
Contributor Author

piranna commented Nov 16, 2015

@piranna I think this feature needs to be behind an option flag or in a separate method.

Are you talking about the swallow of errors of priviledged methods like chmod or futimes and update of the vinyl object, isn't it? I agree on the swallow of the errors as a flag, but maybe the update of the vinyl object should have its own flag, isn't it? Do you want I implement it?

@phated
Copy link
Member

phated commented Nov 16, 2015

@piranna We shouldn't swallow any errors but as @contra said, we shouldn't be making these fs calls for every file. Either we make a method (vfs.touch() maybe?) or hide the fs.futimes calls behind an option.

@piranna
Copy link
Contributor Author

piranna commented Nov 16, 2015

I suppose that `vfs.touch()``would apply all this priviledged functionality at once (uid, gid, utimes, mode...), isn't it?

@phated
Copy link
Member

phated commented Nov 16, 2015

@piranna no, touch would be equivalent to https://en.wikipedia.org/wiki/Touch_(Unix) (doesn't modify uid, guid, modes)

@piranna
Copy link
Contributor Author

piranna commented Nov 16, 2015

Do you propose then to have one method for each of the entries? I would prefer a vinylfs.stat() method that apply all the stat values at once (uid, guid, mode, futimes, whatever) from the vinyl object to the dest file... If you are interested in one of them, probably you are in fact interested in all of them.

@phated
Copy link
Member

phated commented Nov 16, 2015

I think that could be beneficial but I don't like the idea of calling it vfs.stat() because the node API has a stat method that just reads them (no modification).

Also, having a singular method for those would make the implementation very complex.

I've always viewed the vinyl adapter spec as having 2 methods (src & dest), especially now that we removed watch. However, we also have added symlink to vinyl-fs but that method doesn't make sense for other transports (such as vinyl-http). Maybe symlink, touch, chmod, etc are outgrowing the base spec and could become an addon library. Thoughts? /cc @contra

@phated
Copy link
Member

phated commented Nov 16, 2015

^ the addon library could still be included with gulp 4

@piranna
Copy link
Contributor Author

piranna commented Nov 16, 2015

I don't like the idea of calling it vfs.stat()

I though first about applyStat(), but wanted to make it simpler, like dest()- The name is not (so) important, just what it does.

Also, having a singular method for those would make the implementation very complex.

Not really, they could be done easily and in a generic way with try-catch, for example.

@erikkemperman
Copy link
Member

@phated @piranna
Well, if we want to stick with Node fs nomenclature then we'd wind up with vfs.utimes, vfs.chmod and vfs.chown. Or maybe better, stick with what people know from the commandline and make the first one vfs.touch.

But because of the sub-second quirk we ran into earlier, these would actually be working with file descriptors and wind up calling fs.futimes, fs.fchmod and fs.fchown internally. I don't see the problem with extending the adapter beyond src and dest anyway, these extensions being fs specific after all, but note that if this were a separate module we'd have to open the file a second time.

I'd also like to see a vfs.stat for getting these attributes from disk into the vinyl objects, but I seem to be alone in that :-)

@piranna
Copy link
Contributor Author

piranna commented Nov 17, 2015

Well, if we want to stick with Node fs nomenclature then we'd wind up with vfs.utimes, vfs.chmod and vfs.chown

Fair for me.

but note that if this were a separate module we'd have to open the file a second time

Interesting... I think this is a good point about having that functionality integrated on dest(), but would need to be check first.

I'd also like to see a vfs.stat for getting these attributes from disk into the vinyl objects, but I seem to be alone in that :-)

Yeah, because it's already done by src() :-D

@erikkemperman
Copy link
Member

Yeah, because it's already done by src() :-D

Fair enough, I guess I had that coming -- of course I am again talking about the case where dest is not the last stage in a pipeline :-)

But maybe the consensus is actually that if these attributes like times etc matter especially then you should just make a distinct pipeline starting with another src.

@piranna
Copy link
Contributor Author

piranna commented Nov 17, 2015

But maybe the consensus is actually that if these attributes like times etc matter especially then you should just make a distinct pipeline starting with another src.

It makes sense, and it would be simpler...

@yocontra
Copy link
Member

Here are the viable options as I see them:

  • dest always syncs all attributes to the new file (huge performance loss)
  • dest syncs only when an option is true
  • dest becomes intelligent and knows which attributes need syncing, performing the minimum number of fs calls required

I don't think exposing new functions for different attributes is good API design on this one.

@erikkemperman
Copy link
Member

Maybe this is a completely inaccurate projection of long-ago C intuition, but once you have a file descriptor to an open file, which you can safely re-use within dest, subsequent fxxx calls should not be all that bad in terms of performance?

The hit would have to be pretty horrible for users to accept an awkward default behaviour of dest, it seems to me.

@yocontra
Copy link
Member

@erikkemperman Only way to know for sure is to make some benchmarks

@piranna
Copy link
Contributor Author

piranna commented Nov 17, 2015

dest syncs only when an option is true

+1

dest becomes intelligent and knows which attributes need syncing, performing the minimum number of fs calls required

+1000

@phated
Copy link
Member

phated commented Nov 19, 2015

I'm fine with all of @contra's proposals. However, changing this now that there was a release (due to being an added feature) is a breaking change, so we need it so we need it solved ASAP. This seems to be the last blocking thing on gulp 4

@erikkemperman
Copy link
Member

Looking at this yesterday I noticed that there is already a chmod in there. But that is done in writeContents/index.js, while the recent addition of (f)utimes is one level down in writeBuffer.js and writeStream.js.

So I think it would be cleaner to handle these attributes in one place, but passing file descriptors around is a bit messy because that doesn't work for writeDir.js.

I also noticed that the logic around chmod makes a point of doing stat first and checking if the mode is already correct. So up to two fs calls in stead of just writing it unconditionally.

I don't see it yet but assume there's a good reason for that. There's even a test to check that chmod doesn't get called if not necessary.

@phated
Copy link
Member

phated commented Jan 12, 2016

@piranna will you have a chance to rebase this soon?

@yocontra
Copy link
Member

Going to put a time limit of a few more days on these stat changes - if we aren't able to get the fixes merged soon I'm going to revert the broken changes that were merged earlier since it is holding up a release.

@phated
Copy link
Member

phated commented Jan 12, 2016

@contra while I agree that it is annoyingly holding up a release, I don't know if removing the functionality to push a release out quicker is the best idea. It would require bumping vinyl-fs a major version since we are removing functionality that was added in a minor and then I would only feel comfortable adding it back in a major since it caused all these troubles, which means it will never land in gulp 4. This feature surfaced a big problem with vinyl and vinyl-fs - the fact that they only deal with a tiny subset of file properties (path and contents) but it is advertised as an in-memory, streaming file system abstraction. If we aren't handling other metadata correctly and aren't able to update that metadata on the actual filesystem, people are going to reach for the fs module in their plugins, which goes against one of the core tenants.

@yocontra
Copy link
Member

@phated Was this ever released on npm in a way that people could use? If not I think we can revert this without causing any harm.

@phated
Copy link
Member

phated commented Jan 12, 2016

@contra yeah, unfortunately the utimes stuff was released in 2.2.0. 2.2.1 was a fix for trying to write invalid dates, 2.3.0 now has a fix for sub-second precision and these changes are to fix a bug where the utimes call fails if the process user isn't the file owner. We also need to finish gulpjs/vinyl#72 (which @erikkemperman started on) because we actually read the stats of every file and attach them while looking for symlinks (we didn't do that before 2.x)

We found all these issues because people are using it with the gulp 4.0 branch and have been reporting the problems.

@piranna
Copy link
Contributor Author

piranna commented Jan 12, 2016

@piranna will you have a chance to rebase this soon?

Sorry for the delay, I have been busy lately and the rebase is giving me a lot of problems to make the tests to pass.

This feature surfaced a big problem with vinyl and vinyl-fs - the fact that they only deal with a tiny subset of file properties (path and contents) but it is advertised as an in-memory, streaming file system abstraction. If we aren't handling other metadata correctly and aren't able to update that metadata on the actual filesystem, people are going to reach for the fs module in their plugins, which goes against one of the core tenants.

It seems the module fstream from the npm developers is getting some traction (I didn't knew of it until some weeks ago and now I'm seeing it's a dependency of almost all the projects I'm using lately...) and has support for streaming directories and symlinks, maybe we could get some inspiration from it...

@piranna
Copy link
Contributor Author

piranna commented Jan 13, 2016

I can't be able to fully fix the problems that has surfaced after the rebase, seems it's getting some problems due to the usage of the file descriptors... Don't know if the library to manage them would help, but the more I think about it, the more I believe we've gone too far optimizing by using them instead of using plain old path based operations just to have sub-second precission... Does it matter so much?

Also, other problems are related with symlinks and the fact they are not used as first class citizens here... :-/

@Klowner
Copy link
Contributor

Klowner commented Jan 13, 2016

Sorry I'm a little late to the show, but what issue are the symlinks posing? I might be of some assistance.

@erikkemperman
Copy link
Member

I attempted the rebase of this stuff as well, and couldn't figure it out either. Didn't spend a lot of time on it though, but my feeling from its description was this library from rvagg might help.

Personally I feel that sub-second precision is rather a big deal. That is, if we want things like gulp-newer and gulp-changed to work as expected.

@piranna
Copy link
Contributor Author

piranna commented Jan 13, 2016

Sorry I'm a little late to the show, but what issue are the symlinks
posing? I might be of some assistance.

As I remember, since some filesystems don't support symlinks (mostly FAT)
to be as compatible as possible vinyl-fs dereferenced the symlinks writting
instead its content, so it was being used fs.stat() instead of fs.lstat()
and didn't take on consideration the stat of the symlinks. It was
succesfully changed, but when we started using file descriptors we used
fs.fstat(), and that one show the info of the final file, not the symlink.

@erikkemperman
Copy link
Member

@contra @phated @piranna @Klowner

So I spent a little more time with the rebased work and got all tests to pass. Yay!

See the code here and the test results here.

I've taken the liberty of squashing the very many commits, I hope you don't mind...

I guess it should still be tidied up a bit with some flow-control library, like we talked about earlier.

More importantly, it currently skips writing metadata for symlinks, just like it did before... I propose discussing that in a different issue if anyone has ideas.

@Klowner
Copy link
Contributor

Klowner commented Jan 13, 2016

nice job, @erikkeperman! That seems good to me from what I can tell.

@piranna
Copy link
Contributor Author

piranna commented Jan 13, 2016

I've taken the liberty of squashing the very many commits, I hope you don't mind...

All this commits will be lost as tears in the rain... :-P It hurts my ego, but I admit they needed some clean-up.

More importantly, it currently skips writing metadata for symlinks, just like it did before... I propose discussing that in a different issue if anyone has ideas.

It's ok to discuss this on a different issue, but I think this should be fixed too, that's important to set them correctly and not merge the stats of the symlink and the destination file..

@erikkemperman
Copy link
Member

All this commits will be lost as tears in the rain... :-P

How appropriate, it was his birthday last Friday 👍

Seriously though, we can obviously package this however you like -- to use a football phrase, I just walked into a header from an excellent pass.

To reflect this you might consider adding my fork as a remote, and force-pushing it into your branch for this PR? Feel free to squash as you see fit :-)

Like I said it probably still needs some work, for one thing I really don't like that descriptors and streams are opened and closed/ended in different modules now. Maybe the written callback in writeContents/index should be given a back-callback (is there a name for such a thing?) argument, invoke after it is done writing metadata. The various writeContents/writeXxx modules can then close the descriptor or end the stream, as appropriate.

Come to think of it, that might even make things sufficiently separated to leave out further flow-control.

@erikkemperman
Copy link
Member

I think I made it slightly more legible... But that's subjective of course.

Will leave it for now, let me know how you want to proceed!

@phated
Copy link
Member

phated commented Jan 14, 2016

@erikkemperman you are my hero today. Can you squash your commits so there's just 2 commits (1 squashed @piranna commit and 1 squashed commit for you) and then submit a PR?

@piranna
Copy link
Contributor Author

piranna commented Jan 14, 2016

All this commits will be lost as tears in the rain... :-P

How appropriate, it was his birthday last Friday 👍

I was not thinking about that fact, just I love that movie :-D

one thing I really don't like that descriptors and streams are opened and closed/ended in different modules now

That's because close of file descriptors is common, probably the same could be done with open.

@erikkemperman erikkemperman mentioned this pull request Jan 14, 2016
@phated
Copy link
Member

phated commented Feb 26, 2016

Completed by #151 - released in 2.3.2

Thanks for all the work you did on this @piranna!

@phated phated closed this Feb 26, 2016
@piranna
Copy link
Contributor Author

piranna commented Feb 26, 2016

You are welcome :-)

2016-02-26 2:19 GMT+01:00 Blaine Bublitz notifications@github.com:

Completed by #151 #151 -
released in 2.3.2

Thanks for all the work you did on this @piranna
https://github.com/piranna!


Reply to this email directly or view it on GitHub
#128 (comment).

"Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton
de sitios diferentes, simplemente escribe un sistema operativo Unix."
– Linus Tordvals, creador del sistema operativo Linux

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants