Skip to content

Provide conditional support for file inclusion. #694

Open
dancingplatypus opened this Issue Feb 28, 2013 · 25 comments
@dancingplatypus

A specific issue that I had run in to a while back was that, after running a large build job, there is a lot of work going on building targets that had not changed. While I did not solve this problem for all file based tasks, I did specifically implement a modification for grunt-contrib-copy in my own fork:

https://github.com/dancingplatypus/grunt-contrib-copy

This modification allows files to be ignored unless the source is newer than the destination. I did this by specifically updating the modification timestamp of the destination file with that of the source file. I then allow the configuration to specify if the process should run every time, or only when the source file is newer, or when the destination file had been modified, never, or finally, through a user defined predicate.

After submitting a pull request, I was advised that my modification might belong further up the chain, so that all tasks, grunt-contrib-copy included, might benefit. I also recognize that this is the same sort of generic behavior we might find in other make systems.

I see that the changes might be localized to file.js in the grunt project, and although the copy method seems the obvious point of injection, the recurse method might allow for one to expand this functionality so that any task which specifies a source and destination might skip processing of files that have not changed.

Thoughts?

@pvoisin
pvoisin commented Feb 28, 2013

Interesting idea!

@dominicbarnes

+1 I'd really like to see something like this implemented, or at least discussed. The ability for Makefiles to only run their tasks when files have changed is a great feature that is certainly useful for Grunt too.

@mehcode
mehcode commented Feb 28, 2013

This has been discussed several times with varying approaches; one of them will likely get implemented for 0.5.

One approach that I'd like to see fully implemented is a watcher that sits in grunt and uses MD5 hashes to determine when files change. It would also need to be able to determine if file A would change the compilation result of file B; sort of, a universal file dependency tracer.

The central idea is to have successive commands ...

grunt coffee
grunt coffee

... compile / copy / etc. only what they need to.

The task could either be told what files to compile or be given a way to ask if a file has been changed. Most here were leaning towards the former, if I recall correctly.

@dominicbarnes

To my knowledge, GNU Make simply compares modification times, which would be even simpler to implement (and faster) than MD5 hashes. Granted, it wouldn't catch a case where a file is saved where the content isn't actually changed, but probably a worthwhile trade-off considering what we have now.

And this could probably be implemented at the task-level as well, maybe the file option would include changed: true in addition to src/dest, since it's likely Grunt is having to stat each of those files at some point anyways.

@dancingplatypus

@mehcode -- I may have just had bad experiences, but I remember having several shells running various watchers, and though they aren't terrible, they didn't play well with my specific IDE, which is WebStorm. Webstorm only checks for file modifications in its editor when it loses and then regains focus. I realize, however, that Grunt cares not for my IDE biases. :-) However, changing the build process as little as possible must count for something.

One of the things I like about make systems in general is their simplicity by using timestamps. And to Dominic's point, touching a file in order to force a recompile of that resource isn't unheard of, and really, in large projects, the idea that you would touch enough files to impact your build time might be a very rare edge case, and worth the rebuild in that case.

Plus, what struck me about the feedback I had received is that standardizing the file handling for tasks provides a great central avenue to provide this functionality without requiring any change on the part of the developer (except perhaps to change the default treatment).

My own grunt-contrib-copy has been wonderful for our organization. When we made the upgrade to 0.4, it broke my old version, and the pain of going back to copying everything was really felt. If I have some time, and no one else wants to do it, I'd like to code it up for grunt so I can feel it work in some actual empirical cases. I tend to jump at the bit, though, and I don't want to step on toes, so if I'm derailing actual coding efforts to provide this functionality, I'd rather help out than be divergent.

@mehcode
mehcode commented Feb 28, 2013

To my knowledge, GNU Make simply compares modification times, which would be even simpler to implement (and faster) than MD5 hashes.

There are pros and cons to both methods. File modification times would be fine as well. The hard part in this is the dependency tracing / resolving.

[...] I'd like to code it up for grunt so I can feel it work in some actual empirical cases. I tend to jump at the bit, though, and I don't want to step on toes, so if I'm derailing actual coding efforts to provide this functionality, I'd rather help out than be divergent.

Your welcome to start the effort on making this. A few people here have been meaning to (including myself) but I at least haven't had time.

There are a few big parts to it. There is need of a project that can take a file and return its dependencies or dependents. This coupled with something that can check if files have changed could provide a way for tasks to filter out their file set or have their file set be filtered out for them automatically by a watcher.

I'm leaning towards tasks filtering themselves by being able to ask grunt if a file has changed and for the dependencies and dependents of a file.

@tkellen
grunt member
tkellen commented Feb 28, 2013

Just jumping in to re-iterate that this should be implemented here:
http://gruntjs.com/configuring-tasks#building-the-files-object-dynamically

...not in grunt.file.copy.

If implemented at the glob expansion level, the files will never even make it to copy.

@dominicbarnes

@tkellen +1

I think that's a good idea, as it leaves the decision up to the user/builder and not to the task maintainer.

@dancingplatypus

Definitely understood, tkellen -- I've taken your advice to heart. In my defense, I wrote it during 0.3, and it seemed like it was a file free-for-all.

I was thinking about mehcode's comment regarding MD5 hashes and if there was a case that timestamps would be unreliable. The only special case I'm seeing right now is when there are multiple source files coming together to produce a single output. I'll have to take only the latest time stamp, and I'll be unable to detect a change where an older file in the group has been modified, but it's modification timestamp does not make it newer than the latest. This might happen during a branch checkout in Git, or a viewspec update in ClearCase, for example. I honestly don't know how other make tools handle this.

I honestly think that the even with this caveat, the value is still there, as there's nothing to stop one from doing a clean/recompile in these instances, and then see the benefit on subsequent builds.

@mehcode
mehcode commented Feb 28, 2013

I was thinking about mehcode's comment regarding MD5 hashes and if there was a case that timestamps would be unreliable. [...]

Looking at waf (what I've worked on and with for a long time for my projects before web development) it is noted that their strategy of MD5 hashing files starts to be significantly slower as the number of files reach upwards of 10,000 though their of the opinion that it is worth it as most projects would be significantly less than that and have the benefit of consistency and reliability that timestamps don't.

We could always provide a fall back or option to use timestamps instead (as waf does).

@dancingplatypus

It is my opinion that any implementation would rely on a predicate which is passed the source(s) and destination, and the implementation of this predicate would be configurable. Whether it is MD5 or timestamp or some other method, this decision should be commoditized. A post-copy action would also be invoked that might update timestamps, or make a mark in an MD5 table.

At any rate -- I like the discussion. It's pulled me out of the weeds. I have an image processing task that could really benefit from this (as well as cough getting a 0.4 facelift), so I'll dig around.

@doberkofler

+1

@dancingplatypus dancingplatypus added a commit to dancingplatypus/grunt that referenced this issue Mar 1, 2013
@dancingplatypus dancingplatypus Issue #694 -- Provide conditional support for file inclusion (Initial…
… submission)
a46613e
@dancingplatypus

I've completed an intial implementation on this fork:
https://github.com/dancingplatypus/grunt

It adds two additional options to the files collection


onlyIf

Type: String | Function Default 'always'

Controls whether or not the file will be copied.

If onlyIf is a function, is should have the signature: fn(srcPaths, destPath, onlyIf)
and it should return true if the file is to be copied, false otherwise. this is set to the task at hand.

If onlyIf is a string, then it must be one of:

  • always -- the source will always be copied to the destination
  • newer -- the source will only be copied if it is newer than the destination
  • modified -- the source will be copied only if its timestamp is different than that of the destination
  • missing -- the source will only be copied if there is no destination file

postProcess

Type: Function Default null

If present, this is run after the task is completed to allow to appropriately mark files to prevent redundant processing on the next go-around. this is set to the task at hand. Currently the default behavior is to update the modification time stamp to match the latest time stamp in the collection of source files, or to do nothing if there were no source files specified.


I would like to set up default options that can be modified at the top level. Specifically, the identity of a specific onlyIf predicate and a postProcess predicate.

The bad news...

Most things seems to work nicely. However, there's a hole. My coffee files are only building when they've changed, and that's great. However, LESS compilation are not triggering when I modify a file listed in the top level include, and the reason is plainly obvious. Without specifying a dependency as well as a source, there is no way for the general Grunt system to know that the LESS has that dependency. I suspect that Make and Ant, et al, have a method defined within the task that can mine for dependencies.

Still... All of my shallow source->dest relationships are super quick now.

@mehcode
mehcode commented Mar 1, 2013

@dancingplatypus Good stuff.

Now we need a node depedency tracer / resolver.

@cowboy
grunt member
cowboy commented Mar 1, 2013

FWIW, there's already a filter option in src-dest mappings that can be a function that returns false to exclude a file.

I'm not sure about postProcess though, as it would require code to be added to tasks to enable them to be "conditional" friendly.

@dancingplatypus

Filter seemed reasonable, although the timestamp comparison logic would still need to be supplied. More to the point is the importance of postProcess, which actually updates the time stamp to match the latest source. Without it, filter has nothing substantial with which to make its determination. As it stands, it doesn't actually need to be overwritten, as the default behavior supports timestamps, but given @mehcode's point, maybe a different strategy like MD5 would be better than using the timestamp. Still @cowboy, I'd still like your buy-in, and if there is a better built-in way to update timestamps after a task has been run, I'm interested.

@dominicbarnes

Why do timestamps need to be modified by Grunt explicitly? Wouldn't a grunt.file.write() call update the mtime implicitly by the OS? Also, the idea behind comparing mtimes is that we wouldn't need to store any additional data on disk. (such as a hash/md5 table) Post processing seems only applicable if md5 hashing is to be used as the comparator.

In addition, we shouldn't be touching the mtime of source files, as that would create a situation where the source file will always have an mtime that is greater/newer than the destination file, rendering the filter useless.

@dancingplatypus

Why do I set the timestamp? I use WebStorm and a couple of filewatchers, and whenever a file is modified, it rescans. it doesn't matter that the file is the same, you still have to wait while the program locks up and chugs for a while. It drove me nuts. So I wrote a little thing into the grunt-file-copy that would copy the timestamp. Worked great. It was only then that i realized that, even if WebStorm wasn't rescanning, I was copying a lot of assets that didn't need to be copied. A fresh embedded database, for instance. So I wrote the time stamp comparison functionality and added that to my personal fork of grunt-file-copy. It seemed like a nice thing to contribute, and I was directed here. I did all of this while grunt was still 0.3 and while the file stuff seemed a little more cavalier.

And no, I don't touch the source files. Only the destination. The postProcess was added because I said I would commoditize the choice of comparison technique in response to mehcode's observation, and for me, it was a logical place to update the time stamp to thwart my IDE's busy bodiness.

Frankly, cowboy's feedback does solve the problem. Yes, that timestamp comparison logic still has to be written, and sure it would be nice if it was out-of-the-box, but this is definitely workable. It wasn't much time to try out this experiment, and it actually is pretty fun having that in there as my builds times are silly short now. (You may be asking why my build times are so long to begin with? It's because I'm very industrious regarding some things, and notoriously lazy regarding other things). :-)

@dominicbarnes

@dancingplatypus Completely understandable, thanks for the explanation. A post-process hook could definitely have utility for this feature, I hadn't considered external influences like IDEs.

@xixixao
xixixao commented Mar 17, 2013

+1

@shama shama referenced this issue in gruntjs/grunt-contrib-jshint Apr 18, 2013
Closed

An alternate implementation #45

@bitwiseman

This is still open, but I don't understand the status of it.

The facilities provided by @dancingplatypus 's last commit seem fantastic to me. What's the next step here?

Dependency tracer / resolve would be good too, but it shouldn't block pushing this in.

@shama
grunt member
shama commented Jun 1, 2013

Knowing which files have been modified is not the problem. Knowing which files should be compiled is the problem. Just because a file has been modified does not always indicate that it should be compiled. No matter what method you use to poll the file. So without a generic dependency resolver that works everywhere... implementing this feature will only work for some and become complaints for others.

Likely if the magical generic dependency resolver ever does get created, it will and should be a standalone module and not part of Grunt. At that time I'm sure Grunt would highly consider implementing that module. But even then the filter API might be sufficient:

// pseudocode example
filter: function(filepath) {
  return wasEdited && require('magicdepresolve')(filepath);
}
@dancingplatypus

The default position -- that we should compile everything -- seems really heavy handed to me. Specifically in cases like grunt-file-copy, where there are no dependencies, a find my contribution to be a life saver -- or at least a hair saver.

Now I may be misreading shama, so forgive me if I'm restating. Ultimately, the problem I had with my solution very specifically reared its head while I was compiling LESS files that in turn included additional LESS files. I was forced to always compile these files because it would not compile the master LESS file even though a child LESS file had been modified. A pity.

The solution seems simple to me, and it's borrowed directly from the philosophy behind make and its variants. Any listing of files should be accompanied by a list of the files on which that list depends. The con is that It makes the grunt file much more verbose. The pro is that this will work generically across everything. It also still works without modification for things like grunt-file-copy. And finally, it would allow my change to be included into the grunt system with this additional consideration added on later.

@shama
grunt member
shama commented Jun 2, 2013

Not just compiling less too. This would be a problem for javascript, compass/sass, coffeescript, handlebars, ejs, eco, jade, stylus, ruby, php, typescript, etc... really any task in the ecosystem that can include another file. Which is to say most tasks do this.

A user can already verbosely filter their files by their own list of dependencies using the filter property. Also using grunt-contrib-watch you can get a list of modified files. See https://github.com/gruntjs/grunt-contrib-watch#compiling-files-as-needed So those needs can already be satisfied with the current Grunt API.

The feature that keeps being requested is automatically determining which files should be compiled across all tasks in the ecosystem. No commit to date satisfies that request.

@vladikoff vladikoff modified the milestone: 0.4.5, 0.4.4 Mar 12, 2014
@vladikoff vladikoff modified the milestone: 0.4.6, 0.4.5 Apr 8, 2014
@vladikoff vladikoff modified the milestone: 0.5.0, 0.4.6 Jun 19, 2014
@vladikoff
grunt member

Moving to milestone 0.5.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.