Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poll: Remove support for file.contents being a stream #79

Closed
yocontra opened this issue Dec 31, 2013 · 22 comments
Closed

Poll: Remove support for file.contents being a stream #79

yocontra opened this issue Dec 31, 2013 · 22 comments

Comments

@yocontra
Copy link
Member

This was originally added to support extremely large files (like a 4gb video file) but this seems like an extreme edge case.

Pros:

  • Useful for dealing with transforming large files

Cons:

  • Makes plugin development complicated
  • Makes core code complicated
  • Restricting new features that would not support streaming contents

👎 to get rid of it

👍 to keep it

cc @robrich @nfroidure @sindresorhus @phated @funkytek @lazd @travm @dashed

@sindresorhus
Copy link
Contributor

👎 complicates everything for minimal gain.

@phated
Copy link
Member

phated commented Dec 31, 2013

What about just splitting it apart from the gulp.src API? I think dealing with large files should still be supported,

@yocontra
Copy link
Member Author

@phated But then plugins need to support it right?

@robrich
Copy link
Contributor

robrich commented Dec 31, 2013

gulp.src('big/files/*.mp4', {read:false})
  .pipe(es.map(function (file, cb) {
    fs.createReadStream(file.path);
      .pipe(doSomethingWithBigFile())
      .pipe(gutil.replaceExtension(file.path, '.min.mp4'));
   cb(null, file);
  });

It kinda defeats the point of the streaming thing, but it supports the edge-case without inflicting the paradigm on plugin authors, who arguably have all-but ignored it.

@phated
Copy link
Member

phated commented Dec 31, 2013

I think

var contentsAsStream = es.map(function (file, cb) {
  file.contents = fs.createReadStream(file.path);
  cb(null, file);
});

would be a fine abstraction that end users could drop in, but it goes against gulp guidelines of passing along contents the same way you received them.

@nfroidure
Copy link

IMO, we should promote the stream usage for plugin developers. Streams are improving memory/CPU usage, can deal with large files etc...

Plugins are often wrappers for existing libraries. Existing libraries often have concurrent libraries dealing with streams. I think the reason why streams aren't used so much is that they are hidden behind an option

Another pros is that tools creates usage. If we try to create a Grunt clone so, ok, let's drop streams. But if we want to have a new tool to create new usages, dealing with streams is a must.

I'm thinking of the @contra idea of creating things like:

webdav.src('xxx@xxx.com:/folder/*.mp4')
  .pipe(ftp.dest('xxx@xxx.com:/downloads/'))

In a network context, streams are essentials.

👍

@yocontra
Copy link
Member Author

@nfroidure Good points but I think the differentiator between grunt and gulp is not file contents as a stream but files as a stream. I disagree that existing libraries often have ways of dealing with streams. The majority of existing plugins would require doing the conversion dance (stream->buffer->string->process file contents->string->buffer->stream) if they wanted to add stream support.

@doowb
Copy link
Member

doowb commented Dec 31, 2013

👍

I started doing node programming by writing grunt plugins, then extracting code into standalone modules. Gulp has shown me more about streams than anything else and I've started thinking of how I can use streams in my libraries to support both grunt and gulp plugins. This is doable because grunt doesn't read any files in for you, the plugin has to, so you can still use streams from within the plugin.

Of course, after writing that last piece, I realize now why I was struggling when trying to write my first gulp plugin. I thought that the actual files were being streamed, so I created a library to read/write a stream and was just wrapped it in a gulp plugin. I was surprised when I got the File object with content instead. I think that was just my lack of reading all the docs first.

With that... I like the option of having the streams, but I can see how it makes it more difficult when building a task with plugins that use the contents in different ways.

@nfroidure
Copy link

@contra in the web project task context, i think you are right since many tasks need to have a global comprehension of the file content to be able to start output.

But i think Gulp can reach many other domains, especially if it can deal with big files: statistics, financial treatments, log analysis, dba etc...

For the Grunt comparison, i admit that glob streams are also an important part of the Gulp input. It enforce the fact that streams are leading to innovation.

About plugin development, we can get rid of complexity by abstracting buffers/streams at the vinyl level.

@sindresorhus
Copy link
Contributor

Keep in mind that while streaming is nice, it only requires one non-streaming plugin to make the whole chain non-streaming. Streaming is also mostly about memory consumption as the performance most likely isn't that much better since the files are run concurrently through the chain.

Existing libraries often have concurrent libraries dealing with streams. I think the reason why streams aren't used so much is that they are hidden behind an option

From my experience very few do. This is because streams are hard and not everything can be streaming.

But i think Gulp can reach many other domains, especially if it can deal with big files: statistics, financial treatments, log analysis, dba etc...

Maybe gulp should be rewritten in C so it could handle heavy CPU tasks? I kid of course, but gulp shouldn't be designed for potential future usage. Rather make it awesome for what it's used intended for today. Building things fast. It can always evolve into that if there is a need.

@yocontra
Copy link
Member Author

@nfroidure gulp is basically just a conglomerate of other modules (vinyl, vinyl-fs, glob-watcher, etc.) that fill the web dev use case. I think any other use cases should have their own toolkit based around vinyl or we are doing all of them an injustice.

@nfroidure
Copy link

@contra is supporting streams with vinyl / vinyl-fs concerned by this issue result ?

@yocontra
Copy link
Member Author

@nfroidure no this is only for gulp. Basically in the gulp wrapper around vinyl we would remove the ability to buffer: false in .src() and remove all mention of file.contents as a stream from the plugin guidelines

@yocontra
Copy link
Member Author

I think we should try to fix this at the vinyl level and if it is still too cumbersome for plugin devs we can remove it

@nfroidure
Copy link

You may find useful to know that by abstracting buffers and streams on a single API, i found that relying on streams was the simplest and safest way to proceed. Here are the details of the implementation: gulpjs/vinyl#3 (comment)

Any feedback is welcome, especially on the fact that it do not break plugins that are already using streams. If you made one or if you made tasks using stream compatible plugins, you can test it like that:

git clone git@github.com:nfroidure/vinyl.git
cd vinyl
su npm link

// go to your gulp tasks project directory
rm -r node_modules/gulp/node_modules/vinyl
rm -r node_modules/gulp/node_modules/gulp-utils/node_modules/vinyl
rm -r node_modules/gulp-utils/node_modules/vinyl
npm link vinyl

Thanks in advance!

@floatdrop
Copy link
Contributor

This should be a method, that can be called to read file and get Buffer/Stream on demand (IMHO) 👍

@nfroidure
Copy link

Something like that in the Vinyl setters of https://github.com/nfroidure/vinyl/blob/master/index.js#L119 :

(function(cache) {
Object.defineProperty(File.prototype, 'contents', {
  get: function() {
      return cache || (cache = Fs.createReadStream(this.path));
  },
  set: function(value) {
      cache = value;
  }
});
})(null)

So it would stat files only when required. Another way of moving null contents out of Vinyl.

@floatdrop
Copy link
Contributor

@nfroidure it would be great to have (also for stats too).

@nfroidure
Copy link

It would break https://github.com/peter-vilja/gulp-clean since the file descriptors will not be created when deleting the files. A single access to file contents before deleting them could fix it.

@yocontra
Copy link
Member Author

yocontra commented Jan 4, 2014

After looking at @nfroidure BufferStream module I think it's now easy enough to support streams and that we should keep them. As for file.contents/stats on demand go ahead and move that discussion to Vinyl issues.

@yocontra yocontra closed this as completed Jan 4, 2014
@floatdrop
Copy link
Contributor

Streams - is already data on demand, so only stats is a little problem (I thinks this is a cheap operation).

@nfroidure
Copy link

If we decide to finally keep both buffer and streams support, it can be a big win to not read sync til it's necessary in buffer mode. Let open an issue on vinyl to start the discuss on it. gulpjs/vinyl#6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants