Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming zip/tar of directories #2

Closed
svenstaro opened this issue May 17, 2018 · 11 comments · Fixed by #138
Closed

Streaming zip/tar of directories #2

svenstaro opened this issue May 17, 2018 · 11 comments · Fixed by #138
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@svenstaro
Copy link
Owner

It would be cool and convenient to be able to stream directories on the host as archives on the fly if you want to offer many individual files for download.

@ghost
Copy link

ghost commented Feb 4, 2019

I might take a look at this, it's a cool feature :)

@svenstaro
Copy link
Owner Author

Yeah, so the way I imagine it to work is to have a link on every page that says "Download as tar | zip" and then it would redirect to <host>/some_dir/.zip or <host>/some_dir/.tar. Then we just need to hope that there is no .zip (and others) file in there which I suppose is a fair assumption to make. We should probably also allow for compression when users choose <host>/some_dir/.tar.(gz,bz,xz,etc).

@svenstaro
Copy link
Owner Author

We'd have to do some streaming. See here at the very bottom.

Also, I think we'd best use a query param like ?download=targz, ?download=zip, etc. I'd like to stay Rust native all in all so we'd have to use a compression library that is pure Rust. There is libflate and tar as well as zip-rs.

@svenstaro svenstaro added help wanted Extra attention is needed good first issue Good for newcomers labels Mar 4, 2019
@ghost
Copy link

ghost commented Mar 4, 2019

Using query parameters is probably better yes, when #44 is merged we'll have the struct to parse this.

@svenstaro
Copy link
Owner Author

We now have downloading of archives but not streaming as originally intended. I'm going to leave this open for now.

@ghost
Copy link

ghost commented Apr 7, 2019

We really gotta find a crate that allows streaming. That's the last killer feature we lack

@svenstaro
Copy link
Owner Author

Yes, definitely. Perhaps even add that to tar-rs.

@ghost
Copy link

ghost commented Apr 11, 2019

So, it seems it's doable with tar-rs, according to a comment on Reddit (thanks a lot to him). I could not try it yet, but here is the link of the a tool which is quite similar to miniserve, but has streaming support for folder downloads using tar-rs: https://github.com/lnicola/rusty-share

I must admit I'm not very experienced with async stuff, so it's not guaranteed I understand how he did it :p (relevant comment: https://www.reddit.com/r/rust/comments/bapows/whats_everyone_working_on_this_week_152019/ekmrzhm/)

cc @vojta7 (you, on the other hand, seem to understand async very well ;))

@vojta7
Copy link
Contributor

vojta7 commented Apr 13, 2019

I can try to look into it, but I wouldn't call my understanding of async good.

@gyscos
Copy link
Contributor

gyscos commented Jun 11, 2019

tar-rs provide streaming to a Write. We need to connect this Write to the actix response - but actix response expects a futures::Stream, which might potentially be approximated by a Read.

So what we need to do is have some bytes buffer shared by both a Read and a Write. Unsurprisingly, the Write will write to the pipe (and block/yield when it's full), and the Read will read from it (and block/yield when it's empty). One "thread" is started with the tar creation using the Write, and in the meantime the actix response is created using the Read. Using threads (and blocking on empty/full buffer) is the traditional solution, but if we want to go full-async we might also have them be tasks and yield when appropriate.

So the work to be done is:

  • Create a shared buffer with a Write and a Read halves. This is what Pipe does in rusty-share: https://github.com/lnicola/rusty-share/blob/master/src/pipe.rs
    • The Read part might just be a futures::sync::mpsc::Receiver<Bytes>, so it'll already have Stream<Bytes> implemented.
  • Create an adapter to connect the Read to actix-web's response, using BodyStream. Should be straightforward.
  • Connect tar-rs to the Write (possibly wrapped in a flate2::write::GzEncoder) instead of using a Vec, and start it in a separate thread.
    • Ideally we wouldn't need a thread, but it would need tar-rs to be async-aware and correctly yield (without data corruption) whenever the pipe is full. No idea if/when this will happen.

@gyscos
Copy link
Contributor

gyscos commented Jun 12, 2019

Note that zip streaming may wait on zip-rs/zip-old#16 - until then, zip-rs cannot stream the output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants