Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream files from the network #45

Closed
mholt opened this issue May 8, 2014 · 4 comments
Closed

Stream files from the network #45

mholt opened this issue May 8, 2014 · 4 comments
Milestone

Comments

@mholt
Copy link
Owner

mholt commented May 8, 2014

We can stream from file input elements, but how about the network? Would the server hosting the file have to support the Range HTTP header?

This could be awesome...

@mholt mholt added this to the 3.0 milestone May 8, 2014
@mholt
Copy link
Owner Author

mholt commented May 10, 2014

Some preliminary testing shows that this is not too difficult. Fortunately both the FileReader API and AJAX requests are asynchronous, so the groundwork is already laid to parse chunks of a CSV file asynchronously. We just need to build in support for using something like this:

$.ajax("/plu_codes.csv", {
    type: "GET",
    headers: {
        "Range": "bytes=0-1024"
    }
}).done(function(data)
{
    // ... treat data just as if it was a chunk read from the FileReader 
});

@mholt
Copy link
Owner Author

mholt commented May 10, 2014

Thinking out loud here...

Already in the current version of Papa Parse, downloading a file and parsing it (assuming it is not too huge and can fit in memory well -- say, under 100 MB) works as easy as this:

$.get("some_file.csv", function(data) {
    var results = $.parse(data);
});

But that doesn't "stream" the file: if that file is too big for the browser tab to handle, say even 1 GB, then this would just cause the browser the lock up. In order to download and parse huge files, while keeping Papa easy to use, what about invoking Papa so that it uses the Range header as described above like this:

$.parse("some_file.csv", {
    ajax: true,
    step: function(data, jqxhr) {
        console.log(data.results);
    }
});

So you specify ajax: true in the config object in order to tell Papa that the string you gave it is a path to a CSV file to download, so it uses a GET request to download the file. If you also specify the step function, as we have here, it uses the Range header to stream the file chunks at a time.

Two things to work out still:

  1. When doing AJAX parsing, the call to $.parse is asynchronous, meaning it needs a callback function. This is similar to how files are already parsed (you supply a complete) callback. How should this work?
  2. The underlying AJAX requests aren't customizable. I'm worried that letting users pass in a config object for $.ajax would make it easy for users to break Papa unintentionally. In other words, the target file better be accessible with a simple GET request. It's a tradeoff I think I'm willing to make, but will accept feedback if anyone has it.

Since this is for 3.0, I'm willing to make big breaking changes to keep Papa easy to use.

@mholt
Copy link
Owner Author

mholt commented May 12, 2014

Okay, I think I've resolved both those things.

$.get.parse("files/asdf.csv", {
    config: {
        step: function(data, handle) {
            console.log(data, handle);
            // handle gives access to pause(), resume(), jqxhr, etc.
        }
    },
    complete: function(data) {
        console.log("Done!");
    }
});

Calling $.get.parse indicates to Papa that the string given it should be downloaded via a GET request to then be parsed. The second argument has basically the same object structure as when you parse a file, thus resolving number (1) from above.

Number (2) above is resolved because I've decided that the AJAX request done by Papa will be a simple GET request. However, the internal functions that perform the network requests, file reading, and do the parsing will be exposed so the user can utilize them at a lower level if desired.

@mholt mholt removed the deferred label May 13, 2014
@mholt mholt changed the title Is it feasible to stream from the network? Stream files from the network Jul 7, 2014
@mholt
Copy link
Owner Author

mholt commented Jul 11, 2014

Still have some tweaking and optimizing to do, but this is now done.

@mholt mholt closed this as completed Jul 11, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant