Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Option to pass file metadata on file stream itself #356

Open
joedski opened this issue Jun 10, 2017 · 9 comments
Open

Feature request: Option to pass file metadata on file stream itself #356

joedski opened this issue Jun 10, 2017 · 9 comments
Assignees

Comments

@joedski
Copy link

joedski commented Jun 10, 2017

Request doesn't currently have any logic to do transfer-encoding: chunked. It currently tries to always send a content-length header, but form-data cannot just pull that number out of thin air without processing the whole stream.

Ordinarily, this would be solved by passing the knownLength file option, either to form-data directly or to request, however this is not possible when using libraries that themselves use request, such as box-node-sdk.

Currently, I am working around this by taking advantage of form-data's automatic detection of content-length headers on IncomingMessage instances and its use of the name prop (presumably from Browser File objects) to derive the mime-type, though this also means I'm having the client-side code send the file meta along side the file content itself.

myStream.httpVersion = '1.0'; // value technically doesn't matter, but might as well use a valid value.
myStream.headers = { 'content-length': myStreamKnownLength };
myStream.name = myStreamFileName;

However, it would be nice if there were a more sanctioned way to do this, something like:

myStream._formDataOptions = {
  filename: myStreamFileName,
  knownLength: myStreamKnownLength,
};

Aside: While box-node-sdk lets you pass additional base options to request, i'm not sure every library that uses request does this, so even if request did support transfer-encoding: chunked I'm not certain the end user would always be able to specify that. Being able to pass the file meta straight on the file stream without resorting to tricking form-data would ensure some sort of escape hatch for those end users.

I'm open to doing up a PR if the idea presented here is acceptable.

@alexindigo
Copy link
Member

Hey, thanks for the thought. Just to better understand the idea – you need to pass transfer-encoding: chunked as part of your http envelop, not on each multipart item. So I'm not sure how adding more stuff to a file stream would solve the issue. Also, I remember doing some experiments to make it send chunked encoding and it was much easier than you're describing here. Although I don't remember all the nitty-gritty details on top of my head.

And I'm unfamiliar with box-node-sdk, does it just pass options to FormData?
Can you pass object with headers property to the submit method?

@joedski
Copy link
Author

joedski commented Jun 11, 2017

I probably should have provided a concrete example of my issue!

My primary concern is not with passing data to form-data itself, since I could just pass knownLength directly and be done with it, nor with passing data to request since that lets you also specify the options to pass to form-data including knownLength. Rather, it is with libraries that do not expose the object handed to request at all.

For instance, when using the box-node-sdk, you can pass some additional options to it to configure all requests it makes using request, but you can't configure individual requests, and you can't pass different options for requests which upload versus those which don't, and you don't directly create anything for the request body itself.

Basically, when uploading a file, all you have access to is this:

const BoxSDK = require('box-node-sdk');
const sdk = new BoxSDK({ clientID: CLIENT_ID, clientSecret: CLIENT_SECRET });
const boxClient = sdk.getBasicClient(USER_ACCESS_TOKEN);
// no way to pass in file meta here.
boxClient.files.uploadFile(parentFolder.id, fileName, fileStream, callback);

That goes through a couple layers until it finally reaches the internal wrapper around request, but the user of box-node-sdk never never gets to actually see any of that, let alone when request builds the body using form-data.

This means while something like the following work:

boxClient.files.uploadFile(parentFolder.id, fs.createReadStream(somePath), callback);
boxClient.files.uploadFile(parentFolder.id, request(otherResource), callback);

This does not:

// readstream metadata is stuck back on the readstream.  There is nothing on the passthrough stream.
boxClient.files.uploadFile(parentFolder.id, fs.createReadStream(somePath).pipe(new PassThrough()), callback);

Or, more specific to my case, using busboy to handle multipart/form-data requests from the client:

// koa 2 router handler.
async function handleUpload(ctx) {
  const busboy = new Busboy(ctx.req.headers);
  const uploadPromises = [];

  busboy.on('file', (fieldname, file, filename) => {
    uploadPromises.push(new Promise((resolve, reject) => {
      // `file` is a stream.
      // `boxClient` here is attached to the current request context.
      ctx.boxClient.files.uploadFile('0', file, (error, uploadList) => {
        if (error) reject(error);
        else resolve(uploadList);
      });
    }));
  });

  const busboyDone = new Promise((resolve, reject) => {
    busboy.once('end', resolve);
    busboy.once('error', reject);
  });

  ctx.req.pipe(busboy);

  try {
    await busboyDone;
    await Promise.all(uploadPromises);
    ctx.status = 200;
    ctx.body = ':)';
  }
  catch (error) {
    console.error(error);

    ctx.status = 500;
    ctx.body = ':(';
  }
}

In this case, the file will fail to upload because the file stream created by Busboy does not have any information on it that form-data can use to determine a knownSize. The boxClient from box-node-sdk builds a request object to pass to request, including the formData option which causes it to create a body using form-data. request then asks form-data what size the body is going to be because it wants to create a content-length header, but as the file stream created by busboy does not have any metadata on it, neither in the style of a ReadStream nor an IncomingMessage, form-data tells request a smaller size, and request then sets the content-length header with that smaller size, and the request body is truncated and the request closed prematurely.

In my specific case, I got around this by making the client send file metadata before the file itself and then using the above trickery to make form-data pick up on that metadata:

// koa 2 router handler.
async function handleUpload(ctx) {
  const busboy = new Busboy(ctx.req.headers);
  const uploadPromises = [];
  const fileMetas = {};

  // Check fields for any named `(fileFieldname).meta`
  busboy.on('field', (fieldname, val) => {
    if (/\.meta$/.test(fieldname)) {
      const fileFieldname = fieldname.replace(/\.meta$/, '');
      // size, name, type from the browser File object.
      fileMetas[fileFieldname] = JSON.parse(val);
    }
  });

  busboy.on('file', (fieldname, file, filename) => {
    uploadPromises.push(new Promise((resolve, reject) => {
      // Correlate data from `(fileFieldname).meta` with `(fileFieldname)`

      if (!fileMetas[fieldname]) {
        // no meta?  dump file.
        file.resume();
        return reject(new Error(`No file meta for ${fieldname}`));
      }

      // fake it til ya make it.
      file.httpVersion = '1.0';
      file.headers = { 'content-length': fileMetas[fieldname].size };
      file.name = filename;

      // `file` is a stream.
      // `boxClient` here is attached to the current request context.
      ctx.boxClient.files.uploadFile('0', file, (error, uploadList) => {
        if (error) reject(error);
        else resolve(uploadList);
      });
    }));
  });

  const busboyDone = new Promise((resolve, reject) => {
    busboy.once('end', resolve);
    busboy.once('error', reject);
  });

  ctx.req.pipe(busboy);

  try {
    await busboyDone;
    await Promise.all(uploadPromises);
    ctx.status = 200;
    ctx.body = ':)';
  }
  catch (error) {
    console.error(error);

    ctx.status = 500;
    ctx.body = ':(';
  }
}

In the interest of full disclosure, the Box folks have recently added a chunking file upload method intended for very large files or unreliable networks, although they do this by directly breaking things into chunks within their client so that separate parts may be retried. So far as I know, this doesn't have anything to do with transfer-encoding: chunked. Also not sure about the memory use, though it's probably not bad. I may try converting the server to that if I have time next sprint.

It also does not absolve my concerns with libraries other than box-node-sdk which themselves wrap around request but may not allow any way to specify the size of a file. Obviously the Correct Way would be to get them to let the library user specify the form data options for each file, but getting that done in a timely or consistent manner isn't always possible.

Hope that explains the sort of situations I'm concerned with!

@alexindigo
Copy link
Member

Wow. :) Thanks for the detailed explanation. I'm still dizzy from the shear amount of new information :)

Let me know to boil it down.

Passing stream works when they're passed directly (i.e. fs-stream, request, http-stream), but doesn't work when they're being transform by some intermediary, like fs.createReadStream(somePath).pipe(new PassThrough()). Correct?

And your solution is to stick extra properties on the passed stream, like MyStream._formDataStuff.
Wouldn't it be stripped by that same intermediary that strips all other meta info.

And if you have Transfer-Encoding chunked per multipart block, request would still try to stick Content-Length on the envelope which will mess with receiving end.

Or did I miss something?

Thank you.

@joedski
Copy link
Author

joedski commented Jun 11, 2017

Thanks for bearing with me here!

Passing a stream directly from an fs-stream, request, or an http-stream does work because they have extra data attached to them which FormData picks up on automatically, but indeed doesn't work when you don't have that kind of stream.

Examples of those streams which do not work include:

  • The fs.createReadStream(somePath).pipe(new PassThrough()) example as you noted, where PassThrough here may be any Transform or Duplex.
  • The file stream created by Busboy when it's parsing a an http IncomingMessage received by the node app, which is a small slice of the whole IncomingMessage.
    • Busboy does not attach any size related metadata directly to this file stream it creates, largely because I think it cannot without processing the entire stream, which would require holding that entire stream either in memory or on disk.
  • Any other Readable (or Duplex or Transform) stream created from some arbitrary resource

My solution was to attach that missing data by making it look kiiiinda like an IncomingMessage with the myStream.httpVersion = '1.0' and content-length header. I would have to do this same "attach props to make it look kiiiinda like an IncomingMessage" thing if I passed my stream through a transform or duplex, which of course results in a new stream that would get passed on to whatever upload thing I'm using. The entire reason I attached the IncomingMessage-style props was because I cannot pass the extra file-related options into the library that I'm using, because it doesn't expose request or form-data directly, and also doesn't give a way to pass said file options. It only accepts the file stream by itself.

I can't even use transfer-encoding: chunked anywhere because request currently does not have any logic to support it, so if one does add the transfer-encoding: chunked header, request will indeed still try to add a content-length header, which as you note there will screw with the receiving end. But the other issue is still that unless I attach extra properties to the file stream itself, I cannot even tell require and form-data what the size of the file should be, since as noted I don't have direct access to either.

The solution that I would like to have is some obviously named property that I could attach to any stream that form-data will use, rather than making the stream look like an IncomingMessage or something else.

Which is to say, currently I have to do this to pass the data down through whatever library I'm using:

// Make it look kinda like an http.IncomingMessage... except the path, which looks like an fs.ReadStream.
myStream.httpVersion = '1.0'; // value technically doesn't matter, but might as well use a valid value.
myStream.headers = { 'content-length': myStreamKnownLength };
myStream.name = myStreamFileName;

And I would like to do this:

// Obvious name makes what is happening here plain to see.
myStream._formDataOptions = {
  filename: myStreamFileName,
  knownLength: myStreamKnownLength,
};

So that things like handling data from Busboy or retaining metadata from an fs ReadStream after Transforms or Duplexes is doable in a known and documented way rather than in a way that takes advantage of the internals of form-data that may change.

I hope that's a bit clearer!

@alexindigo
Copy link
Member

Oh, I think I got it. So you just need more simple way to augment already pre-messed up streams.
I think it's should be pretty simple to add. But do you want it to be something this package specific, like _formDataOptions, or maybe better something more generic that could be used by other (similar libs), like _streamUploadMeta or something in that fashion?

@joedski
Copy link
Author

joedski commented Jun 12, 2017

Exactly!

And, that's actually a very good question. I suppose something like _streamUploadMeta or even _fileMeta might be better than _formDataOptions. I suppose in that case, maybe it should look kinda like a File in the browser? That is, { size, name, type }?

I suppose another option along this vein is to just attach size, name, and type directly to the stream itself for even more File-like look. I kinda dislike that, I like independent props to namespace things, but having an official way to do it is best regardless of where that metadata goes.

So, from that, I think _streamUploadMeta is what I'd prefer.

@DylanPiercey
Copy link
Member

DylanPiercey commented Jul 15, 2017

@joedski let me start by saying that this issue has been a TLDR for me, but I did skim it.

I think the idea is a good one. I personally think attaching the properties directly to the stream is the best for a couple reasons.

  1. It doesn't look disgusting.
  2. It allows us to pass in objects that emulate the file object.

If we were to go this route though the only allowed fields should be name, size and type like you mentioned earlier in order to match the browser side api. My dream is to eventually use the browsers file api isomorphically, similar to this and I think this is a good step toward that.

I think it's something we could add to feature parity issue, what's your thoughts @alexindigo?

@DylanPiercey DylanPiercey self-assigned this Jul 15, 2017
@joedski
Copy link
Author

joedski commented Jul 15, 2017

I had concerns about possible name collisions, but looking at a fresh instance of a PassThrough stream didn't show any preexisting props named name, size, or type so, at least from the standpoint of core streams/readable-stream, that shouldn't be a problem.

Isomorphism is certainly a good argument. I myself prefer things be neatly namespaced (in as much as underscores are 'neat') but ultimately any documented, fixed, supported API will do for me.

@fomojola
Copy link

As an extra vote for this: just spent 2 hours trying to figure out request/request#2499. Currently form-data is broken for anything other than the 3 supported stream types (fs readstreams, request streams and httpIncoming) when really all it needs to know is the file size. I ended up doing something similar to whats described here: adding fake httpVersion and content-length headers, and that worked. A standard way to specify the size for arbitrary stream-compatible objects would really be the way to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants