Provide file streams instead of metadata to resolvers #13

jaydenseric · 2017-08-25T01:19:43Z

I will flesh out this description out in time. There are lots of little conversations about this hidden around.

Ideal scenario

A new user fills in a registration form. Fields submitted include username (String), password (String) avatar (File), and banner (File). The files are 4 MB each.

The non-file fields can be validated in the resolver as soon as the request comes in to the server, while the uploads asynchronously stream in. It turns out the username is already taken, so a validation error is thrown and the request is aborted. The user gets immediate feedback without having to wait for the whole upload to complete, and the server is not unnecessary burdened.

Because the files are streams in the resolver, they can be individually forwarded into cloud storage, skipping temporary storage on the API server's memory or filesystem. This allows the first avatar file to begin uploading to cloud storage while the second second banner file is still being received by the API. Because the files are not stored on the API server, data security compliance is simpler.

Thoughts

With this setup, a developer could even use different methods to store each file from a single request.

I think it is best to just substitute files for file streams in the resolvers; no more metadata. The API is less opinionated, easier to implement and less complicated to document. Users can extract whatever metadata is useful to the them from the streams.

To prevent file uploads from blocking the resolvers (as they do currently with formidable) we will probably need to provide a new files field in the multipart form sent from the client. This will contain a JSON array of all the file object paths, i.e. ["0.variables.avatar", "0.variables.banner"]. It, along with the existing operations fields should be the very first fields in the multipart form, will all the files following. This will allow us to build an operations object with placeholder file streams to then pass into the GraphQL server and the resolvers. The form can then continue streaming up in parallel to the resolvers running; each file that gets parsed can be streamed into the relevant placeholder stream constructed earlier.

I suspect that to engineer the new system we might have to write a custom multipart form parser, because we need to create streams before the file fields are parsed and options like formidable only create streams as they are encountered. We might also be better off writing the new functionality directly into the GraphQL server middlewares.

Also, I think when we move to streams we should start using a new word like "blob" instead of "file", because people will be able to use any sort of binary data that can't JSON encode in their variables. Relevant: jaydenseric/extract-files#2.

The text was updated successfully, but these errors were encountered:

jaydenseric · 2017-08-25T01:34:50Z

This issue would go away.

du5rte · 2017-08-26T19:10:29Z

After looking deeper into multipart forms and checking top parser, I got to the same conclusion you did formidable is the way to go.

After digging into the library I found a way to skip the parsing. Formidable exposes a onPart method for user to edit.

You may overwrite this method if you are interested in directly accessing the multipart stream. Doing so will disable any 'field' / 'file' events processing which would occur otherwise, making you fully responsible for handling the processing.

Example from their docs

If you want to use formidable to only handle certain parts for you, you can do so:

form.onPart = function(part) {
  if (!part.filename) {
    // let formidable handle all non-file parts
    form.handlePart(part);
  }
}

Let formidable parse the field parts and skip the files

  form.onPart = function(part) {
    if (!part.filename) {
      // let formidable handle all non-file parts
      form.handlePart(part);
    } else {
      // skip handlePart on files, just pass as stream
      this.emit('file', part.name, part);
    }
  }

Parser will then return the parsed fields and the file stream

form.parse(req, (error, file, files) => {
  // files["variables.avatar"] instanceof stream.Stream === true
})

I first introduced a new metadata field to have a file size

{
  "variables.avatar": {
    "lastModified": 1503606654000,
    "name": "avatar.gif",
    "size": 317866,
    "type": "image/gif"
  }
}

But maybe embedding the metadata on the variables isn't such a bad idea?

{
  "query": "mutation uploadAvatarMutation($user: ID! $avatar: Upload!) {uploadAvatar(id: $id avatar: $avatar) {resulturl} }",
  "variables": {
    "id": "598645558d905ae18524bc55",

    "avatar": {
      "lastModified": 1503606654000,
      "name": "avatar.gif",
      "size": 317866,
      "type": "image/gif"
    }

  },
  "operationName": "uploadAvatarMutation"
}

The metadata info can be picked into the file, and then overwritten with then file

{ query: 'mutation uploadAvatarMutation($user: ID! $avatar: Upload!) {uploadAvatar(id: $id avatar: $avatar) {resulturl} }',
  variables: 
   { id: '598645558d905ae18524bc55',

     avatar: 
      Stream {
        domain: null,
        _events: {},
        _eventsCount: 0,
        _maxListeners: undefined,
        readable: true,
        headers: [Object],
        name: 'avatar.gif',
        filename: 'avatar.gif',
        mime: 'image/gif',
        transferEncoding: 'binary',
        transferBuffer: '',
        size: 317866,
        type: 'image/gif' } },

  operationName: 'uploadAvatarMutation' }

I also created a ScalarType to double check Upload type is a stream and definitely not the metadata

function coerceStream(value) {
  if (!(value instanceof stream.Stream)) {
    throw new TypeError('Field error: value is not an instance of Stream');
  }

  // do more checks on the properties and values

  return value;
}

export default new GraphQLScalarType({
  name: 'Upload',
  serialize: coerceStream,
  parseValue: coerceStream,
  parseLiteral: coerceStream
}

I'm working on putting the code I tested on a fork, let know your thoughts on this approach

jaydenseric · 2017-08-27T02:59:35Z

Thanks for taking the time to dig in, this is going to be tricky! I updated the description with an ideal scenario, and my starter thoughts.

But maybe embedding the metadata on the variables isn't such a bad idea?

If we provide metadata from the client, developers will be tempted to trust it, which they should not. The current metadata comes from parsing the actual file that is received by the server. Also, metadata will increase the size of the request.

jaydenseric · 2017-11-18T09:48:59Z

We are ready to start working on this.

I created spec-v2 branches for both apollo-upload-client and apollo-upload-server.

We have a GraphQL multipart request spec v2.0.0 draft that makes it possible to implement file deduplication, file upload streams in resolvers and aborting file uploads in resolvers.

First we update apollo-upload-client to implement the new spec, then apollo-upload-server. We can tweak the spec draft if there are any implementation issues.

At first we don't need to implement all the possible new features such as file deduplication; the main feature is streams in the resolvers.

Once everything is working, I will publish new versions of all 3 projects simultaneously.

* New API to support the [GraphQL multipart request spec v2.0.0-alpha.2](https://github.com/jaydenseric/graphql-multipart-request-spec/releases/tag/v2.0.0-alpha.2). Files no longer upload to the filesystem; [readable streams](https://nodejs.org/api/stream.html#stream_readable_streams) are used in resolvers instead. Fixes [#13](#13). * Export a new `Upload` scalar type to use in place of the old `Upload` input type. It represents a file upload promise that resolves an object containing `stream`, `filename`, `mimetype` and `encoding`. * Deprecated the `uploadDir` middleware option. * Added new `maxFieldSize`, `maxFileSize` and `maxFiles` middleware options. * `graphql` is now a peer dependency. * Middleware are now arrow functions.

New API to support the GraphQL multipart request spec v2. Fixes [#13](#13).

* New API to support the [GraphQL multipart request spec v2.0.0-alpha.2](https://github.com/jaydenseric/graphql-multipart-request-spec/releases/tag/v2.0.0-alpha.2). Files no longer upload to the filesystem; [readable streams](https://nodejs.org/api/stream.html#stream_readable_streams) are used in resolvers instead. Fixes [#13](jaydenseric/graphql-upload#13). * Export a new `Upload` scalar type to use in place of the old `Upload` input type. It represents a file upload promise that resolves an object containing `stream`, `filename`, `mimetype` and `encoding`. * Deprecated the `uploadDir` middleware option. * Added new `maxFieldSize`, `maxFileSize` and `maxFiles` middleware options. * `graphql` is now a peer dependency. * Middleware are now arrow functions.

New API to support the GraphQL multipart request spec v2. Fixes [#13](jaydenseric/graphql-upload#13).

jaydenseric added the enhancement label Aug 25, 2017

jaydenseric mentioned this issue Oct 5, 2017

Cannot name file #14

Closed

This was referenced Nov 16, 2017

Auto delete file from directory? #2

Closed

allow passing configurations to formidable #11

Closed

jaydenseric changed the title ~~Investigate providing file streams instead of metadata to resolvers~~ Provide file streams instead of metadata to resolvers Nov 18, 2017

jaydenseric mentioned this issue Nov 19, 2017

New API to support the GraphQL multipart request spec v2 #22

Merged

jaydenseric closed this as completed in #22 Nov 19, 2017

jaydenseric added a commit that referenced this issue Nov 19, 2017

Merge pull request #22 from jaydenseric/spec-v2

2c104ef

New API to support the GraphQL multipart request spec v2. Fixes [#13](#13).

jaydenseric mentioned this issue Jan 2, 2018

How does it compare to express-graphql way ? jaydenseric/graphql-multipart-request-spec#5

Closed

krasivyy3954 added a commit to krasivyy3954/react-graphql-upload that referenced this issue Jan 6, 2023

Merge pull request #22 from jaydenseric/spec-v2

439b4c0

New API to support the GraphQL multipart request spec v2. Fixes [#13](jaydenseric/graphql-upload#13).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide file streams instead of metadata to resolvers #13

Provide file streams instead of metadata to resolvers #13

jaydenseric commented Aug 25, 2017 •

edited

Loading

jaydenseric commented Aug 25, 2017

du5rte commented Aug 26, 2017 •

edited

Loading

jaydenseric commented Aug 27, 2017 •

edited

Loading

jaydenseric commented Nov 18, 2017 •

edited

Loading

Provide file streams instead of metadata to resolvers #13

Provide file streams instead of metadata to resolvers #13

Comments

jaydenseric commented Aug 25, 2017 • edited Loading

Ideal scenario

Thoughts

jaydenseric commented Aug 25, 2017

du5rte commented Aug 26, 2017 • edited Loading

jaydenseric commented Aug 27, 2017 • edited Loading

jaydenseric commented Nov 18, 2017 • edited Loading

jaydenseric commented Aug 25, 2017 •

edited

Loading

du5rte commented Aug 26, 2017 •

edited

Loading

jaydenseric commented Aug 27, 2017 •

edited

Loading

jaydenseric commented Nov 18, 2017 •

edited

Loading