Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide file streams instead of metadata to resolvers #13

Closed
jaydenseric opened this issue Aug 25, 2017 · 4 comments · Fixed by #22
Closed

Provide file streams instead of metadata to resolvers #13

jaydenseric opened this issue Aug 25, 2017 · 4 comments · Fixed by #22

Comments

@jaydenseric
Copy link
Owner

jaydenseric commented Aug 25, 2017

I will flesh out this description out in time. There are lots of little conversations about this hidden around.

Ideal scenario

A new user fills in a registration form. Fields submitted include username (String), password (String) avatar (File), and banner (File). The files are 4 MB each.

The non-file fields can be validated in the resolver as soon as the request comes in to the server, while the uploads asynchronously stream in. It turns out the username is already taken, so a validation error is thrown and the request is aborted. The user gets immediate feedback without having to wait for the whole upload to complete, and the server is not unnecessary burdened.

Because the files are streams in the resolver, they can be individually forwarded into cloud storage, skipping temporary storage on the API server's memory or filesystem. This allows the first avatar file to begin uploading to cloud storage while the second second banner file is still being received by the API. Because the files are not stored on the API server, data security compliance is simpler.

Thoughts

With this setup, a developer could even use different methods to store each file from a single request.

I think it is best to just substitute files for file streams in the resolvers; no more metadata. The API is less opinionated, easier to implement and less complicated to document. Users can extract whatever metadata is useful to the them from the streams.

To prevent file uploads from blocking the resolvers (as they do currently with formidable) we will probably need to provide a new files field in the multipart form sent from the client. This will contain a JSON array of all the file object paths, i.e. ["0.variables.avatar", "0.variables.banner"]. It, along with the existing operations fields should be the very first fields in the multipart form, will all the files following. This will allow us to build an operations object with placeholder file streams to then pass into the GraphQL server and the resolvers. The form can then continue streaming up in parallel to the resolvers running; each file that gets parsed can be streamed into the relevant placeholder stream constructed earlier.

I suspect that to engineer the new system we might have to write a custom multipart form parser, because we need to create streams before the file fields are parsed and options like formidable only create streams as they are encountered. We might also be better off writing the new functionality directly into the GraphQL server middlewares.

Also, I think when we move to streams we should start using a new word like "blob" instead of "file", because people will be able to use any sort of binary data that can't JSON encode in their variables. Relevant: jaydenseric/extract-files#2.

@jaydenseric
Copy link
Owner Author

This issue would go away.

@du5rte
Copy link

du5rte commented Aug 26, 2017

After looking deeper into multipart forms and checking top parser, I got to the same conclusion you did formidable is the way to go.

After digging into the library I found a way to skip the parsing. Formidable exposes a onPart method for user to edit.

You may overwrite this method if you are interested in directly accessing the multipart stream. Doing so will disable any 'field' / 'file' events processing which would occur otherwise, making you fully responsible for handling the processing.

Example from their docs

If you want to use formidable to only handle certain parts for you, you can do so:

form.onPart = function(part) {
  if (!part.filename) {
    // let formidable handle all non-file parts
    form.handlePart(part);
  }
}

Let formidable parse the field parts and skip the files

  form.onPart = function(part) {
    if (!part.filename) {
      // let formidable handle all non-file parts
      form.handlePart(part);
    } else {
      // skip handlePart on files, just pass as stream
      this.emit('file', part.name, part);
    }
  }

Parser will then return the parsed fields and the file stream

form.parse(req, (error, file, files) => {
  // files["variables.avatar"] instanceof stream.Stream === true
})

I first introduced a new metadata field to have a file size

{
  "variables.avatar": {
    "lastModified": 1503606654000,
    "name": "avatar.gif",
    "size": 317866,
    "type": "image/gif"
  }
}

But maybe embedding the metadata on the variables isn't such a bad idea?

{
  "query": "mutation uploadAvatarMutation($user: ID! $avatar: Upload!) {uploadAvatar(id: $id avatar: $avatar) {resulturl} }",
  "variables": {
    "id": "598645558d905ae18524bc55",

    "avatar": {
      "lastModified": 1503606654000,
      "name": "avatar.gif",
      "size": 317866,
      "type": "image/gif"
    }

  },
  "operationName": "uploadAvatarMutation"
}

The metadata info can be picked into the file, and then overwritten with then file

{ query: 'mutation uploadAvatarMutation($user: ID! $avatar: Upload!) {uploadAvatar(id: $id avatar: $avatar) {resulturl} }',
  variables: 
   { id: '598645558d905ae18524bc55',

     avatar: 
      Stream {
        domain: null,
        _events: {},
        _eventsCount: 0,
        _maxListeners: undefined,
        readable: true,
        headers: [Object],
        name: 'avatar.gif',
        filename: 'avatar.gif',
        mime: 'image/gif',
        transferEncoding: 'binary',
        transferBuffer: '',
        size: 317866,
        type: 'image/gif' } },

  operationName: 'uploadAvatarMutation' }

I also created a ScalarType to double check Upload type is a stream and definitely not the metadata

function coerceStream(value) {
  if (!(value instanceof stream.Stream)) {
    throw new TypeError('Field error: value is not an instance of Stream');
  }

  // do more checks on the properties and values

  return value;
}

export default new GraphQLScalarType({
  name: 'Upload',
  serialize: coerceStream,
  parseValue: coerceStream,
  parseLiteral: coerceStream
}

I'm working on putting the code I tested on a fork, let know your thoughts on this approach

@jaydenseric
Copy link
Owner Author

jaydenseric commented Aug 27, 2017

Thanks for taking the time to dig in, this is going to be tricky! I updated the description with an ideal scenario, and my starter thoughts.

But maybe embedding the metadata on the variables isn't such a bad idea?

If we provide metadata from the client, developers will be tempted to trust it, which they should not. The current metadata comes from parsing the actual file that is received by the server. Also, metadata will increase the size of the request.

@jaydenseric
Copy link
Owner Author

jaydenseric commented Nov 18, 2017

We are ready to start working on this.

I created spec-v2 branches for both apollo-upload-client and apollo-upload-server.

We have a GraphQL multipart request spec v2.0.0 draft that makes it possible to implement file deduplication, file upload streams in resolvers and aborting file uploads in resolvers.

First we update apollo-upload-client to implement the new spec, then apollo-upload-server. We can tweak the spec draft if there are any implementation issues.

At first we don't need to implement all the possible new features such as file deduplication; the main feature is streams in the resolvers.

Once everything is working, I will publish new versions of all 3 projects simultaneously.

@jaydenseric jaydenseric changed the title Investigate providing file streams instead of metadata to resolvers Provide file streams instead of metadata to resolvers Nov 18, 2017
jaydenseric added a commit that referenced this issue Nov 19, 2017
* New API to support the [GraphQL multipart request spec v2.0.0-alpha.2](https://github.com/jaydenseric/graphql-multipart-request-spec/releases/tag/v2.0.0-alpha.2). Files no longer upload to the filesystem; [readable streams](https://nodejs.org/api/stream.html#stream_readable_streams) are used in resolvers instead. Fixes [#13](#13).
* Export a new `Upload` scalar type to use in place of the old `Upload` input type. It represents a file upload promise that resolves an object containing `stream`, `filename`, `mimetype` and `encoding`.
* Deprecated the `uploadDir` middleware option.
* Added new `maxFieldSize`, `maxFileSize` and `maxFiles` middleware options.
* `graphql` is now a peer dependency.
* Middleware are now arrow functions.
jaydenseric added a commit that referenced this issue Nov 19, 2017
New API to support the GraphQL multipart request spec v2. Fixes [#13](#13).
krasivyy3954 added a commit to krasivyy3954/react-graphql-upload that referenced this issue Jan 6, 2023
* New API to support the [GraphQL multipart request spec v2.0.0-alpha.2](https://github.com/jaydenseric/graphql-multipart-request-spec/releases/tag/v2.0.0-alpha.2). Files no longer upload to the filesystem; [readable streams](https://nodejs.org/api/stream.html#stream_readable_streams) are used in resolvers instead. Fixes [#13](jaydenseric/graphql-upload#13).
* Export a new `Upload` scalar type to use in place of the old `Upload` input type. It represents a file upload promise that resolves an object containing `stream`, `filename`, `mimetype` and `encoding`.
* Deprecated the `uploadDir` middleware option.
* Added new `maxFieldSize`, `maxFileSize` and `maxFiles` middleware options.
* `graphql` is now a peer dependency.
* Middleware are now arrow functions.
krasivyy3954 added a commit to krasivyy3954/react-graphql-upload that referenced this issue Jan 6, 2023
New API to support the GraphQL multipart request spec v2. Fixes [#13](jaydenseric/graphql-upload#13).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants