Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent server hangs on unconsumed streams. #81

Merged
merged 68 commits into from
Jul 12, 2018
Merged

Prevent server hangs on unconsumed streams. #81

merged 68 commits into from
Jul 12, 2018

Conversation

mike-marcacci
Copy link
Collaborator

@mike-marcacci mike-marcacci commented Jun 7, 2018

While the HTTP spec does support sending a response before the request is finished and this behavior is followed by node http and curl, every browser and at least one popular load balancer (GCP) don't support this.

This means that if the server responds before the request stream has finished, the response body will be dropped:

image

If, on the other hand, the server waits before for the request to finish (by dumping the incoming data), the client is able to receive the response:

image

The reason this effects apollo-upload-client differently than other upload solutions is twofold:

  1. While I prefer the approach of apollo-upload-server, most other drop-in file upload middleware waits for all uploads to finish before handing off the request to downstream middleware. This makes it simple, as downstream middleware doesn't have to worry about consuming the entire stream.
  2. The downstream middleware is always going to be a graphql server, and resolvers are unaware of Uploads from other graphql branches (you could hack around this, but it would undo all the ergonomic benefits of apollo-upload-server).

Here's my (very typical) use case:

  1. Try to upload a large file (or use a slow/throttled connection on a smaller file).
  2. Before processing the upload, the graphql resolver checks to see that the user is logged in and throws an error:
    if (!session) {
      throw new Error("You must be logged in to save an image.");
    }

Unfortunately, this is a bit challenging to test, as this is a race: if the upload is fully buffered by node http before the response is sent (small file, local network) then everything works correctly; if it isn't (most real-world cases) then it will fail.

So far I've implemented this in the koa middleware (since that's what I'm using), and will add this to express when I get a chance. I'll also add some tests, but I wanted to drop this off here before somebody else descends the rabbit hole trying to diagnose this.

@mike-marcacci
Copy link
Collaborator Author

OK, I've added tests for the koa middleware. The problem with doing this in express is that we're going to need to export 2 different middlewares:

  1. before graphql (add the uploads)
  2. after graphql (wait to send response)

@mike-marcacci
Copy link
Collaborator Author

mike-marcacci commented Jun 7, 2018

OK, so the failing tests appear to happen on master using the node v10 and v8. This is good to know, as I was planning to upgrade from v9 shortly! I'm going to look into this as well.

@mike-marcacci
Copy link
Collaborator Author

OK, so it looks like #80 fixes master on node v10, so I'm going to go ahead and pull that one into this branch as well.

Copy link
Owner

@jaydenseric jaydenseric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding it correctly, this delays resolvers until all of the uploads have been received by the server? I don't think this is the right approach for a few reasons.

It removes the primary benefit of the latest streaming API of apollo-upload-server, shorter response times due to a reduced waterfall uploading to cloud:

If a user is uploading a lot, with the async process a disconnect partway though would result in a partial success rather than a total failure.

Waiting for the uploads to complete is a DoS risk.

If your service takes large uploads (MB, even GB) then would the RAM get smashed, since it is buffering all of the files?

One of the user stories we would like to ultimately achieve, is to allow validation of arguments in the resolvers for an early error response before all the files have uploaded. Imagine a user fills in a form for an invoice on their mobile, and attaches a 3mb photo of the invoice as one of the form fields. It turns out they had a validation typo in an email address field. Ideally they would get a near instant error payload back, so they don't have to wait 3 minutes for the upload to needlessly complete so they can try again.

The fact that some browsers don't respect the HTTP spec and have a client-side fetch error when there is an early response is not really a major concern for us. Our main concern is that the server has rock solid stability and doesn't crash. Reading through the browser bugs, it appears to be resolved with HTTP/2.

A lot of people have the JWT/auth middleware at the start of the middleware chain, so I don't think we can reasonably prevent other middleware from causing early responses. Best we can do is not crash. Besides, we need to handle the client aborting half way through an upload.

return new Promise(resolve => {
upload.file.stream.on('end', resolve)
upload.file.stream.on('error', resolve)
if (!upload.file.stream.readableFlowing) upload.file.stream.resume()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks tricky, what is it for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to interfere with any streams that are being consumed, but we do want to run through the rest. Using readableFlowing let's us know what state the stream is in.

src/test.mjs Outdated
@@ -101,6 +102,66 @@ t.test('Single file.', async t => {
})
})

t.test('Early response.', async t => {
t.jobs = 1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.jobs is only necessary when there are multiple subtests. Parallelism is default for top level tests, but child tests a sync unless you set t.jobs > 1.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to know! (I'm new to tap.)

src/test.mjs Outdated

const data = fs.readFileSync(TEST_FILE_PATH)

var requestHasFinished = false
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use let. You probably don't even need the = false.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good; not to get side-tracked, but is there a benefit to let over var for top-level function-scoped variables, other than stylistic consistency?

src/test.mjs Outdated
var requestHasFinished = false
const stream = new Readable()
stream.path = TEST_FILE_PATH
stream._read = () => {}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this about? The _ appears to indicate a private method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the old, but still-supported way of implementing streams in node (for Streams 2, which is still the current implementation).

src/test.mjs Outdated
stream.push(null)
}, 10)

return promise
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer await to return here, as it makes it clear the value isn't used back up the chain somehow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

src/test.mjs Outdated
)
},
err => {
throw err
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this promise block would be more elegant as an await with no surrounding try/catch, since we throw on error anyway?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally!

package.json Outdated
@@ -65,9 +65,9 @@
"build:clean": "rm -r lib; mkdir lib",
"build:mjs": "BABEL_ESM=1 babel src -d lib --keep-file-extension",
"build:js": "babel src -d lib",
"build:prettier": "prettier 'lib/**/*.{mjs,js}' --write",
"build:prettier": "prettier 'src/**/*.{mjs,js}' --write && prettier '*.{json,md}' --write",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo all these script changes; for an explanation see 8928ce0.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies! Will fix.

@@ -94,7 +94,7 @@ export const processRequest = (
operationsPath.set(path, map.get(fieldName).promise)
}

resolve(operations)
resolve({ operations, map })
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change to the API. A fair few people such as Apollo use processRequest directly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point; I will change how this works.

package.json Outdated
"lint:eslint": "eslint . --ext mjs,js",
"lint:prettier": "prettier '**/*.{json,md}' -l",
"lint:prettier": "prettier '*.{json,md}' -l",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, undo.

@jaydenseric
Copy link
Owner

I've just fixed a bug with the tests in master branch (528216f), might want to rebase.

Copy link
Collaborator Author

@mike-marcacci mike-marcacci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks SO much for the super thorough (and incredibly prompt) response! I've gone through your review and left comments where appropriate. I'm going to go ahead and make all the suggested changes shortly.

I did want to correct one thing that I wasn't entirely clear about: this suggested change does not delay resolvers. Rather, it makes sure that the incoming request is complete before sending the response. It doesn't interfere with streams that are currently being consumed, but does resume those that aren't. Nothing is buffered in memory differently than it was, so this doesn't add that kind of DOS vector, although it would potentially allow an attacker to occupy the network for longer.

I completely agree with you that this is something that should be fixed in browsers... but I tested the exact same behavior in Chrome, Firefox, and Safari on mac, and Safari on iOS. At the end of the day, I just needed to be able to send responses back to the client :-/

Your comment about HTTP2 is very encouraging, and it's the next thing I'll try. If that behaves correctly in modern browsers, then I'll gladly replace this PR with a doc change suggesting users use HTTP2 if possible :)

return new Promise(resolve => {
upload.file.stream.on('end', resolve)
upload.file.stream.on('error', resolve)
if (!upload.file.stream.readableFlowing) upload.file.stream.resume()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to interfere with any streams that are being consumed, but we do want to run through the rest. Using readableFlowing let's us know what state the stream is in.

@@ -94,7 +94,7 @@ export const processRequest = (
operationsPath.set(path, map.get(fieldName).promise)
}

resolve(operations)
resolve({ operations, map })
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point; I will change how this works.

package.json Outdated
@@ -65,9 +65,9 @@
"build:clean": "rm -r lib; mkdir lib",
"build:mjs": "BABEL_ESM=1 babel src -d lib --keep-file-extension",
"build:js": "babel src -d lib",
"build:prettier": "prettier 'lib/**/*.{mjs,js}' --write",
"build:prettier": "prettier 'src/**/*.{mjs,js}' --write && prettier '*.{json,md}' --write",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies! Will fix.

src/test.mjs Outdated
@@ -101,6 +102,66 @@ t.test('Single file.', async t => {
})
})

t.test('Early response.', async t => {
t.jobs = 1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to know! (I'm new to tap.)

src/test.mjs Outdated

const data = fs.readFileSync(TEST_FILE_PATH)

var requestHasFinished = false
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good; not to get side-tracked, but is there a benefit to let over var for top-level function-scoped variables, other than stylistic consistency?

src/test.mjs Outdated
var requestHasFinished = false
const stream = new Readable()
stream.path = TEST_FILE_PATH
stream._read = () => {}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the old, but still-supported way of implementing streams in node (for Streams 2, which is still the current implementation).

src/test.mjs Outdated
)
},
err => {
throw err
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally!

src/test.mjs Outdated
stream.push(null)
}, 10)

return promise
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

@jaydenseric
Copy link
Owner

@mike-marcacci really appreciate your help 👏

I'm on the Apollo slack (and I can see you're on there too), so feel free to chat about any of this stuff, either publicly in the #uploads channel or via DM. My timezone is Melbourne, Australia.

@mike-marcacci
Copy link
Collaborator Author

OK, well, as an update, I wasn't able to get any success with http2, but my real world use case is complex (reverse proxies, etc) so I thought I should go back to first principles. I created a test to try against the browsers on my laptop:

https://gist.github.com/mike-marcacci/5c7f435d51240feb8a871ab348a00dfd

And I can't seem to reproduce the issue on http, https or http2, so I've got to keep digging. I'll post back when I find something.

@@ -22,25 +25,30 @@ class Upload {
this.done = true
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is not used anywhere anymore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's correct! Removed.

@ForsakenHarmony
Copy link

I need this 👀

Copy link
Owner

@jaydenseric jaydenseric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

@jaydenseric jaydenseric merged commit c7567ea into jaydenseric:master Jul 12, 2018
@@ -38,6 +38,7 @@
},
"dependencies": {
Copy link

@koresar koresar Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more @babel/runtime! Yeah! This will remove 3MB from my production docker containers. Thanks!

@terion-name
Copy link

HURRAY! @jaydenseric when to expect this to be tagged on npm?

@jaydenseric
Copy link
Owner

@terion-name I have a few minor things to sneak in, then publish hopefully within the next day. The new version will likely be a semver major, due to the raised min Node.js version. If you use a recent Node.js version it should be a painless upgrade.


const finished = new Promise(resolve => request.on('end', resolve))
const { send } = response
response.send = (...args) => {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this monkey patching the send method for all following express middleware? This seems pretty cheeky.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, this a really ugly thing to do, but it’s the only way I know to “wrap” child middleware in express/connect without requiring them to be aware of the parent middleware... I’ve reluctantly used this pattern in the past and haven’t really run into issues, but if you know of any other way to do this, I’d love to learn!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of the top of my head it might break res.write(): https://stackoverflow.com/questions/44692048/what-is-the-difference-between-res-send-and-res-write-in-express

But I don't know any existing code bases (out of dozens I overview or maintain) which use res.write() with express.js.

Copy link
Collaborator Author

@mike-marcacci mike-marcacci Jul 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@koresar you're definitely correct that calling res.write() could break some assumptions, and also that doing so is a kind of escape hatch from express. In fact, calling ctx.req.write() would have unintended results in koa as well.

@jaydenseric
Copy link
Owner

@mike-marcacci do you know what might be causing these errors? https://travis-ci.org/jaydenseric/apollo-upload-server/jobs/402915414#L2165

@mike-marcacci
Copy link
Collaborator Author

mike-marcacci commented Jul 13, 2018

I’m going to assume you’re referring to the “Parse Error” in the logs (Travis line links are apparently messed up on mobile). I don’t know off the top of my head, but I suspect those are a case where we’re testing an aborted request, and we propagate our custom events while busboy emits a parse error. I’m not sure how those would get in the logs, but I assume there’s something special going on with Tap. I’ll look at these tomorrow.

@ForsakenHarmony
Copy link

when are you gonna release this?

@jaydenseric
Copy link
Owner

@ForsakenHarmony A few unanticipated issues have popped up. Once we know what is causing the errors mentioned above, and once #89 (comment) is fixed and merged we can publish v6.0.0-alpha.1.

Help resolving these blockers is the fastest path to a release, otherwise it is mostly up to @mike-marcacci's availability. If there is no movement this week I will try to take a day or more off from project work to see what more I can do.

Earlier versions had a simpler (although less efficient) approach using formidable and temp files. I would love to have more Node.js stream experts come on to the project as collaborators.

@mike-marcacci
Copy link
Collaborator Author

@ForsakenHarmony @jaydenseric you're in luck, I've got some time here today :) we didn't have as great cell coverage in our last locations as we had expected. Stay tuned!

@jaydenseric
Copy link
Owner

Published in v6.0.0-alpha.1 🎉

@ForsakenHarmony
Copy link

seems like you published it as a normal release on npm not alpha/beta, just as a heads up

@mike-marcacci
Copy link
Collaborator Author

Thanks for the heads-up @ForsakenHarmony. You're right: it looks like @jaydenseric will need to update npm's "latest" tag to point back at 5.0.0.

@jaydenseric
Copy link
Owner

Thats deliberate actually; v6.0.0-alpha.1 is more stable than v5.0.0 and everyone who can (given the few breaking changes) is advised to update.

@mike-marcacci
Copy link
Collaborator Author

mike-marcacci commented Jul 24, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants