Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RST packet instead of a FIN packet to close the response on Heroku #3480

Closed
jmorrell opened this issue Apr 19, 2017 · 7 comments
Closed

RST packet instead of a FIN packet to close the response on Heroku #3480

jmorrell opened this issue Apr 19, 2017 · 7 comments
Assignees
Labels
Milestone

Comments

@jmorrell
Copy link

@jmorrell jmorrell commented Apr 19, 2017

Are you sure this is an issue with hapi or are you just looking for some help?

Everything points to this being an error in hapi (or perhaps Node streams), but I've not yet been able to reproduce it outside of Heroku's environment.

What are you trying to achieve or the steps to reproduce?

One of our clients came to us experiencing many H18 errors with their Hapi app in production. They kindly created a minimal example of the issue, which is a Hapi hello-world server: https://github.com/kimmobrunfeldt/heroku-hapi-503-bug You can deploy this to Heroku to try it yourself with one click.

To trigger the error you can curl the service with a small data payload:

curl -d'a' https://heroku-hapi-503-bug.herokuapp.com

What was the result you received?

About 30% of the time this will work as expected, and about 70% of the time this will result in a 503 error. Weird.

Concerned that this was a bug in our router I escalated this to our Platform team. One thing that immediately jumped out was that attaching strace to the Node instance immediately resolved the issue producing only 404s. Tricky.

tcpdump

So we ran tcpdump on the response generated by the application.

Here's what's happening during a 503:

68747470733a2f2f646c2e64726f70626f7875736572636f6e74656e742e636f6d2f732f6167323271326c70386b62703479712f323031372d30342d30362532306174253230322e3436253230504d2e706e67

Compared to what happens during a 404:

68747470733a2f2f646c2e64726f70626f7875736572636f6e74656e742e636f6d2f732f666e6e77363635636262667a3279772f323031372d30342d30362532306174253230322e3436253230504d253230253238312532392e706e67

The server sends a RST packet instead of a FIN packet to close the response

Full packet capture is available here: 462013.pcap.zip

streams

Playing around with the internals, I ended up making the problem go away by modifying lib/transmit.js https://github.com/hapijs/hapi/blob/master/lib/transmit.js#L298 by adding an additional stream that did nothing more than pipe the data along:

const through = require('through');

const donothing = through(
  function (data) {
    this.emit('data', data);
  },
  function () {
    this.emit('end');
  }
);
    const tap = response._tap();
    const preview = (tap ? source.pipe(tap) : source);
    const compressed = (compressor ? preview.pipe(compressor) : preview);
    const ranged = (ranger ? compressed.pipe(ranger) : compressed);
    ranged.pipe(donothing).pipe(request.raw.res);

Though I'm unsure of what the implications of that are.

git bisect

Finally, I realized that even if we don't understand the underlying cause, we have a reproducible test case. I tested an older version of Hapi that did not suffer from the same issue and created a git bisect script to narrow down when this might have been introduced: https://github.com/jmorrell/heroku-hapi-H18-bisect

This points to this commit as the one that introduced this odd behavior.

98d34046392006540b64a799b6ca58150d08df2c is the first bad commit
commit 98d34046392006540b64a799b6ca58150d08df2c
Author: Eran Hammer <eran@hammer.io>
Date:   Mon Oct 26 00:49:23 2015 -0700

    Skip most lifecycle on not found and bad path. Closes #2867

:100755 100755 1cd1680739ce6bf95b76abe496e26534d094beaa 2decb33f2a69bb7d22862c6c5b54fef83c437e21 M      API.md
:040000 040000 87075a6a7a8ab1d85b4b4d259ce14a2b16870dab 481e5da93ba66490cc62ca1fc242d6f153daa701 M      lib
:040000 040000 84db0ba536bb4f4e8290b5c29ce88d7f6ce1c1c7 00ed8d03c69ce0759262669c81c882dc1c20d890 M      test
bisect run success

Context

  • node version: 7.9.0
  • hapi version: Many affected
  • os:
~ $ uname -a
Linux 3d243743-fcf2-4103-8c24-72657c71ef36 3.13.0-112-generic #159-Ubuntu SMP Fri Mar 3 15:26:07 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
@jmorrell
Copy link
Author

@jmorrell jmorrell commented Apr 19, 2017

cc @kimmobrunfeldt and @eetu who opened the original support ticket

@kanongil
Copy link
Contributor

@kanongil kanongil commented Apr 19, 2017

I suspect what happens here, is that hapi receives the POST request with the headers in 1 packet, which it immediately starts to process. Now, since there is no request handler, it doesn't wait to receive the payload in the next packet before sending the response, which causes node to close the connection.

This means that when the next packet arrives with the 1 byte payload (a), the OS rejects it and sends the RST tcp response to signal this.

@kimmobrunfeldt
Copy link

@kimmobrunfeldt kimmobrunfeldt commented Apr 19, 2017

Thank you @jmorrell for taking the time to debug this issue 👍

@kanongil This bug can also be reproduced (in Heroku) when there are routes configured. I made a new branch to the original example repository which contains a GET /test route: https://github.com/kimmobrunfeldt/heroku-hapi-503-bug/blob/v2/index.js. This v2 branch has been deployed to Heroku and can be tested with curl -XGET -d'a' https://heroku-hapi-503-bug-v2.herokuapp.com/test.

I didn't verify this on TCP level as @jmorrell did but I'm quite confident this would be the same bug as we are experiencing without any routes configured.

@kanongil
Copy link
Contributor

@kanongil kanongil commented Apr 19, 2017

@kimmobrunfeldt In reality this is probably a node issue, though one that hapi / heroku might have to work around.

I am able to replicate this error 100% on my local machine, without hapi & heroku, using this server:

var Http = require('http');

var server = Http.createServer();
server.listen(8000);

server.on('request', function (req, res) {
    res.end('hi');
});

and this client:

const Net = require('net');

const socket = new Net.Socket({
    allowHalfOpen: true
});

socket.connect({
    port: 8000
}, () => {

    socket.write('GET / HTTP 1.1\r\nHost: test\r\nContent-Length: 1\r\nConnection: close\r\n\r\n', 'ascii', () => {

        setTimeout(() => {

            socket.write('a');
            socket.end();
        }, 100);
    });
});

Note that I don't actually try to receive anything on the client, but I can observe the same RST TCP packet as above.

@kimmobrunfeldt
Copy link

@kimmobrunfeldt kimmobrunfeldt commented Apr 27, 2017

Ok @kanongil thanks for the test case. I don't really know what would be the best place to fix this without waiting for quite long nodejs release cycle. Could we fix this temporarily to hapi, and file an issue to Node?

I couldn't reproduce this issue with similar express server, which sounded a bit odd if the bug is in Node. Maybe they have implemented a workaround for the issue?

@hueniverse
Copy link
Contributor

@hueniverse hueniverse commented Jul 19, 2017

@kimmobrunfeldt your modified version still suffers from the same issue because you are making a GET request which should not have a payload, therefore it is ignored by hapi, similar to a 404.

@hueniverse
Copy link
Contributor

@hueniverse hueniverse commented Jul 19, 2017

There are a few paths in which the server will skip reading the request payload:

  • GET or HEAD request
  • 404
  • Invalid path that fails parsing

In case of a request with GET/HEAD, I am not going to handle that case because it would be super wasteful to read payload when they should never be sent. In that case, I don't care if the client is getting a bad response. They are sending garbage as far as I'm concerned.

In the other cases, we should drain the request stream as we do with other partial payload reads.

@hueniverse hueniverse self-assigned this Jul 19, 2017
@hueniverse hueniverse added the bug label Jul 19, 2017
@hueniverse hueniverse added this to the 16.5.0 milestone Jul 19, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Jan 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants