Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hapijs directory route not being scrapped by facebook behind cloudfront #60

Closed
ScottDowne opened this issue May 2, 2016 · 5 comments
Closed
Assignees
Labels
bug
Milestone

Comments

@ScottDowne
Copy link

@ScottDowne ScottDowne commented May 2, 2016

I initially filed this here: hapijs/hapi#3132 Was directed here.

I wonder if anyone has hit this?

I have this https://donate.mozilla.org/en-US/

Which is a hapi server. In this case it's serving a static html file: https://github.com/mozilla/donate.mozilla.org/blob/master/server.js#L352-L358

 server.route([{
      method: 'GET',
      path: '/{params*}',
      handler: {
        directory: {
          path: Path.join(__dirname, 'public')
        }
      }
}

Seems to work fine as a file server. However, when it interacts with cloudfront and facebooks scrapper, something breaks. Not fully understanding what's happening, but what I can piece together is:

The hapi server sends the file contents as Transfer-Encoding: chunked

Cloudfront then has "If the viewer makes a Range GET request and the origin returns Transfer-Encoding: chunked, CloudFront returns the entire object to the viewer instead of the requested range." from http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RangeGETs.html

Facebook's scrapper then chokes on the size of the range not being expected.

You can test that here: https://developers.facebook.com/tools/debug/og/object/

Paste in: http://donate.mozilla.org/en-US/thunderbird/

Then click "fetch new scrape information"

Facebook has provided me with a curl command that simulates what their scrapper does:

curl -G -v --compressed -H "Range: bytes=0-500000" -H "Connection: close" -A "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" "https://donate.mozilla.org/en-US/thunderbird/share/"

It responds with curl: (18) transfer closed with 4317 bytes remaining to read

It also doesn't respond with Transfer-Encoding: chunked

If I curl directly to the server without cloudfront:

curl -G -vv --compressed -H "Range: bytes=0-500000" -H "Connection: close" -A "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" "https://donate-mozilla-org-us-prod.herokuapp.com/en-US/"

I get back Transfer-Encoding: chunked which I then think cloudfront returns another thing to facebook which facebook doesn't expect.

Thoughts? Can I just turn off Transfer-Encoding: chunked and how would I do that with the static directory server?

@kanongil

This comment has been minimized.

Copy link
Member

@kanongil kanongil commented May 2, 2016

Thanks for the detailed report. I have investigated further, and it appears that you have encountered a bug in how inert handles range requests for compressed responses.

The response to the request should be a plain 200 without any range shenanigans. I will look into a fix.

@hueniverse

This comment has been minimized.

Copy link
Member

@hueniverse hueniverse commented May 2, 2016

@kanongil anything to do here?

@kanongil kanongil closed this in bd6a007 May 4, 2016
kanongil added a commit to kanongil/inert that referenced this issue May 4, 2016
@kanongil

This comment has been minimized.

Copy link
Member

@kanongil kanongil commented May 4, 2016

I have published inert@3.2.1 which contains a fix for range requests on compressible contents. This should fix the issue. Let me know how it goes.

@hueniverse

This comment has been minimized.

Copy link
Member

@hueniverse hueniverse commented May 5, 2016

No milestone?

@ScottDowne

This comment has been minimized.

Copy link
Author

@ScottDowne ScottDowne commented May 5, 2016

Fantastic! 🎉 I pulled in the changes today, worked like a charm! Thanks very much for the quick response!

It's not shipped to our prod site yet, but I'll do that tomorrow.

@kanongil kanongil added this to the 4.0.0 milestone May 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.