Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default Accept header to the linkcheck #5140

Merged
merged 2 commits into from Jul 29, 2018

Conversation

honzajavorek
Copy link
Contributor

@honzajavorek honzajavorek commented Jul 3, 2018

Adds default Accept header to the linkcheck as some servers have troubles to respond correctly without it. The added Accept header resembles what Firefox sets by default.

Feature or Bugfix

  • Feature

Purpose

There are situations where requested server replies with a different content (in my particular case HTTP 404) when there is no accept header, possibly because it evaluates the content negotiation to an API request instead of a browser request. This change adds a default Accept header, which equals to what my Firefox sets out of the box to its requests.

I stumbled upon this when checking a link to https://crates.io/crates/dredd-hooks. While

curl -i https://crates.io/crates/dredd-hooks

returns HTTP 404, following results in an expected HTTP 200 response with HTML body:

curl -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -i https://crates.io/crates/dredd-hooks

There are situations where requested server replies with a different content (in my particular case HTTP 404) when there is no accept header, possibly because it evaluates the content negotiation to an API request instead of a browser request. This change adds a default Accept header, which equals to what my Firefox sets out of the box to its requests.

I stumbled upon this when checking a link to https://crates.io/crates/dredd-hooks. While

curl -i https://crates.io/crates/dredd-hooks

returns HTTP 404, following results in an expected HTTP 200 response with HTML body:

curl -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -i https://crates.io/crates/dredd-hooks
@honzajavorek honzajavorek changed the title Add default Accept header Add default Accept header to the linkcheck Jul 3, 2018
honzajavorek added a commit to apiaryio/dredd that referenced this pull request Jul 3, 2018
@codecov
Copy link

codecov bot commented Jul 3, 2018

Codecov Report

Merging #5140 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5140      +/-   ##
==========================================
- Coverage   82.31%   82.29%   -0.02%     
==========================================
  Files         297      292       -5     
  Lines       39212    38888     -324     
  Branches     6033     5979      -54     
==========================================
- Hits        32276    32003     -273     
+ Misses       5602     5558      -44     
+ Partials     1334     1327       -7
Impacted Files Coverage Δ
sphinx/builders/linkcheck.py 73.8% <100%> (-0.13%) ⬇️
sphinx/quickstart.py
sphinx/apidoc.py
sphinx/errors.py
sphinx/search/__init__.py
sphinx/__init__.py

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1140e81...271e1d1. Read the comment docs.

'allow_redirects': True,
'headers': {
'Accept': ('text/html,application/xhtml+xml,'
'application/xml;q=0.9,*/*;q=0.8')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why application/xml is needed. Can Sphinx recognize XML document?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to test it again without application/xml to verify, but in my opinion even without it, it would still solve the issue I had with crates.io.

The reason I added the header like this is to mimic a real browser, 1:1. So I just took whatever my own Firefox is sending.

Since the link check verifies whether the external links in the docs are "not dead" and still accessible for humans reading the docs and clicking the links, displaying them in their browsers, I think it is a correct approach. If browsers are generally fine with XML and prefer it over other media types, then I think it should be there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linkcheck builder uses HTMLParser to parse the response. I'm okay if it can handle any XML data. To send Accept: application/xml means the client can accept it as a data format. But I don't know our builder supports it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Even application/xml not present, creates.io returns its contents:

$ curl -H 'Accept: text/html,application/xhtml+xml;q=0.9,*/*;q=0.8' -i https://crates.io/crates/dredd-hooks
HTTP/1.1 200 OK
...

Copy link
Member

@tk0miya tk0miya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nits

@tk0miya tk0miya added extensions type:proposal a feature suggestion labels Jul 16, 2018
@tk0miya tk0miya added this to the 1.8.0 milestone Jul 16, 2018
tk0miya added a commit to tk0miya/sphinx that referenced this pull request Jul 29, 2018
tk0miya added a commit to tk0miya/sphinx that referenced this pull request Jul 29, 2018
@tk0miya
Copy link
Member

tk0miya commented Jul 29, 2018

To step forward, I just made #5226 which does not contain application/xml in Accept header.
I'll merge it after tests passed. Please let me know if you have any opinion for #5226.
Thanks,

@tk0miya tk0miya merged commit 271e1d1 into sphinx-doc:master Jul 29, 2018
@honzajavorek
Copy link
Contributor Author

Thank you @tk0miya!

@honzajavorek honzajavorek deleted the patch-1 branch August 7, 2018 14:20
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
extensions type:proposal a feature suggestion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants