-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linkchecker: option to override user agent #1331
Comments
From MarioVilas on 2013-12-20 09:55:12+00:00 I'm also getting strange errors from other sites. For example code.activestate.com throws 405 Method Not Allowed errors (and it's not the only site that does that, may be related to the web server software rather than a specific site configuration), and Wordpress blogs also don't seem to like it (they give empty responses). The 405 errors appear to be related to the use of HEAD, which is not mandatory in HTTP. Instead of failing, linkcheck.py should retry with the GET method. |
From Takayuki Shimizukawa on 2013-12-22 03:50:01+00:00 I confirmed with sourceforge.com:
and also confirmed with code.activestate.com:
I think linkcheck should:
However, I wonder why |
From Georg Brandl on 2014-01-12 07:45:48+00:00 This should be in 1.3 as a new feature... |
From Takayuki Shimizukawa on 2014-01-12 08:49:18+00:00 Georg Brandl Ah.. Sorry. I mistook. |
From Georg Brandl on 2014-01-12 23:06:39+00:00 Well, I should have fixed it with fa7c50ffb46f by retrying. I don't think a selection of the user-agent is necessary anymore. |
I'm not sure whether to open a new issue for this or not. After #6381 added sphinx.util.requests.useragent_header to linkcheck's headers, linkcheck now reports "broken" on links that are not broken. For example, http://doc.pytest.org/, http://keras.io/, http://www.coxlab.org/, and http://www.cvlibs.net/.
The problem is caused by sphinx.util.requests.useragent_header.
Reverting #6381 fixes the problem.
The reason #6381 made the difference was that prior to #6381, linkcheck overrode the default headers of sphinx.util.requests.head to not use sphinx.util.requests.useragent_header. It looks like a lot of people are now having to add linkcheck ignores due to sites rejecting the linkcheck user-agent, e.g. mtbc/ome-documentation@41f2e06. I don't know the best way to handle this. At maximum complexity, we could have a linkcheck_user_agents option, similar to linkcheck_ignore but a dictionary mapping URLs to User-Agent strings to use. (With None meaning to pass nothing and let We could allow specifying a User-Agent as a command-line option to use for all links in that run of the builder; that may be what @kristian-kolev had in mind. Really, speaking from my own perspective, an option to just disable the use of sphinx.util.requests.useragent_header would be plenty. I don't know where the magic string It doesn't appear to be documented anywhere, and Google doesn't turn up any indication that there's anything particularly special about that string. Do we even need it? Apparently it fixed #6378; I'm not clear on how or why. When I try the URL from #6378, I get an SSLCertVerificationError:
|
Note: It seems succeeded from my local.
But it is also failed to access to stm32duino.com on my local.
|
I also don't know where the User-Agent string came from. But I agree it is too old for real world. So +1 for use replace it to new one. And +0 for adding a configuration to modify it for who want to use old User-Agent. |
Huh. I tried at https://www.pythonanywhere.com/try-ipython/ earlier to verify, and it got the 403 too:
...but now that I go back and check the regular requests package on https://www.pythonanywhere.com/try-ipython/, that gets a 403 error too. (Only for the URLs above; most URLs give 200 as normal.)
I...have no idea how to account for that. I assume everything must have something to do with User-Agent strings, because reverting that one change makes my build pass locally and on two different CI servers. But it seems that...I guess...the version of
|
Close #1331: Change default User-Agent header
Now #1331 is merged. It will be releases in next release. |
Thank you! Looks like this was merged into the 2.0 branch, so I'm installing sphinx@2.0 to make things work. (It's still the same on master. I guess not all changes go into master? I guess I don't really understand https://github.com/sphinx-doc/sphinx/blob/master/CONTRIBUTING.rst#branch-model. In any case, sphinx@2.0 is working great.) |
Now I merged 2.0 branch into master branch. We do it by hand sometimes. Sorry for late! |
linkcheck.py currently hardcodes a 'Mozilla/5.0' user agent to simulate a browser, which works with most sites.
But Sourceforge resets the connection for that particular string. Interestingly enough, it works OK for other user agents, including 'Mozilla/4.0'.
It may be the case that other websites exhibit similar quirks, and it would be nice if we could specify a string to be used as the user agent in conf.py.
The text was updated successfully, but these errors were encountered: