Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set a User-Agent for HTTP requests from :elinks #1417

Merged
merged 1 commit into from Apr 27, 2019
Merged

Conversation

@da2x
Copy link
Contributor

@da2x da2x commented Apr 27, 2019

Detailed description

  • The default “Ruby” User-Agent is blocked everywhere so replace it.
  • Include “Mozilla/5.0” to pass through a lot of blocking filters and because it’s required due to the legacy of the web.
  • Include Nanoc software name and version to identify the software making the request.
  • Describe the purpose of the request to help webmasters make informed blocking decisions.

Proposed User-Agent string:

Mozilla/5.0 Nanoc/4.11.2 (link rot checker)

To do

  • Agree on User-Agent string.
  • Add to change log.

More details

In my own set of 1800 links, this change reduces the number of HTTP 403 Forbidden responses from 482 to 17. (Most of these are probably hosted by a single large provider like Cloudflare.) This could probably be reduced even further by faking a mainstream web browser’s User-Agent string. However, I’d rather live with a handful of blocked tests than start faking User-Agents.

Goals:
* The default “Ruby” User-Agent is blocked everywhere so replace it.
* Include “Mozilla/5.0” to pass through a lot of blocking filters and because it’s required due to the legacy of the web.
* Include Nanoc software name and version to identify the software making the request.
* Describe the purpose of the request to help webmasters make informed blocking decisions.
@ddfreyne
Copy link
Member

@ddfreyne ddfreyne commented Apr 27, 2019

Ooh, I like it. I think the User Agent that you propose is good, and I suppose it can be changed in the future if needed.

Could be useful to have a test for this, but probably not important. (I also realised that I haven’t really set up proper testing for the external links checker yet anyway…)

Loading

@ddfreyne ddfreyne merged commit 7410fed into nanoc:master Apr 27, 2019
21 checks passed
Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants