Skip to content
Permalink
Browse files

Combat link decay with Nikola, check redirects

Recommend updating permanent redirects to their new location. Warn
about temporary redirects. Follow the redirects and check on their
targets too.
  • Loading branch information
da2x committed Aug 18, 2015
1 parent e36d230 commit beeacb0ee6842cc64ebf096886298bba96702744
Showing with 17 additions and 1 deletion.
  1. +1 −0 CHANGES.txt
  2. +16 −1 nikola/plugins/command/check.py
@@ -4,6 +4,7 @@ New in master
Features
--------

* Checking remote links also checks redirects (nikola check -lr)
* Update suggested license to its latest version (Issue #1950)
* Add Punjabi language, by Jasdeep Singh (Issue #1940)
* New option to use custom, and several ``TEASER_END`` values
@@ -230,12 +230,27 @@ def analyze(self, fname, find_sources=False, check_remote=False):
# Check the remote link works
req_headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0 (Nikola)'} # I’m a real boy!
resp = requests.head(target, headers=req_headers)

# Retry client errors (4xx) as GET requests
if resp.status_code >= 400 and resp.status_code <= 499:
time.sleep(0.5)
resp = requests.get(target, headers=req_headers)

self.checked_remote_targets[target] = resp.status_code
# Follow redirects and see where they lead, redirects to errors will be reported twice
if resp.status_code in [301, 302, 307, 308]:
redir_status_code = resp.status_code
time.sleep(0.5)
resp = requests.head(target, headers=req_headers, allow_redirects=True)
# Permanent redirects should be updated
if redir_status_code in [301, 308]:
self.logger.error("Remote link moved PERMANENTLY to \"{0}\" and should be updated in {1}: {2} [HTTP: {3}]".format(resp.url, filename, target, redir_status_code))
if redir_status_code in [302, 307]:
self.logger.warn("Remote link temporarily redirected to \"{0}\" in {1}: {2} [HTTP: {3}]".format(resp.url, filename, target, redir_status_code))
self.checked_remote_targets[resp.url] = resp.status_code
self.checked_remote_targets[target] = redir_status_code
else:
self.checked_remote_targets[target] = resp.status_code

if resp.status_code > 399: # Error
self.logger.warn("Broken link in {0}: {1} [Error {2}]".format(filename, target, resp.status_code))
continue

0 comments on commit beeacb0

Please sign in to comment.
You can’t perform that action at this time.