New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
catch link redirections #808
Conversation
@@ -148,6 +148,9 @@ def validate_url(url, lint_ctx, user_agent=None): | |||
if e.code == 429: | |||
# too many requests | |||
pass | |||
elif e.code >= 300 and e.code < 400: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe that's a bit too forgiving, I'll see if we can solve the issue with nature.com without allowing other broken redirects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requests
doesn't have the issue seen in https://travis-ci.org/galaxyproject/tools-iuc/jobs/371740724 , maybe we can replace this function with something like the following (not tested):
def validate_url(url, lint_ctx, user_agent=None):
is_valid = True
if user_agent:
headers = {"User-Agent": user_agent, 'Accept': '*/*'}
else:
headers = None
r = None
try:
r = requests.get(url, headers=headers, stream=True)
r.raise_for_status()
next(r.iter_content(1000))
except Exception as e:
if r and r.status_code == 429:
# too many requests
pass
else:
is_valid = False
lint_ctx.error("Error '%s' accessing %s" % (e, url))
if is_valid:
lint_ctx.info("URL OK %s" % url)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My above code works, BUT requests
is HTTP-only, so FTP URLs fail with something like:
Applying linter tool_urls... FAIL
.. ERROR: Error 'No connection adapters were found for 'ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz'' accessing ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jupp, thats why I ended up with the more relaxed mode. Maybe the more relaxed one is ok, if we turn this into a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using requests
for HTTP and urllib2
for FTP would be the best of both worlds, requests
is so much better than Python standard library. I've implemented this in #809 .
No description provided.