Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catch link redirections #808

Closed
wants to merge 1 commit into from
Closed

catch link redirections #808

wants to merge 1 commit into from

Conversation

bgruening
Copy link
Member

No description provided.

@bgruening bgruening requested a review from nsoranzo April 29, 2018 18:39
@@ -148,6 +148,9 @@ def validate_url(url, lint_ctx, user_agent=None):
if e.code == 429:
# too many requests
pass
elif e.code >= 300 and e.code < 400:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that's a bit too forgiving, I'll see if we can solve the issue with nature.com without allowing other broken redirects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requests doesn't have the issue seen in https://travis-ci.org/galaxyproject/tools-iuc/jobs/371740724 , maybe we can replace this function with something like the following (not tested):

     def validate_url(url, lint_ctx, user_agent=None):
        is_valid = True
        if user_agent:
            headers = {"User-Agent": user_agent, 'Accept': '*/*'}
        else:
            headers = None
        r = None
        try:
            r = requests.get(url, headers=headers, stream=True)
            r.raise_for_status()
            next(r.iter_content(1000))
        except Exception as e:
            if r and r.status_code == 429:
                # too many requests
                pass
            else:
                is_valid = False
                lint_ctx.error("Error '%s' accessing %s" % (e, url))
        if is_valid:
            lint_ctx.info("URL OK %s" % url)

Copy link
Member

@nsoranzo nsoranzo Apr 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My above code works, BUT requests is HTTP-only, so FTP URLs fail with something like:

Applying linter tool_urls... FAIL
.. ERROR: Error 'No connection adapters were found for 'ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz'' accessing ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jupp, thats why I ended up with the more relaxed mode. Maybe the more relaxed one is ok, if we turn this into a warning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using requests for HTTP and urllib2 for FTP would be the best of both worlds, requests is so much better than Python standard library. I've implemented this in #809 .

@bgruening bgruening closed this May 6, 2018
@bgruening bgruening deleted the redirected_links branch May 6, 2018 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants