Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote files get imported into file store despite 404 response #1318

Closed
ngehlenborg opened this issue Aug 9, 2016 · 3 comments
Closed

remote files get imported into file store despite 404 response #1318

ngehlenborg opened this issue Aug 9, 2016 · 3 comments

Comments

@ngehlenborg
Copy link
Contributor

ngehlenborg commented Aug 9, 2016

Commit: 33de42c

Steps to reproduce

  1. Import rfc-test-missing-file.zip.
  2. Launch workflow on the file at http://gehlenborg.com/wp-content/uploads/rfc/this-file-not-here.txt to trigger import of the file into the file store. Note that said file does not exist on the server.

Observed behavior

  1. Since the file does not exist, the server will respond with a 404.
  2. Refinery imports the error page sent by the server and logs the file as a successful import.

Expected behavior

  1. Download of the file should fail and nothing should get imported.
  2. Ideally, a warning should be displayed that the download failed.
@ngehlenborg
Copy link
Contributor Author

ngehlenborg commented Aug 9, 2016

This issue is most likely at https://github.com/parklab/refinery-platform/blob/c7bcd768b64aa54e18b60e4355537b94406944c7/refinery/file_store/tasks.py#L103. requests.get(...) doesn't seem to raise exceptions, it is necessary to explicitly trigger them on the response object: http://docs.python-requests.org/en/master/user/quickstart/#response-status-codes

The following code example illustrates the problem. Without the response.raise_for_status() no exception will be raised.

import requests


item = "http://httpbin.org/status/404"

try:
    response = requests.get(item, stream=True)
    # the next line is required to trigger an exception on HTTP error codes
    response.raise_for_status()
except requests.exceptions.HTTPError as e:
    print "Could not open URL '%s'", item
except requests.exceptions.ConnectionError as e:
    print "Could not open URL '%s'. Reason: '%s'", item, e.reason
except ValueError as e:
    print "Could not open URL '%s'. Reason: '%s'", item, e.message

print "%s", response

@ngehlenborg
Copy link
Contributor Author

There are a couple of other instances that might need exception handling for requests.get(...)as well: https://github.com/parklab/refinery-platform/search?q=%22requests.get%22&type=Code

@scottx611x scottx611x modified the milestones: Barre, Next Aug 9, 2016
@scottx611x scottx611x modified the milestones: Canton, Barre Aug 16, 2016
@scottx611x
Copy link
Member

Addressed by d96ff92.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants