Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote: http: raise exception when response with error status code #2794

Merged
merged 6 commits into from
Nov 29, 2019

Conversation

pared
Copy link
Contributor

@pared pared commented Nov 14, 2019

  • ❗ Have you followed the guidelines in the Contributing to DVC list?

  • 📖 Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.

  • ❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addresses. Please review them carefully and fix those that actually improve code or fix bugs.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

Fixes #2510

@pared pared force-pushed the 2510 branch 2 times, most recently from 4ae7fe0 to 4f898fe Compare November 14, 2019 14:54
@@ -338,3 +338,11 @@ def __init__(self, url, cause=None):
),
cause=cause,
)


class HTTPErrorStatusCodeException(DvcException):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to invent a new exception class for each situation? This would be a nightmare if people will some time in the future use Repo class and try to handle errors. Also creates bloat.

This is question goes to everybody not only @pared.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that might be excessive in some situations, though, what alternative do we have?
Do you think I should use requests.HTTPError?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably discuss it separately. I didn't mean this a blocker for this particular PR, just used it as an example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it can't be requests.HTTPError as we need to descend this from DvcException().

Comment on lines 346 to 347
"Server responded with error status code: '{}' and message: "
"'{}'".format(code, reason)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code and reason should go next to each other, separate by space. This how it is presented everywhere. We might also want to say that this is an HTTP error in the message.

@@ -43,6 +43,7 @@
from tests.func.test_data_cloud import get_ssh_url
from tests.func.test_data_cloud import TEST_AWS_REPO_BUCKET
from tests.func.test_data_cloud import TEST_GCP_REPO_BUCKET
from tests.utils.httpd import ContentMD5Handler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't split imports like that anymore)

@@ -13,3 +17,23 @@ def test_no_traverse_compatibility(dvc_repo):

with pytest.raises(ConfigError):
RemoteHTTP(dvc_repo, config)


@pytest.mark.parametrize("response_code", [404, 403, 500])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any value in testing it 3 times?


with pytest.raises(HTTPErrorStatusCodeException):
remote._download(
URLInfo(os.path.join(url, "file.txt")), "file.txt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply use URLInfo(url) / "file.txt" or use URLInfo from the start. This will save you from an error of using os.path instead of posixpath.

self._lock.acquire()
handler_class = ETagHandler if handler == "etag" else ContentMD5Handler
self.response_handler = handler_class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need to store this to an attribute?

@efiop
Copy link
Contributor

efiop commented Nov 19, 2019

@Suor @MrOutis Please review.

@efiop
Copy link
Contributor

efiop commented Nov 20, 2019

@pared Please rebase 😉

class HTTPError(DvcException):
def __init__(self, code, reason):
super(HTTPError, self).__init__(
"HTTP error: '{} {}'".format(code, reason)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will show:

HTTPError: HTTP Error: 404 Not Found

request = self._request("GET", from_info.url, stream=True)
response = self._request("GET", from_info.url, stream=True)
if response.status_code != 200:
raise HTTPError(response.status_code, response.reason)
with Tqdm(
total=None if no_progress_bar else self._content_length(from_info),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
total=None if no_progress_bar else self._content_length(from_info),
total=None if no_progress_bar else self._content_length(response),

I wonder why we are still doing this extra request, also ._content_length() doesn't need to work with url really.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside of your PR actually. But since you are here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But isn't the extra request done only if response does not contain Content-Length?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you pass url (path_info) and not response then you always make a second request. _content_length() should stop accepting urls and might be even inlined, this will prevent this inefficiency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is right, sorry, I'll change accordingly. Though, I would leave _content_length as it is, in order to not obstruct _download method content.

@@ -45,7 +47,7 @@ def _download(self, from_info, to_file, name=None, no_progress_bar=False):
disable=no_progress_bar,
) as pbar:
with open(to_file, "wb") as fd:
for chunk in request.iter_content(chunk_size=self.CHUNK_SIZE):
for chunk in response.iter_content(chunk_size=self.CHUNK_SIZE):
fd.write(chunk)
fd.flush()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need flush. It is probably a remnant from the time we did http resume.

tests/func/test_repro.py Show resolved Hide resolved
Comment on lines 27 to 30
class ErrorStatusRequestHandler(BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(404, message="Not found")
self.end_headers()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need it? Can we simply ask for some missing url?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, 404 is quite easy to generate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question now is, should we leave changes made to StaticFileServer.
@Suor what do you think? I think they should stay, as the previous version was mapping string to its handler class, which was unnecessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's cleaner the new way.

remote = RemoteHTTP(dvc_repo, config)

with pytest.raises(HTTPError):
remote._download(URLInfo(url) / "file.txt", "file.txt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use "missing.txt" to make it more obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ghost
Copy link

ghost commented Nov 21, 2019

thanks, @pared ! looks good to me)

@efiop
Copy link
Contributor

efiop commented Nov 25, 2019

@Suor Please take a look 🙂

request = self._request("GET", from_info.url, stream=True)
response = self._request("GET", from_info.url, stream=True)
if response.status_code != 200:
raise HTTPError(response.status_code, response.reason)
with Tqdm(
total=None if no_progress_bar else self._content_length(from_info),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you pass url (path_info) and not response then you always make a second request. _content_length() should stop accepting urls and might be even inlined, this will prevent this inefficiency.

@pared pared requested a review from Suor November 27, 2019 14:45
Copy link
Contributor

@Suor Suor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

self._request("HEAD", url_or_request).headers,
)
def _content_length(self, response):
headers = getattr(response, "headers", {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this getattr() anymore, simply request.headers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

http remote: check and handle properly access denied and other HTTP codes
3 participants