remote: http: raise exception when response with error status code #2794

pared · 2019-11-14T14:36:50Z

❗ Have you followed the guidelines in the Contributing to DVC list?
📖 Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.
❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addresses. Please review them carefully and fix those that actually improve code or fix bugs.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

Suor · 2019-11-14T21:58:58Z

dvc/exceptions.py

@@ -338,3 +338,11 @@ def __init__(self, url, cause=None):
            ),
            cause=cause,
        )
+
+
+class HTTPErrorStatusCodeException(DvcException):


Do we really need to invent a new exception class for each situation? This would be a nightmare if people will some time in the future use Repo class and try to handle errors. Also creates bloat.

This is question goes to everybody not only @pared.

I agree that might be excessive in some situations, though, what alternative do we have?
Do you think I should use requests.HTTPError?

We should probably discuss it separately. I didn't mean this a blocker for this particular PR, just used it as an example.

And it can't be requests.HTTPError as we need to descend this from DvcException().

Suor · 2019-11-14T22:01:53Z

dvc/exceptions.py

+            "Server responded with error status code: '{}' and message: "
+            "'{}'".format(code, reason)


Code and reason should go next to each other, separate by space. This how it is presented everywhere. We might also want to say that this is an HTTP error in the message.

Suor · 2019-11-14T22:04:42Z

tests/func/test_repro.py

@@ -43,6 +43,7 @@
 from tests.func.test_data_cloud import get_ssh_url
 from tests.func.test_data_cloud import TEST_AWS_REPO_BUCKET
 from tests.func.test_data_cloud import TEST_GCP_REPO_BUCKET
+from tests.utils.httpd import ContentMD5Handler


We don't split imports like that anymore)

Suor · 2019-11-14T22:05:19Z

tests/unit/remote/test_http.py

@@ -13,3 +17,23 @@ def test_no_traverse_compatibility(dvc_repo):

    with pytest.raises(ConfigError):
        RemoteHTTP(dvc_repo, config)
+
+
+@pytest.mark.parametrize("response_code", [404, 403, 500])


Is there any value in testing it 3 times?

Suor · 2019-11-14T22:06:44Z

tests/unit/remote/test_http.py

+
+        with pytest.raises(HTTPErrorStatusCodeException):
+            remote._download(
+                URLInfo(os.path.join(url, "file.txt")), "file.txt"


Simply use URLInfo(url) / "file.txt" or use URLInfo from the start. This will save you from an error of using os.path instead of posixpath.

Suor · 2019-11-14T22:09:10Z

tests/utils/httpd.py

        self._lock.acquire()
-        handler_class = ETagHandler if handler == "etag" else ContentMD5Handler
+        self.response_handler = handler_class


Do you really need to store this to an attribute?

efiop · 2019-11-19T13:39:42Z

@Suor @MrOutis Please review.

efiop · 2019-11-20T08:14:12Z

@pared Please rebase 😉

Suor · 2019-11-20T16:24:52Z

dvc/exceptions.py

+class HTTPError(DvcException):
+    def __init__(self, code, reason):
+        super(HTTPError, self).__init__(
+            "HTTP error: '{} {}'".format(code, reason)


This will show:

HTTPError: HTTP Error: 404 Not Found

Suor · 2019-11-20T16:27:41Z

dvc/remote/http.py

-        request = self._request("GET", from_info.url, stream=True)
+        response = self._request("GET", from_info.url, stream=True)
+        if response.status_code != 200:
+            raise HTTPError(response.status_code, response.reason)
        with Tqdm(
            total=None if no_progress_bar else self._content_length(from_info),


Suggested change

total=None if no_progress_bar else self._content_length(from_info),

total=None if no_progress_bar else self._content_length(response),

I wonder why we are still doing this extra request, also ._content_length() doesn't need to work with url really.

This is outside of your PR actually. But since you are here)

But isn't the extra request done only if response does not contain Content-Length?

If you pass url (path_info) and not response then you always make a second request. _content_length() should stop accepting urls and might be even inlined, this will prevent this inefficiency.

That is right, sorry, I'll change accordingly. Though, I would leave _content_length as it is, in order to not obstruct _download method content.

Suor · 2019-11-20T16:29:26Z

dvc/remote/http.py

@@ -45,7 +47,7 @@ def _download(self, from_info, to_file, name=None, no_progress_bar=False):
            disable=no_progress_bar,
        ) as pbar:
            with open(to_file, "wb") as fd:
-                for chunk in request.iter_content(chunk_size=self.CHUNK_SIZE):
+                for chunk in response.iter_content(chunk_size=self.CHUNK_SIZE):
                    fd.write(chunk)
                    fd.flush()


I don't think we need flush. It is probably a remnant from the time we did http resume.

tests/func/test_repro.py

Suor · 2019-11-20T16:31:09Z

tests/unit/remote/test_http.py

+    class ErrorStatusRequestHandler(BaseHTTPRequestHandler):
+        def do_GET(self):
+            self.send_response(404, message="Not found")
+            self.end_headers()


Do we actually need it? Can we simply ask for some missing url?

Well, 404 is quite easy to generate

The question now is, should we leave changes made to StaticFileServer.
@Suor what do you think? I think they should stay, as the previous version was mapping string to its handler class, which was unnecessary.

It's cleaner the new way.

Suor · 2019-11-21T16:47:47Z

tests/unit/remote/test_http.py

+        remote = RemoteHTTP(dvc_repo, config)
+
+        with pytest.raises(HTTPError):
+            remote._download(URLInfo(url) / "file.txt", "file.txt")


Maybe use "missing.txt" to make it more obvious.

ghost · 2019-11-21T18:30:40Z

thanks, @pared ! looks good to me)

efiop · 2019-11-25T12:45:34Z

@Suor Please take a look 🙂

Suor · 2019-11-27T09:28:52Z

dvc/remote/http.py

-        request = self._request("GET", from_info.url, stream=True)
+        response = self._request("GET", from_info.url, stream=True)
+        if response.status_code != 200:
+            raise HTTPError(response.status_code, response.reason)
        with Tqdm(
            total=None if no_progress_bar else self._content_length(from_info),


If you pass url (path_info) and not response then you always make a second request. _content_length() should stop accepting urls and might be even inlined, this will prevent this inefficiency.

Suor

Looks good.

Suor · 2019-11-29T14:18:06Z

dvc/remote/http.py

-            self._request("HEAD", url_or_request).headers,
-        )
+    def _content_length(self, response):
+        headers = getattr(response, "headers", {})


No need for this getattr() anymore, simply request.headers.

…s code

Co-Authored-By: Alexander Schepanovski <suor.web@gmail.com>

pared force-pushed the 2510 branch 2 times, most recently from 4ae7fe0 to 4f898fe Compare November 14, 2019 14:54

Suor suggested changes Nov 14, 2019

View reviewed changes

pared force-pushed the 2510 branch 2 times, most recently from 5e06a19 to aa5ca84 Compare November 15, 2019 13:36

pared requested a review from Suor November 15, 2019 16:24

efiop requested a review from a user November 16, 2019 19:06

weekly-digest bot mentioned this pull request Nov 17, 2019

Weekly Digest (10 November, 2019 - 17 November, 2019) #2805

Closed

pared force-pushed the 2510 branch from aa5ca84 to f294fe9 Compare November 20, 2019 09:15

Suor suggested changes Nov 20, 2019

View reviewed changes

pared force-pushed the 2510 branch from 497f992 to e42f277 Compare November 21, 2019 16:45

Suor reviewed Nov 21, 2019

View reviewed changes

pared force-pushed the 2510 branch from 30d9846 to 8478397 Compare November 21, 2019 18:22

ghost approved these changes Nov 21, 2019

View reviewed changes

efiop requested a review from Suor November 22, 2019 03:46

weekly-digest bot mentioned this pull request Nov 24, 2019

Weekly Digest (17 November, 2019 - 24 November, 2019) #2841

Closed

pared force-pushed the 2510 branch from 8478397 to 5d77271 Compare November 27, 2019 09:13

Suor suggested changes Nov 27, 2019

View reviewed changes

pared requested a review from Suor November 27, 2019 14:45

Suor suggested changes Nov 29, 2019

View reviewed changes

pared and others added 6 commits November 29, 2019 15:30

remote: http: raise exception when download response with error statu…

72abfb0

…s code

remote: http: rename http error, refactor test

7218ff2

Update dvc/remote/http.py

e084b42

Co-Authored-By: Alexander Schepanovski <suor.web@gmail.com>

HTTPError: reformat error message

c102a78

test: remote: http: clear test file name

f7846a5

remote: http: calculate length basing on response and not head call

03c2719

pared force-pushed the 2510 branch from 63fc3a4 to 03c2719 Compare November 29, 2019 14:31

pared requested a review from Suor November 29, 2019 14:31

efiop merged commit 6ad278f into iterative:master Nov 29, 2019

weekly-digest bot mentioned this pull request Dec 1, 2019

Weekly Digest (24 November, 2019 - 1 December, 2019) #2876

Closed

pared deleted the 2510 branch December 17, 2019 13:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remote: http: raise exception when response with error status code #2794

remote: http: raise exception when response with error status code #2794

pared commented Nov 14, 2019 •

edited

Loading

Suor Nov 14, 2019

pared Nov 15, 2019

Suor Nov 15, 2019

Suor Nov 15, 2019

Suor Nov 14, 2019

Suor Nov 14, 2019

Suor Nov 14, 2019

Suor Nov 14, 2019

Suor Nov 14, 2019

efiop commented Nov 19, 2019

efiop commented Nov 20, 2019

Suor Nov 20, 2019

Suor Nov 20, 2019

Suor Nov 20, 2019

pared Nov 21, 2019

Suor Nov 27, 2019

pared Nov 27, 2019

Suor Nov 20, 2019

Suor Nov 20, 2019

pared Nov 21, 2019

pared Nov 21, 2019

Suor Nov 21, 2019

Suor Nov 21, 2019

pared Nov 21, 2019

ghost commented Nov 21, 2019

efiop commented Nov 25, 2019

Suor Nov 27, 2019

Suor left a comment

Suor Nov 29, 2019

pared Nov 29, 2019

		"Server responded with error status code: '{}' and message: "
		"'{}'".format(code, reason)

	total=None if no_progress_bar else self._content_length(from_info),
	total=None if no_progress_bar else self._content_length(response),

remote: http: raise exception when response with error status code #2794

remote: http: raise exception when response with error status code #2794

Conversation

pared commented Nov 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

efiop commented Nov 19, 2019

efiop commented Nov 20, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost commented Nov 21, 2019

efiop commented Nov 25, 2019

Choose a reason for hiding this comment

Suor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pared commented Nov 14, 2019 •

edited

Loading