Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Range header support to content app #1473

Merged
merged 1 commit into from
Jul 9, 2021

Conversation

bmbouter
Copy link
Member

@bmbouter bmbouter commented Jul 8, 2021

This unskips the Range header and adds support for it to the content
app.

closes #8865

@pulpbot
Copy link
Member

pulpbot commented Jul 8, 2021

Attached issue: https://pulp.plan.io/issues/8865

pulpcore/content/handler.py Outdated Show resolved Hide resolved
@dralley dralley marked this pull request as draft July 8, 2021 16:19
@bmbouter bmbouter force-pushed the fix-range-requests branch 4 times, most recently from 9d7b7b3 to 44d0359 Compare July 8, 2021 20:07
@bmbouter bmbouter marked this pull request as ready for review July 8, 2021 20:09
@@ -809,15 +809,46 @@ def cast_remote_blocking():

remote = await loop.run_in_executor(None, cast_remote_blocking)

async def handle_headers(headers):
rng = request.http_range
Copy link
Contributor

@dralley dralley Jul 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I would maybe do instead is range_start, range_stop = request.http_range.start, request.http_range.stop, because rng mentally maps to "random number generator" for me

But this is a nitpick

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with associating rng with random number generator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change this now. ty


range_bytes = range_bytes - rng.start
range_bytes = range_bytes - (int(value) - stop)
response.headers[name] = str(range_bytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this part. Why are we rewriting the content-length?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we pass the Content-Length header back unmodified then even though say the range is asking for 10 bytes of 1024, and we're only returning 10 bytes, the Content-Length would show 1024 incorrectly. Also we do this here because some server's don't return the Content-Length at which point we don't know enough to set it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this math for these values are right. Testing with httpie, http HEAD https://fixtures.pulpproject.org/file/1.iso Range:bytes=40-50 returns content-length: 11 content-range: bytes 40-50/1024. Is either the range start or stop auto decremented by aiohttp?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the tests constants below, they seem to be correct after manually checking them against httpie and the fixtures, so I think this math must be correct if the tests pass even if it's not intuitive from the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's a little confusing but aiohttp does modify it by one I believe because start is inclusive and end is exclusive. https://docs.aiohttp.org/en/stable/web_reference.html#aiohttp.web.BaseRequest.http_range

Take this test for example, it asserts the content-length is 1200401 and when I run http HEAD https://fixtures.pulpproject.org/file-chunked/1.iso Range:bytes=1048176-2248576 I get

HTTP/1.1 206 Partial Content
cache-control: private
content-length: 1200401
content-range: bytes 1048176-2248576/10485760
content-type: application/octet-stream
date: Fri, 09 Jul 2021 20:35:02 GMT
etag: "60d0dd9f-a00000"
last-modified: Mon, 21 Jun 2021 18:42:39 GMT
server: nginx/1.14.1
set-cookie: dd6ec0a439e94ec1afa8091bd8cb5f4f=84e1263ecb996dab7f6f4ae499835116; path=/; HttpOnly; Secure; SameSite=None

I really really like the thoroughness and attention to detail you're showing here!

Copy link
Contributor

@dralley dralley Jul 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerrod3 Since the ranges start from 0, index 10 is the 11th byte, and likewise in this instance. So I think the math is correct.

Here's what the HTTP standard document says https://datatracker.ietf.org/doc/html/rfc7233#section-4.2

For byte ranges, a sender SHOULD indicate the complete length of the
representation from which the range has been extracted, unless the
complete length is unknown or difficult to determine. An asterisk
character ("*") in place of the complete-length indicates that the
representation length was unknown when the header field was
generated.

The following example illustrates when the complete length of the
selected representation is known by the sender to be 1234 bytes:

 Content-Range: bytes 42-1233/1234

and this second example illustrates when the complete length is
unknown:

 Content-Range: bytes 42-1233/*

A Content-Range field value is invalid if it contains a
byte-range-resp that has a last-byte-pos value less than its
first-byte-pos value, or a complete-length value less than or equal
to its last-byte-pos value. The recipient of an invalid
Content-Range MUST NOT attempt to recombine the received content with
a stored representation.

A server generating a 416 (Range Not Satisfiable) response to a
byte-range request SHOULD send a Content-Range header field with an
unsatisfied-range value, as in the following example:

 Content-Range: bytes */1234

The complete-length in a 416 response indicates the current length of
the selected representation.

The Content-Range header field has no meaning for status codes that
do not explicitly describe its semantic. For this specification,
only the 206 (Partial Content) and 416 (Range Not Satisfiable) status
codes describe a meaning for Content-Range.

Copy link
Contributor

@gerrod3 gerrod3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a test(s) for ranges with files that would be chunked is needed to verify this fix works correctly (see comments below). Use this fixture https://fixtures.pulpproject.org/file-chunked/. Also an additional test for streamed content should be added.

Comment on lines 840 to 855
if rng.start or rng.stop:
start_byte_pos = 0
end_byte_pos = len(data)
if rng.start:
start_byte_pos = max(0, rng.start - downloader._size)
if rng.stop:
end_byte_pos = min(len(data), rng.stop - downloader._size)

data_for_client = data[start_byte_pos:end_byte_pos]
await response.write(data_for_client)
else:
await response.write(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this handles chunking correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a test using the fixture to cause chunking. I think it does because when I developed the patch I modified chunk size here to be 3 bytes for files of size 1024. Adding a test from the start would have been better than that tho.

Comment on lines 852 to 857
if remote.policy != Remote.STREAMED:
await original_handle_data(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, using downloader._size will cause range requests to fail for streamed remotes since it won't call the original handle_data which calculates the current size. Probably should track with a local variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this I think would not work. I'll add a streamed test and also use the local var instead of ._size. ty!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test_content_served_streamed_with_range_request_valid_middle_range and it seemed to work as-is. Hmmmm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because it's not using chunking! I'm switching all of these tests to use the chunking fixture...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this new test is failing. I'm fixing it.

@@ -809,15 +809,46 @@ def cast_remote_blocking():

remote = await loop.run_in_executor(None, cast_remote_blocking)

async def handle_headers(headers):
rng = request.http_range
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with associating rng with random number generator.

CHANGES/8865.bugfix Outdated Show resolved Hide resolved
@bmbouter bmbouter force-pushed the fix-range-requests branch 2 times, most recently from 1f69b24 to f41d630 Compare July 9, 2021 15:51
This unskips the `Range` header and adds support for it to the content
app.

closes #8865

def setup_download_test(self, policy):
self.assertRaises(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should ensure that HTTP 416 is being raised, which is the correct status code.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/416

self.setup_download_test("immediate")
self.do_range_request_download_test()
self.setup_download_test(
"immediate", url="https://fixtures.pulpproject.org/file-chunked/PULP_MANIFEST"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this URL to constants.py and import it to avoid duplicates.

@dralley dralley merged commit df2c8f1 into pulp:master Jul 9, 2021
@bmbouter bmbouter deleted the fix-range-requests branch January 17, 2022 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants