-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds Range header support to content app #1473
Conversation
01e8b1d
to
68de39f
Compare
Attached issue: https://pulp.plan.io/issues/8865 |
pulpcore/tests/functional/api/using_plugin/test_distributions.py
Outdated
Show resolved
Hide resolved
9d7b7b3
to
44d0359
Compare
44d0359
to
924b3b7
Compare
pulpcore/content/handler.py
Outdated
@@ -809,15 +809,46 @@ def cast_remote_blocking(): | |||
|
|||
remote = await loop.run_in_executor(None, cast_remote_blocking) | |||
|
|||
async def handle_headers(headers): | |||
rng = request.http_range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I would maybe do instead is range_start, range_stop = request.http_range.start, request.http_range.stop
, because rng
mentally maps to "random number generator" for me
But this is a nitpick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with associating rng
with random number generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change this now. ty
|
||
range_bytes = range_bytes - rng.start | ||
range_bytes = range_bytes - (int(value) - stop) | ||
response.headers[name] = str(range_bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this part. Why are we rewriting the content-length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we pass the Content-Length
header back unmodified then even though say the range is asking for 10 bytes of 1024, and we're only returning 10 bytes, the Content-Length
would show 1024 incorrectly. Also we do this here because some server's don't return the Content-Length
at which point we don't know enough to set it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this math for these values are right. Testing with httpie, http HEAD https://fixtures.pulpproject.org/file/1.iso Range:bytes=40-50
returns content-length: 11 content-range: bytes 40-50/1024
. Is either the range start or stop auto decremented by aiohttp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the tests constants below, they seem to be correct after manually checking them against httpie and the fixtures, so I think this math must be correct if the tests pass even if it's not intuitive from the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's a little confusing but aiohttp does modify it by one I believe because start is inclusive and end is exclusive. https://docs.aiohttp.org/en/stable/web_reference.html#aiohttp.web.BaseRequest.http_range
Take this test for example, it asserts the content-length is 1200401 and when I run http HEAD https://fixtures.pulpproject.org/file-chunked/1.iso Range:bytes=1048176-2248576
I get
HTTP/1.1 206 Partial Content
cache-control: private
content-length: 1200401
content-range: bytes 1048176-2248576/10485760
content-type: application/octet-stream
date: Fri, 09 Jul 2021 20:35:02 GMT
etag: "60d0dd9f-a00000"
last-modified: Mon, 21 Jun 2021 18:42:39 GMT
server: nginx/1.14.1
set-cookie: dd6ec0a439e94ec1afa8091bd8cb5f4f=84e1263ecb996dab7f6f4ae499835116; path=/; HttpOnly; Secure; SameSite=None
I really really like the thoroughness and attention to detail you're showing here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gerrod3 Since the ranges start from 0, index 10 is the 11th byte, and likewise in this instance. So I think the math is correct.
Here's what the HTTP standard document says https://datatracker.ietf.org/doc/html/rfc7233#section-4.2
For byte ranges, a sender SHOULD indicate the complete length of the
representation from which the range has been extracted, unless the
complete length is unknown or difficult to determine. An asterisk
character ("*") in place of the complete-length indicates that the
representation length was unknown when the header field was
generated.The following example illustrates when the complete length of the
selected representation is known by the sender to be 1234 bytes:Content-Range: bytes 42-1233/1234
and this second example illustrates when the complete length is
unknown:Content-Range: bytes 42-1233/*
A Content-Range field value is invalid if it contains a
byte-range-resp that has a last-byte-pos value less than its
first-byte-pos value, or a complete-length value less than or equal
to its last-byte-pos value. The recipient of an invalid
Content-Range MUST NOT attempt to recombine the received content with
a stored representation.A server generating a 416 (Range Not Satisfiable) response to a
byte-range request SHOULD send a Content-Range header field with an
unsatisfied-range value, as in the following example:Content-Range: bytes */1234
The complete-length in a 416 response indicates the current length of
the selected representation.The Content-Range header field has no meaning for status codes that
do not explicitly describe its semantic. For this specification,
only the 206 (Partial Content) and 416 (Range Not Satisfiable) status
codes describe a meaning for Content-Range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a test(s) for ranges with files that would be chunked is needed to verify this fix works correctly (see comments below). Use this fixture https://fixtures.pulpproject.org/file-chunked/. Also an additional test for streamed content should be added.
pulpcore/content/handler.py
Outdated
if rng.start or rng.stop: | ||
start_byte_pos = 0 | ||
end_byte_pos = len(data) | ||
if rng.start: | ||
start_byte_pos = max(0, rng.start - downloader._size) | ||
if rng.stop: | ||
end_byte_pos = min(len(data), rng.stop - downloader._size) | ||
|
||
data_for_client = data[start_byte_pos:end_byte_pos] | ||
await response.write(data_for_client) | ||
else: | ||
await response.write(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this handles chunking correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a test using the fixture to cause chunking. I think it does because when I developed the patch I modified chunk size here to be 3 bytes for files of size 1024. Adding a test from the start would have been better than that tho.
if remote.policy != Remote.STREAMED: | ||
await original_handle_data(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, using downloader._size
will cause range requests to fail for streamed remotes since it won't call the original handle_data
which calculates the current size. Probably should track with a local variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this I think would not work. I'll add a streamed test and also use the local var instead of ._size
. ty!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test_content_served_streamed_with_range_request_valid_middle_range
and it seemed to work as-is. Hmmmm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because it's not using chunking! I'm switching all of these tests to use the chunking fixture...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this new test is failing. I'm fixing it.
pulpcore/content/handler.py
Outdated
@@ -809,15 +809,46 @@ def cast_remote_blocking(): | |||
|
|||
remote = await loop.run_in_executor(None, cast_remote_blocking) | |||
|
|||
async def handle_headers(headers): | |||
rng = request.http_range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with associating rng
with random number generator.
1f69b24
to
f41d630
Compare
This unskips the `Range` header and adds support for it to the content app. closes #8865
f41d630
to
65dbb06
Compare
|
||
def setup_download_test(self, policy): | ||
self.assertRaises( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should ensure that HTTP 416 is being raised, which is the correct status code.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/416
self.setup_download_test("immediate") | ||
self.do_range_request_download_test() | ||
self.setup_download_test( | ||
"immediate", url="https://fixtures.pulpproject.org/file-chunked/PULP_MANIFEST" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this URL to constants.py and import it to avoid duplicates.
This unskips the
Range
header and adds support for it to the contentapp.
closes #8865