Match standard HTTP semantics of inclusive end value in Range and Content-Range headers#7632
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to fix issue #7626 by properly handling HTTP Range header semantics. In HTTP, both the start and end positions in Range headers are inclusive (e.g., "bytes=0-9" requests 10 bytes). The PR converts inclusive range end values to exclusive values during parsing for idiomatic C++ processing, and converts them back to inclusive before emitting response headers.
Changes:
- Added HTTP Range header constant to improve code consistency
- Updated server-side range parsing to convert inclusive end values to exclusive for internal processing
- Updated server-side response generation to convert back to inclusive before emitting Content-Range headers
- Updated tests to verify correct inclusive range semantics
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| include/ccf/http_consts.h | Added RANGE constant for consistent header name usage |
| src/node/rpc/file_serving_handlers.h | Server-side: parses inclusive range ends, converts to exclusive for processing, converts back to inclusive for response headers |
| src/snapshots/fetch.h | Client-side: parses inclusive range ends and converts to exclusive, but fails to convert back to inclusive when sending Range headers |
| tests/e2e_operations.py | Updated tests to verify inclusive range semantics in server responses |
|
@eddyashton should we implement the standard exactly (including the error behaviour when reading past the end), and add a custom header we can use internally when we want our different (arguably better) behaviour? That way Range would work as expected for external clients downloading ledger chunks (or snapshots), but still be efficient and avoid the extra round trip internally. |
Discussed offline, but repeating here for posterity. I'd thought that truncating an end-past-the-size response was a hack we added (surely a normal client would send an open-ended range for the remainder, and/or HEAD requests to discover the total content-size?), but it turns out this is actually the standards-mandated behaviour in https://datatracker.ietf.org/doc/html/rfc7233#section-2.1:
That's exactly what we do. So that's already standard behaviour, and we don't have to change anything. |
… and Content-Range headers (microsoft#7632)" This reverts commit b4be36f.
…e and Content-Range headers (microsoft#7632)" (microsoft#7641) This reverts commit 21d5180.
…e and Content-Range headers (microsoft#7632)" (microsoft#7641) This reverts commit 21d5180.
Resolves #7626.
I think the simplest way to do this is to convert during parsing, and back just before emission, but keep our exclusive range end value for processing in C+ - that's more idiomatic, and avoids peppering
+/-1s throughout the code.Note that the
Content-Rangeheader is quite funny, with a total size as well as theContent-Lengthheader, and because we consistently return 206 partial contents even for a complete response, you can end up with something like:This is a (HTTP-standard compatible) response containing all 500 bytes of a 500 byte file. Only the range end value is inclusive.
I've prodded our endpoints a little with
curlandhttpxwhen returning large files. They seem happy, but they were also happy before. It looks like because the server is allowed to return a different range, and there are multiple places where we spell that response-range's true size, they're extremely tolerant of us misparsing the range ends.