-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow slicing of data messages #50
Comments
Received via the SDMX mailbox and related to this issue: "There is my view a feature that would be nice to have for avoiding possible timeout and improving the responsiveness when a client is querying a large amount of data. It would be nice to allow a retrieval by blocks. To do so, it would require two additional query string parameters: LimitNbKeys and StartSearchAfterKey. The LimitNbKeys will define the size of the block and the StartSearchAfterKey will allow to start a search after a defined key for retrieving the next block." |
What is the benefit of StartSearchAfterKey vs using a numerical limit & offset (or range) - a numerical range can translate directly to SQL LIMIT and OFFSET. |
The Range header unit If a client sends a request with a Range header using the Range unit AWS has advised, in a support ticket, that the Range header is dropped because the Range unit is not This lack of support for the Range unit |
This might be useful for item schemes too (cf. initial request from @buhaiovos) |
Re. item schemes, there are some rather large codelists (with up to 1 million codes for example). See related discussion. |
@nicharr I updated the description accordingly. The new proposal is compatible with AWS CloudFront. |
What about using parameters in the RESTful URL, like StartObs=nnnn&NumObs=nnnn? That would work in any environment, wouldn't it? |
Fixed by #185 |
Today, the SDMX Rest ws standard foresees one way for restricting number of observations through the firstNObservations and lastNObservations URL parameters. But those apply individually to each "series" and not to the whole data message, and are thus not useful when the client needs to restrict the message size, or for chunking the request into smaller pieces that are better manageable by the client.
A nice solution would be provided by the HTTP protocol that includes a way to request only a specific range of data. The related RFC 7233 standard says:
It works in the following way:
The server indicates with the HTTP header field "Accept-Ranges" that it supports range requests for the target resource, and which types of range units, e.g. bytes.
Accept-Ranges: bytes
The client uses the "Range" header field in the GET request to indicate the requested subrange, e.g. for the first 1000 bytes (with 0-based indexes):
Range: bytes=0-999
And for the next 1000 bytes:
Range: bytes=1000-1999
The server, if the range request is admissible, returns the 206 (Partial Content) status code that indicates that the server is successfully fulfilling a range request for the target resource by transferring one or more parts of the selected representation that correspond to the satisfiable ranges found in the request's "Range" header field. It also returns a "Content-Range" header field, describing what range of the selected representation is enclosed, e.g.
Content-Range: bytes 0-999/1234
Note: 1234 is the total number of bytes for the complete resource, or
Content-Range: bytes 0-999/*
Note: * is indicating that the total number of bytes for the complete resource is unknown by the server, or
Content-Range: bytes 0-566/567
Alternatively, the server generates a 416 (Range Not Satisfiable) response and a "Content-Range" header field with an unsatisfied-range value "*", e.g.
Content-Range: bytes */1234
Also see here and here, for examples of requests for multiple sub-ranges.
However
Since the
bytes
unit is inappropriate for our purposes, in .Stat Suite we have tested the HTTPRange
header approach with another range unit:values
. While this works fine in most environments, some users have reported issues in AWS hosting scenarios. Indeed, some cloud hosting services do not allow using range units that have not been registered with IANA,We therefore decided to use a non-standard/proprietary
X-Range
HTTP header instead ofRange
. This worked in all environments. The specification we have used is as follow:The server indicates with the HTTP header field "Accept-Ranges" that it supports range requests for the target resource and the range unit
values
.:Accept-Ranges: values
The client uses the "X-Range" header field in the GET request to indicate the subrange of the selected representation data, e.g. for the first 1000 SDMX observation values:
X-Range: values=0-999
And for the next 1000 values:
X-Range: values=1000-1999
The server, if the range request is admissible, returns the 206 (Partial Content) status code that indicates that the server is successfully fulfilling a range request for the target resource by transferring one or more parts of the selected representation that correspond to the satisfiable ranges found in the request's "X-Range" header field. It also returns a "Content-Range" header field, describing what range of the selected representation is enclosed, e.g.
Content-Range: values 0-999/1234
Note: 1234 is the total number of values for the complete resource, or
Content-Range: values 0-999/*
Note: * is indicating that the total number of values for the complete resource is unknown by the server, or
Content-Range: values 0-566/567
Alternatively, the server generates a 416 (Range Not Satisfiable) response and a "Content-Range" header field with an unsatisfied-range value "*", e.g.
Content-Range: values */1234
Note
This ticket should be address together with ticket Allow ordering the data query results by dimensions, since slicing/pagination makes only really sense if the order of the observations is deterministic.
The text was updated successfully, but these errors were encountered: