Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow slicing of data messages #50

Closed
dosse opened this issue Oct 20, 2017 · 8 comments · Fixed by #185
Closed

Allow slicing of data messages #50

dosse opened this issue Oct 20, 2017 · 8 comments · Fixed by #185

Comments

@dosse
Copy link
Contributor

dosse commented Oct 20, 2017

Today, the SDMX Rest ws standard foresees one way for restricting number of observations through the firstNObservations and lastNObservations URL parameters. But those apply individually to each "series" and not to the whole data message, and are thus not useful when the client needs to restrict the message size, or for chunking the request into smaller pieces that are better manageable by the client.

A nice solution would be provided by the HTTP protocol that includes a way to request only a specific range of data. The related RFC 7233 standard says:

The "Range" header field on a GET request modifies the method semantics to request transfer of only one or more subranges of the selected representation data, rather than the entire selected representation data.

It works in the following way:

  1. The server indicates with the HTTP header field "Accept-Ranges" that it supports range requests for the target resource, and which types of range units, e.g. bytes.
    Accept-Ranges: bytes

  2. The client uses the "Range" header field in the GET request to indicate the requested subrange, e.g. for the first 1000 bytes (with 0-based indexes):
    Range: bytes=0-999
    And for the next 1000 bytes:
    Range: bytes=1000-1999

  3. The server, if the range request is admissible, returns the 206 (Partial Content) status code that indicates that the server is successfully fulfilling a range request for the target resource by transferring one or more parts of the selected representation that correspond to the satisfiable ranges found in the request's "Range" header field. It also returns a "Content-Range" header field, describing what range of the selected representation is enclosed, e.g.
    Content-Range: bytes 0-999/1234
    Note: 1234 is the total number of bytes for the complete resource, or
    Content-Range: bytes 0-999/*
    Note: * is indicating that the total number of bytes for the complete resource is unknown by the server, or
    Content-Range: bytes 0-566/567
    Alternatively, the server generates a 416 (Range Not Satisfiable) response and a "Content-Range" header field with an unsatisfied-range value "*", e.g.
    Content-Range: bytes */1234

Also see here and here, for examples of requests for multiple sub-ranges.

However

Since the bytes unit is inappropriate for our purposes, in .Stat Suite we have tested the HTTP Range header approach with another range unit: values. While this works fine in most environments, some users have reported issues in AWS hosting scenarios. Indeed, some cloud hosting services do not allow using range units that have not been registered with IANA,

We therefore decided to use a non-standard/proprietary X-Range HTTP header instead of Range. This worked in all environments. The specification we have used is as follow:

  1. The server indicates with the HTTP header field "Accept-Ranges" that it supports range requests for the target resource and the range unit values.:
    Accept-Ranges: values

  2. The client uses the "X-Range" header field in the GET request to indicate the subrange of the selected representation data, e.g. for the first 1000 SDMX observation values:
    X-Range: values=0-999
    And for the next 1000 values:
    X-Range: values=1000-1999

  3. The server, if the range request is admissible, returns the 206 (Partial Content) status code that indicates that the server is successfully fulfilling a range request for the target resource by transferring one or more parts of the selected representation that correspond to the satisfiable ranges found in the request's "X-Range" header field. It also returns a "Content-Range" header field, describing what range of the selected representation is enclosed, e.g.
    Content-Range: values 0-999/1234
    Note: 1234 is the total number of values for the complete resource, or
    Content-Range: values 0-999/*
    Note: * is indicating that the total number of values for the complete resource is unknown by the server, or
    Content-Range: values 0-566/567
    Alternatively, the server generates a 416 (Range Not Satisfiable) response and a "Content-Range" header field with an unsatisfied-range value "*", e.g.
    Content-Range: values */1234

Note

This ticket should be address together with ticket Allow ordering the data query results by dimensions, since slicing/pagination makes only really sense if the order of the observations is deterministic.

@sosna sosna changed the title Allow for slicing of potentially big data messages Allow slicing of data messages Nov 22, 2017
@sosna sosna added this to the 2018 milestone Dec 1, 2017
@sosna sosna added this to To do in 2018 Release Dec 22, 2017
@sosna sosna modified the milestones: 2018, 2019 Apr 24, 2018
@sosna sosna removed this from To do in 2018 Release Apr 24, 2018
@sosna sosna removed this from the 2019 milestone Nov 9, 2018
@sosna
Copy link
Member

sosna commented Nov 23, 2021

Received via the SDMX mailbox and related to this issue:

"There is my view a feature that would be nice to have for avoiding possible timeout and improving the responsiveness when a client is querying a large amount of data. It would be nice to allow a retrieval by blocks. To do so, it would require two additional query string parameters: LimitNbKeys and StartSearchAfterKey. The LimitNbKeys will define the size of the block and the StartSearchAfterKey will allow to start a search after a defined key for retrieving the next block."

@agent96
Copy link

agent96 commented Nov 23, 2021

What is the benefit of StartSearchAfterKey vs using a numerical limit & offset (or range) - a numerical range can translate directly to SQL LIMIT and OFFSET.

@sosna sosna added the next label Mar 3, 2022
@sosna sosna added this to the v2.1.0 milestone Mar 8, 2022
@nicharr
Copy link

nicharr commented Apr 28, 2022

The Range header unit values is unfortunately not supported by AWS CloudFront:

If a client sends a request with a Range header using the Range unit values to an API deployed behind a CloudFront instance, CloudFront will drop the Range header as CloudFront interprets values as an invalid Range unit.

AWS has advised, in a support ticket, that the Range header is dropped because the Range unit is not bytes. Citing that bytes is the only unit value defined in the http1.1 standard https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.12

This lack of support for the Range unit values from AWS CloudFront is of concern as my people / organisations would likely want to run SDMX APIs behind a CloudFront instance.

@sosna
Copy link
Member

sosna commented Aug 10, 2022

This might be useful for item schemes too (cf. initial request from @buhaiovos)

@sosna
Copy link
Member

sosna commented Sep 1, 2022

Re. item schemes, there are some rather large codelists (with up to 1 million codes for example). See related discussion.

@dosse
Copy link
Contributor Author

dosse commented Nov 9, 2023

@nicharr I updated the description accordingly. The new proposal is compatible with AWS CloudFront.

@egreising
Copy link
Member

What about using parameters in the RESTful URL, like StartObs=nnnn&NumObs=nnnn? That would work in any environment, wouldn't it?
I would suggest using "Observations" or "Cases" and not "Values" since with multiple measures we'll have several "values" per "observation" or "case".

@sosna sosna assigned sosna and unassigned stratosn Dec 15, 2023
@sosna sosna linked a pull request Dec 15, 2023 that will close this issue
@sosna sosna added fixed and removed fixed labels Dec 15, 2023
@sosna sosna modified the milestones: v2.1.0, v2.2.0 Jan 9, 2024
@sosna
Copy link
Member

sosna commented Jul 11, 2024

Fixed by #185

@sosna sosna closed this as completed Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Published
Development

Successfully merging a pull request may close this issue.

6 participants