Allow slicing of data messages #50

dosse · 2017-10-20T18:29:32Z

Today, the SDMX Rest ws standard foresees one way for restricting number of observations through the firstNObservations and lastNObservations URL parameters. But those apply individually to each "series" and not to the whole data message, and are thus not useful when the client needs to restrict the message size, or for chunking the request into smaller pieces that are better manageable by the client.

A nice solution would be provided by the HTTP protocol that includes a way to request only a specific range of data. The related RFC 7233 standard says:

The "Range" header field on a GET request modifies the method semantics to request transfer of only one or more subranges of the selected representation data, rather than the entire selected representation data.

It works in the following way:

The server indicates with the HTTP header field "Accept-Ranges" that it supports range requests for the target resource, and which types of range units, e.g. bytes.
Accept-Ranges: bytes
The client uses the "Range" header field in the GET request to indicate the requested subrange, e.g. for the first 1000 bytes (with 0-based indexes):
Range: bytes=0-999
And for the next 1000 bytes:
Range: bytes=1000-1999
The server, if the range request is admissible, returns the 206 (Partial Content) status code that indicates that the server is successfully fulfilling a range request for the target resource by transferring one or more parts of the selected representation that correspond to the satisfiable ranges found in the request's "Range" header field. It also returns a "Content-Range" header field, describing what range of the selected representation is enclosed, e.g.
Content-Range: bytes 0-999/1234
Note: 1234 is the total number of bytes for the complete resource, or
Content-Range: bytes 0-999/*
Note: * is indicating that the total number of bytes for the complete resource is unknown by the server, or
Content-Range: bytes 0-566/567
Alternatively, the server generates a 416 (Range Not Satisfiable) response and a "Content-Range" header field with an unsatisfied-range value "*", e.g.
Content-Range: bytes */1234

Also see here and here, for examples of requests for multiple sub-ranges.

However

Since the bytes unit is inappropriate for our purposes, in .Stat Suite we have tested the HTTP Range header approach with another range unit: values. While this works fine in most environments, some users have reported issues in AWS hosting scenarios. Indeed, some cloud hosting services do not allow using range units that have not been registered with IANA,

We therefore decided to use a non-standard/proprietary X-Range HTTP header instead of Range. This worked in all environments. The specification we have used is as follow:

The server indicates with the HTTP header field "Accept-Ranges" that it supports range requests for the target resource and the range unit values.:
Accept-Ranges: values
The client uses the "X-Range" header field in the GET request to indicate the subrange of the selected representation data, e.g. for the first 1000 SDMX observation values:
X-Range: values=0-999
And for the next 1000 values:
X-Range: values=1000-1999
The server, if the range request is admissible, returns the 206 (Partial Content) status code that indicates that the server is successfully fulfilling a range request for the target resource by transferring one or more parts of the selected representation that correspond to the satisfiable ranges found in the request's "X-Range" header field. It also returns a "Content-Range" header field, describing what range of the selected representation is enclosed, e.g.
Content-Range: values 0-999/1234
Note: 1234 is the total number of values for the complete resource, or
Content-Range: values 0-999/*
Note: * is indicating that the total number of values for the complete resource is unknown by the server, or
Content-Range: values 0-566/567
Alternatively, the server generates a 416 (Range Not Satisfiable) response and a "Content-Range" header field with an unsatisfied-range value "*", e.g.
Content-Range: values */1234

Note

This ticket should be address together with ticket Allow ordering the data query results by dimensions, since slicing/pagination makes only really sense if the order of the observations is deterministic.

The text was updated successfully, but these errors were encountered:

sosna · 2021-11-23T09:37:08Z

Received via the SDMX mailbox and related to this issue:

"There is my view a feature that would be nice to have for avoiding possible timeout and improving the responsiveness when a client is querying a large amount of data. It would be nice to allow a retrieval by blocks. To do so, it would require two additional query string parameters: LimitNbKeys and StartSearchAfterKey. The LimitNbKeys will define the size of the block and the StartSearchAfterKey will allow to start a search after a defined key for retrieving the next block."

agent96 · 2021-11-23T13:26:48Z

What is the benefit of StartSearchAfterKey vs using a numerical limit & offset (or range) - a numerical range can translate directly to SQL LIMIT and OFFSET.

nicharr · 2022-04-28T23:29:24Z

The Range header unit values is unfortunately not supported by AWS CloudFront:

If a client sends a request with a Range header using the Range unit values to an API deployed behind a CloudFront instance, CloudFront will drop the Range header as CloudFront interprets values as an invalid Range unit.

AWS has advised, in a support ticket, that the Range header is dropped because the Range unit is not bytes. Citing that bytes is the only unit value defined in the http1.1 standard https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.12

This lack of support for the Range unit values from AWS CloudFront is of concern as my people / organisations would likely want to run SDMX APIs behind a CloudFront instance.

sosna · 2022-08-10T09:30:29Z

This might be useful for item schemes too (cf. initial request from @buhaiovos)

sosna · 2022-09-01T06:17:18Z

Re. item schemes, there are some rather large codelists (with up to 1 million codes for example). See related discussion.

dosse · 2023-11-09T22:46:08Z

@nicharr I updated the description accordingly. The new proposal is compatible with AWS CloudFront.

egreising · 2023-11-10T15:12:03Z

What about using parameters in the RESTful URL, like StartObs=nnnn&NumObs=nnnn? That would work in any environment, wouldn't it?
I would suggest using "Observations" or "Cases" and not "Values" since with multiple measures we'll have several "values" per "observation" or "case".

sosna · 2024-07-11T14:59:30Z

Fixed by #185

chris-beer mentioned this issue Oct 22, 2017

Add support for partial references #45

Closed

sosna added new feature normal and removed normal labels Nov 22, 2017

sosna changed the title ~~Allow for slicing of potentially big data messages~~ Allow slicing of data messages Nov 22, 2017

sosna added this to the 2018 milestone Dec 1, 2017

sosna added this to To do in 2018 Release Dec 22, 2017

sosna modified the milestones: 2018, 2019 Apr 24, 2018

sosna removed this from To do in 2018 Release Apr 24, 2018

sosna assigned stratosn Oct 29, 2018

sosna removed this from the 2019 milestone Nov 9, 2018

dosse mentioned this issue Nov 23, 2018

Add option to query for number of data points to be delivered #69

Closed

dosse mentioned this issue Jun 9, 2021

Allow ordering the data query results by dimensions #151

Closed

sosna added the next label Mar 3, 2022

sosna added this to the v2.1.0 milestone Mar 8, 2022

sosna assigned sosna and unassigned stratosn Dec 15, 2023

sosna mentioned this issue Dec 15, 2023

Add pagination and sorting mechanism #185

Merged

sosna linked a pull request Dec 15, 2023 that will close this issue

Add pagination and sorting mechanism #185

Merged

sosna added fixed and removed fixed labels Dec 15, 2023

sosna modified the milestones: v2.1.0, v2.2.0 Jan 9, 2024

sosna closed this as completed Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow slicing of data messages #50

Allow slicing of data messages #50

dosse commented Oct 20, 2017 •

edited

Loading

sosna commented Nov 23, 2021

agent96 commented Nov 23, 2021

nicharr commented Apr 28, 2022

sosna commented Aug 10, 2022

sosna commented Sep 1, 2022

dosse commented Nov 9, 2023

egreising commented Nov 10, 2023

sosna commented Jul 11, 2024

Allow slicing of data messages #50

Allow slicing of data messages #50

Comments

dosse commented Oct 20, 2017 • edited Loading

However

Note

sosna commented Nov 23, 2021

agent96 commented Nov 23, 2021

nicharr commented Apr 28, 2022

sosna commented Aug 10, 2022

sosna commented Sep 1, 2022

dosse commented Nov 9, 2023

egreising commented Nov 10, 2023

sosna commented Jul 11, 2024

dosse commented Oct 20, 2017 •

edited

Loading