Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Statistics Lithuania as a source #29

Closed
bertrandmarc opened this issue Nov 27, 2020 · 6 comments
Closed

Add Statistics Lithuania as a source #29

bertrandmarc opened this issue Nov 27, 2020 · 6 comments
Assignees
Labels
bug data-source Issues related to specific web services/data source(s) enh Enhancements & new features

Comments

@bertrandmarc
Copy link

It would be nice to be able to have Statistics Lithuania as new source, as in the rsdmx R package. Statistics Lithuania provides SDMX 2.1 and SDMX-json services, well documented. However, there a several issues preventing from using Statistics Lithuania.

In principle, the new source would like (SDMX):

{
  "id": "LSD",
  "documentation": "https://osp.stat.gov.lt/rdb-rest",
  "url": "osp-rs.stat.gov.lt/rest_xml",
  "name": "Statistics Lithuania"
}

or (JSON):

{
  "id": "LSD",
  "documentation": "https://osp.stat.gov.lt/rdb-rest",
  "url": "http://osp-rs.stat.gov.lt/rest_json",
  "name": "Statistics Lithuania"
}

Please find below the issues preventing the usage of Statistics Lithuania as source in sdmx, with workarounds.

  1. The option verify is not passed properly to requests, making it difficult to use https sources. For instance:
>>> statlt = sdmx.Request('STAT_LT', verify=False)
>>> statlt.session.verify 
True
# This is a workaround
>>> statlt.session.verify = False
  1. Statistics Lithuania replies with an unusual content type: application/force-download
>>> sdmx_msg = statlt.data('S7R239_M2110211')
Traceback (most recent call last):
  File "C:\Users\marcber\PycharmProjects\venv\lib\site-packages\sdmx\api.py", line 439, in get
    Reader = get_reader_for_content_type(content_type)
  File "C:\Users\marcber\PycharmProjects\venv\lib\site-packages\sdmx\reader\__init__.py", line 53, in get_reader_for_content_type
    raise ValueError(f"Unsupported content type: {ctype}") from None
ValueError: Unsupported content type: application/force-download
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Users\marcber\PycharmProjects\venv\lib\site-packages\sdmx\api.py", line 443, in get
    "content type: %s" % content_type
ValueError: can't determine a reader for response content type: application/force-download;charset=UTF-8
# This is a workaround
>>> url=statlt.data('S7R239_M2110211', dry_run=True).url
>>> req = requests.get(url, verify=False)
>>> sdmx_msg = sdmx.read_sdmx(io.BytesIO(req.content))
  1. The xml supplied cannot be parsed properly (to be investigated, might be on Statistics LT side)
sdmx_msg = sdmx.read_sdmx(io.BytesIO(req.content))
2020-11-27 14:05:58,541 sdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
--- <class 'sdmx.message.DataMessage'> ---
[<sdmx.DataMessage>
  <Header>
    id: '044A9368AE724B13819B16D5046E66B6'
    prepared: '2020-11-27T15:05:39.680000+02:00'
    sender: <Agency LSD>
    source: 
    test: True
  DataSet (1)
  dataflow: <DataflowDefinition (missing id)>
  observation_dimension: <sdmx.model._AllDimensions object at 0x00000123B4EABD68>]
--- <class 'sdmx.model.DataStructureDefinition'> ---
[<DataStructureDefinition :M2110211(1.0)>]
--- M2110211 ---
[<DataStructureDefinition :M2110211(1.0)>]
--- Attributes ---
[{'DS_LAST_UPDATE': <AttributeValue: DS_LAST_UPDATE=2020-11-12>, 'DS_REGIONAL': <AttributeValue: DS_REGIONAL=N>, 'DS_TIME_FORMAT': <AttributeValue: DS_TIME_FORMAT=3>, 'OSP_MASYVO_STATUSAS': <AttributeValue: OSP_MASYVO_STATUSAS=A>}]
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Users\marcber\PycharmProjects\venv\lib\site-packages\sdmx\reader\__init__.py", line 148, in read_sdmx
    return reader().read_message(obj, **kwargs)
  File "C:\Users\marcber\PycharmProjects\venv\lib\site-packages\sdmx\reader\sdmxml.py", line 273, in read_message
    raise RuntimeError(f"{uncollected} uncollected items")
RuntimeError: 1 uncollected items
# A workaround is to use the json endpoint http://osp-rs.stat.gov.lt/rest_json
>>> url=statltjson.data('S7R239_M2110211', dry_run=True).url
>>> req = requests.get(url, verify=False)
>>> sdmx_msg = sdmx.read_sdmx(io.BytesIO(req.content))

I will try to investigate a bit the issue with the xml endpoint on my own. For the rest I am sorry I am not sure I can of much help, but I'll do my best.

Best,
Bertrand

@khaeru khaeru added data-source Issues related to specific web services/data source(s) enh Enhancements & new features labels Nov 29, 2020
@khaeru khaeru changed the title Whishlist: add Statistics Lithuania as a source Add Statistics Lithuania as a source Nov 29, 2020
@khaeru
Copy link
Owner

khaeru commented Nov 29, 2020

Thanks for the thorough details here, @bertrandmarc.

Per point (2), each class sdmx.reader.*.Reader has a property content_types that lists the content types it can read:

sdmx/sdmx/reader/sdmxml.py

Lines 205 to 212 in f19948d

class Reader(BaseReader):
content_types = [
"application/xml",
"application/vnd.sdmx.genericdata+xml",
"application/vnd.sdmx.structure+xml",
"application/vnd.sdmx.structurespecificdata+xml",
"text/xml",
]

This list can be extended with new items.

Per point (3): can you paste the exact URL used for this request? It should be shown in a log message. The SDMX-ML reader does a dump like this when it fails to pick up and incorporate all of the individually parsed items found in the message. Looking at the XML directly should tell us what unusual content is not correctly parsed.

@bertrandmarc
Copy link
Author

bertrandmarc commented Dec 1, 2020

Ok, thank you I will add a new content_type to fix point 2.

About point (3), the url passed to request is http://osp-rs.stat.gov.lt/rest_xml/data/S7R239_M2110211 and the redirected url (from req.url) is https://osp-rs.stat.gov.lt/ords/ipospp/ospp/rest_xml/data/S7R239_M2110211.

For your information, this url is parsed without apparent errors by the R package rsdmx:

readSDMX(providerId='LSD', resource="data", flowRef='S7R239_M2110211')

@khaeru
Copy link
Owner

khaeru commented Dec 1, 2020

http://osp-rs.stat.gov.lt/rest_xml/data/S7R239_M2110211 and the redirected url (from req.url) is https://osp-rs.stat.gov.lt/ords/ipospp/ospp/rest_xml/data/S7R239_M2110211.

Visiting these in a browser returns the following 400 error:

<?xml version="1.0" encoding="UTF-8"?>
<mes:GenericData xmlns:mes="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message" xmlns:com="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/common" xmlns:footer="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message/footer" xmlns:g="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message https://osp-rs.stat.gov.lt/xsd_scheme/SDMXMessage.xsd http://www.sdmx.org/resources/sdmxml/schemas/v2_1/common https://osp-rs.stat.gov.lt/xsd_scheme/SDMXCommon.xsd http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic https://osp-rs.stat.gov.lt/xsd_scheme/SDMXDataGeneric.xsd">
   <mes:Header>
      <mes:ID>2B78DAFFD8AC4140B3AA7DD05591E28C</mes:ID>
      <mes:Test>true</mes:Test>
      <mes:Prepared>2020-12-01T12:34:49.935000000+02:00</mes:Prepared>
      <mes:Sender id="LSD"/>
      <mes:Structure structureID="LSD" dimensionAtObservation="AllDimensions">
         <com:Structure>
            <URN>urn:sdmx:org.sdmx.infomodel.datastructure.DataStructure=LSD.LSD(1.0)</URN>
         </com:Structure>
      </mes:Structure>
   </mes:Header>
   <footer:Footer>
      <footer:Message code="400" severity="Error">
         <com:Text xml:lang="lt">Neteisinga parametro Request reikšmė</com:Text>
         <com:Text xml:lang="en">Bad parameter Request value</com:Text>
      </footer:Message>
   </footer:Footer>
</mes:GenericData>

Are there any query parameters in the URL?

@bertrandmarc
Copy link
Author

Sorry, it was a typo in the urls. I have just fixed it by editing my previous message.

@khaeru
Copy link
Owner

khaeru commented Dec 1, 2020

Great, thanks. So the XML looks, in part, like this:

<?xml version="1.0" encoding="UTF-8"?>
<mes:GenericData …>
   <mes:Header>
      <mes:ID>F7534B25376142ECB32ACB52DEC29020</mes:ID>
      <mes:Test>true</mes:Test>
      <mes:Prepared>2020-12-01T21:25:24.447000000+02:00</mes:Prepared>
      <mes:Sender id="LSD"/>
      <mes:Structure structureID="M2110211" dimensionAtObservation="AllDimensions">
         <com:Structure>
            <URN>urn:sdmx:org.sdmx.infomodel.datastructure.DataStructure=LSD.M2110211(1.0)</URN>
         </com:Structure>
      </mes:Structure>
   </mes:Header>
  <mes:DataSet structureRef="M2110211">
    …
  </mes:DataSet>
</mes:GenericData>

A wild guess: usually the <mes:Structure structureID="FOO" …> and matching <mes:DataSet structureRef="FOO"> use some value ("FOO") that is only meaningful with the message, and is different from the actual ID of the DSD given by the <URN> within the <mes:Structure> tag. In this case, these IDs are the same. Perhaps this causes the reader code to fail to clean up the parsed object.

I will debug further!

@khaeru
Copy link
Owner

khaeru commented Dec 11, 2020

A wild guess…

This turned out to be a wrong guess 😅 As mentioned at #33 (comment), the issue is visible in the dump above:

--- <class 'sdmx.message.DataMessage'> ---
[…]
--- <class 'sdmx.model.DataStructureDefinition'> ---
[…]
--- M2110211 ---
[…]
--- Attributes ---
[{'DS_LAST_UPDATE': <AttributeValue: DS_LAST_UPDATE=2020-11-12>, 'DS_REGIONAL': <AttributeValue: DS_REGIONAL=N>, 'DS_TIME_FORMAT': <AttributeValue: DS_TIME_FORMAT=3>, 'OSP_MASYVO_STATUSAS': <AttributeValue: OSP_MASYVO_STATUSAS=A>}]
…
RuntimeError: 1 uncollected items

So it's normal for the DataMessage and DSD to be on the stack, but the AttributeValues should not be there. This was because the code did not collect them and attach them to their DataSet. #33 fixed this, so closing.

@khaeru khaeru closed this as completed Dec 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug data-source Issues related to specific web services/data source(s) enh Enhancements & new features
Projects
None yet
Development

No branches or pull requests

2 participants