-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong reference resolution if there are few dependent artefacts with the same ID but different agencies or version #164
Comments
steps to reproduce: from io import BytesIO
from sdmx.reader.xml.v21 import Reader
def ignore_none_tag():
"""
xml reader fails with NotImplemented when faces <None/> tag
"""
Reader.parser["None", "start"] = None
Reader.parser["None", "end"] = None
def read_xml(file):
with open(file, "rb") as f:
content = f.read()
response_io = BytesIO(content)
return Reader().read_message(response_io)
def main():
ignore_none_tag()
message = read_xml("IMF_STA_DSD_GFS(4.0.1).xml")
# print core_representation for COUNTRY dimension
print(message.structure["DSD_GFS"].dimensions.components[0].concept_identity.core_representation)
if __name__ == "__main__":
main() |
|
Thanks for the report, including a test specimen and code. I will try to reproduce and fix. I recall there was an earlier bug (#116) that the SDMX-ML reader would return a StructureMessage that only had a subset of artifacts, when some had the same ID but different maintainers and/or versions. That was addressed in #124. The issue you describe is similar, but distinct: it's about the proper/expected association between artifacts within the message when some have the same ID. One point of information that could help: is there a particular query (I guess against IMF) that returns this set of artefacts that you include in your ZIP file? This isn't essential, but would help us ensure any fix is durable. |
As a note to self: this might be resolved by adjusting Reader.pop_resolved_ref() / Reader.get_single() to operate by URN rather than a combination of id and/or version. Since every (artifact class, maintainer ID, artifact ID, version) is will have a URN that is (by definition) unique, this could disambiguate and even simplify some of the existing code. |
I think the latter. Here we can benefit from the very useful import sdmx
sdmx.install_schemas()
sdmx.validate_xml("IMF_STA_DSD_GFS(4.0.1).xml") After pruning away content that triggers other, unrelated errors, I see:
This indicates that, per the XML Schema Documents, the tag should be We already have a parser for the first of these: Lines 1521 to 1523 in fa936b2
…and I find that when I replace all instances of the invalid tag, then sdmx.read_xml() parses the file without raising any exception.
I'll proceed to investigate the main issue here with this modified file. But it would still be helpful if you could say where this specimen is coming from. This would indicate whether/where we can report the invalid SDMX-ML to the data provider. |
- Manually reduced while still reproducing the reported issue. - Correct invalid <None/> to <str:None/>.
Here is SDMX query: However, data provider is hidden by a private network so don't I think it is super useful, at least for now. The platform supports a bunch of features that you might be interested in. Such as SDMX 3.0 Hierarchy support, SDMX 2.1 HierarchicalCodelist support, version wildcards, and a lot of other 3.0 stuff. So, I will send you a message as soon as the platform is exposed to the public. |
cc: @FedorYatsenko |
Okay, thanks for this information. I also did not know Fedor was your colleague. From the sounds of it, I guess you are working with the IMF statistics division. It is definitely useful to have "real-world" examples of different kinds of SDMX applications, including:
This is because there is no real "reference implementation" of SDMX, and the samples published with the standards are not really comprehensive—there is a lot of possible usage for which no official example or test suite exists. At the same time, the primary goal for this sdmx1 package is to faithfully implement the standards, rather than align with quirks of other implementations (server or client software). So what's most valuable is cases where I can say confidently, "Aha, I can take this REST API or this provider's XML/JSON as a clear example of what the standards described, and test |
We aren't from IMF organization. We are working with a platform which consumes, stores, and provides statistical data according to SDMX and also provides analytic tools over it. The provided example is a public IMF dataflow(and related DSD) -- but the platform is generic. For analytics purposes, we also working on the SDMX client. We have our implementation of SDMX Infomodel in python -- we even thinking of releasing it to open source as well. We also have sdmx-to-pandas and pandas-to-sdmx converters -- which is outside of standard but super useful. Our colleagues from the parallel team have already open-sourced SDMX Infomodel Java implementation.
What I mean by that message is the platform I described might be that "real-world" example. Especially from SDMX 3.0 perspective. Because I didn't find any public DataSource in I perfectly understand that the scope of this lib is only SDMX standard and we don't encourage you to implement anything outside of it. Our overall goal is pretty much the same. |
Great stuff, thanks for letting me know the context. |
Hi @khaeru
I used sdmx1 lib to parse xml file(attached) and discovered the following issue.
IMF_STA_DSD_GFS(4.0.1).xml.zip
XmlReader saves into stack artefacts based on resource ID, however resource ID is not unique(agency+reousrceId+version is unique)
We have a few ConceptSchemes with the same ID but a different agency or version -- in this case
IMF_STA:CS_MASTER(1.0.1)
forSTATUS
attribute andIMF:CS_MASTER(4.0.0)
forCOUNTRY
dimension.Concept which references one of those ConceptSchemes can't be resolved correctly and None appears in core_representation
Here is
![image](https://private-user-images.githubusercontent.com/21194534/301875546-73874546-5fde-45b4-9056-028e78b4ecc7.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE1NTk2MTgsIm5iZiI6MTcyMTU1OTMxOCwicGF0aCI6Ii8yMTE5NDUzNC8zMDE4NzU1NDYtNzM4NzQ1NDYtNWZkZS00NWI0LTkwNTYtMDI4ZTc4YjRlY2M3LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzIxVDEwNTUxOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWE2YWNmMWZmZTU5YTlhZTEzMDg4N2NiODMxMDYxZmFlOGUxMzhiOGEwYzY1MGRiNDMzZTcyZGYwMjNkODZjNWImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.G-TbqfOy0DbZiITBuoNO1h38OXG2rxedE5tlyC_xwu8)
stack
state whenCOUNTRY
dimension tries to resolve reference onCS_MASTER
concept schemeit contains dict with few
CS_MASTER
ConceptSchemes but is not possible to access them by idIs this a known issue?
I think the problem might be much more complex in the same case with SDMX 3.0 where version might have a wildcard.
The text was updated successfully, but these errors were encountered: