-
-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with pycsw mapping ISO-DIF #657
Comments
I will be working full-time on pycsw during the sprint. I am focusing in understand and possibly improve the mapping from ISO to other Profiles like DIF. I opened a related issue at: geopython/pycsw#657
Regarding the last part of the issue, the one related to the keywords issue - to distinguish between keywords in ISO with and without a thesaurus_name, will it make sense to have a column (which can be empty) to sp[ecify the 'dialect'/'flavour' of the ISO record ... in my case GCMD? -- then try to add some logic in the core code to distinguish between keywords with/without a thesaurs_name .. which will affect the transformation into a specific output profile? |
I may have found a little hack to tune the output the way I needed, by modifying 'dif.py': # keywords
val = util.getqattr(result, context.md_core_model['mappings']['pycsw:Keywords'])
if val:
for kw in val.split(','):
if len(kw.split(">")) >= 2:
values = kw.split(">")
parameters = etree.SubElement(node, util.nspath_eval('dif:Parameters', NAMESPACES)) # .text = kw
etree.SubElement(parameters, util.nspath_eval('dif:Category', NAMESPACES)).text = values[0]
etree.SubElement(parameters, util.nspath_eval('dif:Topic', NAMESPACES)).text = values[1]
etree.SubElement(parameters, util.nspath_eval('dif:Term', NAMESPACES)).text = values[2]
for i,v in enumerate(values[3:]):
etree.SubElement(parameters, util.nspath_eval(f'dif:Variable_Level_{i+1}', NAMESPACES)).text = v
else:
etree.SubElement(node, util.nspath_eval('dif:Keywords', NAMESPACES)).text = kw Note, this will work only for my specific case where I am sure the The code above will return: <dif:Parameters>
<dif:Category>Earth Science</dif:Category>
<dif:Topic>Atmosphere</dif:Topic>
<dif:Term>Atmospheric radiation</dif:Term>
<dif:Variable_Level_1>Reflectance</dif:Variable_Level_1>
</dif:Parameters> From a <gmd:keyword>
<gco:CharacterString>
EARTH SCIENCE > Atmosphere > Atmospheric Winds > Surface Winds
</gco:CharacterString>
</gmd:keyword> |
@epifanio is this still an issue? |
Description
Problem: mapping of ISO records to DIF (using GCMD DIF type/subtype vocabulary).
Given an ISO-compliant metadata Record, I encountered some issues in the mapping to DIF at different levels. Listing two examples:
Environment
Steps to Reproduce
Indexing the following ISO Record:
Results in the following DIF profile
The DIF output doesn't match the information available in the original ISO source.
Data Access
Currently the protocols are just the same as the ISO records.
Current DIF output
Expected DIF9.7 output
Dataset landing page
Current ISO output
As Related_URL using type
DATASET LANDING PAGE
.Expected DIF output
Current DIF output:
Expected DIF output
Additional Information
There are other issues related to how the ISO keywords are mapped to DIF in particular the GCMD Science Keywords.
in ISO we have:
see reference ISO
This in mapped from
apiso:Subject
intocsw:Keywords
which is then mapped todif:Keyword
in dif.pyIn principle it should be mapped instead into DIF 9 As Parameters (with subelement) when the thesauri name is GCMD and Keyword (string) for any other thesauri name.
As this is too complicated I would try to get only the GCMD thesauri, thus I need to map all ISO entries to Parameter in this structure:
See http://metadata.nersc.no/oai?verb=ListRecords&metadataPrefix=dif for example
The text was updated successfully, but these errors were encountered: