Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate URI prefixes error during sssom parse #408

Closed
ehartley opened this issue Aug 8, 2023 · 1 comment · Fixed by #409
Closed

Duplicate URI prefixes error during sssom parse #408

ehartley opened this issue Aug 8, 2023 · 1 comment · Fixed by #409
Assignees
Labels
bug Something isn't working priority

Comments

@ehartley
Copy link

ehartley commented Aug 8, 2023

sssom parse gives a Duplicate URI prefixes error when there are overlapping uri expansions with different prefixes between an input metadata.yml file and the default prefixes when using the -C merged option.

For both version 0.3.36 and 0.3.39, running:

sssom parse -I obographs-json -m config/metadata.yml -C merged -F IAO:0100001 -F oboInOwl:consider tmp/obsolete.json -o reports/obsolete.sssom.tsv

produces this error:

Traceback (most recent call last):
  File "/usr/local/bin/sssom", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sssom/cli.py", line 215, in parse
    parse_file(
  File "/usr/local/lib/python3.10/dist-packages/sssom/io.py", line 93, in parse_file
    mapping_predicates = get_list_of_predicate_iri(
  File "/usr/local/lib/python3.10/dist-packages/sssom/io.py", line 192, in get_list_of_predicate_iri
    p_iri = extract_iri(p, prefix_map)
  File "/usr/local/lib/python3.10/dist-packages/sssom/io.py", line 210, in extract_iri
    converter = Converter.from_prefix_map(prefix_map)
  File "/usr/local/lib/python3.10/dist-packages/curies/api.py", line 551, in from_prefix_map
    return cls(
  File "/usr/local/lib/python3.10/dist-packages/curies/api.py", line 333, in __init__
    raise DuplicateURIPrefixes(duplicate_uri_prefixes)
curies.api.DuplicateURIPrefixes: Duplicate URI prefixes:

http://id.nlm.nih.gov/mesh/:
        prefix='MESH' uri_prefix='http://id.nlm.nih.gov/mesh/' prefix_synonyms=[] uri_prefix_synonyms=[]
        prefix='mesh' uri_prefix='http://id.nlm.nih.gov/mesh/' prefix_synonyms=[] uri_prefix_synonyms=[]

http://uri.neuinfo.org/nif/nifstd/nlx_subcell_:
        prefix='NIF_Subcellular' uri_prefix='http://uri.neuinfo.org/nif/nifstd/nlx_subcell_' prefix_synonyms=[] uri_prefix_synonyms=[]
        prefix='nlx.sub' uri_prefix='http://uri.neuinfo.org/nif/nifstd/nlx_subcell_' prefix_synonyms=[] uri_prefix_synonyms=[]

http://purl.obolibrary.org/obo/EHDAA2_:
        prefix='RETIRED_EHDAA2' uri_prefix='http://purl.obolibrary.org/obo/EHDAA2_' prefix_synonyms=[] uri_prefix_synonyms=[]
        prefix='EHDAA2' uri_prefix='http://purl.obolibrary.org/obo/EHDAA2_' prefix_synonyms=[] uri_prefix_synonyms=[]

http://uri.neuinfo.org/nif/nifstd/nlx_anat_:
        prefix='NLXANAT' uri_prefix='http://uri.neuinfo.org/nif/nifstd/nlx_anat_' prefix_synonyms=[] uri_prefix_synonyms=[]
        prefix='nlx.anat' uri_prefix='http://uri.neuinfo.org/nif/nifstd/nlx_anat_' prefix_synonyms=[] uri_prefix_synonyms=[]

http://uri.neuinfo.org/nif/nifstd/birnlex_:
        prefix='BIRNLEX' uri_prefix='http://uri.neuinfo.org/nif/nifstd/birnlex_' prefix_synonyms=[] uri_prefix_synonyms=[]
        prefix='birnlex' uri_prefix='http://uri.neuinfo.org/nif/nifstd/birnlex_' prefix_synonyms=[] uri_prefix_synonyms=[]

http://purl.obolibrary.org/obo/NCIT_:
        prefix='ncithesaurus' uri_prefix='http://purl.obolibrary.org/obo/NCIT_' prefix_synonyms=[] uri_prefix_synonyms=[]
        prefix='NCIT' uri_prefix='http://purl.obolibrary.org/obo/NCIT_' prefix_synonyms=[] uri_prefix_synonyms=[]

The metatdata.yml file defined the MESH, NIF_Subcellular, RETIRED_EHDAA2, NLXANAT, BIRNLEX, and ncithesaurus prefixes. However, the obsolete.json file being parsed uses both the RETIRED_EHDAA2 and EHDAA2 prefixes and both the ncithesaurus and NCIT prefixes.

So, the desired functionality would be to allow multiple prefixes with the same URI. It might be helpful to get a notification about the duplicate URI prefixes, but they shouldn't cause sssom parse to fail.

This issue is related to #269.

@matentzn
Copy link
Collaborator

matentzn commented Aug 9, 2023

This is a high priority issue to sort out.

The user-supplied metadata always entirely trumps whatever the Bioregistry EPM says. So what needs to happen:

  • Ensure that if in metadata-only mode, the converter only uses metadata (I think this is the case)
  • Ensure that if in merged mode, where some prefixes are supplied by the user and the rest is obtained from the curies converter, that
    1. the user-supplied prefixes trump the converter prefixes
    2. the user-supplied uri-prefixes trump the converter uri-prefixes (this is what I think is going wrong here)

Technically speaking, if the user supplies a pair prefix: http://uri.pre.fix/ then

  1. all mentions of the prefix need to be removed from the context prior to constructing the converter
  2. all mentions of http://uri.pre.fix/ need to be removed from the context prior to constructing the converter

@matentzn matentzn added bug Something isn't working priority labels Aug 9, 2023
hrshdhgd added a commit that referenced this issue Aug 11, 2023
- [x] Fixes #408 
- [x] Fixes #269 

This way, when a `prefix_map` has duplicate `uri_prefix` or `prefix`
from the user, `curies` will not throw an error. `sssom-py` already
gives priority to a user-defined prefix map over the default one (which
now is EPM from bioregistry).

---------

Co-authored-by: Nico Matentzoglu <nicolas.matentzoglu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants