Skip to content

Conversation

@hrshdhgd
Copy link
Contributor

The test for parsing oaei-ordo-hp.rdf was commented. This PR just uncomments it and wires is properly in test_cli.py to verify if it passes.

@hrshdhgd hrshdhgd requested review from cmungall and matentzn May 31, 2023 19:37
@hrshdhgd
Copy link
Contributor Author

@matentzn : ok, so I'm running :sssom parse tests/data/oaei-ordo-hp.rdf -o test.tsv --input-format alignment-api-xml

and the test.tsv begins

# curie_map:
#   HP: http://purl.obolibrary.org/obo/HP_
#   orphanet.ordo: http://www.orpha.net/ORDO/Orphanet_
#   owl: http://www.w3.org/2002/07/owl#
#   rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
#   rdfs: http://www.w3.org/2000/01/rdf-schema#
#   semapv: https://w3id.org/semapv/
#   skos: http://www.w3.org/2004/02/skos/core#
#   sssom: https://w3id.org/sssom/
# license: https://w3id.org/sssom/license/unspecified
# mapping_set_id: https://w3id.org/sssom/mappings/105b111d-1f45-44e3-bb6d-8c33670537b7
# object_source: http://purl.obolibrary.org/hp.owl
# subject_source: http://purl.obolibrary.org/ordo.owl
subject_id	predicate_id	object_id	mapping_justification	confidence
orphanet.ordo:100047	owl:equivalentClass	HP:0100681	semapv:UnspecifiedMatching	0.27449
orphanet.ordo:100069	owl:equivalentClass	HP:0030219	semapv:UnspecifiedMatching	0.875
orphanet.ordo:100088	owl:equivalentClass	HP:0002890	semapv:UnspecifiedMatching	0.875

Is the object_source value incorrect? The reason I ask is test_conversion.py is throwing an error stating that the value is incorrect.

sssom/parsers.py Outdated
SUBJECT_SOURCE
] = (
e.firstChild.nodeValue
) # CURIEfy this node value && If endswith(".extension"), replace it with _)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your plan here re curification if an URI prefix is not in the prefix map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have to either brute-force the CURIE or think if this info is secured via a curie_map in one of the yaml files?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for leaving the URIs as is if the URI prefix is not known to the prefix map. We should produce a single warning at the end of the pipeline which lists the first 5 URIs that didn't have a proper CURIE, and log a warning "the extracted file is not valid SSSOM because some of the URIs could not be converted. Please provide appropriate URI prefixes and repeat the process".

Charlie prefers to not process the node at all if the URI prefix is unknown; this is an alternative, but only if we can overwrite it with a flag, and then we are back to the first point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the problem. The sssom definition for subject_source and object_source is EntityReference. Does this mean only CURIEs? uriOrCurie might be a better solution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you don't want it to be a URI at all.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a URI is a curie, where the prefix is http and the URI expansion http so everwhere where there is an entity reference, you should be able to drop a URI.

Its fine to write a URI, just also write the warning I mention above.

Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is missing in this PR?

@hrshdhgd
Copy link
Contributor Author

hrshdhgd commented Jun 14, 2023

What is missing in this PR?

Just the curiefication

@hrshdhgd hrshdhgd requested review from matentzn and removed request for cmungall June 16, 2023 15:21
Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, two minor comments!

@hrshdhgd hrshdhgd merged commit 6bfb4d3 into master Jun 19, 2023
@hrshdhgd hrshdhgd deleted the h2-patch3 branch June 19, 2023 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants