Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Begin using curies.Converter in more places #397

Merged
merged 42 commits into from
Jul 27, 2023
Merged

Conversation

cthoyt
Copy link
Member

@cthoyt cthoyt commented Jul 22, 2023

Part of #363

This PR does the following:

  1. Adds a minimum version of curies that has the strict compress and expand functions
  2. Rewrites the SPARQL utils and RDF utils to use curies functionality
  3. Updates custom curie_from_uri to use curies (will make a follow-up PR that replaces this completely)

matentzn
matentzn previously approved these changes Jul 24, 2023
Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome!! THANK YOU!

@hrshdhgd I will leave it to you to deal with the PR - I left 1/2 comments here and there to check, but nothing critical.

src/sssom/cli.py Show resolved Hide resolved
src/sssom/sparql_util.py Show resolved Hide resolved
src/sssom/sparql_util.py Show resolved Hide resolved
src/sssom/util.py Outdated Show resolved Hide resolved
src/sssom/util.py Outdated Show resolved Hide resolved
src/sssom/util.py Outdated Show resolved Hide resolved
@cthoyt
Copy link
Member Author

cthoyt commented Jul 24, 2023

@matentzn @hrshdhgd it appears the issues in the build now are due to poetry. I will again give a gentle nudge to totally drop poetry, it is way more of an issue than it helps

@hrshdhgd
Copy link
Contributor

hrshdhgd commented Jul 24, 2023

I ran this PR locally and there are 4 errors as of now. First one is a CLI error which will be fixed automatically once the following are fixed.

Three pytest errors occurring as of now:

FAILED tests/test_parsers.py::TestParse::test_parse_alignment_minidom - curies.api.DuplicateURIPrefixes: Duplicate URI prefixes:
FAILED tests/test_parsers.py::TestParse::test_parse_obographs - curies.api.DuplicateURIPrefixes: Duplicate URI prefixes:
FAILED tests/test_parsers.py::TestParse::test_parse_sssom_rdf - ValueError: CURIE appeared where there should be a URI, and could not be standardized: oio:hasBroadSynonym

From what I can tell the first two errors are thrown by curies/api.py.

The third error is because the prefix oio: is not in the prefix_map.

@hrshdhgd
Copy link
Contributor

it appears the issues in the build now are due to poetry. I will again give a gentle nudge to totally drop poetry, it is way more of an issue than it helps

I agree but it seems to me that only GitHub environments seem to be problematic. Locally, it seems to work fine. Plus Chris insists all our projects following a standard architecture, hence my hands are sort of tied 🙂 .

@cthoyt
Copy link
Member Author

cthoyt commented Jul 24, 2023

@hrshdhgd thanks for following up on all of this. can you take care of looking into these tests that have inconsistent data? Great to see that we are implicitly now checking data integrity using curies!

src/sssom/context.py Outdated Show resolved Hide resolved
src/sssom/sparql_util.py Outdated Show resolved Hide resolved
src/sssom/context.py Outdated Show resolved Hide resolved
tests/test_parsers.py Outdated Show resolved Hide resolved
# f"{self.obographs_file} has the wrong number of mappings.",
# )

@unittest.skip(reason="Not sure what is broken in this graph")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but no; what exactly is going on here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking into it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've figured out what this was testing and written a more detailed explanation of what it was doing before and why we don't need it anymore.

The other option is we can make the OBO parser much less lenient, and then we will be throwing exceptions left and right (not recommended)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so yes, I have now added a detailed explanation of why this test is not needed. We should keep it skipped, or even better, remove it.

cthoyt added a commit to mapping-commons/sssom that referenced this pull request Jul 25, 2023
There are issues in mapping-commons/sssom-py#397
because the base schema does not have a valid prefix map - `dc` and
`dcterms` are both used as prefixes for the same URI prefix. This
happens because LinkML makes some inference about what needs to be there
- there are some assorted uses throughout the schema definition with
both DC and DCTERMS. This PR updates the schema to only use DCTERMS and
puts explicit entries for DCTERMS in the prefix map to address this. It
then regenerates the whole project.
.gitignore Outdated Show resolved Hide resolved
@cthoyt
Copy link
Member Author

cthoyt commented Jul 25, 2023

@hrshdhgd it appears this PR is passing, even with the Poetry changes removed.

@hrshdhgd
Copy link
Contributor

hrshdhgd commented Jul 25, 2023

@hrshdhgd it appears this PR is passing, even with the Poetry changes removed.

Yes , I had a quick solution that worked! It won't sustain for future PRs if we're using the latest versions of poetry.

@hrshdhgd
Copy link
Contributor

hrshdhgd commented Jul 25, 2023

I merged #399 to main. Now if you merge the main branch down to this one, all should be well. I'll let you do it since you're actively working on this branch. This will eliminate the conflict.

@cthoyt cthoyt requested a review from matentzn July 26, 2023 15:04
Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have convinced myself that deleting that broken obographs test was the right thing to do! Thanks for your work on this, I am happy now; the one comment is purely an FYI, the behaviour is desireable.

@@ -117,21 +118,10 @@ def test_parse_obographs(self):
write_table(msdf, file)
self.assertEqual(
len(msdf.df),
9881,
8099,
Copy link
Collaborator

@matentzn matentzn Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cthoyt I reinstated this test. Analysing the warnings, I noticed that the 1800 less parsed entities (which is good, since they are not in the prefix map) are due to biopragmatics/bioregistry#917.

I think this is entirely independent of this PR, so I will approve this, this is just FYI.

@matentzn
Copy link
Collaborator

@hrshdhgd ready for your review and merge, I am happy with it!

Copy link
Contributor

@hrshdhgd hrshdhgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much cleaner than before! Thanks @cthoyt !

@hrshdhgd hrshdhgd merged commit f3be757 into master Jul 27, 2023
6 checks passed
@hrshdhgd hrshdhgd deleted the improve-sparql-util branch July 27, 2023 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants