Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for CurieNamespace with embedded catalog. #244

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hsolbrig
Copy link
Contributor

@hsolbrig hsolbrig commented Feb 2, 2023

It is my understanding that we are supposed to use the curies package, but it isn't clear how one would add maps incrementally (see: tests/test_utils/test_curienamespace.py#113 as an example). Any solution that passes test_curienamespace.py should do.

It is my understanding that we are supposed to use the curies package, but it isn't clear how one
would add maps incrementally (see: tests/test_utils/test_curienamespace.py#113 as an example). Any
solution that passes test_curienamespace.py should do.
@kervel
Copy link

kervel commented Feb 4, 2023

i guess the test fails because the curienamespace catalog was already populated from test_enum which ran before this test...

@kervel
Copy link

kervel commented Feb 8, 2023

Wouild it be an option to make the catalog an instance variable instead of a class variable ? if we want to scope namespaces somehow that will be easier ? And it doesn't require passing a curieNamespaces object to each and every function call because we can still have a module-level defined singleton in generated code ?

Other way to fix it is some pre-test code that cleans up the catalog

@cmungall
Copy link
Member

comments from @kervel on slack:

fails tests because of global prefix catalog in the curienamespaces class. If we would change the order of the tests it would probably pass, but i think moving the catalog out of a class variable so that it becomes scoped sounds like a more robust solution. I cannot estimate the impact this would have on other uses of the prefix catalog

I think I agree but I also think I am lacking a bit of context here. It would be helpful to state here or in the initial comment what the motivation for this PR is, and a high level description of what it seeks to achieve.

@kervel
Copy link

kervel commented Feb 23, 2023

i'm trying to state the motivation here:

the idea is that the python data classes, both when used to generate json and to parse json, should be able to interpret the type designator URI/CURIES in the most broad/flexible way possible. This is especially true if the range of a type designator field is "uriorcurie".

There are several uri's to designate a type: one has both the native and the assigned (via class_uri uri's for a class), which should yield the same behaviour when used as a type designator. And then both of them can be shortened to curies using any of the prefixes in the prefix map. If a user defines multiple prefixes to the same uri namespaces (for instance because he includes yaml written by someboyd else), he would expect all known prefixes to just work.

so

The idea is for the python data classes to use the metadata already available to do this translation, rather than (like we do in pydantic) just having an exhaustive list of accepted uri's / curies as string values (which makes sense for pydantic because there we don't even depend on linkml-runtime)

so this PR adds functions that translate uries to curies and back using metadata that is in linkml runtime curie namespaces, an object which is already available in the python dataclasses right now.

@kervel
Copy link

kervel commented Feb 27, 2023

hello, the reason this PR doesn't use the urieconverter from https://github.com/cthoyt/curies is laid out in the first comment here. So i created an issue there biopragmatics/curies#33 to see if that can be fixed (author replied in the mean time)

@hsolbrig biopragmatics/curies#34 . is the following okay ?

  • factor out a separate class CurieNamespaceCatalog that manages the catalog as an instance variable ?
  • make this class a thin wrapper around the curies package ? Do we even need a wrapper ?

then we could do

catalog = CurieNamespaceCatalog()
LINKML = CurieNamespace('linkml', 'https://w3id.org/linkml/', catalog=catalog)
SHEX = CurieNamespace('shex', 'http://www.w3.org/ns/shex#', catalog=catalog)
XSD = CurieNamespace('xsd', 'http://www.w3.org/2001/XMLSchema#', catalog=catalog)
DEFAULT_ = CurieNamespace('', 'http://example.org/', catalog=catalog)

or alternatively

catalog = CurieNamespaceCatalog()
LINKML = catalog.namespace('linkml', 'https://w3id.org/linkml/')
SHEX = catalog.namespace('shex', 'http://www.w3.org/ns/shex#')
XSD = catalog.namespace('xsd', 'http://www.w3.org/2001/XMLSchema#')
DEFAULT_ = catalog.namespace('', 'http://example.org/')

@cthoyt
Copy link
Contributor

cthoyt commented Feb 27, 2023

@kervel @hsolbrig the new curies v0.4.3 with incremental construction has been released to PyPI. You can upgrade and use something like the following:

import curies

converter = curies.Converter(records=[])
converter.add_prefix("hgnc", "https://bioregistry.io/hgnc:")

@kervel
Copy link

kervel commented Mar 1, 2023

hello, the test done by @hsolbrig still fails when trying to convert to the new curies v0.4.3:

        # Test incremental add
        CurieNamespace('tester', URIRef('http://fester.bester/tester#')).addTo(converter)
        self.assertEqual('tester:hip_boots', namespaceCatalog.to_curie('http://fester.bester/tester#hip_boots'))

        # Test multiple prefixes for same suffix
        CurieNamespace('ns17', URIRef('http://fester.bester/tester#')).addTo(converter)

here, ns17 is actually a synonym. so curies refuses to take it as a new prefix. off course this could be fixed so that curies automatically converts this to a synonym for an existing Record, but that would complicate things.

this would also fail (a bit further down the same test)

        # Test multiple uris for same prefix
        # The following should be benign
        CurieNamespace('tester', URIRef('http://fester.bester/tester#')).addTo(converter)

        # Issue warnings for now on this
        # TODO: test that we log the following
        #       'Prefix: tester already references http://fester.bester/tester# -
        #       not updated to http://fester.notsogood/tester#'
        CurieNamespace('tester', URIRef('http://fester.notsogood/tester#')).addTo(converter)

@cthoyt
Copy link
Contributor

cthoyt commented Mar 1, 2023

It's possible I could implement something like add_prefix_synonym and add_uri_prefix_synonym to curies.Converter if that would be a good stop-gap before making some more fundamental improvements to the way you're doing bookkeeping

@kervel
Copy link

kervel commented Mar 1, 2023

@cthoyt i'd rather first get some feedback from the people that know this code base better than i do. the originally proposed api in linkml-runtime doesn't even make a distinction between a synonym and a new prefix (as can be seen from the tests).

@kervel
Copy link

kervel commented Mar 1, 2023

i created a working proposal here #251 ... i (for now) now bypass the incremental building. code is way too complicated: i need to detect synonyms and i can have transitive equality if you both have prefix synonyms and uri synonyms

not feeling very confident on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants