Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pydanticgen] Embed extra metadata in modules, classes, and fields #2036

Merged
merged 12 commits into from
May 11, 2024

Conversation

sneakers-the-rat
Copy link
Collaborator

Fix: #2005

Related to:

I feel like this comes up a lot. with the new template system it was pretty easy to implement.

this PR adds all metadata that isn't explicitly excluded - either from being already represented by the template model, or by being present in their meta_exclude classvars - to a linkml_meta attribute in modules, classes, and fields.

opening this as a draft because i figure there is plenty of disagreement to be had about where to put them, what should be excluded, etc. but this general framework works.

Currently using json_schema_extra in linkml 2 to store it, because the metadata field isn't really for this, but we can also talk about where that should go - bonus of that is we get all the extra metadata in pydantic's generated json schema for free :)

but anyway here's an overview using personinfo.yaml as a sample:

Adds a LinkMLMeta class that is basically a subclass of dict:

class LinkMLMeta(RootModel):
    root: Dict[str, Any] = {}
    model_config = ConfigDict(frozen=True)

    def __getattr__(self, key:str):
        return getattr(self.root, key)

    def __getitem__(self, key:str):
        return self.root[key]

    def __setitem__(self, key:str, value):
        self.root[key] = value

then schema metadata looks like this:

linkml_meta = LinkMLMeta(
    {
        "default_curi_maps": ["semweb_context"],
        "default_prefix": "personinfo",
        "default_range": "string",
        "description": "Information about people, based on "
        "[schema.org](http://schema.org)",
        "emit_prefixes": ["rdf", "rdfs", "xsd", "skos"],
        "id": "https://w3id.org/linkml/examples/personinfo",
        "license": "https://creativecommons.org/publicdomain/zero/1.0/",
        "name": "personinfo",
        "prefixes": {
            "CODE": {
                "prefix_prefix": "CODE",
                "prefix_reference": "http://example.org/code/",
            },
           "...": "..."
            },
        },
        "source_file": "examples/PersonSchema/personinfo.yaml",
        "subsets": {
            "basic_subset": {
                "description": "A subset of the schema that "
                "handles basic information",
                "from_schema": "https://w3id.org/linkml/examples/personinfo",
                "name": "basic_subset",
            }
        },
    }
)

class metadata is like this:

class Person(HasAliases, NamedThing):
    """
    A person (alive, dead, undead, or fictional).
    """

    linkml_meta: ClassVar[LinkMLMeta] = LinkMLMeta(
        {
            "class_uri": "schema:Person",
            "from_schema": "https://w3id.org/linkml/examples/personinfo",
            "in_subset": ["basic_subset"],
            "mixins": ["HasAliases"],
            "slot_usage": {
                "age_in_years": {"name": "age_in_years", "recommended": True},
                "primary_email": {"name": "primary_email", "pattern": "^\\S+@[\\S+\\.]+\\S+"},
            },
        }
    )

attribute metadata is like this:

started_at_time: Optional[date] = Field(
    None,
    json_schema_extra={
        "linkml_meta": {
            "alias": "started_at_time",
            "domain_of": ["Event", "Relationship"],
            "slot_uri": "prov:startedAtTime",
        }
    },
)

Access to metadata is simple and uniform, even if the attribute version is a little verbose

# schema
module.linkml_meta
# class
Person.linkml_meta
# attribute
Person.model_fields['age_in_years'].json_schema_extra['linkml_meta']

we can do more sophisticated transformations of the embedded values like casting prefixes to a specific prefix class, filtering "alias" if it is identical to the name of the attribute, etc. too. That'll be easier once PRs like #2019 get merged and the rendering logic for each type of object is more separated - i'd prefer to wait on that so i can clean this up, i don't really like adding to the junk heap i made at the bottom of serialize() lol, but figured i was worth getting a draft on the books so we have something to point to when this comes up, as it often does.

no tests yet, wanted to wait for feedback before trying to finish it

Copy link

codecov bot commented Apr 2, 2024

Codecov Report

Attention: Patch coverage is 73.97260% with 19 lines in your changes are missing coverage. Please review.

Project coverage is 80.62%. Comparing base (7b0c00d) to head (424d75f).

Files Patch % Lines
linkml/generators/pydanticgen/pydanticgen.py 65.38% 14 Missing and 4 partials ⚠️
linkml/generators/pydanticgen/template.py 93.75% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2036      +/-   ##
==========================================
- Coverage   80.67%   80.62%   -0.05%     
==========================================
  Files         107      108       +1     
  Lines       11943    12011      +68     
  Branches     3415     3433      +18     
==========================================
+ Hits         9635     9684      +49     
- Misses       1743     1757      +14     
- Partials      565      570       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sneakers-the-rat sneakers-the-rat marked this pull request as ready for review April 2, 2024 02:42
@sneakers-the-rat
Copy link
Collaborator Author

alright, this is ready to check out - again i have held off adding tests for this until i get a better idea if this is the kind of thing we want, but if i get a nod i'll go ahead and add them

@cmungall
Copy link
Member

cmungall commented Apr 2, 2024

How about adding some command line options to control

  • whether metadata is included (off by default, to avoid surprises?)
  • the extent of the metadata
  • whether the base class is inlined in the module (default) or using a runtime import (future)

@sneakers-the-rat
Copy link
Collaborator Author

whether metadata is included (off by default, to avoid surprises?)
the extent of the metadata

this is already in there, but not as a cli option. will add!

whether the base class is inlined in the module (default) or using a runtime import (future)

want this in this PR or in one after we make the runtime import? :)

@cmungall
Copy link
Member

cmungall commented Apr 5, 2024

let's save the inlined vs runtime as a separate PR. Incremental is good! (as is preserving default behavior)

@markdoerr
Copy link

Hi @sneakers-the-rat and @cmungall,
could you please state somewhere in the documentation, which metadata is included in the generated pydantic output ?
I have, e.g. a linkml model with slots containing a "slot_uri" and these do, e.g., not appear in the pydantic (v2) output ( with or without --metauris flag set). Actually the output of gen-pydantic is exactly the same (with --metadata or with --no-metadata command line flag set) :( - any advice ? Thanks. I am currently using linkml 1.7.8

@sneakers-the-rat
Copy link
Collaborator Author

After this PR, all metadata will be (optionally) included.

Until then, all fields that are in the template models have some representation in the generated pydantic models https://linkml.io/linkml/generators/pydantic.html#templates

@markdoerr
Copy link

Thanks, @sneakers-the-rat,
for the fast reaction :)
If I understand you correctly, all of these metadata will be supported after the PR : CommonMetadata.
How much effort would it be, just to mention this liink/information in the documentation of the pydanticgen documentation / README ?
(I am a big fan of explicit information ;).

Looking forward to the merge - do you have a rough time estimate, when it is scheduled ?

One last question: will URIs (like slot_uri) also be transferred to the pydandic output ?

@sneakers-the-rat
Copy link
Collaborator Author

all of these metadata will be supported after the PR

Yes, all metadata. It will be configurable depending on if you want literally every field in the source schema, only those fields not represented by the template models, or no metadata.

See an example in the first post in this issue.

How much effort would it be, just to mention this liink/information in the documentation of the pydanticgen documentation / README ?

Behavior of the template classes is already documented. This PR will also be documented after we reach a final form for it.

do you have a rough time estimate, when it is scheduled ?

Im AFK until next week. Sometime after that. Still need to write tests and docs and decide implementation details

will URIs (like slot_uri) also be transferred to the pydandic output ?

All metadata

@markdoerr
Copy link

Thanks a lot, @sneakers-the-rat,
great enhancement :) - if you need someone for testing, please do not hesitate to contact me.

@markdoerr
Copy link

Hi @sneakers-the-rat ,
I tested your extension by a simple example and it does, what I need :)
Hope that it will find it's way soon into the next release. Thanks 👍

@djarecka
Copy link
Contributor

djarecka commented May 2, 2024

@sneakers-the-rat - thanks a lot for working on this! I was trying to test it, but I'm not completely sure how I can use template to ask for specific fields to be included, e.g. aliases

@sneakers-the-rat
Copy link
Collaborator Author

Aha, what I think we'll do is expose that as a param instead of having to fiddle with the template classes, since that's likely to come up a lot.

do you mean to override the default meta_exclude, or do you mean you want metadata inclusion to be "opt-in" and include only those fields you explicitly specify?

@cmungall cmungall merged commit b5313f9 into linkml:main May 11, 2024
17 checks passed
@sneakers-the-rat
Copy link
Collaborator Author

this was not quite done but i can follow on in another PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

adding metadata to pydantic fields
4 participants