Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pydanticgen: Inject imports, slots, and classes #1916

Merged
merged 3 commits into from
Feb 14, 2024

Conversation

sneakers-the-rat
Copy link
Collaborator

Prerequisite of: #1887

This is both a step towards cleaning up the pydantic generator and a prerequisite to implementing arrays.

There is one failing compliance test, but it is for the dataclass generator from code that i definitely didn't touch, so not sure what that's about. added tests pass, lmk if we need additional cases :)

Motivation

As we try and make more complex models in pydantic, we need to expose more of the generation machinery to allow for custom behavior. Particularly for arrays, we'll need to optionally include extra accessory classes to handle the special type checking needed for the arrays, and we want to have flexibility to inject source code or imports to make that work, depending on the preference of the person doing the generation. We could continue to add extra special cases in the jinja template or generator, but even better would be to have more standardized hooks for each of the different components of a generated pydantic file. This will be needed in any sort of eventual plugin system, where we might want to have a plugin define what extra classes it provides, what additional imports it depends on, etc.

Implementation

This PR does three things:

Inject Imports

Currently additional imports are handled as special cases - eg.

{% if uses_numpy -%}
import numpy as np
{%- endif %}

An additional field imports is added to the PydanticGenerator with the admittedly complex signature:

imports: Optional[Dict[str, Union[None, Dict[str, str], List[Union[str, Dict[str, str]]]]]] = None

This allows injecting four forms of imports:

  • Modules: {'typing': None}
  • Classes from modules: {'typing': ['Dict', 'List']}
  • Aliased modules: {'typing': {'as': 'tp'}
  • Aliased classes from modules: {'typing': [{'name': 'Dict', 'as': 'AliasedDict'}]}

So, for example:

imports = {
    'typing': ['Dict', 'List', 'Union'],
    'types': None,
    'numpy': {'as': 'np'},
    'collections': [{'name': 'OrderedDict', 'as': 'odict'}]
}

yields

from typing import (
    Dict,
    List,
    Union
)
import types
import numpy as np
from collections import (
    OrderedDict as odict
)

Caveats

This is still, to me, undesirably implicit, as it depends on passing a bunch of anonymous dicts and whatnot around, but I didn't want to go and add a bunch of TypedDicts and pydantic models in here since that isn't how the rest of the code is styled. I would be happy to do that though, since i really don't like the type of this field lol.

Another option would be to flatten the whole thing like:

imports = [
  {'type': 'module', 'name': 'typing'},
  {'type': 'module', 'name': 'numpy', 'as': 'np'},
  {'type': 'class', 'name': 'Dict', 'from': 'typing', 'as': 'DictRenamed'}
]

what that gains in potential clarity it sacrifices in ergonomics. In the future if we choose to replace the existing imports within the template with a more generalized import system, we'll need to write logic to merge sets of imports, and it's sort of an open question to me which version is better

When I merge the PR for arrays i'll also remove the existing numpy import and replace it with a result of the array generation process which will indicate whether or not numpy needs to be imported.

Inject Classes

An additional field is added to PydanticGenerator:

injected_classes: Optional[List[Union[Type, str]]] = None

So one is able to do:

class MyClass:
    hey = 'what up'

string_class = """
class StringClass:
    this_one_is = 'a string'
"""

gen = PydanticGenerator(injected_classes = [MyClass, string_class])

and have them show up in the generated module.

This allows a live class definition or a string for a class definition to be injected into the resulting module. This will be necessary for arrays, but is also a useful thing in general: eg. currently the BaseModel is defined in the jinja template, but that is undesirable since it's not possible to introspect it at test time, so it can't be tested independently.

Related to: https://github.com/orgs/linkml/discussions/1820 - we could have a LinkMLMeta class that contains all the extra properties that usually get dropped in the pydantic models. eg how i do this in nwb-linkml here:

https://github.com/p2p-ld/nwb-linkml/blob/4ee97263ed4338a0cf76c19c23553038f85a9eae/nwb_linkml/src/nwb_linkml/generators/pydantic.py#L63-L66

https://github.com/p2p-ld/nwb-linkml/blob/4ee97263ed4338a0cf76c19c23553038f85a9eae/nwb_linkml/src/nwb_linkml/generators/pydantic.py#L483-L495

https://github.com/p2p-ld/nwb-linkml/blob/4ee97263ed4338a0cf76c19c23553038f85a9eae/nwb_linkml/src/nwb_linkml/generators/pydantic.py#L674-L675

Caveats

This is handled separately within the template string generation function, rather than in the template itself, because inspecting for the source code requires access to the python context where the object is imported. We could alternatively add an additional method to the PydanticGenerator that stringifies classes, but since the build order within serialize is a little complex I wasn't quite sure where that should go a priori.

It's also a little delicate since in order for inspect to get the source code, the it has to be in a module with __file__, so eg. it can't be defined in an interactive session or other potential other weird cases, so that's why the ability to pass a pre-stringified class is present.

Inject Slots to BaseModel

This one is super simple. Sometimes you want to have a field present on all the generated classes - eg. for nwb-linkml i needed to add a field to store the hdf5 path for every class, but more generally this would be how we add the linkml_meta slot to every model without special-casing it.

A field is added to PydanticGenerator with signature:

injected_fields: Optional[List[str]] = None

where additional fields can be injected to the basemodel such that

gen = PydanticGenerator(injected_fields = (
            ['object_id: Optional[str] = Field(None, description="Unique UUID for each object")']
        )

results in

class ConfiguredBaseModel(BaseModel):
    model_config = ConfigDict(
        validate_assignment=True,
        validate_default=True,
        extra = 'forbid',
        arbitrary_types_allowed=True,
        use_enum_values = True)
        
    object_id: Optional[str] = Field(None, description="Unique UUID for each object")

This replaces the pass (but includes it if injected_fields = None

Future cleanup

Build stages with Results objects

Currently there is a relatively large amount of implicit behavior between the pydantic default template and the generator itself, so while I'm not touching it in this PR, this is a step towards one piece of that by making an explicit system to include imports - eg., in the future it might be nice to handle all imports this way, in particular by having discrete build stages for each of the classes/slots that yield a BuildResult that indicate what other classes and imports they depend on (this will be how arrays work).

Unification with dataclass generator

I didn't try and port this to the PythonGenerator since it has a different inheritance lineage, but it might be nice to make a single PythonGenerator metaclass/mixin for all the python generators (eg. incl SqlAlchemy), renaming this one DataclassGenerator, to contain all the logic that is special to python generators like this.

Modularize template

It would be nice to able to reuse common logic to make the generator more concise and less prone to errors. So for example, I couldn't reuse the logic to generate a model slot for injected_fields and thus could only accept a pre-formatted field string. It's also relatively difficult to tell the format of what should go into the jinja template - we lose all the structure of the ClassDefinition et al. classes and instead go into anonymous dictionary land.

It would be nice to instead

  • make use of jinja's macro and import system,
  • break up each part of the template, such that
  • each template module has an adjoining dataclass or pydantic model that defines the properties for that module.

@sneakers-the-rat
Copy link
Collaborator Author

PS would y'all rather if i raise PRs from my own fork or from this repo? i forgot to ask when i was visiting

Copy link

codecov bot commented Feb 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (bf736c2) 80.13% compared to head (a5f022e) 80.15%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1916      +/-   ##
==========================================
+ Coverage   80.13%   80.15%   +0.02%     
==========================================
  Files         100      100              
  Lines       11503    11517      +14     
  Branches     2903     2906       +3     
==========================================
+ Hits         9218     9232      +14     
  Misses       1732     1732              
  Partials      553      553              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@pkalita-lbl pkalita-lbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fantastic! Thanks for taking the time to write up the nice explanation of the changes and thinking about this as a foundation for future improvements. Just going to hold off on merging for now to give @kevinschaper a chance to weigh in.

Also, working on a branch here is totally fine; we're not strict about using forks.

@kevinschaper
Copy link
Contributor

This is my new lifetime favorite pull request.

I've wanted the "model" passed to the "view" (I can't help but think about the generator like it's an old timey MVC web app that it makes python instead of html) to be better structured for a long time and not had time to dig in and improve it, so I support any kind of stronger and more explicit typing, even (especially?) when it diverges from the current style.

I think the modularity of templates could really help break the generator code up into being more modular as well, I've nibbled away at the one big deeply nested loop, but not as much as I'd like.

Thank you so much for this PR, the suggested future directions, and hopefully future PRs as well!

@pkalita-lbl pkalita-lbl merged commit a182104 into main Feb 14, 2024
10 checks passed
@pkalita-lbl pkalita-lbl deleted the pydanticgen-dependency-injection branch February 14, 2024 16:26
@sneakers-the-rat
Copy link
Collaborator Author

Hell yes! I have been hesitant to start pulling upstream just bc I figured yall had a lot going on and didnt want to step on anyones toes, but if yall like this kind of thing and work with this kind of code velocity then theres a lot more where that came from ♥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants