-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pydanticgen: Inject imports, slots, and classes #1916
Conversation
PS would y'all rather if i raise PRs from my own fork or from this repo? i forgot to ask when i was visiting |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1916 +/- ##
==========================================
+ Coverage 80.13% 80.15% +0.02%
==========================================
Files 100 100
Lines 11503 11517 +14
Branches 2903 2906 +3
==========================================
+ Hits 9218 9232 +14
Misses 1732 1732
Partials 553 553 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fantastic! Thanks for taking the time to write up the nice explanation of the changes and thinking about this as a foundation for future improvements. Just going to hold off on merging for now to give @kevinschaper a chance to weigh in.
Also, working on a branch here is totally fine; we're not strict about using forks.
This is my new lifetime favorite pull request. I've wanted the "model" passed to the "view" (I can't help but think about the generator like it's an old timey MVC web app that it makes python instead of html) to be better structured for a long time and not had time to dig in and improve it, so I support any kind of stronger and more explicit typing, even (especially?) when it diverges from the current style. I think the modularity of templates could really help break the generator code up into being more modular as well, I've nibbled away at the one big deeply nested loop, but not as much as I'd like. Thank you so much for this PR, the suggested future directions, and hopefully future PRs as well! |
Hell yes! I have been hesitant to start pulling upstream just bc I figured yall had a lot going on and didnt want to step on anyones toes, but if yall like this kind of thing and work with this kind of code velocity then theres a lot more where that came from ♥ |
Prerequisite of: #1887
This is both a step towards cleaning up the pydantic generator and a prerequisite to implementing arrays.
There is one failing compliance test, but it is for the dataclass generator from code that i definitely didn't touch, so not sure what that's about. added tests pass, lmk if we need additional cases :)
Motivation
As we try and make more complex models in pydantic, we need to expose more of the generation machinery to allow for custom behavior. Particularly for arrays, we'll need to optionally include extra accessory classes to handle the special type checking needed for the arrays, and we want to have flexibility to inject source code or imports to make that work, depending on the preference of the person doing the generation. We could continue to add extra special cases in the jinja template or generator, but even better would be to have more standardized hooks for each of the different components of a generated pydantic file. This will be needed in any sort of eventual plugin system, where we might want to have a plugin define what extra classes it provides, what additional imports it depends on, etc.
Implementation
This PR does three things:
Inject Imports
Currently additional imports are handled as special cases - eg.
linkml/linkml/generators/pydanticgen.py
Lines 46 to 48 in 55cc87f
An additional field
imports
is added to thePydanticGenerator
with the admittedly complex signature:This allows injecting four forms of imports:
{'typing': None}
{'typing': ['Dict', 'List']}
{'typing': {'as': 'tp'}
{'typing': [{'name': 'Dict', 'as': 'AliasedDict'}]}
So, for example:
yields
Caveats
This is still, to me, undesirably implicit, as it depends on passing a bunch of anonymous dicts and whatnot around, but I didn't want to go and add a bunch of
TypedDict
s and pydantic models in here since that isn't how the rest of the code is styled. I would be happy to do that though, since i really don't like the type of this field lol.Another option would be to flatten the whole thing like:
what that gains in potential clarity it sacrifices in ergonomics. In the future if we choose to replace the existing imports within the template with a more generalized import system, we'll need to write logic to merge sets of imports, and it's sort of an open question to me which version is better
When I merge the PR for arrays i'll also remove the existing numpy import and replace it with a result of the array generation process which will indicate whether or not numpy needs to be imported.
Inject Classes
An additional field is added to
PydanticGenerator
:So one is able to do:
and have them show up in the generated module.
This allows a live class definition or a string for a class definition to be injected into the resulting module. This will be necessary for arrays, but is also a useful thing in general: eg. currently the BaseModel is defined in the jinja template, but that is undesirable since it's not possible to introspect it at test time, so it can't be tested independently.
Related to: https://github.com/orgs/linkml/discussions/1820 - we could have a
LinkMLMeta
class that contains all the extra properties that usually get dropped in the pydantic models. eg how i do this innwb-linkml
here:https://github.com/p2p-ld/nwb-linkml/blob/4ee97263ed4338a0cf76c19c23553038f85a9eae/nwb_linkml/src/nwb_linkml/generators/pydantic.py#L63-L66
https://github.com/p2p-ld/nwb-linkml/blob/4ee97263ed4338a0cf76c19c23553038f85a9eae/nwb_linkml/src/nwb_linkml/generators/pydantic.py#L483-L495
https://github.com/p2p-ld/nwb-linkml/blob/4ee97263ed4338a0cf76c19c23553038f85a9eae/nwb_linkml/src/nwb_linkml/generators/pydantic.py#L674-L675
Caveats
This is handled separately within the template string generation function, rather than in the template itself, because inspecting for the source code requires access to the python context where the object is imported. We could alternatively add an additional method to the
PydanticGenerator
that stringifies classes, but since the build order withinserialize
is a little complex I wasn't quite sure where that should go a priori.It's also a little delicate since in order for
inspect
to get the source code, the it has to be in a module with__file__
, so eg. it can't be defined in an interactive session or other potential other weird cases, so that's why the ability to pass a pre-stringified class is present.Inject Slots to BaseModel
This one is super simple. Sometimes you want to have a field present on all the generated classes - eg. for
nwb-linkml
i needed to add a field to store the hdf5 path for every class, but more generally this would be how we add thelinkml_meta
slot to every model without special-casing it.A field is added to
PydanticGenerator
with signature:where additional fields can be injected to the basemodel such that
results in
This replaces the
pass
(but includes it ifinjected_fields = None
Future cleanup
Build stages with
Results
objectsCurrently there is a relatively large amount of implicit behavior between the pydantic default template and the generator itself, so while I'm not touching it in this PR, this is a step towards one piece of that by making an explicit system to include imports - eg., in the future it might be nice to handle all imports this way, in particular by having discrete build stages for each of the classes/slots that yield a
BuildResult
that indicate what other classes and imports they depend on (this will be how arrays work).Unification with
dataclass
generatorI didn't try and port this to the
PythonGenerator
since it has a different inheritance lineage, but it might be nice to make a singlePythonGenerator
metaclass/mixin for all the python generators (eg. inclSqlAlchemy
), renaming this oneDataclassGenerator
, to contain all the logic that is special to python generators like this.Modularize template
It would be nice to able to reuse common logic to make the generator more concise and less prone to errors. So for example, I couldn't reuse the logic to generate a model slot for
injected_fields
and thus could only accept a pre-formatted field string. It's also relatively difficult to tell the format of what should go into the jinja template - we lose all the structure of theClassDefinition
et al. classes and instead go into anonymous dictionary land.It would be nice to instead
dataclass
or pydantic model that defines the properties for that module.