Proposed Principle: Avoid generator-specific schema rewrites #3647

cmungall · 2026-06-12T17:16:31Z

cmungall
Jun 12, 2026
Maintainer

Historically, some generators have allowed particular options that change the semantics of the schema. Classic examples include allowing extra attributes for jsonschemagen and pydanticgen. There are a number of objections to this:

the SoT should be the schema; it should not in principle be possible to "override"
having two target validators work different is surprising and can lead to confusion

However, pragmatically there may be reason to want this kind of behavior, sometimes this might be short term while we wait for sufficient expressivity in the metamodel. There might also be reasons that are particular to the target formalism. E.g. when converting to strict relational DDL not all schema features are supported so there may be tradeoffs to be made downstream of the schema itself.

overriding a schema should be discouraged, but well-defined schema transforms should be permitted (this may seem merely terminological but in fact this framing allows us to formalize and reason about things better)
The schema should be the SoT as far as possible
transform options should not be tied to a particular generator, consistent terminology / option names should be used across generators
generator options that perform transforms may be deprecated or at least discourafed given sufficiently timeline as increased metamodel expressivity comes online

Examples of transforms include:

schema relaxations /. extra-attributes
class hierarchy unwinding operations when targeting non OO formalisms

sneakers-the-rat · 2026-06-13T02:15:57Z

sneakers-the-rat
Jun 13, 2026
Collaborator

The major reason aside from semantic fidelity IMO to have generators make representations across frameworks be as close as possible to the schema is that the entire purpose of generating representations in different frameworks is to be able to do interoperability across them - that they are all "the same thing" but in different modalities.

Yes, true, different frameworks have different capabilities, that's a given, so interop will always be slightly lossy. The thing we really need for "true" interop is also to be able to have generation be bidirectional - if I already have my schema expressed in json schema, or as pydantic models, to be able to juice that back into linkml, but that's obviously a ways off and a more difficult problem.

The thing that would really make that impossible is if rather than pushing desired features into the schema layer, we left them hanging around in the generators, and so the question of interoperability explodes from "mapping domain to domain" to "mapping (domain * all the possible overrides) to (domain * all the possible overrides." Even if we did have a really awesome means of recording all those overrides as a set of transformation options (I'm a big fan of the linkml map idea, of course) then that still implies that we need to make any interoperability layer accommodate the product of all those extra possibilities, and come up with ways to represent not only the schema, but the transforms applied in every framework. This is possible in frameworks like Python and pydantic where we can just stick a private dict in the module, or in json schema with some annotation object, but not really with e.g. SQL DDL and others.

Things like unwinding inheritance hierarchy to me are part of adapting to other frameworks, so as a much more mildly held opinion, IMO that would be great to flatten out as a single layer as being different frameworks (e.g. frameworkgen, frameworkgen-flat) rather than having a nested parameterization of framework*options - but again, mildly held opinion, the general principle is the same, preserve capacity for interop by limiting the complexity to a single-layer star topology, framework<->linkml with linkml as the center of the star.

This intersects with another longstanding problem, and that's the tooling, fluidity of modifications, and versionability (is that a word?) of the metamodel. Two things are true: a) we don't want to go hogwild and make the metamodel a moving target, and b) we do need to be able to make it just as easy to make changes in the metamodel as in the generators to avoid the temptation of a quick hack. To make this possible it needs to be possible to e.g. say "my schema is tied to version x of the metamodel, so I want to use the tooling for version x of the metamodel." For that we need to move the linkml model artifacts out of linkml-runtime and into linkml-model, a package whose version === the metamodel version, and then pin that version in linkml and linkml-runtime such that if I have a schema at metamodel version x, I can set a hard pin of linkml-model=x.y.z and then the dependency resolver can ensure the version of linkml I get on install is compatible with that. That makes it possible to make needed changes to the metamodel while providing a path to people to have predictable behavior and a clear upgrade path. There is a lot of things that could get pruned and cleaned up from the metamodel to make it easier to understand and support, but as long as we are required to be infinitely backwards compatible because there's no way to protect against downstream breakage, we can't do that.

So, briefly, IMO we should have a standard that generators do not mutate schema. Over time deprecate places they do by moving any desired behaviors into the metamodel. And to make that feasible make the metamodel easier and safer to mutate. Thank you for attending my ted talk

0 replies

sneakers-the-rat · 2026-06-13T04:08:25Z

sneakers-the-rat
Jun 13, 2026
Collaborator

on the more specific question of schema-level extra data, continued the prior issue over here: #1595 , just linking for discoverability between multiple discussions of related topic

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linked data Modeling Language

Proposed Principle: Avoid generator-specific schema rewrites #3647

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Linked data Modeling Language

Proposed Principle: Avoid generator-specific schema rewrites #3647

Uh oh!

cmungall Jun 12, 2026 Maintainer

Replies: 2 comments

Uh oh!

sneakers-the-rat Jun 13, 2026 Collaborator

Uh oh!

Uh oh!

sneakers-the-rat Jun 13, 2026 Collaborator

cmungall
Jun 12, 2026
Maintainer

sneakers-the-rat
Jun 13, 2026
Collaborator

sneakers-the-rat
Jun 13, 2026
Collaborator