Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include custom annotations in generated schemas / code #1618

Open
simontaurus opened this issue Sep 13, 2023 · 7 comments
Open

Include custom annotations in generated schemas / code #1618

simontaurus opened this issue Sep 13, 2023 · 7 comments
Labels
community-generated developer-days smallish tickets that can be considered "maintenance" and fixed within a single session enhancement New feature or request generator-pydantic

Comments

@simontaurus
Copy link

simontaurus commented Sep 13, 2023

Is your feature request related to a problem? Please describe.
We are currently working on a link schema repo https://opensemantic.world/ combining json-schema and json-ld, similar to #474
On the way we recognized that linkml (great work!) already addresses a lot of problems we have identified, especially the bridges to OWL and SHACL.
However, since we focus both on programmatical and graphical interfaces would also need some custom annotations
for custom vocabular e. g. used by https://github.com/json-editor/json-editor to generate html forms.

Describe the solution you'd like
Option to transfer

  • a) existing annotations (like range even when inlined_as_list: false)
  • b) custom annotation like hidden: true for autogenerated fields in forms
    into jsonschema and pydantic classes
  classes:
  Person:
  Container:
    attributes:
      id:
        option:
          hidden: true
      persons:
        multivalued: true
        inlined_as_list: true
        range: Person
{
    "$defs": {
        "Container": {
            "description": "",
            "properties": {
                "id": {
                    "options": {
                        "hidden": "true"
                    },
                    "type": "string"
                "persons": {
                    "items": {
                        "type": "string"
                    },
                    "type": "array",
                    "range": "#/$defs/Person"
                }
            },
            "title": "Container",
            "type": "object"
        },
        "Person": {
        }
    }
}
class Person:
    pass

class Container(ConfiguredBaseModel):
    id: str
    persons: Optional[List[str]] = Field(default_factory=list, range=Person, options={'hidden': True})

Note: the extra parameter for Field() is also generated by https://github.com/koxudaxi/datamodel-code-generator

Maybe utilizing linkml annotations and dump the value as dict in the json-schema property and the pydantic Field if the generator was called with --include-annotations=True would be a solution for b)

How important is this feature? Select from the options below:
• Important - it's a blocker and can't do work without it (but of course I understand if this is out-of-scope)

When will use cases depending on this become relevant? Select from the options below:
• Mid-term - 2-4 months

Additional context
Related:

@simontaurus simontaurus changed the title Custom annotations Include custom annotations in generated schemas / code Sep 13, 2023
@kevinschaper kevinschaper added the developer-days smallish tickets that can be considered "maintenance" and fixed within a single session label Nov 10, 2023
@cmungall
Copy link
Member

cmungall commented Jan 30, 2024

This is in scope and would be a useful feature. We want to think carefully about how best to do this. See also #1830

I think using options in pydantic makes sense. It would be nice if there were an equivalent for dataclasses. @pkalita-lbl does the jsonschema approach above seem reasonable

@pkalita-lbl
Copy link
Contributor

I think encoding the extra information in annotations along with an optional generator flag like --inject-annotations could work well for the JSON Schema generator. We'd just want to make sure we're careful to warn the user about edge cases like if the annotation would overwrite a JSON Schema keyword that was produced by the existing generation logic. Such as:

slots:
  age:
    minimum_value: 0  # this will generate `minimum: 0` in JSON Schema
    annotations:
      hidden: true
      minimum: 10  # uh oh!

@cmungall
Copy link
Member

Do we also have to worry about the scenario where a future version of json-schema introduces a keyword annotations?

@simontaurus
Copy link
Author

Following @pkalita-lbl approach, annotations would not collide with json-schema keywords since it's not exported (but the keywords / subobjects below)
So running

slots:
  age:
    minimum_value: 0  # this will generate `minimum: 0` in JSON Schema
    annotations:
      options: 
        hidden: true
      minimum: 10  # uh oh!

with --inject-annotations would lead to

{
  "type": "object",
  "properties": {
                "age": {
                    "type": "int",
                    "minimum": 10
                    "options": {
                        "hidden": true
                    }
               }
}

with minimum overwritten by annotation. I think we can put the responsibility on the user and may introduce in addition --inject-annotation-prefix 'x-', resulting in

{
  "type": "object",
  "properties": {
                "age": {
                    "type": "int",
                    "minimum": 0
                    "x-options": {
                        "hidden": true
                    },
                    "x-minimum": 10
               }
}

or even --inject-annotation-prefix-on-conflict, resulting in

{
  "type": "object",
  "properties": {
                "age": {
                    "type": "int",
                    "minimum": 0
                    "options": {
                        "hidden": true
                    },
                    "x-minimum": 10
               }
}

@pkalita-lbl
Copy link
Contributor

I think we can put the responsibility on the user

I agree. I just wanted to flag it as something we should be clear on and communicate to the user (through documentation, runtime warnings, etc) so that no one is surprised.

@jsheunis
Copy link
Contributor

jsheunis commented May 9, 2024

I'd like to register my interest in this feature as well, in relation to the automatic generation of user interfaces from schemas (similar to @simontaurus's use case). I'm focusing specifically on SHACL. I've adapted shaclgen.py in my local clone, enough to be able to see custom annotations on slots (directly, and slots with custom types as ranges, where the types have annotations) flow through to the exported SHACL. Perhaps my comments provide some more meat for the use case:

  1. I am following the design at https://datashapes.org/forms.html quite closely, i.e. using the DASH vocabulary to define constraints for how specific fields are to be edited/viewed.
  2. I am hoping this is just a result of my ignorance, but I am not sure how I could coerce the annotations into specific types on the SHACL side. E.g. if I have the following schema:
    id: https://example.org/test-schema
    name: myschema
    
    prefixes:
      dash: http://datashapes.org/dash#
      myschema: https://example.org/test-schema/
    
    default_prefix: myschema
    
    emit_prefixes:
      - dash
    
    imports: https://w3id.org/linkml/types
    
    slots:
      my_attr:
        range: string
        annotations:
          dash:singleLine: true
    
    classes:
      MyClass:
        abstract: true
        slots:
          - my_attr
    
    I want the SHACL to be:
    @prefix myschema: <https://example.org/test-schema/> .
    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefix sh: <http://www.w3.org/ns/shacl#> .
    @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
    
    myschema:MyClass a sh:NodeShape ;
        sh:closed false ;
        sh:ignoredProperties ( rdf:type ) ;
        sh:property [ sh:datatype xsd:string ;
                dash:singleLine true ;
                sh:maxCount 1 ;
                sh:nodeKind sh:Literal ;
                sh:order 0 ;
                sh:path myschema:my_attr ] ;
        sh:targetClass myschema:MyClass .
    
    i.e. the annotation tag dash:singleLine should be recognized as a CURIE and the annotation value true should be recognized as xsd:boolean, i.e. not string literals. I don't have ideas yet about how to achieve this.
  3. I encountered a scenario where there could be conflicts between annotations of a slot and annotations of a custom type which is used as the range of that same slot. E.g.:
    types:
      NameString:
        typeof: string
        uri: myschema:NameString
        annotations:
          dash:singleLine: false
    
    slots:
      my_attr:
        range: NameString
        annotations:
          dash:singleLine: true
    
    Here, the question is whether the resulting property shape of my_attr in SHACL should have dash:singleLine true ; or dash:singleLine false ;. I thought it's worth mentioning such a scenario and to consider how generator code would deal with this if needed.

If anyone has some more insight to share about these challenges I'm facing, I would love to hear them! I'm happy to create a PR to address shaclgen improvements ito annotations, once the uncertainties are gone.

Disclaimers:

  • I have used LinkML for some months now, but I'm almost fully ignorant to the code internals
  • I haven't looked into the base Generators class that something like shaclgen inherits from, so I'm not sure if some of the functionality in my local patches would be general enough to be applied to the base class rather than in shaclgen specifically.

jsheunis added a commit to jsheunis/linkml that referenced this issue May 15, 2024
This is for the SHACL generator in response to linkml#1618.
Code is added to shaclgen.py to:
- allow users to specify the --include-annotations tag if they
want annotations (on classes, slots, and types) to be included
in the exported SHACL shapes
- determine the datatype of both annotation tag and value (a
CURIE is identified by searching for the ':' character)
- add the correct triples to the shacl output (to a nodeshape
for classes, and to a property shape for slots and slots with typesas ranges)
cmungall added a commit that referenced this issue May 29, 2024
…part of shacl shapes (#2111)

* Add --include-annotations option for shaclgen

This is for the SHACL generator in response to #1618.
Code is added to shaclgen.py to:
- allow users to specify the --include-annotations tag if they
want annotations (on classes, slots, and types) to be included
in the exported SHACL shapes
- determine the datatype of both annotation tag and value (a
CURIE is identified by searching for the ':' character)
- add the correct triples to the shacl output (to a nodeshape
for classes, and to a property shape for slots and slots with typesas ranges)

* fix linting

* Update snapshot data in 'test_scripts' after updating kitchen sink schema for shaclgen annotation tests

* Update shaclgen.py

Add a TODO comment

---------

Co-authored-by: Chris Mungall <cjm@berkeleybop.org>
vincentkelleher pushed a commit to vincentkelleher/linkml that referenced this issue Jun 5, 2024
…part of shacl shapes (linkml#2111)

* Add --include-annotations option for shaclgen

This is for the SHACL generator in response to linkml#1618.
Code is added to shaclgen.py to:
- allow users to specify the --include-annotations tag if they
want annotations (on classes, slots, and types) to be included
in the exported SHACL shapes
- determine the datatype of both annotation tag and value (a
CURIE is identified by searching for the ':' character)
- add the correct triples to the shacl output (to a nodeshape
for classes, and to a property shape for slots and slots with typesas ranges)

* fix linting

* Update snapshot data in 'test_scripts' after updating kitchen sink schema for shaclgen annotation tests

* Update shaclgen.py

Add a TODO comment

---------

Co-authored-by: Chris Mungall <cjm@berkeleybop.org>
cmungall added a commit that referenced this issue Jun 7, 2024
* Implement equals_string and equals_string_in

* Remove renaming§

* Add validation rules

* Add validation for equals_string and equals_string_in in schema loader

* Revert renaming

* Remove obsolete code

* Remove obsolete code

* Fix codespell errors

* Resolve flake errors

* Reforamt files

* Fix lint errors

* fix lint errors

* Add unit tests for equals_string and equals_string_in

* Make quality checks happy (#2136)

* Update poetry lockfile

* hotwo on deprecation

* `shaclgen`: Add `--include-annotations` option to let annotations be part of shacl shapes (#2111)

* Add --include-annotations option for shaclgen

This is for the SHACL generator in response to #1618.
Code is added to shaclgen.py to:
- allow users to specify the --include-annotations tag if they
want annotations (on classes, slots, and types) to be included
in the exported SHACL shapes
- determine the datatype of both annotation tag and value (a
CURIE is identified by searching for the ':' character)
- add the correct triples to the shacl output (to a nodeshape
for classes, and to a property shape for slots and slots with typesas ranges)

* fix linting

* Update snapshot data in 'test_scripts' after updating kitchen sink schema for shaclgen annotation tests

* Update shaclgen.py

Add a TODO comment

---------

Co-authored-by: Chris Mungall <cjm@berkeleybop.org>

* Erdiagram include upstream (#2139)

* Include upstream classes into ERD diagram of selected entitites

Add docs for —include-upstream

* Fix unit test for Py3.9

* Update poetry lockfile

* Implement equals_string and equals_string_in

* Resolve flake errors

* fix lint errors

* Fix tests for equals_string_in feature

Signed-off-by: Vincent Kelleher <vincent.kelleher@gaia-x.eu>

* Fix gen shacl test

* Fix unit tests

* Reformat code

* Fix missing type

* Reformt

* Fix lint errors

* Fix lint errors

* Fix unti tests

* Format imports; ensure that tox and pre-commit agree on a ruff version

---------

Signed-off-by: Vincent Kelleher <vincent.kelleher@gaia-x.eu>
Co-authored-by: Anja Strunk <anja.strunk@cloudandheat.com>
Co-authored-by: Vlad Korolev <vlad@v-lad.org>
Co-authored-by: cmungall <50745+cmungall@users.noreply.github.com>
Co-authored-by: Sierra Taylor Moxon <sierra.taylor@gmail.com>
Co-authored-by: Stephan Heunis <s.heunis@fz-juelich.de>
Co-authored-by: Chris Mungall <cjm@berkeleybop.org>
Co-authored-by: Vincent Kelleher <vincent.kelleher@gaia-x.eu>
Co-authored-by: anjastrunk <119566837+anjastrunk@users.noreply.github.com>
vincentkelleher pushed a commit to vincentkelleher/linkml that referenced this issue Jun 10, 2024
* Implement equals_string and equals_string_in

* Remove renaming§

* Add validation rules

* Add validation for equals_string and equals_string_in in schema loader

* Revert renaming

* Remove obsolete code

* Remove obsolete code

* Fix codespell errors

* Resolve flake errors

* Reforamt files

* Fix lint errors

* fix lint errors

* Add unit tests for equals_string and equals_string_in

* Make quality checks happy (linkml#2136)

* Update poetry lockfile

* hotwo on deprecation

* `shaclgen`: Add `--include-annotations` option to let annotations be part of shacl shapes (linkml#2111)

* Add --include-annotations option for shaclgen

This is for the SHACL generator in response to linkml#1618.
Code is added to shaclgen.py to:
- allow users to specify the --include-annotations tag if they
want annotations (on classes, slots, and types) to be included
in the exported SHACL shapes
- determine the datatype of both annotation tag and value (a
CURIE is identified by searching for the ':' character)
- add the correct triples to the shacl output (to a nodeshape
for classes, and to a property shape for slots and slots with typesas ranges)

* fix linting

* Update snapshot data in 'test_scripts' after updating kitchen sink schema for shaclgen annotation tests

* Update shaclgen.py

Add a TODO comment

---------

Co-authored-by: Chris Mungall <cjm@berkeleybop.org>

* Erdiagram include upstream (linkml#2139)

* Include upstream classes into ERD diagram of selected entitites

Add docs for —include-upstream

* Fix unit test for Py3.9

* Update poetry lockfile

* Implement equals_string and equals_string_in

* Resolve flake errors

* fix lint errors

* Fix tests for equals_string_in feature

Signed-off-by: Vincent Kelleher <vincent.kelleher@gaia-x.eu>

* Fix gen shacl test

* Fix unit tests

* Reformat code

* Fix missing type

* Reformt

* Fix lint errors

* Fix lint errors

* Fix unti tests

* Format imports; ensure that tox and pre-commit agree on a ruff version

---------

Signed-off-by: Vincent Kelleher <vincent.kelleher@gaia-x.eu>
Co-authored-by: Anja Strunk <anja.strunk@cloudandheat.com>
Co-authored-by: Vlad Korolev <vlad@v-lad.org>
Co-authored-by: cmungall <50745+cmungall@users.noreply.github.com>
Co-authored-by: Sierra Taylor Moxon <sierra.taylor@gmail.com>
Co-authored-by: Stephan Heunis <s.heunis@fz-juelich.de>
Co-authored-by: Chris Mungall <cjm@berkeleybop.org>
Co-authored-by: Vincent Kelleher <vincent.kelleher@gaia-x.eu>
Co-authored-by: anjastrunk <119566837+anjastrunk@users.noreply.github.com>
@sneakers-the-rat
Copy link
Collaborator

saw this when searching for related issues for a PR. this is done for pydanticgen at least via the metadata_mode property that can include all schema metadata in generated models, including annotations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-generated developer-days smallish tickets that can be considered "maintenance" and fixed within a single session enhancement New feature or request generator-pydantic
Development

No branches or pull requests

7 participants