<a href="https://colab.research.google.com/github/lsloan/json-schema-to-python/blob/master/JSON_Schema_to_Python_classes_(feat_Caliper).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is an experiment to generate code from JSON schemata that can be used to generate JSON which complies with the initial schemata.  The goal is to use as little domain-specific code as possible.  General JSON schema tools can generate the code that represents the objects to be encoded in JSON.  A small amount of additional code may be added to facilitate that encoding.

Primary resources:

* [datamodel-code-generator](https://github.com/koxudaxi/datamodel-code-generator) — "This code generator creates pydantic v1 and v2 model, dataclasses.dataclass and typing.TypedDict from an openapi file and others."
* [caliper-spec JSON schemata](https://github.com/1EdTech/caliper-spec/tree/60f7cb7/json_schema/schema_1_1) — A set of JSON schemata for a publicly available standard.
* jsonschema
* check-jsonschema


# JSON Schema to Python classes (feat. Caliper)

## Initialize

In [None]:
%reload_ext autoreload
%autoreload all --print

# Install datamodel-code-generator to process JSON schemata
# Install newer pydantic (v2); Colab comes with v1
%pip install -Uq \
  datamodel-code-generator[http]==0.22.0 \
  pydantic==2.4.2 \
  jsonschema==4.19.1 \
  check-jsonschema==0.27.0

# Get a set of JSON schemata, cloned from a GitHub repo
!test -d caliper-spec || \
  git clone -b json_schema https://github.com/1EdTech/caliper-spec.git

# Install jq if it's not already available, for editing JSON schemata
# Note: `apt` couldn't get jq-1.7, only jq-1.6
# !jq -V > /dev/null 2>&1 || apt install jq
!jq -V > /dev/null 2>&1 || \
  (wget -O jq https://github.com/jqlang/jq/releases/download/jq-1.7/jq-linux-i386 && \
  chmod +x jq && mv jq /usr/local/bin)

## Clean up JSON schemata

The JSON schemata chosen are a work in progress.  They need some changes to make them usable for this purpose.  The changes made here should not be specific to this experiment and will likely be contributed back to the schemata maintainers.

Changes include…
* Append `.json` to URIs in `$ref` properties. (Completed)  
  The schemata refer to each other and are contained in files with a `.json` extension on their names.  However, the references don't include that extension.
* Remove unnecessary regular expressions. (Work in progress)  
  Regular expressions were used throughout the schemata in `pattern` properties.  In most cases, the regular experssions are of the format `^…static text…$`.  Those could be expressed more easily as `const` properties instead, with the regular expression symbols removed.

In [2]:
# Modify each JSON schema file and save a new copy
# * Add ".json" to "$ref" properties
!mkdir -p /content/schema; \
  cd /content/caliper-spec/json_schema/schema_1_1; \
  for f in *.json; do \
    cat $f | jq '(.. | objects."$ref" | strings) |= \
        sub("(?<hashOrEol>#|$)"; ".json\(.hashOrEol)")' \
    > /content/schema/$f; \
  done

# TODO: Also fix required `type` properties and other unnecessary regexes

## Generate model classes from schemata

Attempts to build the classes from remote schema…

```
datamodel-codegen --debug \
--url https://raw.githubusercontent.com/1EdTech/caliper-spec/json_schema/json_schema/schema_1_1/Entity.json \
\--output generatedModel.py
```

…fails with the message, `TypeError: keywords must be strings`

In [3]:
!datamodel-codegen \
  --input-file-type jsonschema \
  --use-title-as-name \
  --output-model-type pydantic_v2.BaseModel \
  --input schema/Agent.json \
  --output generatedModel.py
#  --use-one-literal-as-default \
#   --input schema/Event.json \

# Without `--use-title-as-name` datamodel-codegen sometimes drops the last
# character of class names

%ls -l generatedModel.py

-rw-r--r-- 1 root root 1962 Sep 29 16:30 generatedModel.py


## Use the generated classes

In [4]:
from pydantic import ValidationError
import generatedModel as genmo

try:
    # m = genmo.Agent(id='123', name='Bond', field_context='007', type='SecretAgent')
    # m = genmo.Agent(id='123', name='Bond', context='http://purl.imsglobal.org/ctx/caliper/v1p1')
    m = genmo.Agent(id='123', name='Bond', type='Agent')
    print(m)
except ValidationError as v:
    print(v)

# TODO: Modify schema to automatically use correct type

# print(30 * '- ')

# m = genmo.Agent(id='123', name='Bond', field_context='007')
# print(m)
# m.type='SecretAgent'
# print(m)


extensions=None id='123' dateCreated=None dateModified=None description=None name='Bond' type='Agent' field_context=None


## Serialize instantiated objects to JSON

In [5]:
# FIXME: @context is not set
print(m.model_dump_json())
print(m.model_dump_json(exclude_unset=True))
print(m.model_dump_json(exclude_none=True))
print(m.model_dump_json(exclude_defaults=False))

{"extensions":null,"id":"123","dateCreated":null,"dateModified":null,"description":null,"name":"Bond","type":"Agent","field_context":null}
{"id":"123","name":"Bond","type":"Agent"}
{"id":"123","name":"Bond","type":"Agent"}
{"extensions":null,"id":"123","dateCreated":null,"dateModified":null,"description":null,"name":"Bond","type":"Agent","field_context":null}


## Validate the JSON

### Programmatic validation

Calling the validator this way fails.  It's unable to open the other files referenced in each `$ref` property.  It claims the URI is incorrect.  Prefixing `file://` to the bare URI doesn't seem to help.

In [6]:
from os import chdir
import json
from jsonschema import validate

try:
    chdir('/content/schema')
    schema = json.load(open('Agent.json'))
    validate(instance=m.model_dump(), schema=schema)
    # validator unable to open "$ref" files; URI is incorrect?
except Exception as e:
    print('Error!')
    print(e)

Error!
Unresolvable: CaliperTypeDefinitions.json#/extensions


### CLI validation

This CLI validator uses the programmatic validator mentioned above, yet this works.  Find out why.

Find out whether `check-jsonschema` can be called programmatically.

If `null` values are included, the validator reports `None is not of type 'object'`, `None is not of type 'string'`, etc.

Input is provided by process substitution because `check-jsonschema` doesn't support STDIN.  (Yet.  I'm working on it.)



In [7]:
!check-jsonschema --schemafile /content/schema/Agent.json <(printf '{m.model_dump_json(exclude_none=True)}')

[32mok[0m -- validation done
