# Generate JSON Schemas Automatically using Python

## What is JSON Schema?

<!-- <figure>
  <img src="https://json-schema.org/assets/logo.svg" alt="Image" width="200" height="100">
  <figcaption>Logo of json-schema.org</figcaption>
</figure> -->


JSON Schema is a widely used tool for defining the structure, constraints, and validation rules for JSON data. It provides a standardized way to describe the expected format and properties of JSON documents, allowing developers to validate and ensure the correctness of data exchanged between different systems or components. 

Check out the JSON Schema Declaration Language at: https://json-schema.org/

## Benefits
* Validates Data Exchange 
* Provides a Solution to Schema Evolution
* Creates Schema Documentation

## JSON Schema & Python

Python has a few libraries that enable you to leverage the JSON Schema language. One such library is `jsonschema` which gives functionality to check json documents against json schemas using Python.

To my knowledge, there is no Python package available that will take a JSON Document as input and output an auto generated JSON Schema. With this being the case, I started developing one that does.

# Getting Started with jsonautoschema

As noted, this libary is in development so the package cannot be installed via `pip`. Source code can be found at: https://github.com/jaycroft/jsonautoschema

## generate_schema()
The generate_schema() function takes a json array or object, analyzes the structure, and returns a standard schema that you can used to validate json documents. It comes with a number of optional arguments that you can use to refine and add informaton to the schema. See below:

    json_doc: Union[dict, list],
    schema_url: str = None,
    uri: str = None,
    title: str = None,
    description: str = None,
    required_cols: Union[str, list] = None,
    nullable_cols: Union[str, list] = None,
    version: str = None


## Deserializing JSON
In order for the generate_schema() function to work, you need to deserialize the JSON document into python compatible data types. This is fairly straightforward and can be done using the json library. 

In [1]:
import json

json_document = '''
{
    "_id": 1,
    "author": "Albert Einstein",
    "quote": "Everything should be as simple as it can be, but not simpler!",
    "source": "The Archives"
}
'''

json_data = json.loads(json_document)
json_data

{'_id': 1,
 'author': 'Albert Einstein',
 'quote': 'Everything should be as simple as it can be, but not simpler!',
 'source': 'The Archives'}

## Using Defaults

In [2]:
import jsonautoschema
schema = jsonautoschema.generate_schema(json_data)
schema

{'type': 'object',
 'properties': {'_id': {'type': 'integer'},
  'author': {'type': 'string'},
  'quote': {'type': 'string'},
  'source': {'type': 'string'}},
 'required': ['_id', 'author', 'quote', 'source']}

In [3]:
jsonautoschema.validate_json(json_data,schema)

Meets Schema Requirements


## Required & Nullable Columns
With the generate_schema() function, you can see required and nullable columns. You can specify these as a string or a list of strings in python. 

In [4]:
import jsonautoschema
schema = jsonautoschema.generate_schema(json_data
                                        ,required_cols=['author','source']
                                        ,nullable_cols=['quote']
                                        )
schema

{'type': 'object',
 'properties': {'_id': {'type': 'integer'},
  'author': {'type': 'string'},
  'quote': {'type': ['string', 'null']},
  'source': {'type': 'string'}},
 'required': ['author', 'source']}

In [5]:
jsonautoschema.validate_json(json_data,schema)

Meets Schema Requirements


## Versioning & Schema Documentation
As your web application evolves, your JSON Schemas are likely to change. Within the generate_schema() function, you can add a version number and descriptive information to document changes to your schema. See the code below for an example:

In [6]:
import jsonautoschema
schema = jsonautoschema.generate_schema(json_data
                                        ,version='1.0.0'
                                        ,title='Quotes'
                                        ,description='Standard Quote Format'
                                        )
schema

{'type': 'object',
 'properties': {'_id': {'type': 'integer'},
  'author': {'type': 'string'},
  'quote': {'type': 'string'},
  'source': {'type': 'string'}},
 'required': ['_id', 'author', 'quote', 'source'],
 'title': 'Quotes',
 'description': 'Standard Quote Format',
 'version': '1.0.0'}

In [7]:
jsonautoschema.validate_json(json_data,schema)

Meets Schema Requirements


## Interpreting JSON Structures
The generate schema function automatically interprets the root json structure used. In the last few examples, we looked at a JSON document with its root sturcture as an object. Check out the code below to see an example of a JSON document with th root structure as an array.

In [8]:
import json

json_document = '''
[
    {
    "_id": 1,
    "author": "Albert Einstein",
    "quote": "Everything should be as simple as it can be, but not simpler!",
    "source": "The Archives"
    }
]
'''

json_data = json.loads(json_document)
json_data

[{'_id': 1,
  'author': 'Albert Einstein',
  'quote': 'Everything should be as simple as it can be, but not simpler!',
  'source': 'The Archives'}]

In [9]:
import jsonautoschema
schema = jsonautoschema.generate_schema(json_data
                                        ,required_cols=['author','source']
                                        ,nullable_cols=['quote']
                                        )
schema

{'type': 'array',
 'items': {'type': 'object',
  'properties': {'_id': {'type': 'integer'},
   'author': {'type': 'string'},
   'quote': {'type': ['string', 'null']},
   'source': {'type': 'string'}},
  'required': ['author', 'source']}}