# Describing Hierarchical Data with JSON
The JSON data format is widely used to describe data with a hierarchical structure. 
Such structures are composed of nested set of rules like "addresses have street numbers", and "street numbers are positive integers."
This notebook will show you how to create simple JSON documents then enforce their format requirements using the JSON Schema.

In [None]:
from jsonschema import validate, ValidationError
import json

## First, Reading and Writing a JSON Document in Python
JSON documents map neatly to the "dictionary" data structure and many of the other data types used in Python. 
Accordingly, there is a [JSON module](https://docs.python.org/3/library/json.html) available in all installations of Python for moving to- and from this data format.

Let's start with some simple data: Only one level of nesting and only two types of data, integers and strings.

In [None]:
my_address = {
    'number': 9700,
    'direction': 'S',
    'street_name': 'Cass',
    'street_suffix': 'Ave',
    'city': 'Lemont', 
    'state': 'IL',
    'zip': 60439
}

Converting this to JSON is simple

In [None]:
print(json.dumps(my_address, indent=2))

JSON-format data looks almost exactly like Python code ([JSON's syntax is from a programming language](https://en.wikipedia.org/wiki/JSON#History)). 
There are differences including JSON only allowing double quotes for marking strings and only allowing keys to be strings, but they only rarely come up as problem.

In [None]:
json.dumps({1: 1})

In [None]:
try:
    json.loads("{'1': 1}")
except json.JSONDecodeError as e:
    print(e)

In short, it is pretty easy to write and understand what a JSON document can look like. 

The challenge is defining what it _must_ look like for your data.

## JSON Schemas
[JSON Schemas](https://json-schema.org/) enforce what keys and values are allowed for a document to belong to certain format.

All JSON schemas are JSON documents which start with a special preamble that defines them as schemas.

In [None]:
schema = {
    "$schema": "https://json-schema.org/draft/2020-12/schema",  # References which version of the Schema
    "title": "Address",  # A short name for the data type
    "description": "A location recognized by the US Postal Service",  # A longer-form explanation
    "type": "object"  # What type of data it is "more on this later."
}

The rest of the schema is a series of options and nested structures defining the allowed keys and their types

In [None]:
schema['properties'] = {
    'number': {
        'title': 'House number',
        'description': 'The position of the building along a street.',
        'type': 'integer',
    },
    'direction': {
        'title': 'Direction',
        'description': 'Direction of the street relative to the center of the numbering system.',
        'type': 'string'
    }
}

The similarity between the preamble and descriptions of the blocks is very intentional. 
The format is recursive, in that keys of the document must also be valid JSON documents. 

Documents are composed of documents.

In [None]:
print(json.dumps(schema, indent=2))

## Validating a JSON Document
[Many libraries](https://json-schema.org/implementations.html) exist for checking whether a JSON document matches a schema.
We'll show you how to use one of the common Python ones, [`jsonschema`](https://python-jsonschema.readthedocs.io/en/stable/)

In [None]:
validate(my_address, schema)  # All you do

In [None]:
try: 
    validate({'number': '1'}, schema)
except ValidationError as e:
    print(e)

It will detect when your document doesn't fit the schema, and why.

## Scheming a Better Schema
The example schema we give above is very simple. It only detects that we have the proper types for each value, but there's a lot more that makes an address an address.

For one, the house number and a ZIP code are required. For that we add the ["required"](https://json-schema.org/understanding-json-schema/reference/object.html#required-properties) option to describing the address. 

In [None]:
schema['required'] = ['number', 'street_name', 'zip']

In [None]:
try: 
    validate({'number': 4, 'street_name': 'Main'}, schema)
except ValidationError as e:
    print(e.message)

In [None]:
validate({'number': 4, 'street_name': 'Main', 'zip': 41400}, schema)

Another example is assigning that street numbers must be positive. Add that requirement by adding an option to the "number" property.



In [None]:
schema['properties']['number']['exclusiveMinimum'] = 0

In [None]:
schema['properties']['number']

In [None]:
try: 
    validate({'number': -1, 'street_name': 'Main', 'zip': 41400}, schema)
except ValidationError as e:
    print(e.message)

The possible options for testing types vary depending on the type of the data. For example, text strings [can have required lengths or match certain patterns.](https://json-schema.org/understanding-json-schema/reference/string.html)

Every `type` has a different set of requirements.

## Other Types of Data
Hierarchical formats can be more than simple single-level mappings of name to numbers or text. Fields within the document can be lists of data or other documents.

In [None]:
schema['properties']['residents'] = {
    'type': 'array',
    'description': 'Names of residents associated with this address.',
    'items': {
        'type': 'string'
    }
}

In [None]:
validate({
    'number': 1, 'street_name': 'Main',
    'zip': 41400,
    'residents': ['Logan Ward'],},
schema)

The description of the [array type](https://json-schema.org/understanding-json-schema/reference/array.html) should feel similar. It has some standard fields (e.g., type, description) and others that are specific to its type (e.g., items). 

The type of a field can be another document (known as a "object' to JSON schema).

In [None]:
schema['properties']['building'] = {
    'type': 'object',
    'description': 'Description of the building at this address.',
    'required': ['function'],
    'properties': {
        'function': {
            'enum': ['residential', 'commercial', 'industrial'],  # Fixes the allowed values
            'description': 'Use of the building.'
        },
        'floors': {
            'type': 'integer',
            'description': 'Number of stories in the building.',
            'minimum': 1,
        }
    }
}

**Warm-up Exercise**: Add that the house at 1 Main is a 1-story residential building. 

In [None]:
raise NotImplementedError()
validate({
    'number': 1, 'street_name': 'Main',
    'zip': 41400,
    'residents': ['Logan Ward'],
}, schema)

Double click this cell to see an answer.
<code hidden>
{
    'number': 1, 'street_name': 'Main',
    'zip': 41400,
    'residents': ['Logan Ward'],
    'building': {'function': 'residential', 'floors': 1}
}
</code>

## Exercise: Describe Alloys and Heat Treatment Schedules
Build a schema that describes heat treatment schedules

In [None]:
treatment_a = {
    'name': 'long aged',
    'steps': [
        {'type': 'ramp', 'rate': 1.5, 'temperature': 160},
        {'type': 'hold', 'time': 8, 'temperature': 160},
        {'type': 'ramp', 'rate': 100, 'temperature': 100},
        {'type': 'quench', 'medium': 'water'},
    ]
}

In [None]:
treatment_b = {
    'name': 'rapid',
    'steps': [
        {'type': 'ramp', 'rate': 1.5, 'temperature': 160},
        {'type': 'hold', 'time': 0.5, 'temperature': 320},
        {'type': 'quench', 'medium': 'water'},
    ]
}

Build a schema that: 
- [ ] Ensures that documents have a "name" field
- [ ] Ensures that there are at least 3 steps (see [array](https://json-schema.org/understanding-json-schema/reference/array.html))
- [ ] Ensures that each step contains a type of ramp, hold, or quench
- [ ] Describes that times are in hours, rates are in C/s, and temperatures are in C.

In [None]:
schema = {}
raise NotImplementedError()

In [None]:
for x in [treatment_b, treatment_a]:
    validate(x, schema) 

Double click for an answer.
<code hidden>
    schema = {
    'description': 'Heat treatment schedule for an alloy',
    'type': 'object', 
    'properties': {
        'name': {
            'description': 'A recognizable name for this schedule.',
            'type': 'string',
        },
        'steps': {
            'description': 'Each step of a treatment schedule',
            'type': 'array',
            'minContains': 3,
            'items': {
                'type': 'object',
                'properties': {
                    'type': {'description': 'Type of the step', 'enum': ['ramp', 'hold', 'quench']},
                    'time': {'description': 'Duration of the hold time (units: hr)', 'type': 'number'},
                    'temperature': {'description': 'Hold temperature or end temperature of a ramp (units: C)', 'type': 'number'},
                    'medium': {'descripton': 'Quench medium', 'type': 'string'}
                }
            }
        }
    }
}
</code>

Bonus steps:
- Enforce that times must be positive
- Prevent other keys from being allowed
- Ensure that "medium" is set if "quench" is the type. (Hint: `oneOf`)

## Learning More
This notebook only scratches the surface of JSON schema. Good steps to learn next include:

1. Going through the [Step-by-Step from JSONSchema.org](https://json-schema.org/learn/getting-started-step-by-step.html)
1. Splitting the heat treatment example into multiple files, then assembling the heat treatment schedule as a [complex schema](https://json-schema.org/understanding-json-schema/structuring.html).