# JSON Schema

The prior lesson demonstrated communicating between a RESTful web server and a client.  Recall that we sent HTTP POST messages with a JSON body to a server and received JSON responses from GET queries.  One thing that was not done in the example was any validation of the format of these messages.  Or rather, there was one element of ad-hoc validation in that the server required the field "name" to be present in a user record.

Using JSON Schema, we can more precisely specify all the elements that may be present in an acceptable JSON document, including which are requires versus option, and indicate datatypes and nesting of containers.  JSON Schema can contain varying levels of details.  We will look at some possible schemata to define a valid user with varying degrees of specificity.

Let us start out by loading Python standard library modules and the third-party `jsonschema` module.  We also create JSON strings for several users to validate.

In [1]:
import json
from jsonschema import validate, ValidationError

In [20]:
guido = json.loads("""{
  "name": "Guido van Rossum",
  "password": "unladenswallow",
  "details": {
    "profession": "ex-BDFL"
  }
}""")

In [3]:
david = json.loads("""{
  "name": "David Mertz",
  "password": "badpassword",
  "details": {
    "profession": "Data Scientist",
    "publisher": "INE"
  },
  "lucky_numbers": [12, 42, 55, 87]
}""")

In [4]:
intruder = json.loads("""{
  "password": "P4cC!^*8chWz8", 
  "profession": "Hacker"
}""")

# Validation

A JSON Schema is itself a JSON document following certain specifications.  At the simplest, it needs to specify a type for the JSON being validated. The module `jsonschema` expects Python objects as both `instance` and `schema` arguments.  If you are beginning with JSON—which is, after all, the point of using it—you need to use the `json` module to convert both to Python objects first.

The API the `jsonschema` module uses might be surprising.  It raises an exception on failure, but passes silently on success.  Let us look at a couple examples.

## Checking Scalars

In [5]:
try:
    validate(instance=99, schema={"type": "number"})
    print("99 is a number")
except ValidationError as err:
    print(err)    

99 is a number


In [6]:
try:
    validate(99, {"type": "string"})
    print("99 is a string")
except ValidationError as err:
    print(err)

99 is not of type 'string'

Failed validating 'type' in schema:
    {'type': 'string'}

On instance:
    99


In [7]:
try:
    validate("99", {"type": "number"})
    print("99 is a string")
except ValidationError as err:
    print(err)

'99' is not of type 'number'

Failed validating 'type' in schema:
    {'type': 'number'}

On instance:
    '99'


## A Test Function

I find it easier to wrap the exception raising API with a function that will return either the error description as a string or None as a sentinel for "no errors."

In [8]:
def not_valid(instance, schema):
    try:
        validate(instance, schema)
        return None
    except ValidationError as err:
        return str(err)

The following is the pattern we will use for the remaining examples.

In [9]:
# The "walrus operator" requires Python 3.8+
if msg := not_valid("Ooops", {"type": "array"}):
    print(msg)

'Ooops' is not of type 'array'

Failed validating 'type' in schema:
    {'type': 'array'}

On instance:
    'Ooops'


# Checking Users

The simple examples above do not check structured collections. All user JSON records are what JavaScript calls "objects" but Python calls dicts.   For a JSON object, we need to define both the type and the properties we expect it to have.  We may specify keys as required, but validation will not prohibit inclusion of "cargo" in keys we have not specified.  Very often this is exactly desired behavior; JSON often carries extra information that might be used by other consumers, but a particular consumer only needs to assure the parts it cares about are present.

In [10]:
schema = json.loads("""{
  "type" : "object",
  "required": ["name"],
  "properties" : {
    "name" : {"type" : "string"}
    }
}""")

Validate standard users.

In [11]:
for user in [guido, david]:
    if msg := not_valid(user, schema):
        print(msg, "\n--------------------")
    else:
        print(f"User {user['name']} validates correctly")

User Guido van Rossum validates correctly
User David Mertz validates correctly


The schema in this first pass suffices to check the constraint the server in the prior lesson imposed.  In fact, it checks slightly more in guaranteeing that the field "name" is a string.

In [12]:
barbara_feldon = json.loads("""{
  "name": 99, 
  "details": {"profession": "CONTROL Agent"}
}""")

We have two not-quite-conformant user JSON documents to validate. Each fails in a different way.

In [13]:
for user in [barbara_feldon, intruder]:
    if msg := not_valid(user, schema):
        print(msg, "\n--------------------")
    else:
        print(f"User {user['name']} validates correctly")

99 is not of type 'string'

Failed validating 'type' in schema['properties']['name']:
    {'type': 'string'}

On instance['name']:
    99 
--------------------
'name' is a required property

Failed validating 'required' in schema:
    {'properties': {'name': {'type': 'string'}},
     'required': ['name'],
     'type': 'object'}

On instance:
    {'password': 'P4cC!^*8chWz8', 'profession': 'Hacker'} 
--------------------


## Nested Structure

A JSON Schema allows specification of nested structures, including type and cardinality, and also may optionally contain a number of annotations to describe the schema itself.  Let us add a few. In the expanded schema, we will require a password along with a name.  Notice that we describe several aspects of what the field "lucky_numbers" might look like, but we do not make it required.  Guido had none, but David did; both should validate.

In [14]:
schema = json.loads("""{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "http://example.com/user.schema.json",
  "title": "User",
  "description": "A User of Our Computer System",
  "type" : "object",
  "required": ["name", "password"],
  "properties" : {
     "name" : {"type" : "string"},
     "password": {
         "description": "Use special characters and mixed case",
         "type": "string"},
     "lucky_numbers": {
         "description": "Up to 6 favorite numbers 1-100",
         "type": "array",
         "items": {
           "type": "number",
           "minimum": 1,
           "maximum": 100
         },
         "uniqueItems": true,
         "minItems": 0,
         "maxItems": 6
    }
  }
}""")

Our existing users continue to validate without a problem.

In [15]:
for user in [guido, david]:
    if msg := not_valid(user, schema):
        print(msg, "\n--------------------")
    else:
        print(f"User {user['name']} validates correctly")

User Guido van Rossum validates correctly
User David Mertz validates correctly


There are a few ways that validation might fail with the expanded schema.  Obviously, "password" was added as a required field, but the pattern there is identical as with "name".  The field "lucky_numbers" has more going on.  It might be omitted altogether for a valid users, but if it is included, it can only be an array (Python list) of numbers between 1 and 100; moreover, it can only have from zero to six numbers that must be distinct.

In [16]:
the_count = json.loads("""{
  "name": "Count von Count",
  "password": "fourbananas",
  "lucky_numbers": ["one", "two", "three"]
}""")

if msg := not_valid(the_count, schema):
    print(msg, "\n--------------------")
else:
    print(f"User {user['name']} validates correctly")

'one' is not of type 'number'

Failed validating 'type' in schema['properties']['lucky_numbers']['items']:
    {'maximum': 100, 'minimum': 1, 'type': 'number'}

On instance['lucky_numbers'][0]:
    'one' 
--------------------


In [17]:
cantor = json.loads("""{
  "name": "Georg Cantor",
  "password": "omega_aleph",
  "lucky_numbers": [1, 2, 3, 4, 5, 6, 7, 8]
}""")

if msg := not_valid(cantor, schema):
    print(msg, "\n--------------------")
else:
    print(f"User {user['name']} validates correctly")

[1, 2, 3, 4, 5, 6, 7, 8] is too long

Failed validating 'maxItems' in schema['properties']['lucky_numbers']:
    {'description': 'Up to 6 favorite numbers 1-100',
     'items': {'maximum': 100, 'minimum': 1, 'type': 'number'},
     'maxItems': 6,
     'minItems': 0,
     'type': 'array',
     'uniqueItems': True}

On instance['lucky_numbers']:
    [1, 2, 3, 4, 5, 6, 7, 8] 
--------------------


In [18]:
revolution_9 = json.loads("""{
  "name": "Yoko Ono",
  "password": "grapefruit",
  "lucky_numbers": [9, 9, 9]
}""")

if msg := not_valid(revolution_9, schema):
    print(msg, "\n--------------------")
else:
    print(f"User {user['name']} validates correctly")

[9, 9, 9] has non-unique elements

Failed validating 'uniqueItems' in schema['properties']['lucky_numbers']:
    {'description': 'Up to 6 favorite numbers 1-100',
     'items': {'maximum': 100, 'minimum': 1, 'type': 'number'},
     'maxItems': 6,
     'minItems': 0,
     'type': 'array',
     'uniqueItems': True}

On instance['lucky_numbers']:
    [9, 9, 9] 
--------------------


In [19]:
go_big = json.loads("""{
  "name": "Leslie Knope",
  "password": "ilovepawnee",
  "lucky_numbers": [1000000, 200000]
}""")

if msg := not_valid(go_big, schema):
    print(msg, "\n--------------------")
else:
    print(f"User {user['name']} validates correctly")

1000000 is greater than the maximum of 100

Failed validating 'maximum' in schema['properties']['lucky_numbers']['items']:
    {'maximum': 100, 'minimum': 1, 'type': 'number'}

On instance['lucky_numbers'][0]:
    1000000 
--------------------
