# Goal

#### Description

In this project our goal is to validate one dictionary structure against a template dictionary.

A typical example of this might be working with JSON data inputs in an API. You are trying to validate this received JSON against some kind of template to make sure the received JSON conforms to that template (i.e. all the keys and structure are identical - value types being important, but not the value itself - so just the structure, and the data type of the values).

To keep things simple we'll assume that values can be either single values (like an integer, string, etc), or a dictionary, itself only containing single values or other dictionaries, recursively. In other words, we're not going to deal with lists as possible values. Also, to keep things simple, we'll assume that all keys are **required**, and that no extra keys are permitted.

In practice we would not have these simplifying assumptions, and although we could definitely write this ourselves, there are many 3rd party libraries that already exist to do this (such as `jsonschema`, `marshmallow`, and many more, some of which I'll cover lightly in some later videos.)

For example you might have this template:

In [74]:
template = {
    'user_id': int,
    'name': {
        'first': str,
        'last': str
    },
    'bio': {
        'dob': {
            'year': int,
            'month': int,
            'day': int
        },
        'birthplace': {
            'country': str,
            'city': str
        }
    }
}

So, a JSON document such as this would match the template:

In [69]:
john = {
    'user_id': 100,
    'name': {
        'first': 'John',
        'last': 'Cleese'
    },
    'bio': {
        'dob': {
            'year': 1939,
            'month': 11,
            'day': 27
        },
        'birthplace': {
            'country': 'United Kingdom',
            'city': 'Weston-super-Mare'
        }
    }
}

But this one would **not** match the template (missing key):

In [70]:
eric = {
    'user_id': 101,
    'name': {
        'first': 'Eric',
        'last': 'Idle'
    },
    'bio': {
        'dob': {
            'year': 1943,
            'month': 3,
            'day': 29
        },
        'birthplace': {
            'country': 'United Kingdom'
        }
    }
}

And neither would this one (wrong data type):

In [71]:
michael = {
    'user_id': 102,
    'name': {
        'first': 'Michael',
        'last': 'Palin'
    },
    'bio': {
        'dob': {
            'year': 1943,
            'month': 'May',
            'day': 5
        },
        'birthplace': {
            'country': 'United Kingdom',
            'city': 'Sheffield'
        }
    }
}

Write a function such this:

In [5]:
def validate(data, template):
    # implement
    # and return True/False
    # in the case of False, return a string describing 
    # the first error encountered
    # in the case of True, string can be empty
    return state, error

That should return this:
* `validate(john, template) --> True, ''`
* `validate(eric, template) --> False, 'mismatched keys: bio.birthplace.city'`
* `validate(michael, template) --> False, 'bad type: bio.dob.month'`

Better yet, use exceptions instead of return codes and strings!

#### Solution

##### `match_keys`

The first function we'll write is `match_keys(data, valid, path)`.
- `data`: any dictionary (independent of depth in the original dictionary)
- `valid`: a dictionary template that `data` must conform to.
- `path`: a breadcrumb to specify the depth of `data` in the original dictionary.

We want this function to be silent if all goes well or raise a custom exception if something goes wrong. 

To jump ahead a little, we can see that we have `match_keys` which validates if all keys are there and none extra, raising one custom exception if invalid. Soon we'll have `match_types` which will validate if all values are of the correct type, raising another custom exception if invalid. Thereafter, we will contain these two functions in a larger function that will be called recursively.

Both of these exceptions are specific exceptions of a more general schema exception. That is to say, these specific exceptions can inherit from a general schema exception which itself inherits from `Exception`. This will make more sense later on. For now, we will write the exceptions.

In [19]:
class SchemaError(Exception):
    pass

class SchemaKeyMismatch(SchemaError):
    pass

class SchemaTypeMismatch(SchemaError, TypeError):  # we'll come back to the reason why this inherits `TypeError`
    pass

In [26]:
def match_keys(data: dict, valid: dict, path: str):
    data_keys = data.keys()
    valid_keys = valid.keys()

    extra_keys = data_keys - valid_keys
    missing_keys = valid_keys - data_keys

    if extra_keys or missing_keys:
        extras_msg = (
            "Extra keys: " + ", ".join({path + '.' + str(key) for key in extra_keys})
        ) if extra_keys else ''
        missing_msg = (
            "Missing keys: " + ", ".join({path + '.' + str(key) for key in missing_keys})
        ) if missing_keys else ''

        err_msg = ', '.join((missing_msg, extras_msg))
        raise SchemaKeyMismatch(err_msg)

Testing the function:

In [27]:
t = {'a': int, 'b': int, 'c': int, 'd': int}
d = {'a': 'wrong type', 'b': 100, 'c': 200, 'd': {'wrong': 'type'}}
match_keys(d, t, 'some.path')

In [28]:
t = {'a': int, 'b': int, 'c': int, 'd': int}
d = {'a': 'test', 'b': 'test', 'z': 'extra'}
match_keys(d, t, 'some.path')

SchemaKeyMismatch: Missing keys: some.path.c, some.path.d, Extra keys: some.path.z

##### `match_values`

Now, we're going to do type-checking. We will assume that this function will follow the previous function, and therefore, we can assume that we have all the required keys and none extra. 

This function will be very similiar in layout to the previous too. Instead of a `valid` argument, we will have `template` which looks something like this:
```python
template = {
    'user_id': int,
    'name': {
        'first': str,
        'last': str
    }
}
```

Notice how the values are types **unless** they're dictionaries, at which point the value is the dictionary. Our code will have to make this distinction.

In [78]:
def match_types(data, template, path):
    for key, value in template.items():
        if isinstance(value, dict):
            template_type = dict
        else:
            template_type = value 

        data_value = data[key]
        if not isinstance(data_value, template_type):
            err_msg = f"Incorrect Type:\nAt {path}.{key}.\nExpected {template_type.__name__}.\nGot {type(data_value).__name__}"
            raise SchemaTypeMismatch(err_msg)     

Testing the function:

In [79]:
t = {'a': int, 'b': str, 'c': int, 'd': dict}
d = {'a': 100, 'b': 'a', 'c': 200, 'd': {'some':'value'}}
match_types(d, t, 'some.path')

In [80]:
t = {'a': int, 'b': str, 'c': int, 'd': dict}
d = {'a': 'str', 'b': 'a', 'c': 200, 'd': {'some':'value'}}
match_types(d, t, 'some.path')

SchemaTypeMismatch: Incorrect Type:
At some.path.a.
Expected int.
Got str

##### Combine together to build recursive function

You should be able to call this function on any dictionary for it to be successfully recursive. 

In [81]:
def recurse_validate(data, template, path):
    match_keys(data, template, path)  # all exception handling is done internally which bubbles up if necessary
    match_types(data, template, path)  # all exception handling is done internally which bubbles up if necessary

    # Now we know the keys and types are correct, we need to revalidate the keys and types
    # of any subdictionaries. Get all keys whose types were dictionaries and recurse on them.
    keys_with_dict_as_value = {key for key, value in template.items() if isinstance(value, dict)}

    for key in keys_with_dict_as_value:
        sub_data = data[key]
        sub_template = template[key]
        sub_path = path + "." + str(key)
        recurse_validate(sub_data, sub_template, sub_path)     

##### Make User-Facing Function

This step isn't 100% necessary; it just abstracts away the implementation and gives the user a cleaner interface.

At the top of this file, we defined some example data: `john`, `eric` and `michael`, and a template. We'll test our function against them dictionaries with our template:

In [82]:
def validate(data, template):
    recurse_validate(data, template, path='root')

In [83]:
validate(john, template)

In [84]:
validate(eric, template)

SchemaKeyMismatch: Missing keys: root.bio.birthplace.city, 

In [85]:
validate(michael, template)

SchemaTypeMismatch: Incorrect Type:
At root.bio.dob.month.
Expected int.
Got str

So why did we create three custom type exceptions, two of which inherit from the other? Here is the code:
```python
class SchemaError(Exception):
    pass

class SchemaKeyMismatch(SchemaError):
    pass

class SchemaTypeMismatch(SchemaError, TypeError):
    pass
```

This allows for finer granularity in our `try-except` blocks. Let's explain with the final custom exception as an example.

`SchemaTypeMismatch` will raise a `SchemaError`'s and `TypeError` too. So, if we have a `try-except` block that attempts to only catch `TypeError`s, and then we raise a `SchemaTypeMismatch`, that will inturn raise a `TypeError` which will bubble up. 

`michael` gives us a `SchemaTypeMismatch`; if we look out for a `TypeError`, we will catch this `SchemaTypeMismatch` exception:

In [87]:
try:
    validate(michael, template)
except TypeError as ex:
    print(ex)

Incorrect Type:
At root.bio.dob.month.
Expected int.
Got str


We can also look out for the general `SchemaError` exception or use the direct custom exception:

In [88]:
try:
    validate(michael, template)
except SchemaError as ex:
    print(ex)

Incorrect Type:
At root.bio.dob.month.
Expected int.
Got str


In [89]:
try:
    validate(michael, template)
except SchemaTypeMismatch as ex:
    print(ex)

Incorrect Type:
At root.bio.dob.month.
Expected int.
Got str


Another advantage of using these exceptions is that it allows us to deal with exception specific exception differently:

In [91]:
try:
    validate(eric, template)
except SchemaKeyMismatch as ex:
    print('mismatched keys, doing some specific handling for that', ex)
except SchemaTypeMismatch as ex:
    print('mismatched types, doing some specific handling for that', ex)
except SchemaError as ex:
    print('general exception, doing some specific handling for that', ex)

mismatched keys, doing some specific handling for that Missing keys: root.bio.birthplace.city, 


In [93]:
try:
    validate(michael, template)
except SchemaKeyMismatch as ex:
    print('mismatched keys, doing some specific handling for that', ex)
except SchemaTypeMismatch as ex:
    print('mismatched types, doing some specific handling for that', ex)
except SchemaError as ex:
    print('general exception, doing some specific handling for that', ex)

mismatched types, doing some specific handling for that Incorrect Type:
At root.bio.dob.month.
Expected int.
Got str


Of course, since there are no `SchemaError` exceptions raised in our code, and since `SchemaKeyMismatch` and `SchemaTypeMismatch` both inherit from `SchemaError`, and there are no other exceptions, the `SchemaError` exception block will never run. 

Below, we get a `SchemaTypeMismatch` but we don't catch that, but `SchemaTypeMismatch` does inherit from `SchemaError`, but we don't catch that either...

But, it also inherits from `TypeError` and we are catching that, so it gets raised!

In [94]:
try:
    validate(michael, template)
except SchemaKeyMismatch as ex:
    print('mismatched keys, doing some specific handling for that', ex)
except TypeError as ex:
    print('type exception, doing some specific handling for that', ex)

type exception, doing some specific handling for that Incorrect Type:
At root.bio.dob.month.
Expected int.
Got str
