### Project 1

In this project our goal is to validate one dictionary structure against a template dictionary.

A typical example of this might be working with JSON data inputs in an API. You are trying to validate this received JSON against some kind of template to make sure the received JSON conforms to that template (i.e. all the keys and structure are identical - value types being important, but not the value itself - so just the structure, and the data type of the values).

To keep things simple we'll assume that values can be either single values (like an integer, string, etc), or a dictionary, itself only containing single values or other dictionaries, recursively. In other words, we're not going to deal with lists as possible values. Also, to keep things simple, we'll assume that all keys are **required**, and that no extra keys are permitted.

In practice we would not have these simplifying assumptions, and although we could definitely write this ourselves, there are many 3rd party libraries that already exist to do this (such as `jsonschema`, `marshmallow`, and many more, some of which I'll cover lightly in some later videos.)

For example you might have this template:

In [33]:
template = {
    'user_id': int,
    'name': {
        'first': str,
        'last': str
    },
    'bio': {
        'dob': {
            'year': int,
            'month': int,
            'day': int
        },
        'birthplace': {
            'country': str,
            'city': str
        }
    }
}

So, a JSON document such as this would match the template:

In [34]:
john = {
    'user_id': 100,
    'name': {
        'first': 'John',
        'last': 'Cleese'
    },
    'bio': {
        'dob': {
            'year': 1939,
            'month': 11,
            'day': 27
        },
        'birthplace': {
            'country': 'United Kingdom',
            'city': 'Weston-super-Mare'
        }
    }
}

But this one would **not** match the template (missing key):

In [35]:
eric = {
    'user_id': 101,
    'name': {
        'first': 'Eric',
        'last': 'Idle'
    },
    'bio': {
        'dob': {
            'year': 1943,
            'month': 3,
            'day': 29
        },
        'birthplace': {
            'country': 'United Kingdom'
        }
    }
}

And neither would this one (wrong data type):

In [36]:
michael = {
    'user_id': 102,
    'name': {
        'first': 'Michael',
        'last': 'Palin'
    },
    'bio': {
        'dob': {
            'year': 1943,
            'month': 'May',
            'day': 5
        },
        'birthplace': {
            'country': 'United Kingdom',
            'city': 'Sheffield'
        }
    }
}

Write a function such this:

In [37]:
def match_keys(data, valid, path):
    data_keys = data.keys()
    valid_keys = valid.keys()
    extra_keys = data_keys - valid_keys
    missing_keys = valid_keys - data_keys

    if(missing_keys or extra_keys):
        missing_msg = ('missing keys:' +
                        ','.join({path + '.' + str(key)
                        for key in missing_keys})) if missing_keys else ''
        extras_msg = ('extra keys:' + 
                        ','.join({path + '.' + str(key)
                        for key in extra_keys})) if extra_keys else ''
        return False, ' '.join((missing_msg, extras_msg))
    else:
        return True, None

In [38]:
t = {'a': int, 'b': int, 'c': int, 'd' : {}}
d = {'a' : 'wrong type', 'b' : 100, 'c': 200, 'd': {'wrong', 'type'}}
is_ok, err_msg = match_keys(d,t,'some.path')
print(is_ok, err_msg)

True None


In [39]:
t = {'a': int, 'b': int, 'c': int, 'd' : {}, 'e':None}
d = {'a' : 'wrong type', 'b' : 100, 'c': 200, 'd': {'wrong', 'type'}}
is_ok, err_msg = match_keys(d,t,'some.path')
print(is_ok, err_msg)

False missing keys:some.path.e 


In [40]:
t = {'a': int, 'd' : {}, 'e':None}
d = {'a' : 'wrong type', 'b' : 100, 'c': 200, 'd': {'wrong', 'type'}}
is_ok, err_msg = match_keys(d,t,'some.path')
print(is_ok, err_msg)

False missing keys:some.path.e extra keys:some.path.c,some.path.b


In [41]:
def match_types(data, template, path):
    # assume here that the keys have already been matched OK
    # but do not assume that the keys are necessarily in the same
    # order in both the data and the template
    for key, value in template.items():
        if isinstance(value, dict):
            template_type = dict
        else:
            template_type = value
        data_value = data.get(key, object())
        if not isinstance(data_value, template_type):
            err_msg = ('incorrect type: ' + path + '.' + key +
                       ' -> expected ' + template_type.__name__ +
                       ', found ' + type(data_value).__name__)
            return False, err_msg
    return True, None        

In [42]:
t = {'a': int, 'b': str, 'c': {"dt": int}}
d = {'a': 100, 'b': 'test', 'c': 'value'}
match_types(d,t, 'some.path')

(False, 'incorrect type: some.path.c -> expected dict, found str')

In [43]:
def recursive_validate(data, template, path):
    is_ok, err_msg = match_keys(data,template,path)
    if not is_ok:
        return False, err_msg
    
    is_ok, err_msg = match_types(data,template,path)
    if not is_ok:
        return False, err_msg

    dictionary_type_keys = {key for key, value in template.items()
                            if isinstance(value, dict)}
    for key in dictionary_type_keys:
        sub_path = path + '.' + str(key)
        sub_temp = template[key]
        sub_data= data[key]
        is_ok, err_msg = recursive_validate(sub_data, sub_temp, sub_path)
        if not is_ok:
            return False, err_msg
    return True, None

In [44]:
is_ok, err_msg = recursive_validate(john, template, 'root')
print(is_ok, err_msg)

True None


In [46]:
is_ok, err_msg = recursive_validate(michael, template, 'root')
print(is_ok, err_msg)

False incorrect type: root.bio.dob.month -> expected int, found str


In [47]:
def validate(data, template):
   return recursive_validate(data, template, '')

In [50]:
persons = ((john, 'John'), (eric, 'Eric'), (michael, 'Michael'))
for person, name in persons:
    is_ok, err_msg = validate(person, template)
    print(f'{name}: valid={is_ok}: {err_msg}')


John: valid=True: None
Eric: valid=False: missing keys:.bio.birthplace.city 
Michael: valid=False: incorrect type: .bio.dob.month -> expected int, found str


That should return this:
* `validate(john, template) --> True, ''`
* `validate(eric, template) --> False, 'mismatched keys: bio.birthplace.city'`
* `validate(michael, template) --> False, 'bad type: bio.dob.month'`

Better yet, use exceptions instead of return codes and strings!