## TDDA: Test-Driven Data Analysis

In this notebook, we'll review a Python library: [TDDA](https://github.com/tdda/tdda), which takes data inputs (such as NumPy arrays or Pandas DataFrames) and builds a set of constraints around them. You can then save your constraints (JSON output) and test new data against observed constraints.

In [None]:
import pandas as pd
import numpy as np
from tdda.constraints.pdconstraints import discover_constraints, \
    verify_df

In [None]:
df = pd.read_csv('../data/iot_example.csv')

## Basic Data Quality Check

In [None]:
df.sample(10)

In [None]:
df.dtypes

## Use `discover_constraints` to build the constraint object

In [None]:
constraints = discover_constraints(df)

In [None]:
constraints

In [None]:
constraints.fields

## Now write the constraints to a file

In [None]:
with open('../data/ignore-iot_constraints.tdda', 'w') as f:
    f.write(constraints.to_json())

In [None]:
cat ../data/ignore-iot_constraints.tdda

## Exercise: what types of constraints are being extracted? How does this compare with defining your own schema?

### Now, let's read in our other IOT dataset :D (can anyone guess what will happen?)

In [None]:
new_df = pd.read_csv('../data/iot_example_with_nulls.csv')

## We use `verify_df` to pass in the new dataframe, along with either the filepath to our saved constraints.

In [None]:
v = verify_df(new_df, '../data/ignore-iot_constraints.tdda')

## We can now test passes, failures and look at the output

In [None]:
v

In [None]:
v.passes

In [None]:
v.failures

In [None]:
print(str(v))

## In addition, we can take a look at the passes and failures in a dataframe

In [None]:
v.to_frame()

## Exercise: How could we fix the schema or separate data so all tests pass?