## DataFrame Validation with TDDA (Test-Driven Data Analysis)

In this notebook, we'll review [TDDA](https://github.com/tdda/tdda), a library which takes data inputs (such as NumPy arrays or Pandas DataFrames) and **automatically** builds a set of constraints around them. You can then save your constraints (JSON output) and test new data against the observed constraints.

In [None]:
import pandas as pd
import numpy as np
from tdda.constraints.pdconstraints import discover_constraints, verify_df
from tdda.constraints import detect_df 

In [None]:
df = pd.read_csv('data/iot_example.csv')

## Basic Data Quality Check

In [None]:
df.head()

In [None]:
df.dtypes

## Use `discover_constraints` to build the constraint object

In [None]:
constraints = discover_constraints(df)

In [None]:
constraints

## Let's write the constraints to a file

In [None]:
with open('data/iot_constraints.tdda', 'w') as f:
    f.write(constraints.to_json())

In [None]:
cat data/ignore-iot_constraints.tdda

### TDDA Automatically inferes the constraints of our data. Needless to say,  we can also change it so that it will exactly match our needs.

#### Now, let's read in another IOT dataset (can anyone guess what will happen?)

In [None]:
new_df = pd.read_csv('data/iot_example_production.csv')

## We use `verify_df` to pass in the new dataframe, along with either the filepath to our saved constraints.

In [None]:
v = verify_df(new_df, 'data/iot_constraints.tdda')

## Let's inspect the results:

In [None]:
print(f'Number of passed tests: {v.passes}')

In [None]:
print(f'Number of failed tests: {v.failures}')

In [None]:
print(str(v))

## In addition, we can take a look at the passes and failures in a dataframe

In [None]:
v.to_frame()

## But what if we want to see exactly where are our errors? 

In [None]:
D = detect_df(new_df, 'data/iot_constraints.tdda',per_constraint=True,output_fields=[])

In [None]:
D.detected()

## What's Next?
 - Modify Schema based on domain expertise
 - Add logic for data that doesn't supply the constaraints
 - Change code (If neccessary)
