# PyRasgo Missing Data

This notebook explains how to identify missing data with `pyrasgo`.

### Packages

This tutorial uses:
* [pandas](https://pandas.pydata.org/docs/)
* [PyRasgo](https://app.gitbook.com/@rasgo/s/rasgo-docs/pyrasgo-0.1/dataframe-prep)

In [1]:
import pandas as pd
import pyrasgo

## Connect to Rasgo

Enter your email and password to create an account. This account gives you free access to the Rasgo API which will calculate dataframe profiles, generate feature importance score, and produce feature explainability for you analysis.  In addition, this account allows you to maintain access to your analysis and share with your colleagues.

**Note** This only needs to be run the first time you use pyrasgo.  

In [2]:
#pyrasgo.register(email='<your email>', password='<your password>')

Enter the email and password you used at registration to connect to Rasgo.

In [3]:
rasgo = pyrasgo.login(email='<your email>', password='<your password>')

## Creating the data

We will create a dataframe that contains multiple occurances of duplication for this example.

In [4]:
df = pd.DataFrame({'A': ['text']*20,
                   'B': [1, 2.2]*10,
                   'C': [True, False]*10,
                   'D': pd.to_datetime('2020-01-01')
                  })

Next, delete some of the entries to create missing data.

In [5]:
df.iloc[0,0] = None
df.iloc[1,0] = None
df.iloc[10,0] = None
df.iloc[5,1] = None
df.iloc[7,1] = None
df.iloc[4,2] = None
df.iloc[5,2] = None
df.iloc[9,2] = None
df.iloc[12,2] = None
df.iloc[2,3] = None
df.iloc[12,3] = None
df

Unnamed: 0,A,B,C,D
0,,1.0,1.0,2020-01-01
1,,2.2,0.0,2020-01-01
2,text,1.0,1.0,NaT
3,text,2.2,0.0,2020-01-01
4,text,1.0,,2020-01-01
5,text,,,2020-01-01
6,text,1.0,1.0,2020-01-01
7,text,,0.0,2020-01-01
8,text,1.0,1.0,2020-01-01
9,text,2.2,,2020-01-01


## Identify missing data

The function `evaluate.missing_data` will identify missing data in the dataframe.

In [6]:
missing = rasgo.evaluate.missing_data(df)
missing

Column               Count of Nulls
-----------------------------------
A                    3     
B                    2     
C                    4     
D                    2     
-----------------------------------


Unnamed: 0,A,B,C,D
0,,1.0,1.0,2020-01-01
1,,2.2,0.0,2020-01-01
2,text,1.0,1.0,NaT
4,text,1.0,,2020-01-01
5,text,,,2020-01-01
7,text,,0.0,2020-01-01
9,text,2.2,,2020-01-01
10,,1.0,1.0,2020-01-01
12,text,1.0,,NaT
