# Data Screening Steps

### What is Data Screening?

* Enables you to make sure you've met all your assumptions, outliers, and error problems
* Each type of analysis will have different types of data screening

### 🚨 The Big Important Rule 🚨

> For __hypothesis testing__ traditionally we use a `p < 0.05` (less than) criterion because you're looking for statistically significant relationship

> But for __data screening__ we use a much more stringent criterion of `p > 0.001` (greater than) because we want to make sure the data is astronomically wild before we remove it

### Cleaning Steps for each Hypothesis

1. Strip data set for relevant columns only
2. Check and fix __Accuracy__
3. Check and fix __Missing__ data
4. Check and fix __Outliers__
5. Confirm any of the relevant __Assumptions__ with statistical tests
    * Additvity
    * Normality
    * Linearity
    * Homogeneity
    * Homoscedasticity

#### 1. Select Relevant Variables Only

* a
* b
* c

#### 2. Check for Accuracy

* a
* b
* c

#### 3. Identify & Fix Missing Data

* a
* b
* c

#### 4. Identify & Fix Outliers

* a
* b
* c


####  5. Verify Assumptions Hold True

* a
* b
* c

```{admonition} An extra exercise
:class: extra-credit
An "extra credit" exercise is presented here.
```

What i'm looking for is some evidence of CSS

In [3]:
import numpy as np
import pandas as pd
from matplotlib import pyplot
import scipy
import plotly
import seaborn as sns

In [8]:
df = pd.read_csv("mock-jury-stalking-data.csv")

# Set options to display all rows and columns
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

df.head()

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.


Unnamed: 0,ATTN-CIV1-1,ATTN-CIV1-2,ATTN-CIV1-3,ATTN-CIV1-4,ATTN-CIV1-5,ATTN-CIV1-6,ATTN-CIV1-7,DEC-RATE-CIV1,DEC-CIV1-1,WHY-DEC-CIV1,DEC-CIV1-2,COMPENSATORY-CIV1\n,ATTN-CIV3-1,ATTN-CIV3-2,ATTN-CIV3-3,ATTN-CIV3-3.1,ATTN-CIV3-2.1,ATTN-CIV3-4,ATTN-CIV3-4.1,DEC-RATE-CIV3,DEC-CIV3-1,WHY-DEC-CIV3,DEC-CIV3-2,COMPENSATORY-CIV3\n,ATTN-CRIM-1,ATTN-CRIM-2,ATTN-CRIM-3,ATTN-CRIM-4,ATTN-CRIM-5,ATTN-CRIM-6,ATTN-CRIM-7,DEC-RATE-CRIM,DEC-CRIM-1,WHY-DEC-CRIM,DEC-CRIM-2,COMPENSATORY-CRIM\n,VICCRED,VICBELIEVE,VICHONEST,VICBLAME,VICRESP,VICDISTRESS,VICFEAR,VICANNOY,VICFLATTER,VICSYMP,VICANGER,VICGREED,VICLIKE,VICSELFISH,PERPCRED,PERPBELIEVE,PERPHONEST,PERPBLAME,PERPRESP,PERPDANGER,PERPDISTRESS,PERPFEAR,PERPSYMP,PERPANGER,GENDER,AGE,CITIZEN,RACE,JURYSERVE,TIMESSERVE,JURYCRIME,JURYOUTCOME,JURYUNANIMOUS,Unnamed: 69,Unnamed: 70
0,1.0,2.0,5.0,1.0,1.0,2.0,1.0,7.0,1.0,"Defendant admits to being highly emotional, ye...",1.0,5000.0,,,,,,,,,,,,,,,,,,,,,,,,,9.0,10.0,10.0,1.0,1.0,8.0,6.0,10.0,2.0,10.0,1.0,1.0,6.0,1.0,3.0,3.0,2.0,10.0,10.0,5.0,5.0,4.0,2.0,6.0,1.0,48.0,1.0,3,2.0,,,,,A2VE5IV9OD2SK1,civ1
1,1.0,2.0,5.0,1.0,1.0,3.0,1.0,15.0,1.0,I felt that a reasonable person would be very ...,1.0,10000.0,,,,,,,,,,,,,,,,,,,,,,,,,10.0,10.0,10.0,2.0,2.0,8.0,9.0,10.0,1.0,9.0,1.0,1.0,9.0,1.0,2.0,2.0,2.0,10.0,10.0,8.0,8.0,7.0,1.0,9.0,2.0,64.0,1.0,3,2.0,,,,,A25FJAJGTWFMP,civ1
2,1.0,2.0,5.0,1.0,1.0,2.0,1.0,8.0,1.0,Her stories are very elaborate in how the emai...,1.0,5000.0,,,,,,,,,,,,,,,,,,,,,,,,,7.0,8.0,9.0,1.0,7.0,9.0,8.0,10.0,1.0,1.0,9.0,1.0,9.0,1.0,1.0,4.0,5.0,10.0,10.0,9.0,9.0,9.0,1.0,9.0,2.0,24.0,1.0,1,2.0,,,,,A39KJNWAFOD7N1,civ1
3,1.0,2.0,5.0,1.0,1.0,3.0,1.0,6.0,2.0,"If he had been asked to stop in writing, like ...",2.0,,,,,,,,,,,,,,,,,,,,,,,,,,5.0,4.0,4.0,1.0,1.0,7.0,8.0,6.0,3.0,6.0,1.0,1.0,5.0,5.0,6.0,7.0,7.0,8.0,7.0,2.0,1.0,1.0,6.0,2.0,1.0,33.0,1.0,3,2.0,,,,,A1U46YK7C5HEY1,civ1
4,1.0,2.0,5.0,1.0,1.0,3.0,1.0,1.0,2.0,I believe there is a lot of circumstantial evi...,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,5.0,5.0,6.0,1.0,1.0,6.0,1.0,4.0,1.0,10.0,1.0,1.0,6.0,1.0,10.0,10.0,10.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,4.0,40.0,1.0,8,2.0,,,,,A3NMU6AVMQ0QDB,civ1
