# Data Screening Steps

### What is Data Screening?

* Enables you to make sure you've met all your assumptions, outliers, and error problems
* Each type of analysis will have different types of data screening

### 🚨 The Big Important Rule 🚨

> For __hypothesis testing__ traditionally we use a `p < 0.05` (less than) criterion because you're looking for statistically significant relationship

> But for __data screening__ we use a much more stringent criterion of `p > 0.001` (greater than) because we want to make sure the data is astronomically wild before we remove it

### Cleaning Steps for each Hypothesis

1. Strip data set for relevant columns only
2. Check and fix __Accuracy__
3. Check and fix __Missing__ data
4. Check and fix __Outliers__
5. Confirm any of the relevant __Assumptions__ with statistical tests
    * Additvity
    * Normality
    * Linearity
    * Homogeneity
    * Homoscedasticity

#### 1. Select Relevant Variables Only

* a
* b
* c

#### 2. Check for Accuracy

* a
* b
* c

#### 3. Identify & Fix Missing Data

* a
* b
* c

#### 4. Identify & Fix Outliers

* a
* b
* c


####  5. Verify Assumptions Hold True

* a
* b
* c

```{admonition} An extra exercise
:class: extra-credit
An "extra credit" exercise is presented here.
```

<style>p { color: red; }</style>
<p>What i'm looking for is some evidence of CSS</p>

In [15]:
import ipywidgets as widgets
from IPython.display import HTML, display

# Create custom styled widgets
button = widgets.Button(
    description="Click me!",
    button_style='success',
    layout=widgets.Layout(width='200px', height='50px')
)

output = widgets.Output()

def on_button_click(b):
    with output:
        print("Button clicked!")

button.on_click(on_button_click)

display(widgets.VBox([button, output]))

VBox(children=(Button(button_style='success', description='Click me!', layout=Layout(height='50px', width='200…

In [17]:
external_resources = """
<!-- Bootstrap CSS -->
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.min.css" rel="stylesheet">

<!-- jQuery -->
<script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>

<!-- Your custom code -->
<div class="container">
    <div class="row">
        <div class="col-md-6">
            <button class="btn btn-primary" id="myBtn">Bootstrap Button</button>
        </div>
    </div>
</div>

<script>
$(document).ready(function(){
    $("#myBtn").click(function(){
        alert("jQuery is working!");
    });
});
</script>
"""

display(HTML(external_resources))

In [19]:
def load_custom_css():
    return """
    <style>
    /* Define your CSS variables */
    :root {
        --primary-color: #3498db;
        --secondary-color: #2c3e50;
        --accent-color: #e74c3c;
    }
    
    .notebook-section {
        margin: 20px 0;
        padding: 20px;
        border-left: 4px solid var(--primary-color);
        background-color: #f8f9fa;
    }
    
    .notebook-card {
        box-shadow: 0 2px 4px rgba(0,0,0,0.1);
        border-radius: 8px;
        padding: 16px;
        margin: 16px 0;
    }
    </style>
    """

display(HTML(load_custom_css()))

<div class="notebook-section">Something here</div>
<div class="notebook-card">and this is also here bro</div>

In [20]:
from IPython.display import HTML, display

# Load Tailwind CSS from CDN
tailwind_setup = """
<script src="https://cdn.tailwindcss.com"></script>
<script>
    // Optional: Configure Tailwind
    tailwind.config = {
        theme: {
            extend: {
                colors: {
                    primary: '#3b82f6',
                    secondary: '#8b5cf6',
                    accent: '#ec4899'
                }
            }
        }
    }
</script>
"""

display(HTML(tailwind_setup))

In [21]:
tailwind_content = """
<div class="min-h-screen bg-gradient-to-br from-blue-50 to-indigo-100 p-8">
    <div class="max-w-6xl mx-auto">
        <h1 class="text-4xl font-bold text-center text-gray-800 mb-12">
            Tailwind CSS in Jupyter Notebook
        </h1>
        
        <!-- Flex Container 1: Horizontal Layout -->
        <div class="flex flex-wrap gap-6 mb-12 justify-center">
            <div class="flex-1 min-w-[250px] bg-white rounded-xl shadow-lg p-6 hover:shadow-xl transition-shadow duration-300">
                <div class="bg-blue-100 w-16 h-16 rounded-full flex items-center justify-center mb-4">
                    <span class="text-2xl">📊</span>
                </div>
                <h3 class="text-xl font-semibold text-gray-800 mb-2">Data Analysis</h3>
                <p class="text-gray-600">Powerful tools for exploring and understanding your data</p>
            </div>
            
            <div class="flex-1 min-w-[250px] bg-white rounded-xl shadow-lg p-6 hover:shadow-xl transition-shadow duration-300">
                <div class="bg-purple-100 w-16 h-16 rounded-full flex items-center justify-center mb-4">
                    <span class="text-2xl">🤖</span>
                </div>
                <h3 class="text-xl font-semibold text-gray-800 mb-2">Machine Learning</h3>
                <p class="text-gray-600">Build and deploy ML models with ease</p>
            </div>
            
            <div class="flex-1 min-w-[250px] bg-white rounded-xl shadow-lg p-6 hover:shadow-xl transition-shadow duration-300">
                <div class="bg-pink-100 w-16 h-16 rounded-full flex items-center justify-center mb-4">
                    <span class="text-2xl">🎨</span>
                </div>
                <h3 class="text-xl font-semibold text-gray-800 mb-2">Visualization</h3>
                <p class="text-gray-600">Create stunning visual representations of your data</p>
            </div>
        </div>
        
        <!-- Flex Container 2: Vertical Centering -->
        <div class="flex items-center justify-center bg-gradient-to-r from-indigo-500 to-purple-600 rounded-2xl p-12 mb-12">
            <div class="text-center text-white max-w-2xl">
                <h2 class="text-3xl font-bold mb-4">Centered Content</h2>
                <p class="text-xl opacity-90 mb-8">This content is perfectly centered using flexbox</p>
                <button class="bg-white text-indigo-600 font-semibold py-3 px-8 rounded-full hover:bg-gray-100 transition-colors duration-300">
                    Get Started
                </button>
            </div>
        </div>
        
        <!-- Flex Container 3: Responsive Grid-like Layout -->
        <div class="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-4 gap-6">
            <div class="bg-gradient-to-br from-cyan-500 to-blue-500 rounded-xl p-6 text-white">
                <div class="text-3xl font-bold mb-2">24k+</div>
                <div class="opacity-90">Active Users</div>
            </div>
            <div class="bg-gradient-to-br from-emerald-500 to-teal-500 rounded-xl p-6 text-white">
                <div class="text-3xl font-bold mb-2">98%</div>
                <div class="opacity-90">Satisfaction</div>
            </div>
            <div class="bg-gradient-to-br from-amber-500 to-orange-500 rounded-xl p-6 text-white">
                <div class="text-3xl font-bold mb-2">150+</div>
                <div class="opacity-90">Countries</div>
            </div>
            <div class="bg-gradient-to-br from-rose-500 to-pink-500 rounded-xl p-6 text-white">
                <div class="text-3xl font-bold mb-2">24/7</div>
                <div class="opacity-90">Support</div>
            </div>
        </div>
    </div>
</div>
"""

display(HTML(tailwind_content))

In [3]:
import numpy as np
import pandas as pd
from matplotlib import pyplot
import scipy
import plotly
import seaborn as sns

In [8]:
df = pd.read_csv("mock-jury-stalking-data.csv")

# Set options to display all rows and columns
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

df.head()

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.


Unnamed: 0,ATTN-CIV1-1,ATTN-CIV1-2,ATTN-CIV1-3,ATTN-CIV1-4,ATTN-CIV1-5,ATTN-CIV1-6,ATTN-CIV1-7,DEC-RATE-CIV1,DEC-CIV1-1,WHY-DEC-CIV1,DEC-CIV1-2,COMPENSATORY-CIV1\n,ATTN-CIV3-1,ATTN-CIV3-2,ATTN-CIV3-3,ATTN-CIV3-3.1,ATTN-CIV3-2.1,ATTN-CIV3-4,ATTN-CIV3-4.1,DEC-RATE-CIV3,DEC-CIV3-1,WHY-DEC-CIV3,DEC-CIV3-2,COMPENSATORY-CIV3\n,ATTN-CRIM-1,ATTN-CRIM-2,ATTN-CRIM-3,ATTN-CRIM-4,ATTN-CRIM-5,ATTN-CRIM-6,ATTN-CRIM-7,DEC-RATE-CRIM,DEC-CRIM-1,WHY-DEC-CRIM,DEC-CRIM-2,COMPENSATORY-CRIM\n,VICCRED,VICBELIEVE,VICHONEST,VICBLAME,VICRESP,VICDISTRESS,VICFEAR,VICANNOY,VICFLATTER,VICSYMP,VICANGER,VICGREED,VICLIKE,VICSELFISH,PERPCRED,PERPBELIEVE,PERPHONEST,PERPBLAME,PERPRESP,PERPDANGER,PERPDISTRESS,PERPFEAR,PERPSYMP,PERPANGER,GENDER,AGE,CITIZEN,RACE,JURYSERVE,TIMESSERVE,JURYCRIME,JURYOUTCOME,JURYUNANIMOUS,Unnamed: 69,Unnamed: 70
0,1.0,2.0,5.0,1.0,1.0,2.0,1.0,7.0,1.0,"Defendant admits to being highly emotional, ye...",1.0,5000.0,,,,,,,,,,,,,,,,,,,,,,,,,9.0,10.0,10.0,1.0,1.0,8.0,6.0,10.0,2.0,10.0,1.0,1.0,6.0,1.0,3.0,3.0,2.0,10.0,10.0,5.0,5.0,4.0,2.0,6.0,1.0,48.0,1.0,3,2.0,,,,,A2VE5IV9OD2SK1,civ1
1,1.0,2.0,5.0,1.0,1.0,3.0,1.0,15.0,1.0,I felt that a reasonable person would be very ...,1.0,10000.0,,,,,,,,,,,,,,,,,,,,,,,,,10.0,10.0,10.0,2.0,2.0,8.0,9.0,10.0,1.0,9.0,1.0,1.0,9.0,1.0,2.0,2.0,2.0,10.0,10.0,8.0,8.0,7.0,1.0,9.0,2.0,64.0,1.0,3,2.0,,,,,A25FJAJGTWFMP,civ1
2,1.0,2.0,5.0,1.0,1.0,2.0,1.0,8.0,1.0,Her stories are very elaborate in how the emai...,1.0,5000.0,,,,,,,,,,,,,,,,,,,,,,,,,7.0,8.0,9.0,1.0,7.0,9.0,8.0,10.0,1.0,1.0,9.0,1.0,9.0,1.0,1.0,4.0,5.0,10.0,10.0,9.0,9.0,9.0,1.0,9.0,2.0,24.0,1.0,1,2.0,,,,,A39KJNWAFOD7N1,civ1
3,1.0,2.0,5.0,1.0,1.0,3.0,1.0,6.0,2.0,"If he had been asked to stop in writing, like ...",2.0,,,,,,,,,,,,,,,,,,,,,,,,,,5.0,4.0,4.0,1.0,1.0,7.0,8.0,6.0,3.0,6.0,1.0,1.0,5.0,5.0,6.0,7.0,7.0,8.0,7.0,2.0,1.0,1.0,6.0,2.0,1.0,33.0,1.0,3,2.0,,,,,A1U46YK7C5HEY1,civ1
4,1.0,2.0,5.0,1.0,1.0,3.0,1.0,1.0,2.0,I believe there is a lot of circumstantial evi...,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,5.0,5.0,6.0,1.0,1.0,6.0,1.0,4.0,1.0,10.0,1.0,1.0,6.0,1.0,10.0,10.0,10.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,4.0,40.0,1.0,8,2.0,,,,,A3NMU6AVMQ0QDB,civ1
