# CSCA 5622 Intro to Machine Learning: Final Project

## Likelihood of Student Criminal Activity

Using public datasets from the Department of Education, find any correlations between states where children attended preschool, teacher certifications, and criminal offenses ocurring at school.  


#### DATA INFO

**Teacher Credentials:**
- The `teacher_creds` data frame records number and percentage of public school classroom teachers (in full-time equivalents), by certification status and years of experience, by state: School Year 2013-14.

- Table reads (for US Totals): Of all 3,138,535 classroom teachers (FTE), 3,084,697 (98.3%) met all state licensing/certification requirements. Data reported in this table represent 100.0% of responding schools.

- *Source: U.S. Department of Education, Office for Civil Rights, Civil Rights Data Collection, 2013-14, available at http://ocrdata.ed.gov. Data notes are available at http://ocrdata.ed.gov/downloads/DataNotes.docx*
 
**Preschool Enrollment:**
- The `preschool` data frame records number and percentage of public school students enrolled in Preschool, by race/ethnicity, disability status, and English proficiency, by state: School Year 2015-16.

- Table reads (for US Totals): Of all 1,536,982 public school students enrolled in Preschool, 17,964 (1.2%) were American Indian or Alaska Native, and 313,601 (20.4%) were students with disabilities served under the Individuals with Disabilities Education Act (IDEA). Data reported in this table represent 100.0% of responding schools.																						
	
- *Source: U.S. Department of Education, Office for Civil Rights, Civil Rights Data Collection, 2015-16, available at http://ocrdata.ed.gov. Data notes are available at https://ocrdata.ed.gov/Downloads/Data-Notes-2015-16-CRDC.pdf*

**School Incidents:**
- The `incidents` data frame records number of incidents, by state: School Year 2015-16.

- Table reads (for US): The number of incidents of sexual assault was 9,255. Data reported in this table represent 98.0% of responding schools.

- *Source: U.S. Department of Education, Office for Civil Rights, Civil Rights Data Collection, 2015-16, available at http://ocrdata.ed.gov. Data notes are available at https://ocrdata.ed.gov/Downloads/Data-Notes-2015-16-CRDC.pdf.*																	

##### Disclaimers 
- Due to limited data available, the teacher credentials data is from school year 2013-2014 while the other two data sets are from 2015-2016.
- The school incidents data provides a disclaimer at the footer to "Interpret data in this row with caution. Data are missing for more than 15 percent of schools."
- This is only one school year's worth of data and would probably be more accurate with several years. 

**This results of this experiment will inherently be inacurrate so this should not be interpreted as factual.**

In [77]:
%matplotlib inline
import numpy as np
import scipy as sp
import scipy.stats as stats
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split

# Set color map to have light blue background
sns.set()
import statsmodels.formula.api as smf
import statsmodels.api as sm
from pathlib import Path
from ISLP import load_data

# Load & Prepare Teacher Credentials Data
- Load theExcel sheet and exlcude the top three rows of formatting.
- Rename long column names to shorter acronyms. See comments in the code for mappings.
- Remove the empty 0index `NaN` column from Excel sheet.
- Drop the last three rows of metada and formatting. 

In [78]:
# -- Load Teach Credentials DataFrame
file_path = Path.cwd().joinpath("../../data/teacher-certification-and-years-of-experience.xlsx")
teacher_creds = pd.read_excel(
    file_path,
    header=2,
    skiprows=[0, 1, 2], # -- Exclude the top three rows of formatting
    
    names=[    
        "C0",
        "State",
        "CT (FTE)",
        # "Classroom Teachers (FTE)",
        "MR (FTE)",
        # "Meeting All State Licensing/Certification Requirements (FTE)",
        "MR (P)",
        # "Meeting All State Licensing/Certification Requirements (P)",
        "FY (FTE)",
        # "Classroom Teachers in their First Year of Teaching (FTE)",
        "FY (P)",
        # "Classroom Teachers in their First Year of Teaching (P)",
        "SY (FTE)",
        "SY (P)",
        "schools",
        "schools (P)",
      ]
)

# -- Remove empty NaN column values
teacher_creds.drop(["C0"], axis=1, inplace=True)

# -- Delete metadata text
teacher_creds.drop([52, 53, 54, 55], inplace=True)