# Iowa State Prison Recidivism

**source**:
<br><br>
https://data.iowa.gov/Public-Safety/3-Year-Recidivism-for-Offenders-Released-from-Pris/mw8r-vqy4/data
<br><br>
- Statistics about inmate recidivism within 3 years of being released from prison in Iowa.
Covers the time period from 
<br><br>
***
**Features**:
<br><br>
##### Fiscal Year Released:
- Fiscal year (year ending June 30) for which the offender was released from prison.
<br><br>

##### Recidivism Reporting Year:
- Fiscal year (year ending June 30) that marks the end of the 3-year tracking period. For example, offenders exited prison in FY 2012 are found in recidivism reporting year FY 2015.
<br><br>

##### Race-Ethnicity: 
- Offender's Race and Ethnicity
<br><br>

##### Age At Release:
- Offender's age group at release from prison.
<br><br>

##### Convicting Offense Classification: 
- Maximum penalties:
    - *A Felony* : Life 
    - *B Felony* : 25 or 50 years 
    - *C Felony* : 10 years 
    - *D Felony* : 5 years
    - *Aggravated Misdemeanor* : 2 years
    - *Serious Misdemeanor* : 1 year
    - *Simple Misdemeanor* : 30 days
<br><br>

##### Convicting Offense Type: 
- General category for the most serious offense for which the offender was placed in prison.
<br><br>

##### Convicting Offense Subtype: 
-Further classification of the most serious offense for which the offender was placed in prison.
<br><br>

##### Release Type: 
- Reasoning for Offender's release from prison.
<br><br>

##### Main Supervising District:
- The Judicial District supervising the offender for the longest time during the tracking period.
<br><br>

##### Recidivism - Return to Prison:	
- *No* : No Recidivism
- *Yes* : Prison admission for any reason within the 3-year tracking period
<br><br>

##### Days to Recidivism:
- Number of days it took before the offender returned to prison.
<br><br>

##### New Conviction Offense Classification: 
- New conviction maximum penalties: 
    - *A Felony* : Life
    - *B Felony* : 25 or 50 years
    - *C Felony* : 10 years
    - *D Felony* : 5 years
    - *Aggravated Misdemeanor* : 2 years
    - *Serious Misdemeanor* : 1 year
    - *Simple Misdemeanor* : 30 days
<br><br>

##### New Conviction Offense Type: 
- General category for the new conviction while the offender is out of prison.
<br><br>

##### New Conviction Offense Sub Type: 
- Further classification of the new conviction.
<br><br>

##### Part of Target Population: 
- The Department of Corrections has undertaken specific strategies to reduce recidivism rates for prisoners who are on parole.
<br><br>

##### Recidivism Type:
- There is no description of this feature. It pertains only to recidivists and to the category of recidivsm they follow into.
<br><br>

##### Sex:
- Gender of our offender

## Obtain Data
<br><br>
***

In [1]:
# Import all necesary modules using bs_ds
from bs_ds.imports import *
# Set plot style
# from jupyterthemes import jtplot
# jtplot.style(theme='monokai', context='notebook')

View our documentation at https://bs-ds.readthedocs.io/en/latest/bs_ds.html
For convenient loading of standard modules :
>> from bs_ds.imports import *



Unnamed: 0,Module/Package Handle
pandas,pd
numpy,np
matplotlib,mpl
matplotlib.pyplot,plt
seaborn,sns


In [2]:
recidivists = pd.read_csv('3-Year_Recidivism_for_Offenders_Released_from_Prison_in_Iowa.csv')
print(recidivists.shape)
display(recidivists.head())
recidivists.info()

FileNotFoundError: [Errno 2] File b'3-Year_Recidivism_for_Offenders_Released_from_Prison_in_Iowa.csv' does not exist: b'3-Year_Recidivism_for_Offenders_Released_from_Prison_in_Iowa.csv'

# Preprocessing
<br><br>
***

There are four features here that only relate to recidivists.  For the initial analysis, that data will be subset for post classification analysis.  
<br><br>
__Seperate data and rename columns__:
<br><br>
Dataset 1 : 
_prison_ 
- All rows and columns except dataset 2 accept those that only pertain to recidivists:<br><br>

Dataset 2 : 
_recidivists_
- Subset of rows with the additional columns added

    - Days to Recidivism
    - New Conviction Offense Classification
    - New Conviction Offense Type
    - New Conviction Offense Sub Type

## Seperate Datasets
<br><br>
***
**Rename columns**

In [None]:
# Get column names to map dictionary of new column titles
colnames_orig = recidivists.columns

# New column names
colnames_short = ('yr_released','report_year','race_ethnicity',
                  'age_released','crime_class','crime_type',
                  'crime_subtype','release_type', 'super_dist',
                  'recidivist','days','new_class','new_type',
                  'new_sub','target_pop','recid_type','sex')

# zip old columsn and new columns together
column_legend = dict(zip(colnames_orig,colnames_short))
column_legend

In [None]:
# Map new names to correct columns
recidivists.rename(mapper=column_legend, axis=1, inplace=True)

# Seperate Columns that pertain only to recidivists
prison = recidivists.drop(['days',
                           'new_class',
                           'new_type', 
                           'new_sub',
                           'recid_type'], axis=1)

### Dataset 1
***
- 26020 rows, each a unique criminal!
- 12 columns

In [None]:
# Dataset 1 
print(prison.shape)
display(prison.head())
display(prison.describe())
prison.info()

### Dataset 2 
***
- 8681 rows
- 17 columns
- Might drop a column regarding type if they are redudant

In [None]:
# Dataset 2
recidivists = recidivists.loc[recidivists.recidivist == 'Yes']
print(f"Shape (rows,columns): {recidivists.shape}\n")
display(recidivists.info())

## Scrub 
<br><br>
***
- Dataset 1

In [None]:
# import pandas wrapper tools for inspection and data cleaning
from bs_ds.bamboo import check_null, check_unique, check_numeric
prison.info()

###  Missing and Unique Values, Data types
<br><br>
***

In [None]:
# Check null values
check_null(prison)

__Missing Values__

***

In [None]:
print('NaN is age_released:')
display(prison.loc[prison.age_released.isna()])# These and 'sex' NaN's will drop with 'race_ethnicity'

print("NaN is race_ethincity:")
display(prison.loc[prison.race_ethnicity.isna()])# Drop These

# looking at time blocks that are missing by viewing the two numerical columns for these rows 
print('NaN in super_dist column:')
display(prison.loc[prison.super_dist.isna()].describe())

# how many of the same rows are missing
print('NaN in both super_dist and release_type:')
display(prison.loc[(prison.super_dist.isna())&(prison.release_type.isna())].describe())

print('NaN in only release_type')
display(prison.loc[prison.release_type.isna()])

The columns *super_dist* and *release_type* are each missing significant amounts of values. Together missing over 40% of of rows are missing values in those two columns. Because the missing values in *release_type* all come from rows that are adjacent to eachother, if dropped, the data may lose significance due to the temporal element of the dataset. I am assuming, based on the release year and report year that there is some type of order to this. Even if they were released consecutively in the order of the rows, they still come from the same general time period.  Changing practices within the prison over time, could have an effect on the recidivism rates of ex-cons. For now those rows will stay in the set. The rows corresponding with the missing values from other columns will be dropped.

In [None]:
# drop rows with missing values and affirm drop
prison = prison.loc[(~prison.race_ethnicity.isna())& (~prison.sex.isna())]
check_null(prison)
df = prison

**Unique Values**

***

In [None]:
check_unique(prison)

**Drop hidden N/A's and other suspiscious rows**
***
- **race_ethnicity** : White-, Black-, N/A-
    All of those labels are incomplete or belong somewhere else.

### Encoding Boolean Values
<br><br>
***

In [None]:
bools = ['recidivist', 'target_pop', 'sex']

#  EDA
<br><br>
**visualizations**
***

In [None]:
from bs_ds.bamboo import LabelLibrary, plot_hist_scat, multiplot

# Make list of variables to label, leaving out years for eda
df_code = df
to_encode = [i for i in df_code.columns if i  != 'yr_released' and i != 'report_year']

# fit and transform labels with LabelEncoder wrapper class
labels = LabelLibrary().fit_transform(df_code, to_encode)
print(labels.shape)
labels.head()

In [None]:
plot_hist_scat(labels, target='recidivist')
plt.show()

In [None]:
multiplot(labels)
plt.show()


### One Hot Encoded
<br><br>
***

In [None]:
dummies = pd.get_dummies(df_code, sparse=True)

In [None]:
dummies.head()