## Background 

In this analysis, we attempt to answer the question of how many individuals are potentially eligible or impacted by Massachusetts expungement eligibility. 

In [81]:
import numpy as np
import pandas as pd
import math

## Data Source

The data source for this comes from the Massachusetts State Police CrimeSOLV that contains 'crime data from police agencies in Massachusetts which report data to the Crime Reporting Unit in the format of the "National Incident Based Reporting System." The majority of police agencies now use this way of reporting their crime data.'

Unfortunately, there is no API for retrieving the data, but a report can be configureed and downloaded as a CSV at https://masscrime.chs.state.ma.us. Using the reporting system, 

<img src="../img/report_selection.png" width="400">

Columns are selected and arranged in the following manner:

<img src="../img/report_columns.png" width="800">

With the following filters:

<table>
  <tr>
      <td><img src="../img/select_age.png" width="400"></td>
      <td><img src="../img/select_offense.png" width="400"></td>
      <td><img src="../img/select_unarmed.png" width="400"></td>
    </tr>
</table>    

This report can then be downloaded in CSV list format for further analysis.

In [84]:
# Load data from downloaded CSV
arrests = pd.read_csv(
    "../data/arrests_by_age.csv", skiprows=5, 
    names=list(["Date","Offense","Armed","Age","Count","dummy"]), 
    index_col=False)

# Remove empty entries and drop dummy cols
arrests = arrests[arrests["Count"].notnull()] \
    .drop(["Armed","dummy"], axis=1) \
    .groupby(["Date", "Offense", "Age"]) \
    .agg("sum").reset_index()

arrests.Count = pd.to_numeric(arrests['Count'], errors='coerce')
arrests.head()

Unnamed: 0,Date,Offense,Age,Count
0,1991,Aggravated Assault,15,1.0
1,1991,All Other Larceny,14,1.0
2,1991,All Other Offenses,13,1.0
3,1991,Burglary/Breaking & Entering,10,1.0
4,1991,Burglary/Breaking & Entering,14,1.0


In [86]:
# Certain offenses are never eligible
disqualifying_offenses = list([
    # Result / intent in Death or serious bodily injury
    'Murder and Nonnegligent Manslaughter',
    'Aggravated Assault',
    'Negligent Manslaughter',
    'Rape',
    'Sodomy',
    'Sexual Assault With An Object'
])
clean_arrests = arrests[np.logical_not(arrests['Offense'].isin(disqualifying_offenses))].copy()
clean_arrests["Offense"].unique()

array(['All Other Larceny', 'All Other Offenses',
       'Burglary/Breaking & Entering',
       'Destruction/Damage/Vandalism of Property', 'Disorderly Conduct',
       'Driving Under the Influence', 'Drug/Narcotic Violations',
       'Liquor Law Violations', 'Motor Vehicle Theft', 'Simple Assault',
       'Theft From Building', 'Theft From Motor Vehicle',
       'Trespass of Real Property', 'Arson', 'Counterfeiting/Forgery',
       'Drunkenness', 'False Pretenses/Swindle/Confidence Game',
       'Family Offenses (Nonviolent)', 'Intimidation', 'Robbery',
       'Runaway', 'Shoplifting', 'Stolen Property Offenses',
       'Theft of Motor Vehicle Parts/Accessories',
       'Weapon Law Violations', 'Credit Card/Automatic Teller Fraud',
       'Drug Equipment Violations', 'Fondling', 'Kidnapping/Abduction',
       'Prostitution', 'Purse-snatching', 'Bad Checks',
       'Curfew/Loitering/Vagrancy Violations', 'Embezzlement',
       'Impersonation', 'Statutory Rape',
       'Theft From Coin 

In [102]:
# Assign crime type - this is incorrect since all are assigned misdemeanor
crime_type = {offense: "misdemeanor" for offense in clean_arrests["Offense"].unique()}
crime_type.get('Welfare Fraud')

'misdemeanor'

In [88]:
from datetime import date
def eligible(offense_year, offense, age):
    if age >= 21:
        return False
    offense_class = crime_type.get(offense)
    cutoff = 7 if offense_class == 'felony' else 3
    current_year = date.today().year
    years_since = current_year - offense_year
    
    return years_since >= cutoff

In [89]:
clean_arrests["Eligible"] = clean_arrests.apply(
    lambda row: eligible(row["Date"], row["Class"], row["Age"]), axis=1)

In [94]:
clean_arrests[clean_arrests["Eligible"]]["Count"].sum()

461879.0