# Prosecution severity - Based on sentencing Days

### Prosecution severity
##### Data:
1. case_offense_v01.csv (merged in bonds_analysis_01.ipynb, but ideally can be merged anywhere?)
 
##### Steps
1. Load Data 
    * Load the files listed above
 
2. Create sentencing days for each offense class
    * Use the felony chart, adjusting for priors?
    * Create a dataframe of minimum sentencing days for convicted and charged offense code. 
    * Groupby charged offense code and take the minimum of the sentencing days.  
    * **What is the right metric given multiple sentencing days for one offense class. Currently using minimum?**
    * **Should I be referring to the felony chart instead? How to adjust for priors**?

3. Merge with convicted and charged offense classes
    * Rename columns in the sentencing df (step #2 output) as charged_sentencing_days and charged_offense_code to merge with charged offense. 
    * Repeat the same for convicted offenses. 

3. Data Quality Fixes
    * Fix Offense class strings ("1.0" to "1" etc.)
    * Filter for gender values (M/F/U). Drop X ~500 cases. 
 
4.  Determining Delta
    * Based on sentencing days for an offense class
        * "delta_min" is the difference between "min_convicted_offense_sentences" and "min_charged_offense_sentences", i.e. The minimum convicted offense
        * "delta_ax" is the difference between "max_convicted_offense_sentences" and "max_charged_offense_sentences", i.e. The maximum convicted offense
    * In cases where the charged offense code matches the convicted offense code, set delta_min and delta_max to zero

In [21]:
import pandas as pd
import numpy as np
import os
from sklearn.preprocessing import MinMaxScaler
pd.set_option('display.max_columns', None) 
# I am blind without this
%config InlineBackend.figure_format = 'retina' 

### Load the required files
file_dir = "/workspaces/esg-controversy-tracker/nc_acis-main/20_intermediate_data"
offenses = pd.read_csv(os.path.join(file_dir, "case_offense_v01.csv"), low_memory=False, index_col=[0])

# subset
offs = offenses[
    [
        "case_id",
        "date_of_birth",
        "key_county_num",
        "key_year",
        "court_type",
        "race",
        "sex",
        "process_served",
        "case_creation_date",
        "case_trial_date",
        "court_attorney_type",
        "bond_type",
        "bond_amount",
        "charged_offense_date",
        "charged_offense_code",
        "offense_class",
        "min_sentence",
        "max_sentence",
        "convicted_offense_code",
        "disposition",
        "disposition_date",
        "min_sentence_day", 
        "max_sentence_day"
    ]
]
offs.head(5)

Unnamed: 0,case_id,date_of_birth,key_county_num,key_year,court_type,race,sex,process_served,case_creation_date,case_trial_date,court_attorney_type,bond_type,bond_amount,charged_offense_date,charged_offense_code,offense_class,min_sentence,max_sentence,convicted_offense_code,disposition,disposition_date,min_sentence_day,max_sentence_day
0,1973006239,7f69fac81ed85a6a41996f079cf9adcbd1ea89098e756a...,0,73,CRS,W,M,W,2013-10-22,1974-05-09,R,SEC,10000.0,1973-05-20,3599,,,,,SI,1973-08-07,,
1,1973006239,7f69fac81ed85a6a41996f079cf9adcbd1ea89098e756a...,0,73,CRS,W,M,W,2013-10-22,1974-05-09,R,SEC,10000.0,1973-05-20,3599,,,,,VD,2013-10-18,,
2,1973007828,7f69fac81ed85a6a41996f079cf9adcbd1ea89098e756a...,0,73,CRS,W,M,W,2013-10-22,1974-05-09,R,SEC,1.0,1973-05-19,3599,,,,,SI,1973-08-07,,
3,1973007828,7f69fac81ed85a6a41996f079cf9adcbd1ea89098e756a...,0,73,CRS,W,M,W,2013-10-22,1974-05-09,R,SEC,1.0,1973-05-19,3599,,,,,VD,2013-10-18,,
4,1973013840,7f69fac81ed85a6a41996f079cf9adcbd1ea89098e756a...,0,73,CRS,W,M,W,2013-10-22,1974-05-09,R,SEC,3000.0,1973-10-13,3599,,,,,SI,1974-01-22,,


### 2. Create sentencing days for each offense class


In [22]:
convicted_sentencing = offs[['min_sentence_day', 'max_sentence_day' ,'convicted_offense_code']].copy()
convicted_sentencing.rename(columns={"convicted_offense_code":"offense_code", "min_sentence_day":"min_offense_code_sentencing", 
                                     "max_sentence_day":"max_offense_code_sentencing"}, inplace=True)

charged_sentencing = offs[['min_sentence_day', 'max_sentence_day', 'charged_offense_code']].copy()
charged_sentencing.rename(columns={"charged_offense_code":"offense_code", "min_sentence_day":"min_offense_code_sentencing", 
                                     "max_sentence_day":"max_offense_code_sentencing"}, inplace=True)

sentencing_df = pd.concat([convicted_sentencing, charged_sentencing])
sentencing_df.dropna(inplace=True)
sentencing_df.reset_index(inplace=True, drop=True)
sentencing_df = sentencing_df[["offense_code", "min_offense_code_sentencing", "max_offense_code_sentencing"]].groupby("offense_code").min().reset_index()
sentencing_df.head()

Unnamed: 0,offense_code,min_offense_code_sentencing,max_offense_code_sentencing
0,910.0,4320.0,5550.0
1,916.0,480.0,870.0
2,920.0,300.0,420.0
3,922.0,75.0,75.0
4,930.0,180.0,480.0


### 3. Merge with charged and convicted offense codes

In [23]:
charged_sentencing = sentencing_df.rename(columns={"min_offense_code_sentencing": "min_char_offense_sentences", 
                                                   "max_offense_code_sentencing":"max_char_offense_sentences",
                                                   "offense_code":"charged_offense_code"})

offenses_ranked = pd.merge(offs, charged_sentencing, how='left')

convicted_sentencing = sentencing_df.rename(columns={"min_offense_code_sentencing": "min_conv_offense_sentences", 
                                                   "max_offense_code_sentencing":"max_conv_offense_sentences",
                                                   "offense_code":"convicted_offense_code"})
offenses_ranked = pd.merge(offenses_ranked, convicted_sentencing, how='left')
offenses_ranked.sample(5)

Unnamed: 0,case_id,date_of_birth,key_county_num,key_year,court_type,race,sex,process_served,case_creation_date,case_trial_date,court_attorney_type,bond_type,bond_amount,charged_offense_date,charged_offense_code,offense_class,min_sentence,max_sentence,convicted_offense_code,disposition,disposition_date,min_sentence_day,max_sentence_day,min_char_offense_sentences,max_char_offense_sentences,min_conv_offense_sentences,max_conv_offense_sentences
21132403,9102015710839,de3a6cfac79a6449538d3209b52705c487494b33770c05...,910,15,CR,B,M,C,2015-02-14,2015-05-08,,,,2015-02-14,5461,,,,,VD,2015-07-24,,,1.0,1.0,,
445008,202018700011,edd2a89f0a7ad7d32eb03c6e08c030f860ae75c40dea44...,20,18,CRS,W,M,C,2018-01-08,2019-03-25,A,,,2018-01-07,5491,,,,,VD,2019-03-25,,,2.0,2.0,,
736067,602017700752,06b577aec28c8242c53d1443e1e72d9cf9cded18d5b597...,60,17,CR,W,M,C,2017-02-27,2017-04-17,,,,2017-02-27,5461,,,,,VD,2017-04-05,,,1.0,1.0,,
365650,102006000534,ba0c7a1598513c7f983ebd9a64f0edfff41a52a9dcab0c...,10,6,CR,H,M,C,2006-03-24,2006-05-02,,,,2006-03-18,5441,,,,,VD,2019-05-20,,,0.0,0.0,,
13899739,5902019210674,af2540f4ccf3bd855447790a00e309005d53938163925b...,590,19,CR,B,F,S,2019-03-21,2020-01-27,P,,,2019-03-20,1336,,,,,JU,2020-01-27,,,10.0,10.0,,


### 5. Create Delta based on difference in the sentencing days between charged and convicted sentencing days

In [24]:

offenses_ranked["delta_min"] = (offenses_ranked["min_conv_offense_sentences"] - offenses_ranked["min_char_offense_sentences"])  
offenses_ranked["delta_max"] = offenses_ranked["max_conv_offense_sentences"] - offenses_ranked["max_char_offense_sentences"]

# if codes are the same, fill NaN with 0
offenses_ranked.loc[
    offenses_ranked["charged_offense_code"] == offenses_ranked["convicted_offense_code"],
    "delta_min",
] = 0
offenses_ranked.loc[
    offenses_ranked["charged_offense_code"] == offenses_ranked["convicted_offense_code"],
    "delta_max",
] = 0
offenses_ranked.sample(5)

Unnamed: 0,case_id,date_of_birth,key_county_num,key_year,court_type,race,sex,process_served,case_creation_date,case_trial_date,court_attorney_type,bond_type,bond_amount,charged_offense_date,charged_offense_code,offense_class,min_sentence,max_sentence,convicted_offense_code,disposition,disposition_date,min_sentence_day,max_sentence_day,min_char_offense_sentences,max_char_offense_sentences,min_conv_offense_sentences,max_conv_offense_sentences,delta_min,delta_max
2301587,1202019714106,966649c3986427442da6c9f1c85375ce1ca3a50726f2ca...,120,19,CR,B,M,C,2019-07-29,2019-09-20,R,,,2019-07-28,5450,,,,4450.0,JU,2019-09-20,,,1.0,1.0,5.0,5.0,4.0,4.0
7981872,3502018002513,3668f4d251fd44d1cc0d787eec57fefd378d2a23e88483...,350,18,CR,W,M,C,2018-04-10,2019-03-18,P,SEC,2000.0,2017-12-12,4725,,,,,,,,,1.0,1.0,,,,
20953590,9102014708433,9d95fbbe19b2aec080dd9ab0428f6cf058b876904f3f35...,910,14,CR,B,M,C,2014-02-09,2014-03-28,R,,,2014-02-08,5450,,,,4450.0,JU,2014-03-28,,,1.0,1.0,5.0,5.0,4.0,4.0
22652347,9602018707795,44b35630d1b1d37be222ff3c16226891e461581938da64...,960,18,CR,W,F,C,2018-12-20,2019-02-21,W,,,2018-12-20,4725,,,,4573.0,JU,2019-02-21,,,1.0,1.0,1.0,1.0,0.0,0.0
8511716,4002011091829,fe483263e3fa889f1daa878897ad45b68fc6ce179f4272...,400,11,CR,W,M,M,2011-10-30,2013-05-31,R,WPA,,2011-10-29,5446,2.0,,,5446.0,JU,2013-05-31,,,1.0,1.0,1.0,1.0,0.0,0.0


In [25]:
offenses_ranked['delta_min'].value_counts()

delta_min
 0.0       3317062
 59.0       876314
 4.0        655888
 1.0        311961
 55.0        55354
            ...   
-446.0           1
-143.0           1
 660.0           1
-2396.0          1
 190.0           1
Name: count, Length: 527, dtype: int64

In [26]:
offenses_ranked['delta_max'].value_counts()

delta_max
 0.0       3236948
 59.0       875542
 4.0        656772
 1.0        263427
 2.0         68149
            ...   
-2730.0          1
 219.0           1
-162.0           1
 719.0           1
-324.0           1
Name: count, Length: 749, dtype: int64