# Information about Ice raids in February, 2017

By Christian McDonald, with much help from Cody Winchester

Of note: There is problem with converting the .xlsx file as it came from Phil. I had to open it in Excel and `save as` first before csvkit would properly process the contents of the file.

Right now this works with a single csv of the three available.

`ICE - ORR - 4200 released 01Feb2017 and Reinstated with ICE Warrant.xlsx` was opened an a Save as done to rename as `reinstated.xlsx` into data-feb-raw.

According to Travis County, these are "All inmates who had ICE Detainers released by policy on 02/01/2017 which were subsequently reinstated by ICE providing a Warrant"

This:
- converts the spreadsheet to csv
- parses the csv into to tables
    + First is a list of the people and info about them
    + Second is the list of charges, with booking_id of the person

### to do some day

Cody says we should turn this into a python class so we can reuse across files and projects, since we'll likely work with ICE data often.


In [1]:
%%bash
# Convert to csv
in2csv data-feb-raw/reinstated.xlsx > data-feb/reinstated.csv

  def get_active_sheet(self):


In [2]:
%%bash
# this trims crud from the head and tail
count=$(wc -l < data-feb/reinstated.csv | sed 's/ //g'); trim=$(echo $count - 10 | bc); \
tail -n +6 data-feb/reinstated.csv | head -n $trim > data-feb/reinstated-trimmed.csv


In [3]:
%%bash
# report line count, stripping spaces
wc -l < data-feb/reinstated-trimmed.csv | sed 's/ //g'


201


In [4]:
# python imports
import csv
import pandas as pd
import numpy as np

In [5]:
# helper functions

# This is the name unmangler
def name_unmangler(name_str):
    """Return `LAST,REST` as a tuple"""
    try:
        name_split = name_str.split(',')
        return (name_split[0], name_split[1])
    except:
        return name_str


In [6]:
# scrape spreadsheet data into a python dictionary
with open('data-feb/reinstated-trimmed.csv', 'r') as infile:
    data = csv.reader(infile, delimiter=',')
    data = list(data)
    
    # setting up our people list, with headers
    people = []
    
    # headers for people list
    people_headers = [
            'booking_id',
            'name_last',
            'name_rest',
            'race',
            'sex',
            'jail_custody',
            'age',
            'booking_date',
            'release_date',
            'nativity'
        ]
    
    # setting up our charges list, with headers
    charges = []
    
    charges_headers = [
            'booking_id',
            'authority',
            'charge_id',
            'charge_description',
            'level',
            'sentence',
            'bond_amount',
            'disp_date',
            'disp_type'
        
    ]
    
    booking_id_tracker = ''
    
    # set initial defaults    
    new_record = False
    booking_id = None

    # We are working through the data in the csv
    for row in data:
        row = [x.strip() for x in row]

        if new_record:
            # setting vars for each column of the row
            booking_id = row[0]
            booking_id_tracker = booking_id #sets the tracker to this record
            name = row[1]
            name_last = name_unmangler(name)[0]
            name_rest = name_unmangler(name)[1]
            race = row[2]
            sex = row[3]
            jail_custody = row[4]
            age = row[5]
            booking_date = row[6]
            release_date = row[7]
            nativity = row[8]
            
            # adding a row to the people list with vars pulled from current row
            people.append([
                    booking_id,
                    name_last,
                    name_rest,
                    race,
                    sex,
                    jail_custody,
                    age,
                    booking_date,
                    release_date,
                    nativity
                ])
        
            # reset flag to indicate new record
            new_record = False
        
        else:
            # Ensure not blank row, or header or subheader
            if ''.join(row).strip() != '' and row[1].strip() != 'Charge' and row[0] != 'Booking No':
                #setting vars for each column of the row being parsed
                authority = row[0]
                charge_id = row[1]
                charge_description = row[2]
                skip_3 = row[3]
                level = row[4]
                sentence = row[5]
                bond_amount = row[6]
                disp_date = row[7]
                disp_type = row[8]

                charges.append([
                        booking_id_tracker,
                        authority,
                        charge_id,
                        charge_description,
                        level,
                        sentence,
                        bond_amount,
                        disp_date,
                        disp_type
                    ])
        
        # setting the flag at the end of the loop so we know it is
        # a new record.
        # if it's a blank row, the next one is a new record
        if ''.join(row).strip() == '' or row[0] == 'Booking No':
            # set flag
            new_record = True

# at this point I have two dictionaries?
# `people` are all the individual people
# `charges` are all the charges, with first row the booking_id of the person

In [7]:
# This creates the peopleDF dataframe from the list we created from the csv
peopleDF = pd.DataFrame(people, columns=people_headers)

# This resets the index to the booking_id, which is supposed to be unique.
# This may or may not be a great idea
peopleDF = peopleDF.set_index('booking_id')

In [8]:
peopleDF.head()

Unnamed: 0_level_0,name_last,name_rest,race,sex,jail_custody,age,booking_date,release_date,nativity
booking_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1541823,RAMIREZ-MARTINEZ,ALEJANDRO,W,M,IN CUSTODY,29,2015-11-02,,Mexico (Use only when state is unknown)
1601778,GONZALEZ,WILLIAM,W,M,IN CUSTODY,25,2016-01-14,,"Distrito Federal, Mexico"
1621712,MOCTEZUMA-MORENO,JOSE R,W,M,RELEASED,33,2016-06-12,2017-02-09,Mexico (Use only when state is unknown)
1624441,CONTRERAS,BETO,W,M,IN CUSTODY,42,2016-07-03,,Mexico (Use only when state is unknown)
1625127,MORA,JOSE EUSTASIO,W,M,IN CUSTODY,32,2016-07-09,,Mexico (Use only when state is unknown)


In [9]:
# this is number of people 
len(peopleDF.index)

31

In [10]:
# This creates the peopleDF dataframe from the list we created from the csv
chargesDF = pd.DataFrame(charges, columns=charges_headers)

In [11]:
# peek at chargesDF
chargesDF.head()

Unnamed: 0,booking_id,authority,charge_id,charge_description,level,sentence,bond_amount,disp_date,disp_type
0,1541823,NEWART,36010001,INDECENCY W/CHILD SEXUAL CONTACT,F2,,70000.0,,
1,1541823,ICEW,4200,ICE DETAINER,F*,,,,
2,1601778,NEW,48010019,EVADING ARREST DET W/PREV CONVICTION,FS,,,2016-01-15,NO CHARGES FILED
3,1601778,NEW,48010020,EVADING ARREST DET W/VEH,F3,7 years,,,
4,1601778,NEW,54040011,DRIVING WHILE INTOXICATED 3RD OR MORE,F3,7 years,,,


In [12]:
# this is number of all charges, including the detainer
len(chargesDF.index)

108

## Looking at charges

Now that we have dataframes for our people and our charges, it's time to look at how these charges breakdown for these folks. Some questions to ask:

- What is the lowest charge that one of these folks have?
- Were any retained only on misdemeanor?
- Is there some charge guaranteed to get you a new warrant?

In [13]:
# This filter is removing the ice detainers from the list of charges
# so we don't have them in the analysis, since everyone has them
# All ice detainers are charge_id = 4200
chargesDF_noice = chargesDF[(chargesDF['charge_id'] != '4200')]

In [14]:
# number of total charges sans ice detainers
len(chargesDF_noice.index)

77

In [15]:
# This looks groups the charges to see the most popular charges
# I did check this that charge_id is unique to each description
chargesDF_noice.groupby(['charge_description']).agg({'charge_id': np.size}). \
sort_values('charge_id', ascending=False)

Unnamed: 0_level_0,charge_id
charge_description,Unnamed: 1_level_1
DRIVING WHILE INTOXICATED 3RD OR MORE,13
POSS CS PG1<1G (FS),6
DRIVING WHILE INTOXICATED,5
AGG ASSLT W/DEADLY WEAPON,5
INDECENCY W/CHILD SEXUAL CONTACT,3
INTERFER W/EMERGENCY CALL (MA),3
MOTION TO REVOKE PROBATION,3
TRAFFIC OFFENSE MULTIPLE,3
DRIVING WHILE INTOXICATED BAC>=0.15,3
EVADING ARREST DETENTION (MA),2


In [16]:
# this gives me a matrix that has the level and number charges for each person
# in other words, joe had 2 felonies and one misdemeanor
# idea is to find those with just misdemeanors
chargesDF_matrix = chargesDF_noice.groupby(['booking_id', 'level']).agg({'level': np.size}) \
                    .rename(columns={
                            'level': 'charges'
                        })

chargesDF_matrix

Unnamed: 0_level_0,Unnamed: 1_level_0,charges
booking_id,level,Unnamed: 2_level_1
1541823,F2,1
1601778,F3,3
1601778,FS,3
1601778,MB,1
1621712,F3,1
1624441,F3,1
1624441,FS,1
1624441,MA,1
1625127,F2,3
1625127,F3,1


In [22]:
# multiindex sorting off second level
# This gets me rows that have the lambda value, but it's
# not quite what I want.
chargesDF_matrix[chargesDF_matrix.index.map(lambda x: x[1] == 'F2')]

Unnamed: 0_level_0,Unnamed: 1_level_0,charges
booking_id,level,Unnamed: 2_level_1
1541823,F2,1
1625127,F2,3
1628855,F2,1
1628918,F2,2
1629071,F2,1
1640430,F2,1
1642527,F2,2
1643228,F2,1
1643408,F2,2


In [28]:
#df_item = df_item[df_item['column2'].apply(lambda x: 'str2' in x.split(','))]

chargesDF_matrix[chargesDF_matrix.index.map(lambda x: x[1] == 'F2')]


AttributeError: 'function' object has no attribute 'apply'