# What is Jupyter Notebook?

#### Notebook document
Notebook documents (or “notebooks”, all lower case) are documents produced by the Jupyter Notebook App, which contain both computer code (e.g. python) and rich text elements (paragraph, equations, figures, links, etc…). Notebook documents are both human-readable documents containing the analysis description and the results (figures, tables, etc..) as well as executable documents which can be run to perform data analysis.

#### Jupyter Notebook App
The Jupyter Notebook App is a server-client application that allows editing and running notebook documents via a web browser. The Jupyter Notebook App can be executed on a local desktop requiring no internet access (as described in this document) or can be installed on a remote server and accessed through the internet.

In addition to displaying/editing/running notebook documents, the Jupyter Notebook App has a “Dashboard” (Notebook Dashboard), a “control panel” showing local files and allowing to open notebook documents or shutting down their kernels.

#### Kernel
A notebook kernel is a “computational engine” that executes the code contained in a Notebook document. The ipython kernel, referenced in this guide, executes python code. Kernels for many other languages exist (official kernels).

When you open a Notebook document, the associated kernel is automatically launched. When the notebook is executed (either cell-by-cell or with menu Cell -> Run All), the kernel performs the computation and produces the results. Depending on the type of computations, the kernel may consume significant CPU and RAM. Note that the RAM is not released until the kernel is shut-down.

#### Notebook
The Notebook Dashboard is the component which is shown first when you launch Jupyter Notebook App. The Notebook Dashboard is mainly used to open notebook documents, and to manage the running kernels (visualize and shutdown).

The Notebook Dashboard has other features similar to a file manager, namely navigating folders and renaming/deleting files.

# What is Pandas?

Pandas is a package commonly used to deal with data analysis. It simplifies the loading of data from external sources such as text files and databases, as well as providing ways of analysing and manipulating data once it is loaded into your computer. The features provided in pandas automate and simplify a lot of the common tasks that would take many lines of code to write in the basic Python langauge.

#### Pandas can work with the following
* Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet.
* Ordered and unordered (not necessarily fixed-frequency) time series data.
* Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels.
* Any other form of observational / statistical data sets. The data actually need not be labelled at all to be placed into a pandas data structure.

#### Pandas Data Structures
##### Series
* A Series is a one-dimensional array-like object containing an array of data (of any NumPy data type) and an associated array of data labels, called its index. The simplest Series is formed from only an array of data

##### Dataframe
* A DataFrame represents a tabular, spreadsheet-like data structure containing an ordered collection of columns, each of which can be a different value type (numeric, string, boolean, etc.).
* The DataFrame has both a row and column index; it can be thought of as a dict of Series (one for all sharing the same index).

# What is Requests?

Requests is an Apache2 Licensed HTTP library, written in Python. It is designed to be used by humans to interact with the language. This means you don’t have to manually add query strings to URLs, or form-encode your POST data. Don’t worry if that made no sense to you. It will in due time.

#### What can Requests do?

Requests will allow you to send HTTP/1.1 requests using Python. With it, you can add content like headers, form data, multipart files, and parameters via simple Python libraries. It also allows you to access the response data of Python in the same way.

In programming, a library is a collection or pre-configured selection of routines, functions, and operations that a program can use. These elements are often referred to as modules, and stored in object format.

# What is BeautifulSoup?

BeautifulSoup is a third party Python library from Crummy.

The library is designed for quick turnaround projects like screen-scraping.

#### What can Beautiful Soup do?
Beautiful Soup parses anything you give it and does the tree traversal
stuff for you. 

You can use it to find all the links of a website

Find all the links whose urls match "foo.com"

Find the table heading that’s got bold text, then give me that text.

Find every "a" element that has an href attribute etc.

# Loading Libraries

In [208]:
import pandas as pd
import requests
import zipfile
from bs4 import BeautifulSoup
from urllib.request import urlopen

## Data downloaded from the source website using web scraping

We perform web scraping to download data from a website using Beautiful Soup and Requests

In [209]:
url = "https://www.bjs.gov/index.cfm?ty=pbdetail&iid=6187"
page = urlopen(url)

# Download the page as an HTML file
soup = BeautifulSoup(page, "html.parser")

In [210]:
# Find all paragraph tags
all_paragraphs = soup.find_all('p')

In [211]:
# Iterating through all the paragraph tags 

i = 0
for paragraph in all_paragraphs:
    i += 1
    print(paragraph)
    print("paragraph number %d" % (i))
    print("*********************************************************************************")
    print(" ")

<p>E. Ann Carson, Ph.D., <em>BJS Statistician</em></p>
paragraph number 1
*********************************************************************************
 
<p>January 9, 2018    NCJ 251149</p>
paragraph number 2
*********************************************************************************
 
<p><p>Presents final counts of prisoners under the jurisdiction of state and federal correctional authorities at year-end 2016, including admissions, releases, noncitizen inmates, and inmates age 17 or younger. The report describes prisoner populations by jurisdiction, most serious offense, and demographic characteristics. Selected findings on prison capacity and prisoners held in private prisons, local jails, the U.S. military, and U.S. territories are also included. Findings are based on data from BJS's National Prisoner Statistics program, which collects data from state departments of correction and the Federal Bureau of Prisons.</p></p>
paragraph number 3
**********************************

In [212]:
# Getting the 8th paragraph from the HTML page and then append all the 'a' tags into a list
i = 0
links = []
for paragraph in all_paragraphs:
    i += 1
    if i == 7:
        for a in paragraph.find_all('a'):
            print(a['href'])
            links.append(a['href'])

# Download the actual source data from the website            
downloadString = "https://www.bjs.gov" + links[3]
r = requests.get(downloadString,allow_redirects=True)

# Save the downloaded files into a zip file
open('downloads.zip', 'wb').write(r.content)

/content/pub/press/p16pr.cfm
/content/pub/pdf/p16_sum.pdf
/content/pub/pdf/p16.pdf
/content/pub/sheets/p16.zip


40773

In [213]:
# Unzip the downloaded zip file into the constituent files
zip_ref = zipfile.ZipFile("downloads.zip", 'r')
zip_ref.extractall('LoveDataWeekFilesScraping')
zip_ref.close()

# From this point onwards we will be working with the already downloaded and relatively cleaned data

## Working with downloaded files

In [224]:
# Read all csv files from the folder LoveDataWeekFiles
from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir('LoveDataWeekFiles') if isfile(join('LoveDataWeekFiles', f))]

In [227]:
# fileData here is a list of dataframes
fileData = []
for file in onlyfiles:
    if '.csv' in file:
        print(file)
        path = 'LoveDataWeekFiles/' + file
        fileData.append(pd.read_csv(path, encoding='latin1'))
    if '.tsv' in file:
        print(file)
        path = 'LoveDataWeekFiles/' + file
        fileData.append(pd.read_csv(path, encoding='latin1', sep='\t'))

1860-slavery-data.csv
2015-jaildata.csv
popnumb.csv
Prison-Data.tsv


####  Details of the data downloaded
* fileData[0] = 1860-slavery-data.csv
* fileData[1] = 2015-jaildata.csv
* fileData[2] = popnumb.csv
* fileData[3] = Prison-Data.csv

#### Working with slavery data

* <b>Selecting specific subset of columns from the slavery dataframe</b>

In [220]:
slaveryRaw = fileData[0].loc[:,['Column 1','Column 2', 'Column 3','Column 5','Column 373','Column 38','Column 39', 'Column 40','Column 71', 'Column 72', 'Column 73','Column 104','Column 105','Column 106','Column 243','Column 244','Column 245','Column 246','Column 247','Column 248','Column 249','Column 250','Column 251','Column 252','Column 253','Column 254','Column 255','Column 256','Column 257','Column 258','Column 259','Column 260','Column 261','Column 262','Column 263','Column 264','Column 265','Column 275','Column 276']]
slaveryRaw.head()

Unnamed: 0,Column 1,Column 2,Column 3,Column 5,Column 373,Column 38,Column 39,Column 40,Column 71,Column 72,...,Column 258,Column 259,Column 260,Column 261,Column 262,Column 263,Column 264,Column 265,Column 275,Column 276
0,S,860,1,CONNECTICUT,0,221851,229653,451504,4136,4491,...,-1,-1,-1,-1,-1,-1,-1,-1,94831,460147
1,C,860,1,FAIRFIELD,10,36614,39186,75800,790,886,...,-1,-1,-1,-1,-1,-1,-1,-1,16102,77476
2,C,860,1,HARTFORD,30,43766,44877,88643,671,648,...,-1,-1,-1,-1,-1,-1,-1,-1,17927,89962
3,C,860,1,LITCHFIELD,50,23001,23206,46207,577,534,...,-1,-1,-1,-1,-1,-1,-1,-1,9701,47318
4,C,860,1,MIDDLESEX,70,14771,15751,30522,153,184,...,-1,-1,-1,-1,-1,-1,-1,-1,7068,30859


* <b>Renaming the columns for better understanding and ease of working</b>

In [222]:
slaveryRaw.columns = ['CountyState','Year','StateCode','NameStateCounty','CountyID','WhiteMales','WhiteFemales','AggrWhites','FreeColoredMales','FreeColoredFemales','AggrColored','MaleSlaves','FemaleSlaves','AggrSlaves','OneSlave','TwoSlaves','ThreeSlaves','FourSlaves','FiveSlaves','SixSlaves','SevenSlaves','EigthSlaves','NineSlaves','TenFourSlaves','FifteenNineSlaves','TwentyNineSlaves','ThirtyNineSlaves','FortyNineSlaves','FiftySixNineSlaves','SeventyNineSlaves','OneHundSlaves','TwoHundSlaves','ThreeHundSlaves','OneThouSlaves','GreaterSlaves','TotalSlaveholders','TotalSlaves','Families','TotalFree']
slaveryRaw.head()

Unnamed: 0,CountyState,Year,StateCode,NameStateCounty,CountyID,WhiteMales,WhiteFemales,AggrWhites,FreeColoredMales,FreeColoredFemales,...,SeventyNineSlaves,OneHundSlaves,TwoHundSlaves,ThreeHundSlaves,OneThouSlaves,GreaterSlaves,TotalSlaveholders,TotalSlaves,Families,TotalFree
0,S,860,1,CONNECTICUT,0,221851,229653,451504,4136,4491,...,-1,-1,-1,-1,-1,-1,-1,-1,94831,460147
1,C,860,1,FAIRFIELD,10,36614,39186,75800,790,886,...,-1,-1,-1,-1,-1,-1,-1,-1,16102,77476
2,C,860,1,HARTFORD,30,43766,44877,88643,671,648,...,-1,-1,-1,-1,-1,-1,-1,-1,17927,89962
3,C,860,1,LITCHFIELD,50,23001,23206,46207,577,534,...,-1,-1,-1,-1,-1,-1,-1,-1,9701,47318
4,C,860,1,MIDDLESEX,70,14771,15751,30522,153,184,...,-1,-1,-1,-1,-1,-1,-1,-1,7068,30859


#### Removing white spaces from the country names

In [411]:
slaveryRaw['NameStateCounty'] = slaveryRaw['NameStateCounty'].str.strip()

* <b>Performing a sanity check on the data</b>

In [223]:
slaveryRaw[(slaveryRaw['CountyState'] == 'S')]

Unnamed: 0,CountyState,Year,StateCode,NameStateCounty,CountyID,WhiteMales,WhiteFemales,AggrWhites,FreeColoredMales,FreeColoredFemales,...,SeventyNineSlaves,OneHundSlaves,TwoHundSlaves,ThreeHundSlaves,OneThouSlaves,GreaterSlaves,TotalSlaveholders,TotalSlaves,Families,TotalFree
0,S,860,1,CONNECTICUT,0,221851,229653,451504,4136,4491,...,-1,-1,-1,-1,-1,-1,-1,-1,94831,460147
17,S,860,2,MAINE,0,316527,310420,626947,659,668,...,-1,-1,-1,-1,-1,-1,-1,-1,120863,628279
34,S,860,3,MASSACHUSETTS,0,592231,629201,1221432,4469,5133,...,-1,-1,-1,-1,-1,-1,-1,-1,251287,1231066
48,S,860,4,NEW HAMPSHIRE,0,159563,166016,325579,253,241,...,-1,-1,-1,-1,-1,-1,-1,-1,69018,326073
56,S,860,5,RHODE ISLAND,0,82294,88355,170649,1831,2121,...,-1,-1,-1,-1,-1,-1,-1,-1,35209,174620
69,S,860,6,VERMONT,0,158406,155963,314369,371,338,...,-1,-1,-1,-1,-1,-1,-1,-1,63781,315098
73,S,860,11,DELAWARE,0,45940,44649,90589,9889,9940,...,0,0,0,0,0,0,587,1798,18966,110418
91,S,860,12,NEW JERSEY,0,322733,323966,646699,12312,13006,...,-1,-1,-1,-1,-1,-1,-1,-1,130348,672017
128,S,860,13,NEW YORK,0,1910279,1921311,3831590,23178,25827,...,-1,-1,-1,-1,-1,-1,-1,-1,758420,3880735
207,S,860,14,PENNSYLVANIA,0,1427943,1421316,2849259,26473,30476,...,-1,-1,-1,-1,-1,-1,-1,-1,524558,2906215


#### Updating values in columns which are negative

In [405]:
for column in slaveryRaw:
    slaveryRaw.loc[(slaveryRaw[column]== -1) | (slaveryRaw[column]== -2) | (slaveryRaw[column]== -8) | (slaveryRaw[column]== -9),column] = 0

In [407]:
print(slaveryRaw.columns)
slaveryRaw.head()

Index(['CountyState', 'Year', 'StateCode', 'NameStateCounty', 'CountyID',
       'WhiteMales', 'WhiteFemales', 'AggrWhites', 'FreeColoredMales',
       'FreeColoredFemales', 'AggrColored', 'MaleSlaves', 'FemaleSlaves',
       'AggrSlaves', 'OneSlave', 'TwoSlaves', 'ThreeSlaves', 'FourSlaves',
       'FiveSlaves', 'SixSlaves', 'SevenSlaves', 'EigthSlaves', 'NineSlaves',
       'TenFourSlaves', 'FifteenNineSlaves', 'TwentyNineSlaves',
       'ThirtyNineSlaves', 'FortyNineSlaves', 'FiftySixNineSlaves',
       'SeventyNineSlaves', 'OneHundSlaves', 'TwoHundSlaves',
       'ThreeHundSlaves', 'OneThouSlaves', 'GreaterSlaves',
       'TotalSlaveholders', 'TotalSlaves', 'Families', 'TotalFree'],
      dtype='object')


Unnamed: 0,CountyState,Year,StateCode,NameStateCounty,CountyID,WhiteMales,WhiteFemales,AggrWhites,FreeColoredMales,FreeColoredFemales,...,SeventyNineSlaves,OneHundSlaves,TwoHundSlaves,ThreeHundSlaves,OneThouSlaves,GreaterSlaves,TotalSlaveholders,TotalSlaves,Families,TotalFree
0,S,860,1,CONNECTICUT,0,221851,229653,451504,4136,4491,...,0,0,0,0,0,0,0,0,94831,460147
1,C,860,1,FAIRFIELD,10,36614,39186,75800,790,886,...,0,0,0,0,0,0,0,0,16102,77476
2,C,860,1,HARTFORD,30,43766,44877,88643,671,648,...,0,0,0,0,0,0,0,0,17927,89962
3,C,860,1,LITCHFIELD,50,23001,23206,46207,577,534,...,0,0,0,0,0,0,0,0,9701,47318
4,C,860,1,MIDDLESEX,70,14771,15751,30522,153,184,...,0,0,0,0,0,0,0,0,7068,30859


#### Working with general population data

In [229]:
populationData = fileData[2]
populationData.head()

Unnamed: 0,ID,STATE,STATECODE,TOTPOP
0,0400000US01,Alabama,AL,4858979
1,0400000US02,Alaska,AK,738432
2,0400000US04,Arizona,AZ,6828065
3,0400000US05,Arkansas,AR,2978204
4,0400000US06,California,CA,39144818


* <b>Converting the State column values into uppercase</b>

In [231]:
populationData['STATE'] = populationData['STATE'].str.upper()
populationData.head()

Unnamed: 0,ID,STATE,STATECODE,TOTPOP
0,0400000US01,ALABAMA,AL,4858979
1,0400000US02,ALASKA,AK,738432
2,0400000US04,ARIZONA,AZ,6828065
3,0400000US05,ARKANSAS,AR,2978204
4,0400000US06,CALIFORNIA,CA,39144818


#### Working with prison data

In [235]:
prisonDataRaw = fileData[3]
prisonDataRaw.head()

Unnamed: 0,YEAR,STATEID,STATE,REGION,CUSGT1M,CUSGT1F,CUSLT1M,CUSLT1F,CUSUNSM,CUSUNSF,...,DTHHOMOM,DTHHOMOF,DTHPERSM,DTHPERSF,DTHOTHM,DTHOTHF,DTHTOTM,DTHTOTF,HANDLEM,HANDLEF
0,1978,1,AL,3,-2,-2,-2,-2,-2,-2,...,-1,-1,6,0,0,0,-1,-1,-9,-9
1,1978,2,AK,4,-2,-2,-2,-2,-2,-2,...,-1,-1,0,0,0,0,-1,-1,-9,-9
2,1978,4,AZ,4,-2,-2,-2,-2,-2,-2,...,-1,-1,5,0,0,0,-1,-1,-9,-9
3,1978,5,AR,3,-2,-2,-2,-2,-2,-2,...,-1,-1,0,0,0,0,-1,-1,-9,-9
4,1978,6,CA,4,-2,-2,-2,-2,-2,-2,...,-1,-1,0,0,43,1,-1,-1,-9,-9


#### Getting a subset of the data which is of use to us

In [262]:
prisonDataSubset = prisonDataRaw.loc[:,['YEAR','STATE','STATEID','REGION','CUSLT18F','CUSLT18M','BLACK','BLACKM','BLACKF','HISP','HISPM','HISPF','WHITE','WHITEM','WHITEF','ASIAN','ASIANM','ASIANF','AIAN','AIANM','AIANF','NHOPI','NHPIM','NHPIF','OTHERRACE','ADDRACEM','ADDRACEF','TWORACE','TWORACEM','TWORACEF','RACEDK','UNKRACEM','UNKRACEF','NONCITZ','CUSCTZNM','CUSCTZNF','ADMIS','ADTOTM','ADTOTF','CONFPOP','CUSTOTM','CUSTOTF','RELEASE','RLTOTF','RLTOTM','CONV','CUSUNSM','CUSUNSF']]
print(prisonDataSubset.columns)
prisonDataSubset.head()

Index(['YEAR', 'STATE', 'STATEID', 'REGION', 'CUSLT18F', 'CUSLT18M', 'BLACK',
       'BLACKM', 'BLACKF', 'HISP', 'HISPM', 'HISPF', 'WHITE', 'WHITEM',
       'WHITEF', 'ASIAN', 'ASIANM', 'ASIANF', 'AIAN', 'AIANM', 'AIANF',
       'NHOPI', 'NHPIM', 'NHPIF', 'OTHERRACE', 'ADDRACEM', 'ADDRACEF',
       'TWORACE', 'TWORACEM', 'TWORACEF', 'RACEDK', 'UNKRACEM', 'UNKRACEF',
       'NONCITZ', 'CUSCTZNM', 'CUSCTZNF', 'ADMIS', 'ADTOTM', 'ADTOTF',
       'CONFPOP', 'CUSTOTM', 'CUSTOTF', 'RELEASE', 'RLTOTF', 'RLTOTM', 'CONV',
       'CUSUNSM', 'CUSUNSF'],
      dtype='object')


Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self._getitem_tuple(key)


Unnamed: 0,YEAR,STATE,STATEID,REGION,CUSLT18F,CUSLT18M,BLACK,BLACKM,BLACKF,HISP,...,ADTOTF,CONFPOP,CUSTOTM,CUSTOTF,RELEASE,RLTOTF,RLTOTM,CONV,CUSUNSM,CUSUNSF
0,1978,AL,1,3,-1,-1,,3116,158,,...,184,,-2,-2,,161,2823,,-2,-2
1,1978,AK,2,4,-1,-1,,176,9,,...,10,,-2,-2,,9,313,,-2,-2
2,1978,AZ,4,4,-1,-1,,641,42,,...,134,,-2,-2,,141,1546,,-2,-2
3,1978,AR,5,3,-1,-1,,1304,49,,...,107,,-2,-2,,107,1801,,-2,-2
4,1978,CA,6,4,-1,-1,,6743,379,,...,725,,-2,-2,,549,9658,,-2,-2


#### Adding new columns to help us match the data with the jail data for future use

In [263]:
prisonDataSubset['RTIID'] = 'NA'
prisonDataSubset['NAME'] = 'NA'
prisonDataSubset['CITY'] = 'NA'
prisonDataSubset['ZIP'] = 'NA'
prisonDataSubset['CNTYCODE'] = 'NA'
prisonDataSubset['FELONY'] = 'NA'
prisonDataSubset['MISD'] = 'NA'
prisonDataSubset['UNCONV'] = 'NA'
prisonDataSubset['WEEKN'] = 'NA'
prisonDataSubset['STATEABBR'] = 'NA'
print(prisonDataSubset.columns)

Index(['YEAR', 'STATE', 'STATEID', 'REGION', 'CUSLT18F', 'CUSLT18M', 'BLACK',
       'BLACKM', 'BLACKF', 'HISP', 'HISPM', 'HISPF', 'WHITE', 'WHITEM',
       'WHITEF', 'ASIAN', 'ASIANM', 'ASIANF', 'AIAN', 'AIANM', 'AIANF',
       'NHOPI', 'NHPIM', 'NHPIF', 'OTHERRACE', 'ADDRACEM', 'ADDRACEF',
       'TWORACE', 'TWORACEM', 'TWORACEF', 'RACEDK', 'UNKRACEM', 'UNKRACEF',
       'NONCITZ', 'CUSCTZNM', 'CUSCTZNF', 'ADMIS', 'ADTOTM', 'ADTOTF',
       'CONFPOP', 'CUSTOTM', 'CUSTOTF', 'RELEASE', 'RLTOTF', 'RLTOTM', 'CONV',
       'CUSUNSM', 'CUSUNSF', 'RTIID', 'NAME', 'CITY', 'ZIP', 'CNTYCODE',
       'FELONY', 'MISD', 'UNCONV', 'WEEKN', 'STATEABBR'],
      dtype='object')


#### Renaming a few columns to make them match with the jail data

In [264]:
prisonDataSubset = prisonDataSubset.rename(columns={'RLTOTF' : 'RELEASEF','RLTOTM' : 'RELEASEM'})
prisonDataSubset = prisonDataSubset.rename(columns={'CUSLT18F' : 'JUVF', 'CUSLT18M' : 'JUVM'})
print(prisonDataSubset.columns)
prisonDataSubset.head()

Index(['YEAR', 'STATE', 'STATEID', 'REGION', 'JUVF', 'JUVM', 'BLACK', 'BLACKM',
       'BLACKF', 'HISP', 'HISPM', 'HISPF', 'WHITE', 'WHITEM', 'WHITEF',
       'ASIAN', 'ASIANM', 'ASIANF', 'AIAN', 'AIANM', 'AIANF', 'NHOPI', 'NHPIM',
       'NHPIF', 'OTHERRACE', 'ADDRACEM', 'ADDRACEF', 'TWORACE', 'TWORACEM',
       'TWORACEF', 'RACEDK', 'UNKRACEM', 'UNKRACEF', 'NONCITZ', 'CUSCTZNM',
       'CUSCTZNF', 'ADMIS', 'ADTOTM', 'ADTOTF', 'CONFPOP', 'CUSTOTM',
       'CUSTOTF', 'RELEASE', 'RELEASEF', 'RELEASEM', 'CONV', 'CUSUNSM',
       'CUSUNSF', 'RTIID', 'NAME', 'CITY', 'ZIP', 'CNTYCODE', 'FELONY', 'MISD',
       'UNCONV', 'WEEKN', 'STATEABBR'],
      dtype='object')


Unnamed: 0,YEAR,STATE,STATEID,REGION,JUVF,JUVM,BLACK,BLACKM,BLACKF,HISP,...,RTIID,NAME,CITY,ZIP,CNTYCODE,FELONY,MISD,UNCONV,WEEKN,STATEABBR
0,1978,AL,1,3,-1,-1,,3116,158,,...,,,,,,,,,,
1,1978,AK,2,4,-1,-1,,176,9,,...,,,,,,,,,,
2,1978,AZ,4,4,-1,-1,,641,42,,...,,,,,,,,,,
3,1978,AR,5,3,-1,-1,,1304,49,,...,,,,,,,,,,
4,1978,CA,6,4,-1,-1,,6743,379,,...,,,,,,,,,,


#### Setting the STATEABBR column equal to STATE

In [265]:
prisonDataSubset['STATEABBR'] = prisonDataSubset['STATE']
prisonDataSubset.head()

Unnamed: 0,YEAR,STATE,STATEID,REGION,JUVF,JUVM,BLACK,BLACKM,BLACKF,HISP,...,RTIID,NAME,CITY,ZIP,CNTYCODE,FELONY,MISD,UNCONV,WEEKN,STATEABBR
0,1978,AL,1,3,-1,-1,,3116,158,,...,,,,,,,,,,AL
1,1978,AK,2,4,-1,-1,,176,9,,...,,,,,,,,,,AK
2,1978,AZ,4,4,-1,-1,,641,42,,...,,,,,,,,,,AZ
3,1978,AR,5,3,-1,-1,,1304,49,,...,,,,,,,,,,AR
4,1978,CA,6,4,-1,-1,,6743,379,,...,,,,,,,,,,CA


#### We can notice that the STATE column does not contain the complete name of the states. We would like to update this information. In order to do that, we will perform a join between the populationData and prisonDataSubset by joining on columns STATEABBR and STATECODE.

##### In order to find the respective column names, we add suffixes to the names of the dataframe columns
* prisonDataSubset columns will contain the suffix _prison
* populationData columns will contain the suffix _population

##### Note the syntax of the join command:
* set_index() - This function sets the index values to be used while performing the joins between two dataframes because dataframe joins need a key column to work with.

* Since we set 'STATEABBR' and 'STATECODE' as indices, they will be removed from the main dataframes, merged and will be used instead as an index for the resulting dataframe

In [266]:
prisonDataSubset = prisonDataSubset.set_index('STATEABBR').join(populationData.set_index('STATECODE'),lsuffix='_prison', rsuffix='_population')
print(prisonDataSubset.columns)
prisonDataSubset.head()

Index(['YEAR', 'STATE_prison', 'STATEID', 'REGION', 'JUVF', 'JUVM', 'BLACK',
       'BLACKM', 'BLACKF', 'HISP', 'HISPM', 'HISPF', 'WHITE', 'WHITEM',
       'WHITEF', 'ASIAN', 'ASIANM', 'ASIANF', 'AIAN', 'AIANM', 'AIANF',
       'NHOPI', 'NHPIM', 'NHPIF', 'OTHERRACE', 'ADDRACEM', 'ADDRACEF',
       'TWORACE', 'TWORACEM', 'TWORACEF', 'RACEDK', 'UNKRACEM', 'UNKRACEF',
       'NONCITZ', 'CUSCTZNM', 'CUSCTZNF', 'ADMIS', 'ADTOTM', 'ADTOTF',
       'CONFPOP', 'CUSTOTM', 'CUSTOTF', 'RELEASE', 'RELEASEF', 'RELEASEM',
       'CONV', 'CUSUNSM', 'CUSUNSF', 'RTIID', 'NAME', 'CITY', 'ZIP',
       'CNTYCODE', 'FELONY', 'MISD', 'UNCONV', 'WEEKN', 'ID',
       'STATE_population', 'TOTPOP'],
      dtype='object')


Unnamed: 0,YEAR,STATE_prison,STATEID,REGION,JUVF,JUVM,BLACK,BLACKM,BLACKF,HISP,...,CITY,ZIP,CNTYCODE,FELONY,MISD,UNCONV,WEEKN,ID,STATE_population,TOTPOP
AK,1978,AK,2,4,-1,-1,,176,9,,...,,,,,,,,0400000US02,ALASKA,738432.0
AK,1979,AK,2,4,-1,-1,,81,7,,...,,,,,,,,0400000US02,ALASKA,738432.0
AK,1980,AK,2,4,-1,-1,,89,5,,...,,,,,,,,0400000US02,ALASKA,738432.0
AK,1981,AK,2,4,-1,-1,,126,13,,...,,,,,,,,0400000US02,ALASKA,738432.0
AK,1982,AK,2,4,-1,-1,,91,3,,...,,,,,,,,0400000US02,ALASKA,738432.0


#### Removing extra columns from the resulting dataframe

In [267]:
prisonDataSubset = prisonDataSubset.drop(columns=['ID','TOTPOP'])
print(prisonDataSubset.columns)
prisonDataSubset.head()

Index(['YEAR', 'STATE_prison', 'STATEID', 'REGION', 'JUVF', 'JUVM', 'BLACK',
       'BLACKM', 'BLACKF', 'HISP', 'HISPM', 'HISPF', 'WHITE', 'WHITEM',
       'WHITEF', 'ASIAN', 'ASIANM', 'ASIANF', 'AIAN', 'AIANM', 'AIANF',
       'NHOPI', 'NHPIM', 'NHPIF', 'OTHERRACE', 'ADDRACEM', 'ADDRACEF',
       'TWORACE', 'TWORACEM', 'TWORACEF', 'RACEDK', 'UNKRACEM', 'UNKRACEF',
       'NONCITZ', 'CUSCTZNM', 'CUSCTZNF', 'ADMIS', 'ADTOTM', 'ADTOTF',
       'CONFPOP', 'CUSTOTM', 'CUSTOTF', 'RELEASE', 'RELEASEF', 'RELEASEM',
       'CONV', 'CUSUNSM', 'CUSUNSF', 'RTIID', 'NAME', 'CITY', 'ZIP',
       'CNTYCODE', 'FELONY', 'MISD', 'UNCONV', 'WEEKN', 'STATE_population'],
      dtype='object')


Unnamed: 0,YEAR,STATE_prison,STATEID,REGION,JUVF,JUVM,BLACK,BLACKM,BLACKF,HISP,...,RTIID,NAME,CITY,ZIP,CNTYCODE,FELONY,MISD,UNCONV,WEEKN,STATE_population
AK,1978,AK,2,4,-1,-1,,176,9,,...,,,,,,,,,,ALASKA
AK,1979,AK,2,4,-1,-1,,81,7,,...,,,,,,,,,,ALASKA
AK,1980,AK,2,4,-1,-1,,89,5,,...,,,,,,,,,,ALASKA
AK,1981,AK,2,4,-1,-1,,126,13,,...,,,,,,,,,,ALASKA
AK,1982,AK,2,4,-1,-1,,91,3,,...,,,,,,,,,,ALASKA


#### Since we have the needed information in the dataframe, we can rename the columns

In [283]:
prisonDataSubset = prisonDataSubset.rename(columns = {'STATE_prison':'STATEABBR', 'STATE_population':'STATE'})
prisonDataSubset = prisonDataSubset.reset_index()
#prisonDataSubset = prisonDataSubset.drop(columns= ['index'])
print(prisonDataSubset.columns)
prisonDataSubset.head()

Index(['YEAR', 'STATEABBR', 'STATEID', 'REGION', 'JUVF', 'JUVM', 'BLACK',
       'BLACKM', 'BLACKF', 'HISP', 'HISPM', 'HISPF', 'WHITE', 'WHITEM',
       'WHITEF', 'ASIAN', 'ASIANM', 'ASIANF', 'AIAN', 'AIANM', 'AIANF',
       'NHOPI', 'NHPIM', 'NHPIF', 'OTHERRACE', 'ADDRACEM', 'ADDRACEF',
       'TWORACE', 'TWORACEM', 'TWORACEF', 'RACEDK', 'UNKRACEM', 'UNKRACEF',
       'NONCITZ', 'CUSCTZNM', 'CUSCTZNF', 'ADMIS', 'ADTOTM', 'ADTOTF',
       'CONFPOP', 'CUSTOTM', 'CUSTOTF', 'RELEASE', 'RELEASEF', 'RELEASEM',
       'CONV', 'CUSUNSM', 'CUSUNSF', 'RTIID', 'NAME', 'CITY', 'ZIP',
       'CNTYCODE', 'FELONY', 'MISD', 'UNCONV', 'WEEKN', 'STATE'],
      dtype='object')


Unnamed: 0,YEAR,STATEABBR,STATEID,REGION,JUVF,JUVM,BLACK,BLACKM,BLACKF,HISP,...,RTIID,NAME,CITY,ZIP,CNTYCODE,FELONY,MISD,UNCONV,WEEKN,STATE
0,1978,AK,2,4,-1,-1,,176,9,,...,,,,,,,,,,ALASKA
1,1979,AK,2,4,-1,-1,,81,7,,...,,,,,,,,,,ALASKA
2,1980,AK,2,4,-1,-1,,89,5,,...,,,,,,,,,,ALASKA
3,1981,AK,2,4,-1,-1,,126,13,,...,,,,,,,,,,ALASKA
4,1982,AK,2,4,-1,-1,,91,3,,...,,,,,,,,,,ALASKA


#### Working with jail data
* Loading the data into an individual dataframe instead of a list of dataframes

In [271]:
jailDataRaw = fileData[1]
jailDataRaw.drop(columns='Unnamed: 0')

Unnamed: 0,RTIID,GID,JURISID,COUNTY,NAME,YEAR,CITY,STATE,ZIP,STATEFIPS,...,CORRSTAFFF,CORRSTAFFF_FLAG,OTHERSTAFF,OTHERSTAFF_FLAG,OTHERSTAFFM,OTHERSTAFFM_FLAG,OTHERSTAFFF,OTHERSTAFFF_FLAG,TOTALSTAFF,TOTALSTAFF_FLAG
0,10956003,011002002061000000000,11002002,Baldwin County ...,Baldwin County Sheriff's Office ...,2015,Bay Minette,(01) Alabama,36507,1,...,31.0,(0) Reported,28.0,(0) Reported,3.0,(0) Reported,25.0,(0) Reported,119.0,(0) Reported
1,10956005,011004004061000000000,11004004,Bibb County ...,Bibb County Sheriffs Department ...,2015,Brent,(01) Alabama,35034,1,...,6.0,(1) Estimated by respondent,0.0,(0) Reported,0.0,(0) Reported,0.0,(0) Reported,15.0,(1) Estimated by respondent
2,10956016,011015015061000000000,11015015,Cleburne County ...,Cleburne County Sheriffs Office ...,2015,Heflin,(01) Alabama,36264,1,...,5.0,(0) Reported,1.0,(0) Reported,1.0,(0) Reported,0.0,(0) Reported,16.0,(0) Reported
3,10956023,011022022061000000000,11022022,Cullman County ...,Cullman County Sheriffs Office ...,2015,Cullman,(01) Alabama,35055,1,...,8.0,(0) Reported,15.0,(0) Reported,8.0,(0) Reported,7.0,(0) Reported,42.0,(0) Reported
4,10956029,011028028061000000000,11028028,Etowah County ...,Etowah County Sheriffs Office ...,2015,Gadsden,(01) Alabama,35901,1,...,11.0,(1) Estimated by respondent,23.0,(1) Estimated by respondent,15.0,(1) Estimated by respondent,8.0,(1) Estimated by respondent,84.0,(1) Estimated by respondent
5,10956030,011029029061000000000,11029029,Fayette County ...,Fayette County Sheriffs Office ...,2015,Fayette,(01) Alabama,35555,1,...,6.0,(0) Reported,1.0,(0) Reported,1.0,(0) Reported,0.0,(0) Reported,12.0,(0) Reported
6,10956036,011035035061000000000,11035035,"Dale County, Henry County, Houston County ...",Houston County Sheriffs Office ...,2015,Dothan,(01) Alabama,36303,1,...,21.0,(0) Reported,11.0,(0) Reported,4.0,(0) Reported,7.0,(0) Reported,74.0,(0) Reported
7,10950037,011037037060000000100,11037037,"Jefferson County, Shelby County ...",Jefferson County Sheriff's Office ...,2015,Birmingham,(01) Alabama,35203,1,...,40.0,(0) Reported,17.0,(0) Reported,3.0,(0) Reported,14.0,(0) Reported,165.0,(0) Reported
8,10956040,011040040061000000000,11040040,Lawrence County ...,Lawrence County Sheriffs Office ...,2015,Moulton,(01) Alabama,35650,1,...,2.0,(0) Reported,5.0,(1) Estimated by respondent,0.0,(0) Reported,5.0,(0) Reported,18.0,(0) Reported
9,10956041,011041041061000000000,11041041,Lee County ...,Lee County Sheriffs Office ...,2015,Opelika,(01) Alabama,36801,1,...,16.0,(0) Reported,13.0,(0) Reported,6.0,(0) Reported,7.0,(0) Reported,68.0,(0) Reported


#### Selecting a subset of the dataframe

In [272]:
jailDataSubset = jailDataRaw.loc[:,['YEAR','RTIID','NAME','CITY','STATE','STATEID','STATEABBR','ZIP','CNTYCODE','JUVF','JUVM','BLACK','HISP','WHITE','ASIAN','AIAN','NHOPI','OTHERRACE','TWORACE','RACEDK','NONCITZ','ADMIS','CONFPOP','RELEASE','RELEASEF','RELEASEM','CONV','FELONY','MISD','UNCONV','WEEKN']]
print(jailDataSubset.columns)
jailDataSubset.head()

Index(['YEAR', 'RTIID', 'NAME', 'CITY', 'STATE', 'STATEID', 'STATEABBR', 'ZIP',
       'CNTYCODE', 'JUVF', 'JUVM', 'BLACK', 'HISP', 'WHITE', 'ASIAN', 'AIAN',
       'NHOPI', 'OTHERRACE', 'TWORACE', 'RACEDK', 'NONCITZ', 'ADMIS',
       'CONFPOP', 'RELEASE', 'RELEASEF', 'RELEASEM', 'CONV', 'FELONY', 'MISD',
       'UNCONV', 'WEEKN'],
      dtype='object')


Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self._getitem_tuple(key)


Unnamed: 0,YEAR,RTIID,NAME,CITY,STATE,STATEID,STATEABBR,ZIP,CNTYCODE,JUVF,...,ADMIS,CONFPOP,RELEASE,RELEASEF,RELEASEM,CONV,FELONY,MISD,UNCONV,WEEKN
0,2015,10956003,Baldwin County Sheriff's Office ...,Bay Minette,(01) Alabama,,,36507,1003,0,...,7718,456,7834,2116,5718,49,187.0,269.0,407,
1,2015,10956005,Bibb County Sheriffs Department ...,Brent,(01) Alabama,,,35034,1007,0,...,1224,85,1224,282,942,40,85.0,0.0,45,
2,2015,10956016,Cleburne County Sheriffs Office ...,Heflin,(01) Alabama,,,36264,1029,0,...,731,59,727,182,545,10,40.0,19.0,49,
3,2015,10956023,Cullman County Sheriffs Office ...,Cullman,(01) Alabama,,,35055,1043,0,...,4418,325,4327,865,3462,210,141.0,95.0,115,5.0
4,2015,10956029,Etowah County Sheriffs Office ...,Gadsden,(01) Alabama,,,35901,1055,0,...,6180,634,6270,653,5617,66,220.0,165.0,568,


#### Adding columns to the dataframe to make it comptaible with prison dataframe.

In [273]:
jailDataSubset['REGION']='NA'
jailDataSubset['BLACKM']='NA'
jailDataSubset['BLACKF']='NA'
jailDataSubset['HISPM']='NA'
jailDataSubset['HISPF']='NA'
jailDataSubset['WHITEM']='NA'
jailDataSubset['WHITEF']='NA'
jailDataSubset['ASIANM']='NA'
jailDataSubset['ASIANF']='NA'
jailDataSubset['AIANM']='NA'
jailDataSubset['AIANF']='NA'
jailDataSubset['NHPIM']='NA'
jailDataSubset['NHPIF']='NA'
jailDataSubset['ADDRACEM']='NA'
jailDataSubset['ADDRACEF']='NA'
jailDataSubset['TWORACEM']='NA'
jailDataSubset['TWORACEF']='NA'
jailDataSubset['UNKRACEM']='NA'
jailDataSubset['UNKRACEF']='NA'
jailDataSubset['CUSCTZNM']='NA'
jailDataSubset['CUSCTZNF']='NA'
jailDataSubset['ADTOTM']='NA'
jailDataSubset['ADTOTF']='NA'
jailDataSubset['CUSTOTM']='NA'
jailDataSubset['CUSTOTF']='NA'
jailDataSubset['CUSUNSM']='NA'
jailDataSubset['CUSUNSF']='NA'
print(jailDataSubset.columns)
jailDataSubset.head()

Index(['YEAR', 'RTIID', 'NAME', 'CITY', 'STATE', 'STATEID', 'STATEABBR', 'ZIP',
       'CNTYCODE', 'JUVF', 'JUVM', 'BLACK', 'HISP', 'WHITE', 'ASIAN', 'AIAN',
       'NHOPI', 'OTHERRACE', 'TWORACE', 'RACEDK', 'NONCITZ', 'ADMIS',
       'CONFPOP', 'RELEASE', 'RELEASEF', 'RELEASEM', 'CONV', 'FELONY', 'MISD',
       'UNCONV', 'WEEKN', 'REGION', 'BLACKM', 'BLACKF', 'HISPM', 'HISPF',
       'WHITEM', 'WHITEF', 'ASIANM', 'ASIANF', 'AIANM', 'AIANF', 'NHPIM',
       'NHPIF', 'ADDRACEM', 'ADDRACEF', 'TWORACEM', 'TWORACEF', 'UNKRACEM',
       'UNKRACEF', 'CUSCTZNM', 'CUSCTZNF', 'ADTOTM', 'ADTOTF', 'CUSTOTM',
       'CUSTOTF', 'CUSUNSM', 'CUSUNSF'],
      dtype='object')


Unnamed: 0,YEAR,RTIID,NAME,CITY,STATE,STATEID,STATEABBR,ZIP,CNTYCODE,JUVF,...,UNKRACEM,UNKRACEF,CUSCTZNM,CUSCTZNF,ADTOTM,ADTOTF,CUSTOTM,CUSTOTF,CUSUNSM,CUSUNSF
0,2015,10956003,Baldwin County Sheriff's Office ...,Bay Minette,(01) Alabama,,,36507,1003,0,...,,,,,,,,,,
1,2015,10956005,Bibb County Sheriffs Department ...,Brent,(01) Alabama,,,35034,1007,0,...,,,,,,,,,,
2,2015,10956016,Cleburne County Sheriffs Office ...,Heflin,(01) Alabama,,,36264,1029,0,...,,,,,,,,,,
3,2015,10956023,Cullman County Sheriffs Office ...,Cullman,(01) Alabama,,,35055,1043,0,...,,,,,,,,,,
4,2015,10956029,Etowah County Sheriffs Office ...,Gadsden,(01) Alabama,,,35901,1055,0,...,,,,,,,,,,


### Data Cleaning steps

The STATE column contains both the STATECODE which is numeric and the STATE name. We want to separate them into independent columns. This stepwill help us efficiently utilize this data later.

In [274]:
jailDataSubset['A'],jailDataSubset['B'] = jailDataSubset['STATE'].str.split(' ', 1).str
print(jailDataSubset.columns)
jailDataSubset.head()

Index(['YEAR', 'RTIID', 'NAME', 'CITY', 'STATE', 'STATEID', 'STATEABBR', 'ZIP',
       'CNTYCODE', 'JUVF', 'JUVM', 'BLACK', 'HISP', 'WHITE', 'ASIAN', 'AIAN',
       'NHOPI', 'OTHERRACE', 'TWORACE', 'RACEDK', 'NONCITZ', 'ADMIS',
       'CONFPOP', 'RELEASE', 'RELEASEF', 'RELEASEM', 'CONV', 'FELONY', 'MISD',
       'UNCONV', 'WEEKN', 'REGION', 'BLACKM', 'BLACKF', 'HISPM', 'HISPF',
       'WHITEM', 'WHITEF', 'ASIANM', 'ASIANF', 'AIANM', 'AIANF', 'NHPIM',
       'NHPIF', 'ADDRACEM', 'ADDRACEF', 'TWORACEM', 'TWORACEF', 'UNKRACEM',
       'UNKRACEF', 'CUSCTZNM', 'CUSCTZNF', 'ADTOTM', 'ADTOTF', 'CUSTOTM',
       'CUSTOTF', 'CUSUNSM', 'CUSUNSF', 'A', 'B'],
      dtype='object')


Unnamed: 0,YEAR,RTIID,NAME,CITY,STATE,STATEID,STATEABBR,ZIP,CNTYCODE,JUVF,...,CUSCTZNM,CUSCTZNF,ADTOTM,ADTOTF,CUSTOTM,CUSTOTF,CUSUNSM,CUSUNSF,A,B
0,2015,10956003,Baldwin County Sheriff's Office ...,Bay Minette,(01) Alabama,,,36507,1003,0,...,,,,,,,,,(01),Alabama
1,2015,10956005,Bibb County Sheriffs Department ...,Brent,(01) Alabama,,,35034,1007,0,...,,,,,,,,,(01),Alabama
2,2015,10956016,Cleburne County Sheriffs Office ...,Heflin,(01) Alabama,,,36264,1029,0,...,,,,,,,,,(01),Alabama
3,2015,10956023,Cullman County Sheriffs Office ...,Cullman,(01) Alabama,,,35055,1043,0,...,,,,,,,,,(01),Alabama
4,2015,10956029,Etowah County Sheriffs Office ...,Gadsden,(01) Alabama,,,35901,1055,0,...,,,,,,,,,(01),Alabama


#### Dropping the STATE and STATEID columns to make sure we can use the new columns A and B instead.

In [275]:
jailDataSubset = jailDataSubset.drop(columns=['STATEID','STATE'])
print(jailDataSubset.columns)
jailDataSubset.head()

Index(['YEAR', 'RTIID', 'NAME', 'CITY', 'STATEABBR', 'ZIP', 'CNTYCODE', 'JUVF',
       'JUVM', 'BLACK', 'HISP', 'WHITE', 'ASIAN', 'AIAN', 'NHOPI', 'OTHERRACE',
       'TWORACE', 'RACEDK', 'NONCITZ', 'ADMIS', 'CONFPOP', 'RELEASE',
       'RELEASEF', 'RELEASEM', 'CONV', 'FELONY', 'MISD', 'UNCONV', 'WEEKN',
       'REGION', 'BLACKM', 'BLACKF', 'HISPM', 'HISPF', 'WHITEM', 'WHITEF',
       'ASIANM', 'ASIANF', 'AIANM', 'AIANF', 'NHPIM', 'NHPIF', 'ADDRACEM',
       'ADDRACEF', 'TWORACEM', 'TWORACEF', 'UNKRACEM', 'UNKRACEF', 'CUSCTZNM',
       'CUSCTZNF', 'ADTOTM', 'ADTOTF', 'CUSTOTM', 'CUSTOTF', 'CUSUNSM',
       'CUSUNSF', 'A', 'B'],
      dtype='object')


Unnamed: 0,YEAR,RTIID,NAME,CITY,STATEABBR,ZIP,CNTYCODE,JUVF,JUVM,BLACK,...,CUSCTZNM,CUSCTZNF,ADTOTM,ADTOTF,CUSTOTM,CUSTOTF,CUSUNSM,CUSUNSF,A,B
0,2015,10956003,Baldwin County Sheriff's Office ...,Bay Minette,,36507,1003,0,0,53,...,,,,,,,,,(01),Alabama
1,2015,10956005,Bibb County Sheriffs Department ...,Brent,,35034,1007,0,0,24,...,,,,,,,,,(01),Alabama
2,2015,10956016,Cleburne County Sheriffs Office ...,Heflin,,36264,1029,0,0,9,...,,,,,,,,,(01),Alabama
3,2015,10956023,Cullman County Sheriffs Office ...,Cullman,,35055,1043,0,0,37,...,,,,,,,,,(01),Alabama
4,2015,10956029,Etowah County Sheriffs Office ...,Gadsden,,35901,1055,0,2,150,...,,,,,,,,,(01),Alabama


#### Rename A and B to STATEID and STATE respectively. Also convert the STATE column values to uppercase.

In [276]:
jailDataSubset = jailDataSubset.rename(columns= {'A':'STATEID', 'B':'STATE'})
jailDataSubset['STATE'] = jailDataSubset['STATE'].str.upper()
print(jailDataSubset.columns)
jailDataSubset.head()

Index(['YEAR', 'RTIID', 'NAME', 'CITY', 'STATEABBR', 'ZIP', 'CNTYCODE', 'JUVF',
       'JUVM', 'BLACK', 'HISP', 'WHITE', 'ASIAN', 'AIAN', 'NHOPI', 'OTHERRACE',
       'TWORACE', 'RACEDK', 'NONCITZ', 'ADMIS', 'CONFPOP', 'RELEASE',
       'RELEASEF', 'RELEASEM', 'CONV', 'FELONY', 'MISD', 'UNCONV', 'WEEKN',
       'REGION', 'BLACKM', 'BLACKF', 'HISPM', 'HISPF', 'WHITEM', 'WHITEF',
       'ASIANM', 'ASIANF', 'AIANM', 'AIANF', 'NHPIM', 'NHPIF', 'ADDRACEM',
       'ADDRACEF', 'TWORACEM', 'TWORACEF', 'UNKRACEM', 'UNKRACEF', 'CUSCTZNM',
       'CUSCTZNF', 'ADTOTM', 'ADTOTF', 'CUSTOTM', 'CUSTOTF', 'CUSUNSM',
       'CUSUNSF', 'STATEID', 'STATE'],
      dtype='object')


Unnamed: 0,YEAR,RTIID,NAME,CITY,STATEABBR,ZIP,CNTYCODE,JUVF,JUVM,BLACK,...,CUSCTZNM,CUSCTZNF,ADTOTM,ADTOTF,CUSTOTM,CUSTOTF,CUSUNSM,CUSUNSF,STATEID,STATE
0,2015,10956003,Baldwin County Sheriff's Office ...,Bay Minette,,36507,1003,0,0,53,...,,,,,,,,,(01),ALABAMA
1,2015,10956005,Bibb County Sheriffs Department ...,Brent,,35034,1007,0,0,24,...,,,,,,,,,(01),ALABAMA
2,2015,10956016,Cleburne County Sheriffs Office ...,Heflin,,36264,1029,0,0,9,...,,,,,,,,,(01),ALABAMA
3,2015,10956023,Cullman County Sheriffs Office ...,Cullman,,35055,1043,0,0,37,...,,,,,,,,,(01),ALABAMA
4,2015,10956029,Etowah County Sheriffs Office ...,Gadsden,,35901,1055,0,2,150,...,,,,,,,,,(01),ALABAMA


#### Removing the extra characters from STATEID column and converting it into integer values

In [277]:
jailDataSubset['STATEID'] = jailDataSubset['STATEID'].str.replace("(",'')
jailDataSubset['STATEID'] = jailDataSubset['STATEID'].str.replace(")",'')
jailDataSubset['STATEID'] = pd.to_numeric(jailDataSubset['STATEID'])
print(jailDataSubset.columns)
jailDataSubset.head()

Index(['YEAR', 'RTIID', 'NAME', 'CITY', 'STATEABBR', 'ZIP', 'CNTYCODE', 'JUVF',
       'JUVM', 'BLACK', 'HISP', 'WHITE', 'ASIAN', 'AIAN', 'NHOPI', 'OTHERRACE',
       'TWORACE', 'RACEDK', 'NONCITZ', 'ADMIS', 'CONFPOP', 'RELEASE',
       'RELEASEF', 'RELEASEM', 'CONV', 'FELONY', 'MISD', 'UNCONV', 'WEEKN',
       'REGION', 'BLACKM', 'BLACKF', 'HISPM', 'HISPF', 'WHITEM', 'WHITEF',
       'ASIANM', 'ASIANF', 'AIANM', 'AIANF', 'NHPIM', 'NHPIF', 'ADDRACEM',
       'ADDRACEF', 'TWORACEM', 'TWORACEF', 'UNKRACEM', 'UNKRACEF', 'CUSCTZNM',
       'CUSCTZNF', 'ADTOTM', 'ADTOTF', 'CUSTOTM', 'CUSTOTF', 'CUSUNSM',
       'CUSUNSF', 'STATEID', 'STATE'],
      dtype='object')


Unnamed: 0,YEAR,RTIID,NAME,CITY,STATEABBR,ZIP,CNTYCODE,JUVF,JUVM,BLACK,...,CUSCTZNM,CUSCTZNF,ADTOTM,ADTOTF,CUSTOTM,CUSTOTF,CUSUNSM,CUSUNSF,STATEID,STATE
0,2015,10956003,Baldwin County Sheriff's Office ...,Bay Minette,,36507,1003,0,0,53,...,,,,,,,,,1,ALABAMA
1,2015,10956005,Bibb County Sheriffs Department ...,Brent,,35034,1007,0,0,24,...,,,,,,,,,1,ALABAMA
2,2015,10956016,Cleburne County Sheriffs Office ...,Heflin,,36264,1029,0,0,9,...,,,,,,,,,1,ALABAMA
3,2015,10956023,Cullman County Sheriffs Office ...,Cullman,,35055,1043,0,0,37,...,,,,,,,,,1,ALABAMA
4,2015,10956029,Etowah County Sheriffs Office ...,Gadsden,,35901,1055,0,2,150,...,,,,,,,,,1,ALABAMA


#### In order to make the jail dataframe similar to prison dataframe, we need to add STATECODE column. We can do that using the population dataframe and joining it with the jail dataframe.

In [278]:
jailDataSubset = jailDataSubset.set_index('STATE').join(populationData.set_index('STATE'),lsuffix='_jail', rsuffix='_population')
print(jailDataSubset.columns)
jailDataSubset.head()

Index(['YEAR', 'RTIID', 'NAME', 'CITY', 'STATEABBR', 'ZIP', 'CNTYCODE', 'JUVF',
       'JUVM', 'BLACK', 'HISP', 'WHITE', 'ASIAN', 'AIAN', 'NHOPI', 'OTHERRACE',
       'TWORACE', 'RACEDK', 'NONCITZ', 'ADMIS', 'CONFPOP', 'RELEASE',
       'RELEASEF', 'RELEASEM', 'CONV', 'FELONY', 'MISD', 'UNCONV', 'WEEKN',
       'REGION', 'BLACKM', 'BLACKF', 'HISPM', 'HISPF', 'WHITEM', 'WHITEF',
       'ASIANM', 'ASIANF', 'AIANM', 'AIANF', 'NHPIM', 'NHPIF', 'ADDRACEM',
       'ADDRACEF', 'TWORACEM', 'TWORACEF', 'UNKRACEM', 'UNKRACEF', 'CUSCTZNM',
       'CUSCTZNF', 'ADTOTM', 'ADTOTF', 'CUSTOTM', 'CUSTOTF', 'CUSUNSM',
       'CUSUNSF', 'STATEID', 'ID', 'STATECODE', 'TOTPOP'],
      dtype='object')


Unnamed: 0_level_0,YEAR,RTIID,NAME,CITY,STATEABBR,ZIP,CNTYCODE,JUVF,JUVM,BLACK,...,ADTOTM,ADTOTF,CUSTOTM,CUSTOTF,CUSUNSM,CUSUNSF,STATEID,ID,STATECODE,TOTPOP
STATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ALABAMA,2015,10956003,Baldwin County Sheriff's Office ...,Bay Minette,,36507,1003,0,0,53,...,,,,,,,1,0400000US01,AL,4858979
ALABAMA,2015,10956005,Bibb County Sheriffs Department ...,Brent,,35034,1007,0,0,24,...,,,,,,,1,0400000US01,AL,4858979
ALABAMA,2015,10956016,Cleburne County Sheriffs Office ...,Heflin,,36264,1029,0,0,9,...,,,,,,,1,0400000US01,AL,4858979
ALABAMA,2015,10956023,Cullman County Sheriffs Office ...,Cullman,,35055,1043,0,0,37,...,,,,,,,1,0400000US01,AL,4858979
ALABAMA,2015,10956029,Etowah County Sheriffs Office ...,Gadsden,,35901,1055,0,2,150,...,,,,,,,1,0400000US01,AL,4858979


#### Removing extra columns from the resulting dataframe and reindexing the dataframe. By reindexing, we change the index to the default numeric index. We need the STATE column so reindexing seems to be a logical choice.

In [279]:
jailDataSubset = jailDataSubset.drop(columns= ['STATEABBR','ID','TOTPOP'])
jailDataSubset = jailDataSubset.reset_index()
print(jailDataSubset.columns)
jailDataSubset.head()

Index(['STATE', 'YEAR', 'RTIID', 'NAME', 'CITY', 'ZIP', 'CNTYCODE', 'JUVF',
       'JUVM', 'BLACK', 'HISP', 'WHITE', 'ASIAN', 'AIAN', 'NHOPI', 'OTHERRACE',
       'TWORACE', 'RACEDK', 'NONCITZ', 'ADMIS', 'CONFPOP', 'RELEASE',
       'RELEASEF', 'RELEASEM', 'CONV', 'FELONY', 'MISD', 'UNCONV', 'WEEKN',
       'REGION', 'BLACKM', 'BLACKF', 'HISPM', 'HISPF', 'WHITEM', 'WHITEF',
       'ASIANM', 'ASIANF', 'AIANM', 'AIANF', 'NHPIM', 'NHPIF', 'ADDRACEM',
       'ADDRACEF', 'TWORACEM', 'TWORACEF', 'UNKRACEM', 'UNKRACEF', 'CUSCTZNM',
       'CUSCTZNF', 'ADTOTM', 'ADTOTF', 'CUSTOTM', 'CUSTOTF', 'CUSUNSM',
       'CUSUNSF', 'STATEID', 'STATECODE'],
      dtype='object')


Unnamed: 0,STATE,YEAR,RTIID,NAME,CITY,ZIP,CNTYCODE,JUVF,JUVM,BLACK,...,CUSCTZNM,CUSCTZNF,ADTOTM,ADTOTF,CUSTOTM,CUSTOTF,CUSUNSM,CUSUNSF,STATEID,STATECODE
0,ALABAMA,2015,10956003,Baldwin County Sheriff's Office ...,Bay Minette,36507,1003,0,0,53,...,,,,,,,,,1,AL
1,ALABAMA,2015,10956005,Bibb County Sheriffs Department ...,Brent,35034,1007,0,0,24,...,,,,,,,,,1,AL
2,ALABAMA,2015,10956016,Cleburne County Sheriffs Office ...,Heflin,36264,1029,0,0,9,...,,,,,,,,,1,AL
3,ALABAMA,2015,10956023,Cullman County Sheriffs Office ...,Cullman,35055,1043,0,0,37,...,,,,,,,,,1,AL
4,ALABAMA,2015,10956029,Etowah County Sheriffs Office ...,Gadsden,35901,1055,0,2,150,...,,,,,,,,,1,AL


#### Cleaning the county name column using split and updating STATECODE column

In [280]:
jailDataSubset['A'],jailDataSubset['B'] = jailDataSubset['NAME'].str.split(' ', 1).str
jailDataSubset = jailDataSubset.drop(columns=['B','NAME'])
jailDataSubset = jailDataSubset.rename(columns={'A':'NAME'})
jailDataSubset['NAME'] = jailDataSubset['NAME'].str.upper()
imprisonment['NAME'] = imprisonment['NAME'].str.strip()
jailDataSubset = jailDataSubset.rename(columns={'STATECODE':'STATEABBR'})
print(jailDataSubset.columns)
jailDataSubset.head()

Index(['STATE', 'YEAR', 'RTIID', 'CITY', 'ZIP', 'CNTYCODE', 'JUVF', 'JUVM',
       'BLACK', 'HISP', 'WHITE', 'ASIAN', 'AIAN', 'NHOPI', 'OTHERRACE',
       'TWORACE', 'RACEDK', 'NONCITZ', 'ADMIS', 'CONFPOP', 'RELEASE',
       'RELEASEF', 'RELEASEM', 'CONV', 'FELONY', 'MISD', 'UNCONV', 'WEEKN',
       'REGION', 'BLACKM', 'BLACKF', 'HISPM', 'HISPF', 'WHITEM', 'WHITEF',
       'ASIANM', 'ASIANF', 'AIANM', 'AIANF', 'NHPIM', 'NHPIF', 'ADDRACEM',
       'ADDRACEF', 'TWORACEM', 'TWORACEF', 'UNKRACEM', 'UNKRACEF', 'CUSCTZNM',
       'CUSCTZNF', 'ADTOTM', 'ADTOTF', 'CUSTOTM', 'CUSTOTF', 'CUSUNSM',
       'CUSUNSF', 'STATEID', 'STATEABBR', 'NAME'],
      dtype='object')


Unnamed: 0,STATE,YEAR,RTIID,CITY,ZIP,CNTYCODE,JUVF,JUVM,BLACK,HISP,...,CUSCTZNF,ADTOTM,ADTOTF,CUSTOTM,CUSTOTF,CUSUNSM,CUSUNSF,STATEID,STATEABBR,NAME
0,ALABAMA,2015,10956003,Bay Minette,36507,1003,0,0,53,67,...,,,,,,,,1,AL,BALDWIN
1,ALABAMA,2015,10956005,Brent,35034,1007,0,0,24,0,...,,,,,,,,1,AL,BIBB
2,ALABAMA,2015,10956016,Heflin,36264,1029,0,0,9,2,...,,,,,,,,1,AL,CLEBURNE
3,ALABAMA,2015,10956023,Cullman,35055,1043,0,0,37,19,...,,,,,,,,1,AL,CULLMAN
4,ALABAMA,2015,10956029,Gadsden,35901,1055,0,2,150,80,...,,,,,,,,1,AL,ETOWAH


## Appending the jail dataset and prison dataset

Before we append the two dataframes, we add an identification column based on the type of data

In [285]:
jailDataSubset['CODE'] = 'JAIL'
prisonDataSubset['CODE'] = 'PRISON'

In [329]:
imprisonment = prisonDataSubset.append(jailDataSubset, sort=False)
print(imprisonment.columns)
imprisonment.head()

Index(['YEAR', 'STATEABBR', 'STATEID', 'REGION', 'JUVF', 'JUVM', 'BLACK',
       'BLACKM', 'BLACKF', 'HISP', 'HISPM', 'HISPF', 'WHITE', 'WHITEM',
       'WHITEF', 'ASIAN', 'ASIANM', 'ASIANF', 'AIAN', 'AIANM', 'AIANF',
       'NHOPI', 'NHPIM', 'NHPIF', 'OTHERRACE', 'ADDRACEM', 'ADDRACEF',
       'TWORACE', 'TWORACEM', 'TWORACEF', 'RACEDK', 'UNKRACEM', 'UNKRACEF',
       'NONCITZ', 'CUSCTZNM', 'CUSCTZNF', 'ADMIS', 'ADTOTM', 'ADTOTF',
       'CONFPOP', 'CUSTOTM', 'CUSTOTF', 'RELEASE', 'RELEASEF', 'RELEASEM',
       'CONV', 'CUSUNSM', 'CUSUNSF', 'RTIID', 'NAME', 'CITY', 'ZIP',
       'CNTYCODE', 'FELONY', 'MISD', 'UNCONV', 'WEEKN', 'STATE', 'CODE'],
      dtype='object')


Unnamed: 0,YEAR,STATEABBR,STATEID,REGION,JUVF,JUVM,BLACK,BLACKM,BLACKF,HISP,...,NAME,CITY,ZIP,CNTYCODE,FELONY,MISD,UNCONV,WEEKN,STATE,CODE
0,1978,AK,2,4,-1,-1,,176,9,,...,,,,,,,,,ALASKA,PRISON
1,1979,AK,2,4,-1,-1,,81,7,,...,,,,,,,,,ALASKA,PRISON
2,1980,AK,2,4,-1,-1,,89,5,,...,,,,,,,,,ALASKA,PRISON
3,1981,AK,2,4,-1,-1,,126,13,,...,,,,,,,,,ALASKA,PRISON
4,1982,AK,2,4,-1,-1,,91,3,,...,,,,,,,,,ALASKA,PRISON


## Cleaning the data from the dataframe

We notice there are a lot of negative and garbage values in the columns. We can clean them up and insert zeros in place of negative numbers.

In [330]:
for column in imprisonment:
    imprisonment.loc[(imprisonment[column] == -2) | (imprisonment[column] == -1) | (imprisonment[column] == -8) | (imprisonment[column] == -9), column] = 0

In [331]:
print(imprisonment.columns)
imprisonment.head()

Index(['YEAR', 'STATEABBR', 'STATEID', 'REGION', 'JUVF', 'JUVM', 'BLACK',
       'BLACKM', 'BLACKF', 'HISP', 'HISPM', 'HISPF', 'WHITE', 'WHITEM',
       'WHITEF', 'ASIAN', 'ASIANM', 'ASIANF', 'AIAN', 'AIANM', 'AIANF',
       'NHOPI', 'NHPIM', 'NHPIF', 'OTHERRACE', 'ADDRACEM', 'ADDRACEF',
       'TWORACE', 'TWORACEM', 'TWORACEF', 'RACEDK', 'UNKRACEM', 'UNKRACEF',
       'NONCITZ', 'CUSCTZNM', 'CUSCTZNF', 'ADMIS', 'ADTOTM', 'ADTOTF',
       'CONFPOP', 'CUSTOTM', 'CUSTOTF', 'RELEASE', 'RELEASEF', 'RELEASEM',
       'CONV', 'CUSUNSM', 'CUSUNSF', 'RTIID', 'NAME', 'CITY', 'ZIP',
       'CNTYCODE', 'FELONY', 'MISD', 'UNCONV', 'WEEKN', 'STATE', 'CODE'],
      dtype='object')


Unnamed: 0,YEAR,STATEABBR,STATEID,REGION,JUVF,JUVM,BLACK,BLACKM,BLACKF,HISP,...,NAME,CITY,ZIP,CNTYCODE,FELONY,MISD,UNCONV,WEEKN,STATE,CODE
0,1978,AK,2,4,0,0,,176,9,,...,,,,,,,,,ALASKA,PRISON
1,1979,AK,2,4,0,0,,81,7,,...,,,,,,,,,ALASKA,PRISON
2,1980,AK,2,4,0,0,,89,5,,...,,,,,,,,,ALASKA,PRISON
3,1981,AK,2,4,0,0,,126,13,,...,,,,,,,,,ALASKA,PRISON
4,1982,AK,2,4,0,0,,91,3,,...,,,,,,,,,ALASKA,PRISON


## Performing basic operations on the data

We can make simple calculations like summation of different column values to get a third value

In [332]:
imprisonment.loc[imprisonment['CODE'] == 'PRISON',['CONFPOP']] = imprisonment.loc[imprisonment['CODE'] == 'PRISON','CUSTOTM']  + imprisonment.loc[imprisonment['CODE'] == 'PRISON','CUSTOTF']

In [398]:
imprisonment.loc[imprisonment['CODE'] == 'PRISON',['CONFPOP','CUSTOTM','CUSTOTF','CODE']]


Unnamed: 0,CONFPOP,CUSTOTM,CUSTOTF,CODE
0,0.0,0,0,PRISON
1,0.0,0,0,PRISON
2,0.0,0,0,PRISON
3,0.0,0,0,PRISON
4,0.0,0,0,PRISON
5,1350.0,1284,66,PRISON
6,1698.0,1627,71,PRISON
7,1929.0,1826,103,PRISON
8,2447.0,2344,103,PRISON
9,2118.0,2022,96,PRISON


# Simple practice exercise
* Load the file named Testing.csv into a dataframe called "practice" present in the testing folder under LoveDataWeekFiles
* Extract just the first name of the county from the column County
* Perform a join with Population dataframe to get the StateAbbr and StateId values
* Rename the column TOTPOP to TotalPopulation of the state
* Create a new column called StateCounty by concatenating the County first name and the State name

# Joining Imprisonment and Slavery dataframes

In [400]:
print(imprisonment.columns)
imprisonment.head()

Index(['YEAR', 'STATEABBR', 'STATEID', 'REGION', 'JUVF', 'JUVM', 'BLACK',
       'BLACKM', 'BLACKF', 'HISP', 'HISPM', 'HISPF', 'WHITE', 'WHITEM',
       'WHITEF', 'ASIAN', 'ASIANM', 'ASIANF', 'AIAN', 'AIANM', 'AIANF',
       'NHOPI', 'NHPIM', 'NHPIF', 'OTHERRACE', 'ADDRACEM', 'ADDRACEF',
       'TWORACE', 'TWORACEM', 'TWORACEF', 'RACEDK', 'UNKRACEM', 'UNKRACEF',
       'NONCITZ', 'CUSCTZNM', 'CUSCTZNF', 'ADMIS', 'ADTOTM', 'ADTOTF',
       'CONFPOP', 'CUSTOTM', 'CUSTOTF', 'RELEASE', 'RELEASEF', 'RELEASEM',
       'CONV', 'CUSUNSM', 'CUSUNSF', 'RTIID', 'NAME', 'CITY', 'ZIP',
       'CNTYCODE', 'FELONY', 'MISD', 'UNCONV', 'WEEKN', 'STATE', 'CODE'],
      dtype='object')


Unnamed: 0,YEAR,STATEABBR,STATEID,REGION,JUVF,JUVM,BLACK,BLACKM,BLACKF,HISP,...,NAME,CITY,ZIP,CNTYCODE,FELONY,MISD,UNCONV,WEEKN,STATE,CODE
0,1978,AK,2,4,0,0,,176,9,,...,,,,,,,,,ALASKA,PRISON
1,1979,AK,2,4,0,0,,81,7,,...,,,,,,,,,ALASKA,PRISON
2,1980,AK,2,4,0,0,,89,5,,...,,,,,,,,,ALASKA,PRISON
3,1981,AK,2,4,0,0,,126,13,,...,,,,,,,,,ALASKA,PRISON
4,1982,AK,2,4,0,0,,91,3,,...,,,,,,,,,ALASKA,PRISON


In [409]:
print(slaveryRaw.columns)
slaveryRaw.head()

Index(['CountyState', 'Year', 'StateCode', 'NameStateCounty', 'CountyID',
       'WhiteMales', 'WhiteFemales', 'AggrWhites', 'FreeColoredMales',
       'FreeColoredFemales', 'AggrColored', 'MaleSlaves', 'FemaleSlaves',
       'AggrSlaves', 'OneSlave', 'TwoSlaves', 'ThreeSlaves', 'FourSlaves',
       'FiveSlaves', 'SixSlaves', 'SevenSlaves', 'EigthSlaves', 'NineSlaves',
       'TenFourSlaves', 'FifteenNineSlaves', 'TwentyNineSlaves',
       'ThirtyNineSlaves', 'FortyNineSlaves', 'FiftySixNineSlaves',
       'SeventyNineSlaves', 'OneHundSlaves', 'TwoHundSlaves',
       'ThreeHundSlaves', 'OneThouSlaves', 'GreaterSlaves',
       'TotalSlaveholders', 'TotalSlaves', 'Families', 'TotalFree'],
      dtype='object')


Unnamed: 0,CountyState,Year,StateCode,NameStateCounty,CountyID,WhiteMales,WhiteFemales,AggrWhites,FreeColoredMales,FreeColoredFemales,...,SeventyNineSlaves,OneHundSlaves,TwoHundSlaves,ThreeHundSlaves,OneThouSlaves,GreaterSlaves,TotalSlaveholders,TotalSlaves,Families,TotalFree
0,S,860,1,CONNECTICUT,0,221851,229653,451504,4136,4491,...,0,0,0,0,0,0,0,0,94831,460147
1,C,860,1,FAIRFIELD,10,36614,39186,75800,790,886,...,0,0,0,0,0,0,0,0,16102,77476
2,C,860,1,HARTFORD,30,43766,44877,88643,671,648,...,0,0,0,0,0,0,0,0,17927,89962
3,C,860,1,LITCHFIELD,50,23001,23206,46207,577,534,...,0,0,0,0,0,0,0,0,9701,47318
4,C,860,1,MIDDLESEX,70,14771,15751,30522,153,184,...,0,0,0,0,0,0,0,0,7068,30859


In [420]:
final = imprisonment.set_index('NAME').join(slaveryRaw.set_index('NameStateCounty'),lsuffix='_imprison', rsuffix='_slavery', how='right')
final = final.reset_index()
final = final.rename(columns = {'index':'COUNTY'})
final.head()

Unnamed: 0,index,YEAR,STATEABBR,STATEID,REGION,JUVF,JUVM,BLACK,BLACKM,BLACKF,...,SeventyNineSlaves,OneHundSlaves,TwoHundSlaves,ThreeHundSlaves,OneThouSlaves,GreaterSlaves,TotalSlaveholders,TotalSlaves,Families,TotalFree
0,ABBEVILLE,,,,,,,,,,...,20,9,0,0,0,0,1467,20502,2244,11883
1,ACCOMACK,,,,,,,,,,...,0,0,0,0,0,0,773,4507,2892,14079
2,ADAIR,2015.0,KY,18.0,,0.0,0.0,20.0,,,...,0,0,0,0,0,0,0,0,188,984
3,ADAIR,2015.0,KY,18.0,,0.0,0.0,20.0,,,...,0,0,0,0,0,0,33,86,1475,8445
4,ADAIR,2015.0,KY,18.0,,0.0,0.0,20.0,,,...,0,0,0,0,0,0,331,1602,1382,7907


In [435]:
final.to_csv('ImprisonSlaveryCombined.csv')