# U.S.A. Crimes in 2014 #

In [1]:
import pandas as pd
import numpy as np
import matplotlib as mtl
import matplotlib.pyplot as plt
import seaborn as sns
import os
import folium
import json
from branca.colormap import linear

pd.options.mode.chained_assignment = None

For first **we read the orginal *datasat* and we convert it as a *Pandas dataframe*.**

In [2]:
crimes2014 = os.path.join('data', 'Table_5_Crime_in_the_United_States_by_State_2014.xls')
data1 = pd.read_excel(crimes2014)
data1

Unnamed: 0,State,Area,Unnamed: 2,Population,Violent crime,Murder and nonnegligent manslaughter,Rape (revised definition),Rape (legacy definition),Robbery,Aggravated assault,Property crime,Burglary,Larceny-theft,Motor vehicle theft
0,ALABAMA,Metropolitan Statistical Area,Area actually reporting,3692100,16204.0,232,1435.0,1033.0,4131.0,10406.0,118226.0,29903.0,80430.0,7893.0
1,ALABAMA,Cities outside metropolitan areas,Area actually reporting,529129,2605.0,17,294.0,214.0,363.0,1931.0,21747.0,5005.0,15516.0,1226.0
2,ALABAMA,Nonmetropolitan counties,Area actually reporting,628148,1247.0,22,199.0,133.0,86.0,940.0,8683.0,3357.0,4663.0,663.0
3,ALABAMA,State Total,Total,4849377,20727.0,276,2005.0,1436.0,4701.0,13745.0,154094.0,39715.0,104238.0,10141.0
4,ALASKA,Metropolitan Statistical Area,Area actually reporting,351408,2899.0,17,442.0,329.0,557.0,1883.0,13905.0,1626.0,11148.0,1131.0
5,ALASKA,Cities outside metropolitan areas,Area actually reporting,128681,722.0,3,128.0,93.0,45.0,546.0,3362.0,413.0,2702.0,247.0
6,ALASKA,Nonmetropolitan counties,Area actually reporting,256643,1037.0,21,196.0,128.0,25.0,795.0,2949.0,1097.0,1500.0,352.0
7,ALASKA,State Total,Total,736732,4684.0,41,771.0,555.0,629.0,3243.0,20334.0,3150.0,15445.0,1739.0
8,ARIZONA,Metropolitan Statistical Area,Area actually reporting,6382968,24006.0,288,2762.0,2024.0,6117.0,14839.0,201780.0,39504.0,146202.0,16074.0
9,ARIZONA,Cities outside metropolitan areas,Area actually reporting,123617,1721.0,20,452.0,323.0,55.0,1194.0,7020.0,2111.0,3996.0,913.0


In the *dataframe* there is a column with a wrong name: "**Unnamed: 2**". We rename it as "**DetailArea**".

In [3]:
new_columns = data1.columns.values
new_columns[2] = 'DetailArea'
data1

Unnamed: 0,State,Area,DetailArea,Population,Violent crime,Murder and nonnegligent manslaughter,Rape (revised definition),Rape (legacy definition),Robbery,Aggravated assault,Property crime,Burglary,Larceny-theft,Motor vehicle theft
0,ALABAMA,Metropolitan Statistical Area,Area actually reporting,3692100,16204.0,232,1435.0,1033.0,4131.0,10406.0,118226.0,29903.0,80430.0,7893.0
1,ALABAMA,Cities outside metropolitan areas,Area actually reporting,529129,2605.0,17,294.0,214.0,363.0,1931.0,21747.0,5005.0,15516.0,1226.0
2,ALABAMA,Nonmetropolitan counties,Area actually reporting,628148,1247.0,22,199.0,133.0,86.0,940.0,8683.0,3357.0,4663.0,663.0
3,ALABAMA,State Total,Total,4849377,20727.0,276,2005.0,1436.0,4701.0,13745.0,154094.0,39715.0,104238.0,10141.0
4,ALASKA,Metropolitan Statistical Area,Area actually reporting,351408,2899.0,17,442.0,329.0,557.0,1883.0,13905.0,1626.0,11148.0,1131.0
5,ALASKA,Cities outside metropolitan areas,Area actually reporting,128681,722.0,3,128.0,93.0,45.0,546.0,3362.0,413.0,2702.0,247.0
6,ALASKA,Nonmetropolitan counties,Area actually reporting,256643,1037.0,21,196.0,128.0,25.0,795.0,2949.0,1097.0,1500.0,352.0
7,ALASKA,State Total,Total,736732,4684.0,41,771.0,555.0,629.0,3243.0,20334.0,3150.0,15445.0,1739.0
8,ARIZONA,Metropolitan Statistical Area,Area actually reporting,6382968,24006.0,288,2762.0,2024.0,6117.0,14839.0,201780.0,39504.0,146202.0,16074.0
9,ARIZONA,Cities outside metropolitan areas,Area actually reporting,123617,1721.0,20,452.0,323.0,55.0,1194.0,7020.0,2111.0,3996.0,913.0


Inside the *datafarme* there are rows useless for our purpose: the rows which contains the *strings* "**State Total**" or "**Total**" value under "**Area**" column. We delete them from the original dataset.

In [4]:
data1 = data1[data1.Area != 'State Total']
data1 = data1[data1.Area != 'Total']
data1

Unnamed: 0,State,Area,DetailArea,Population,Violent crime,Murder and nonnegligent manslaughter,Rape (revised definition),Rape (legacy definition),Robbery,Aggravated assault,Property crime,Burglary,Larceny-theft,Motor vehicle theft
0,ALABAMA,Metropolitan Statistical Area,Area actually reporting,3692100,16204.0,232,1435.0,1033.0,4131.0,10406.0,118226.0,29903.0,80430.0,7893.0
1,ALABAMA,Cities outside metropolitan areas,Area actually reporting,529129,2605.0,17,294.0,214.0,363.0,1931.0,21747.0,5005.0,15516.0,1226.0
2,ALABAMA,Nonmetropolitan counties,Area actually reporting,628148,1247.0,22,199.0,133.0,86.0,940.0,8683.0,3357.0,4663.0,663.0
4,ALASKA,Metropolitan Statistical Area,Area actually reporting,351408,2899.0,17,442.0,329.0,557.0,1883.0,13905.0,1626.0,11148.0,1131.0
5,ALASKA,Cities outside metropolitan areas,Area actually reporting,128681,722.0,3,128.0,93.0,45.0,546.0,3362.0,413.0,2702.0,247.0
6,ALASKA,Nonmetropolitan counties,Area actually reporting,256643,1037.0,21,196.0,128.0,25.0,795.0,2949.0,1097.0,1500.0,352.0
8,ARIZONA,Metropolitan Statistical Area,Area actually reporting,6382968,24006.0,288,2762.0,2024.0,6117.0,14839.0,201780.0,39504.0,146202.0,16074.0
9,ARIZONA,Cities outside metropolitan areas,Area actually reporting,123617,1721.0,20,452.0,323.0,55.0,1194.0,7020.0,2111.0,3996.0,913.0
10,ARIZONA,Nonmetropolitan counties,Area actually reporting,224899,433.0,1,13.0,10.0,24.0,395.0,2074.0,836.0,1069.0,169.0
12,ARKANSAS,Metropolitan Statistical Area,Area actually reporting,1818380,9939.0,109,958.0,699.0,1671.0,7201.0,65236.0,15309.0,46042.0,3885.0


The number below the columns "**Population**" and "**'Murder and nonnegligent manslaughter'**" are *objects* so we convert them as *float*.

In [5]:
data1['Population'] = pd.to_numeric(data1['Population'], errors='coerce')
data1['Murder and nonnegligent manslaughter'] = pd.to_numeric(data1['Murder and nonnegligent manslaughter'], errors='coerce')
data1

Unnamed: 0,State,Area,DetailArea,Population,Violent crime,Murder and nonnegligent manslaughter,Rape (revised definition),Rape (legacy definition),Robbery,Aggravated assault,Property crime,Burglary,Larceny-theft,Motor vehicle theft
0,ALABAMA,Metropolitan Statistical Area,Area actually reporting,3692100.0,16204.0,232.0,1435.0,1033.0,4131.0,10406.0,118226.0,29903.0,80430.0,7893.0
1,ALABAMA,Cities outside metropolitan areas,Area actually reporting,529129.0,2605.0,17.0,294.0,214.0,363.0,1931.0,21747.0,5005.0,15516.0,1226.0
2,ALABAMA,Nonmetropolitan counties,Area actually reporting,628148.0,1247.0,22.0,199.0,133.0,86.0,940.0,8683.0,3357.0,4663.0,663.0
4,ALASKA,Metropolitan Statistical Area,Area actually reporting,351408.0,2899.0,17.0,442.0,329.0,557.0,1883.0,13905.0,1626.0,11148.0,1131.0
5,ALASKA,Cities outside metropolitan areas,Area actually reporting,128681.0,722.0,3.0,128.0,93.0,45.0,546.0,3362.0,413.0,2702.0,247.0
6,ALASKA,Nonmetropolitan counties,Area actually reporting,256643.0,1037.0,21.0,196.0,128.0,25.0,795.0,2949.0,1097.0,1500.0,352.0
8,ARIZONA,Metropolitan Statistical Area,Area actually reporting,6382968.0,24006.0,288.0,2762.0,2024.0,6117.0,14839.0,201780.0,39504.0,146202.0,16074.0
9,ARIZONA,Cities outside metropolitan areas,Area actually reporting,123617.0,1721.0,20.0,452.0,323.0,55.0,1194.0,7020.0,2111.0,3996.0,913.0
10,ARIZONA,Nonmetropolitan counties,Area actually reporting,224899.0,433.0,1.0,13.0,10.0,24.0,395.0,2074.0,836.0,1069.0,169.0
12,ARKANSAS,Metropolitan Statistical Area,Area actually reporting,1818380.0,9939.0,109.0,958.0,699.0,1671.0,7201.0,65236.0,15309.0,46042.0,3885.0


We load the **GeoJSON** for the states geographical areas of **USA**.

In [6]:
us_states = os.path.join('data','us-states.json')

USstates = json.load(open(us_states))
USstates

{'features': [{'geometry': {'coordinates': [[[-87.359296, 35.00118],
      [-85.606675, 34.984749],
      [-85.431413, 34.124869],
      [-85.184951, 32.859696],
      [-85.069935, 32.580372],
      [-84.960397, 32.421541],
      [-85.004212, 32.322956],
      [-84.889196, 32.262709],
      [-85.058981, 32.13674],
      [-85.053504, 32.01077],
      [-85.141136, 31.840985],
      [-85.042551, 31.539753],
      [-85.113751, 31.27686],
      [-85.004212, 31.003013],
      [-85.497137, 30.997536],
      [-87.600282, 30.997536],
      [-87.633143, 30.86609],
      [-87.408589, 30.674397],
      [-87.446927, 30.510088],
      [-87.37025, 30.427934],
      [-87.518128, 30.280057],
      [-87.655051, 30.247195],
      [-87.90699, 30.411504],
      [-87.934375, 30.657966],
      [-88.011052, 30.685351],
      [-88.10416, 30.499135],
      [-88.137022, 30.318396],
      [-88.394438, 30.367688],
      [-88.471115, 31.895754],
      [-88.241084, 33.796253],
      [-88.098683, 34.891641],
      [-

We convert the values inside the "**features**" attributes as a new dataframe.

In [7]:
df_reg = pd.DataFrame(USstates['features'])
df_reg

Unnamed: 0,geometry,id,properties,type
0,"{'type': 'Polygon', 'coordinates': [[[-87.3592...",AL,{'name': 'Alabama'},Feature
1,"{'type': 'MultiPolygon', 'coordinates': [[[[-1...",AK,{'name': 'Alaska'},Feature
2,"{'type': 'Polygon', 'coordinates': [[[-109.042...",AZ,{'name': 'Arizona'},Feature
3,"{'type': 'Polygon', 'coordinates': [[[-94.4738...",AR,{'name': 'Arkansas'},Feature
4,"{'type': 'Polygon', 'coordinates': [[[-123.233...",CA,{'name': 'California'},Feature
5,"{'type': 'Polygon', 'coordinates': [[[-107.919...",CO,{'name': 'Colorado'},Feature
6,"{'type': 'Polygon', 'coordinates': [[[-73.0535...",CT,{'name': 'Connecticut'},Feature
7,"{'type': 'Polygon', 'coordinates': [[[-75.4140...",DE,{'name': 'Delaware'},Feature
8,"{'type': 'Polygon', 'coordinates': [[[-85.4971...",FL,{'name': 'Florida'},Feature
9,"{'type': 'Polygon', 'coordinates': [[[-83.1091...",GA,{'name': 'Georgia'},Feature


Inside the column "**id**" there are the ids of all states.

In [8]:
df_reg['id']

0     AL
1     AK
2     AZ
3     AR
4     CA
5     CO
6     CT
7     DE
8     FL
9     GA
10    HI
11    ID
12    IL
13    IN
14    IA
15    KS
16    KY
17    LA
18    ME
19    MD
20    MA
21    MI
22    MN
23    MS
24    MO
25    MT
26    NE
27    NV
28    NH
29    NJ
30    NM
31    NY
32    NC
33    ND
34    OH
35    OK
36    OR
37    PA
38    RI
39    SC
40    SD
41    TN
42    TX
43    UT
44    VT
45    VA
46    WA
47    WV
48    WI
49    WY
Name: id, dtype: object

**We insert all the ids inside a new *dataframe* named "mapping". We will use this *dataframe* to map all the states inside the main *dataframe* with their ids.**

In [9]:
mapping = pd.DataFrame(data = df_reg['id'], columns=['id'])

mapping

Unnamed: 0,id
0,AL
1,AK
2,AZ
3,AR
4,CA
5,CO
6,CT
7,DE
8,FL
9,GA


**We associate to each state id the corresponding state name taken from df_reg dataset inside "properties" column.**

In [10]:
mapping['name'] = [df_reg['properties'][i]['name'] for i in range(50)]
mapping     

Unnamed: 0,id,name
0,AL,Alabama
1,AK,Alaska
2,AZ,Arizona
3,AR,Arkansas
4,CA,California
5,CO,Colorado
6,CT,Connecticut
7,DE,Delaware
8,FL,Florida
9,GA,Georgia


Inside the main *dataframe* the state names are all written with capital letters; so **in order to have consistency with it we capitalize all the state names inside the *dataframe* "mapping".**

In [11]:
mapping['name'] = mapping['name'].str.upper()
mapping

Unnamed: 0,id,name
0,AL,ALABAMA
1,AK,ALASKA
2,AZ,ARIZONA
3,AR,ARKANSAS
4,CA,CALIFORNIA
5,CO,COLORADO
6,CT,CONNECTICUT
7,DE,DELAWARE
8,FL,FLORIDA
9,GA,GEORGIA


**We set the state names as row indexes inside "mapping".**

In [12]:
mapping.set_index('name', inplace=True)
mapping

Unnamed: 0_level_0,id
name,Unnamed: 1_level_1
ALABAMA,AL
ALASKA,AK
ARIZONA,AZ
ARKANSAS,AR
CALIFORNIA,CA
COLORADO,CO
CONNECTICUT,CT
DELAWARE,DE
FLORIDA,FL
GEORGIA,GA


We calculate a new dataframe with the **number of violent crimes per state**. The procedure use to do this is the same used to calculate the total population dataframe the difference is that we sum up the number of violent crimes.

In [13]:
stateCrimes = data1.groupby('State')[['Violent crime']].sum()
stateCrimes

Unnamed: 0_level_0,Violent crime
State,Unnamed: 1_level_1
ALABAMA,20056.0
ALASKA,4658.0
ARIZONA,26160.0
ARKANSAS,13628.0
CALIFORNIA,153688.0
COLORADO,16362.0
CONNECTICUT,8522.0
DELAWARE,4576.0
DISTRICT OF COLUMBIA,8199.0
FLORIDA,107241.0


**We insert into the *dataframe* with all the number of violent crimes per state the corresponding id for each state** taken from "mapping" *dataframe*.

In [14]:
stateCrimes['id'] = mapping['id']
stateCrimes

Unnamed: 0_level_0,Violent crime,id
State,Unnamed: 1_level_1,Unnamed: 2_level_1
ALABAMA,20056.0,AL
ALASKA,4658.0,AK
ARIZONA,26160.0,AZ
ARKANSAS,13628.0,AR
CALIFORNIA,153688.0,CA
COLORADO,16362.0,CO
CONNECTICUT,8522.0,CT
DELAWARE,4576.0,DE
DISTRICT OF COLUMBIA,8199.0,
FLORIDA,107241.0,FL


**We noticed that inside the main dataset there are states not present inside the GeoJSON**. In matter of fact inside the violent crimes per state *dataframe* there are rows with *NaN* inside del column "id". This states are: **DISTRICT OF COLUMBIA** and **PUERTO RICO**. We delete this two rows from the crimes per state *dataframe* since, if they are not present inside the GeoJSON file, those data are useless in order to visualize the number of crimes on map.

In [15]:
stateCrimes = stateCrimes.dropna()
stateCrimes

Unnamed: 0_level_0,Violent crime,id
State,Unnamed: 1_level_1,Unnamed: 2_level_1
ALABAMA,20056.0,AL
ALASKA,4658.0,AK
ARIZONA,26160.0,AZ
ARKANSAS,13628.0,AR
CALIFORNIA,153688.0,CA
COLORADO,16362.0,CO
CONNECTICUT,8522.0,CT
DELAWARE,4576.0,DE
FLORIDA,107241.0,FL
GEORGIA,37317.0,GA


We set up the **color palette from the minimum number of violent crimes per state to the maximum value**.

In [16]:
colormap = linear.Paired.scale(
    stateCrimes.loc[:,'Violent crime'].min(),
    stateCrimes.loc[:,'Violent crime'].max())


colormap

In the next step we create a dictionary wich has as key the state id and as value the amount of violent crime for the corresponding state.

In [17]:
datadict = {stateCrimes.loc[i]['id']:stateCrimes.loc[i]['Violent crime'] for i in stateCrimes.index}
datadict

{'AK': 4658.0,
 'AL': 20056.0,
 'AR': 13628.0,
 'AZ': 26160.0,
 'CA': 153688.0,
 'CO': 16362.0,
 'CT': 8522.0,
 'DE': 4576.0,
 'FL': 107241.0,
 'GA': 37317.0,
 'HI': 3680.0,
 'IA': 8134.0,
 'ID': 3467.0,
 'IL': 46475.0,
 'IN': 22342.0,
 'KS': 9944.0,
 'KY': 9304.0,
 'LA': 23275.0,
 'MA': 25943.0,
 'MD': 26661.0,
 'ME': 1700.0,
 'MI': 41843.0,
 'MN': 12504.0,
 'MO': 26846.0,
 'MS': 5257.0,
 'MT': 3221.0,
 'NC': 32235.0,
 'ND': 1959.0,
 'NE': 5117.0,
 'NH': 2448.0,
 'NJ': 23346.0,
 'NM': 12328.0,
 'NV': 18033.0,
 'NY': 75281.0,
 'OH': 31339.0,
 'OK': 15744.0,
 'OR': 8675.0,
 'PA': 40041.0,
 'RI': 2313.0,
 'SC': 23680.0,
 'SD': 2686.0,
 'TN': 39740.0,
 'TX': 109017.0,
 'UT': 6316.0,
 'VA': 16331.0,
 'VT': 622.0,
 'WA': 20107.0,
 'WI': 16496.0,
 'WV': 4932.0,
 'WY': 1107.0}

We use this new *dictionary* to report all the violent crimes values inside a map. As you can see **much more a state color tends to red and much more the state has an high numebr of violent crimes. States with a color which tends to blue have a lower number of violent crimes.**

In [18]:
kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

**We noticed from the map that all the states have relatively low number of violent crimes respect to California which has the highest number of violent crimes in 2014. We noticed also that there are few state wich have a medium value of violent crimes; in matter of fact just New York state has green color fairly well defined. Other states with a high number of violent crimes are Texas, Florida.**

In the next cell **we normalize the data applying logarithmic function** to each number of violent crime per state. In this way **we try to obtain a much more clear visualization on map avoiding to have so much difference between the highest value (fro California) and the other states**.

In [19]:
stateCrimes['Violent crime'] = np.log(stateCrimes['Violent crime'])

colormap = linear.Paired.scale(
    stateCrimes.loc[:,'Violent crime'].min(),
    stateCrimes.loc[:,'Violent crime'].max())

datadict = {stateCrimes.loc[i]['id']:stateCrimes.loc[i]['Violent crime'] for i in stateCrimes.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

Of course it's easy to see **from this new map a much more clear distribution of violent crimes in USA**. The states with the highest number of violent crimes are always **California**, **Texas** and **Florida**. But we can also see other states with an high number of violent crimes.

In [20]:
stateCrimes[stateCrimes.loc[:, 'Violent crime'] == stateCrimes.loc[:,'Violent crime'].min()]

Unnamed: 0_level_0,Violent crime,id
State,Unnamed: 1_level_1,Unnamed: 2_level_1
VERMONT,6.43294,VT


In [21]:
stateCrimes[stateCrimes.loc[:, 'Violent crime'] == stateCrimes.loc[:,'Violent crime'].max()]

Unnamed: 0_level_0,Violent crime,id
State,Unnamed: 1_level_1,Unnamed: 2_level_1
CALIFORNIA,11.94268,CA


**By exploiting the population map in the next cells we can see that there is a strong relation from violent crimes and the states with high population or with big metropolis**. States with high population tends to have and higher number of violent crimes (eg. **New York state**, **Illinois**, **Boston state** etc..) and states with lower poplations tends to have lower numebr of violent crimes (eg. **Maine**, **Vermont**, **North/South Dakota**, **Montana**, **Idaho** etc..). The state with the lowest number of violent crimes is **Vermont**. It's also is noteworthy that **Alaska has an high number of violent crimes although it is one of the less populated states**.

We calculate a new *dataframe* which contains the **amount of population for each state**. In order to do that we sum up all
the population from *Metropolitan Area*, from *Cities Outside Metropolitan Areas* and from *Non Metropolitan Counties* and we group the results by state.

In [22]:
statePopulation = data1.groupby('State')[['Population']].sum()
statePopulation

Unnamed: 0_level_0,Population
State,Unnamed: 1_level_1
ALABAMA,4849377.0
ALASKA,736732.0
ARIZONA,6731484.0
ARKANSAS,2966369.0
CALIFORNIA,38802500.0
COLORADO,5355866.0
CONNECTICUT,3596677.0
DELAWARE,935614.0
DISTRICT OF COLUMBIA,658893.0
FLORIDA,19893297.0


In [23]:
statePopulation['id'] = mapping['id']
statePopulation = statePopulation.dropna()

colormap = linear.Paired.scale(
    statePopulation.loc[:,'Population'].min(),
    statePopulation.loc[:,'Population'].max())

datadict = {statePopulation.loc[i]['id']:statePopulation.loc[i]['Population'] for i in statePopulation.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

For this data but also for the next data we noticed that **is always useful to normalize the numbers with the *logarithmic function* in order to have a clearer view of crimes in all USA**.

In [24]:
statePopulation['Population'] = np.log(statePopulation['Population'])

colormap = linear.Paired.scale(
    statePopulation.loc[:,'Population'].min(),
    statePopulation.loc[:,'Population'].max())

datadict = {statePopulation.loc[i]['id']:statePopulation.loc[i]['Population'] for i in statePopulation.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

In [25]:
statePopulation[statePopulation.loc[:, 'Population'] == statePopulation.loc[:,'Population'].min()]

Unnamed: 0_level_0,Population,id
State,Unnamed: 1_level_1,Unnamed: 2_level_1
WYOMING,13.277918,WY


In [26]:
statePopulation[statePopulation.loc[:, 'Population'] == statePopulation.loc[:,'Population'].max()]

Unnamed: 0_level_0,Population,id
State,Unnamed: 1_level_1,Unnamed: 2_level_1
CALIFORNIA,17.473995,CA


The state with the highest population is **California** and **Wyoming** has the lowest population.

In the next cells we do the same operation as before **in order to obtain a map visualization of Property Crimes per each state**.

In [27]:
PropertyCrimes = data1.groupby('State')[['Property crime']].sum()
PropertyCrimes 

Unnamed: 0_level_0,Property crime
State,Unnamed: 1_level_1
ALABAMA,148656.0
ALASKA,20216.0
ARIZONA,210874.0
ARKANSAS,94331.0
CALIFORNIA,947029.0
COLORADO,133847.0
CONNECTICUT,69070.0
DELAWARE,27900.0
DISTRICT OF COLUMBIA,34147.0
FLORIDA,677744.0


In [28]:
PropertyCrimes['id'] = mapping['id']
PropertyCrimes = PropertyCrimes.dropna()

colormap = linear.Paired.scale(
    PropertyCrimes.loc[:,'Property crime'].min(),
    PropertyCrimes.loc[:,'Property crime'].max())

datadict = {PropertyCrimes.loc[i]['id']:PropertyCrimes.loc[i]['Property crime'] for i in PropertyCrimes.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

**We can see from the map that the same considerations made for violent crimes apply also for property crimes. In particular we can underline that the states with the highest property crimes are again California, Texas and Florida. In this case Texas has an higher number of property crimes respect to violent crimes.**

Again in the next cell **we normalize the data applying logarithmic function**.

In [29]:
PropertyCrimes['Property crime'] = np.log(PropertyCrimes['Property crime'])

colormap = linear.Paired.scale(
    PropertyCrimes.loc[:,'Property crime'].min(),
    PropertyCrimes.loc[:,'Property crime'].max())

datadict = {PropertyCrimes.loc[i]['id']:PropertyCrimes.loc[i]['Property crime'] for i in PropertyCrimes.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

Exploiting this clearer map we can see that **apparently there is not a relation between population and the number of property crimes**. We can also see that **the three states with the highest number of property crimes are the same as violent crimes and also the states with lower number of property crimes are quite the same**.

In [30]:
PropertyCrimes[PropertyCrimes.loc[:, 'Property crime'] == PropertyCrimes.loc[:,'Property crime'].min()]

Unnamed: 0_level_0,Property crime,id
State,Unnamed: 1_level_1,Unnamed: 2_level_1
VERMONT,9.163877,VT


In [31]:
PropertyCrimes[PropertyCrimes.loc[:, 'Property crime'] == PropertyCrimes.loc[:,'Property crime'].max()]

Unnamed: 0_level_0,Property crime,id
State,Unnamed: 1_level_1,Unnamed: 2_level_1
CALIFORNIA,13.761085,CA


Again **California** is the state with the highest number of crimes (property crimes in this case) and as for violent crimes **Vermont** is the "safest" state.

In [32]:
murderCrimes = data1.groupby('State')[['Murder and nonnegligent manslaughter']].sum()
murderCrimes

Unnamed: 0_level_0,Murder and nonnegligent manslaughter
State,Unnamed: 1_level_1
ALABAMA,271.0
ALASKA,41.0
ARIZONA,309.0
ARKANSAS,159.0
CALIFORNIA,1699.0
COLORADO,149.0
CONNECTICUT,86.0
DELAWARE,54.0
DISTRICT OF COLUMBIA,105.0
FLORIDA,1148.0


In [33]:
murderCrimes['id'] = mapping['id']
murderCrimes = murderCrimes.dropna()

colormap = linear.Paired.scale(
    murderCrimes.loc[:,'Murder and nonnegligent manslaughter'].min(),
    murderCrimes.loc[:,'Murder and nonnegligent manslaughter'].max())

datadict = {murderCrimes.loc[i]['id']:murderCrimes.loc[i]['Murder and nonnegligent manslaughter'] for i in murderCrimes.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

In [34]:
murderCrimes['Murder and nonnegligent manslaughter'] = np.log(murderCrimes['Murder and nonnegligent manslaughter'])

colormap = linear.Paired.scale(
    murderCrimes.loc[:,'Murder and nonnegligent manslaughter'].min(),
    murderCrimes.loc[:,'Murder and nonnegligent manslaughter'].max())

datadict = {murderCrimes.loc[i]['id']:murderCrimes.loc[i]['Murder and nonnegligent manslaughter'] for i in murderCrimes.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

In [35]:
murderCrimes[murderCrimes.loc[:, 'Murder and nonnegligent manslaughter'] == murderCrimes.loc[:,'Murder and nonnegligent manslaughter'].min()]

Unnamed: 0_level_0,Murder and nonnegligent manslaughter,id
State,Unnamed: 1_level_1,Unnamed: 2_level_1
VERMONT,2.302585,VT


In [36]:
murderCrimes[murderCrimes.loc[:, 'Murder and nonnegligent manslaughter'] == murderCrimes.loc[:,'Murder and nonnegligent manslaughter'].max()]

Unnamed: 0_level_0,Murder and nonnegligent manslaughter,id
State,Unnamed: 1_level_1,Unnamed: 2_level_1
CALIFORNIA,7.437795,CA


**Same maximum and minimum result stand also for murders.**

In [37]:
rapeCrimes = data1.groupby('State')[['Rape (revised definition)']].sum()
rapeCrimes

Unnamed: 0_level_0,Rape (revised definition)
State,Unnamed: 1_level_1
ALABAMA,1928.0
ALASKA,766.0
ARIZONA,3227.0
ARKANSAS,1637.0
CALIFORNIA,11526.0
COLORADO,2995.0
CONNECTICUT,782.0
DELAWARE,386.0
DISTRICT OF COLUMBIA,472.0
FLORIDA,8535.0


In [38]:
rapeCrimes['id'] = mapping['id']
rapeCrimes = rapeCrimes.dropna()

colormap = linear.Paired.scale(
    rapeCrimes.loc[:,'Rape (revised definition)'].min(),
    rapeCrimes.loc[:,'Rape (revised definition)'].max())

datadict = {rapeCrimes.loc[i]['id']:rapeCrimes.loc[i]['Rape (revised definition)'] for i in rapeCrimes.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

In [39]:
rapeCrimes['Rape (revised definition)'] = np.log(rapeCrimes['Rape (revised definition)'])

colormap = linear.Paired.scale(
    rapeCrimes.loc[:,'Rape (revised definition)'].min(),
    rapeCrimes.loc[:,'Rape (revised definition)'].max())

datadict = {rapeCrimes.loc[i]['id']:rapeCrimes.loc[i]['Rape (revised definition)'] for i in rapeCrimes.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

In [40]:
rapeCrimes[rapeCrimes.loc[:, 'Rape (revised definition)'] == rapeCrimes.loc[:,'Rape (revised definition)'].min()]

Unnamed: 0_level_0,Rape (revised definition),id
State,Unnamed: 1_level_1,Unnamed: 2_level_1
VERMONT,4.70048,VT


In [41]:
rapeCrimes[rapeCrimes.loc[:, 'Rape (revised definition)'] == rapeCrimes.loc[:,'Rape (revised definition)'].max()]

Unnamed: 0_level_0,Rape (revised definition),id
State,Unnamed: 1_level_1,Unnamed: 2_level_1
CALIFORNIA,9.352361,CA


**Same maximum and minimum result stand also for murders.**

In [42]:
rapeCrimesLeg = data1.groupby('State')[['Rape (legacy definition)']].sum()
rapeCrimesLeg

Unnamed: 0_level_0,Rape (legacy definition)
State,Unnamed: 1_level_1
ALABAMA,1380.0
ALASKA,550.0
ARIZONA,2357.0
ARKANSAS,1096.0
CALIFORNIA,8397.0
COLORADO,2090.0
CONNECTICUT,571.0
DELAWARE,249.0
DISTRICT OF COLUMBIA,352.0
FLORIDA,6031.0


In [43]:
rapeCrimesLeg['id'] = mapping['id']
rapeCrimesLeg = rapeCrimesLeg.dropna()

colormap = linear.Paired.scale(
    rapeCrimesLeg.loc[:,'Rape (legacy definition)'].min(),
    rapeCrimesLeg.loc[:,'Rape (legacy definition)'].max())

datadict = {rapeCrimesLeg.loc[i]['id']:rapeCrimesLeg.loc[i]['Rape (legacy definition)'] for i in rapeCrimesLeg.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

In [44]:
rapeCrimesLeg['Rape (legacy definition)'] = np.log(rapeCrimesLeg['Rape (legacy definition)'])

colormap = linear.Paired.scale(
    rapeCrimesLeg.loc[:,'Rape (legacy definition)'].min(),
    rapeCrimesLeg.loc[:,'Rape (legacy definition)'].max())

datadict = {rapeCrimesLeg.loc[i]['id']:rapeCrimesLeg.loc[i]['Rape (legacy definition)'] for i in rapeCrimesLeg.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

In [45]:
rapeCrimesLeg[rapeCrimesLeg.loc[:, 'Rape (legacy definition)'] == rapeCrimesLeg.loc[:,'Rape (legacy definition)'].min()]

Unnamed: 0_level_0,Rape (legacy definition),id
State,Unnamed: 1_level_1,Unnamed: 2_level_1
VERMONT,4.59512,VT


In [46]:
rapeCrimesLeg[rapeCrimesLeg.loc[:, 'Rape (legacy definition)'] == rapeCrimesLeg.loc[:,'Rape (legacy definition)'].max()]

Unnamed: 0_level_0,Rape (legacy definition),id
State,Unnamed: 1_level_1,Unnamed: 2_level_1
CALIFORNIA,9.03563,CA


**Same maximum and minimum result stand also for murders.**

In [47]:
robberyCrimes = data1.groupby('State')[['Robbery']].sum()
robberyCrimes

Unnamed: 0_level_0,Robbery
State,Unnamed: 1_level_1
ALABAMA,4580.0
ALASKA,627.0
ARIZONA,6196.0
ARKANSAS,1997.0
CALIFORNIA,48673.0
COLORADO,3024.0
CONNECTICUT,3159.0
DELAWARE,1269.0
DISTRICT OF COLUMBIA,3497.0
FLORIDA,24869.0


In [48]:
robberyCrimes['id'] = mapping['id']
robberyCrimes = robberyCrimes.dropna()

colormap = linear.Paired.scale(
    robberyCrimes.loc[:,'Robbery'].min(),
    robberyCrimes.loc[:,'Robbery'].max())

datadict = {robberyCrimes.loc[i]['id']:robberyCrimes.loc[i]['Robbery'] for i in robberyCrimes.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

In [49]:
robberyCrimes['Robbery'] = np.log(robberyCrimes['Robbery'])

colormap = linear.Paired.scale(
    robberyCrimes.loc[:,'Robbery'].min(),
    robberyCrimes.loc[:,'Robbery'].max())

datadict = {robberyCrimes.loc[i]['id']:robberyCrimes.loc[i]['Robbery'] for i in robberyCrimes.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

In [50]:
robberyCrimes[robberyCrimes.loc[:, 'Robbery'] == robberyCrimes.loc[:,'Robbery'].min()]

Unnamed: 0_level_0,Robbery,id
State,Unnamed: 1_level_1,Unnamed: 2_level_1
WYOMING,3.970292,WY


In [51]:
robberyCrimes[robberyCrimes.loc[:, 'Robbery'] == robberyCrimes.loc[:,'Robbery'].max()]

Unnamed: 0_level_0,Robbery,id
State,Unnamed: 1_level_1,Unnamed: 2_level_1
CALIFORNIA,10.79288,CA


**For robbery crime the state with maximum and minimum values of crimes are exactly the states with highest and lowest population in USA.**

In [52]:
assaultCrimes = data1.groupby('State')[['Aggravated assault']].sum()
assaultCrimes

Unnamed: 0_level_0,Aggravated assault
State,Unnamed: 1_level_1
ALABAMA,13277.0
ALASKA,3224.0
ARIZONA,16428.0
ARKANSAS,9835.0
CALIFORNIA,91790.0
COLORADO,10194.0
CONNECTICUT,4495.0
DELAWARE,2867.0
DISTRICT OF COLUMBIA,4125.0
FLORIDA,72689.0


In [53]:
assaultCrimes['id'] = mapping['id']
assaultCrimes = assaultCrimes.dropna()

colormap = linear.Paired.scale(
    assaultCrimes.loc[:,'Aggravated assault'].min(),
    assaultCrimes.loc[:,'Aggravated assault'].max())

datadict = {assaultCrimes.loc[i]['id']:assaultCrimes.loc[i]['Aggravated assault'] for i in assaultCrimes.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

In [54]:
assaultCrimes['Aggravated assault'] = np.log(assaultCrimes['Aggravated assault'])

colormap = linear.Paired.scale(
    assaultCrimes.loc[:,'Aggravated assault'].min(),
    assaultCrimes.loc[:,'Aggravated assault'].max())

datadict = {assaultCrimes.loc[i]['id']:assaultCrimes.loc[i]['Aggravated assault'] for i in assaultCrimes.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

In [55]:
assaultCrimes[assaultCrimes.loc[:, 'Aggravated assault'] == assaultCrimes.loc[:,'Aggravated assault'].min()]

Unnamed: 0_level_0,Aggravated assault,id
State,Unnamed: 1_level_1,Unnamed: 2_level_1
VERMONT,6.068426,VT


In [None]:
assaultCrimes[assaultCrimes.loc[:, 'Aggravated assault'] == assaultCrimes.loc[:,'Aggravated assault'].max()]

Unnamed: 0_level_0,Aggravated assault,id
State,Unnamed: 1_level_1,Unnamed: 2_level_1
CALIFORNIA,11.427259,CA


**Again California has the highest level of crimes and Vermont the lowest.**

In [None]:
burglaryCrimes = data1.groupby('State')[['Burglary']].sum()
burglaryCrimes

In [None]:
burglaryCrimes['id'] = mapping['id']
burglaryCrimes = burglaryCrimes.dropna()

colormap = linear.Paired.scale(
    burglaryCrimes.loc[:,'Burglary'].min(),
    burglaryCrimes.loc[:,'Burglary'].max())

datadict = {burglaryCrimes.loc[i]['id']:burglaryCrimes.loc[i]['Burglary'] for i in burglaryCrimes.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

In [None]:
burglaryCrimes['Burglary'] = np.log(burglaryCrimes['Burglary'])

colormap = linear.Paired.scale(
    burglaryCrimes.loc[:,'Burglary'].min(),
    burglaryCrimes.loc[:,'Burglary'].max())

datadict = {burglaryCrimes.loc[i]['id']:burglaryCrimes.loc[i]['Burglary'] for i in burglaryCrimes.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

In [None]:
burglaryCrimes[burglaryCrimes.loc[:, 'Burglary'] == burglaryCrimes.loc[:,'Burglary'].min()]

In [None]:
burglaryCrimes[burglaryCrimes.loc[:, 'Burglary'] == burglaryCrimes.loc[:,'Burglary'].max()]

**Again California has the highest level and in this case Wyoming has the lowest level of crimes.**

In [None]:
larcenyCrimes = data1.groupby('State')[['Larceny-theft']].sum()
larcenyCrimes

In [None]:
larcenyCrimes['id'] = mapping['id']
larcenyCrimes = larcenyCrimes.dropna()

colormap = linear.Paired.scale(
    larcenyCrimes.loc[:,'Larceny-theft'].min(),
    larcenyCrimes.loc[:,'Larceny-theft'].max())

datadict = {larcenyCrimes.loc[i]['id']:larcenyCrimes.loc[i]['Larceny-theft'] for i in larcenyCrimes.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

In [None]:
larcenyCrimes['Larceny-theft'] = np.log(larcenyCrimes['Larceny-theft'])

colormap = linear.Paired.scale(
    larcenyCrimes.loc[:,'Larceny-theft'].min(),
    larcenyCrimes.loc[:,'Larceny-theft'].max())

datadict = {larcenyCrimes.loc[i]['id']:larcenyCrimes.loc[i]['Larceny-theft'] for i in larcenyCrimes.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

In [None]:
larcenyCrimes[larcenyCrimes.loc[:, 'Larceny-theft'] == larcenyCrimes.loc[:,'Larceny-theft'].min()]

In [None]:
larcenyCrimes[larcenyCrimes.loc[:, 'Larceny-theft'] == larcenyCrimes.loc[:,'Larceny-theft'].max()]

**Again California has the highest level and in this case Vermont has the lowest level of crimes.**

In [None]:
motorTheftCrimes = data1.groupby('State')[['Motor vehicle theft']].sum()
motorTheftCrimes

In [None]:
motorTheftCrimes['id'] = mapping['id']
motorTheftCrimes = motorTheftCrimes.dropna()

colormap = linear.Paired.scale(
    motorTheftCrimes.loc[:,'Motor vehicle theft'].min(),
    motorTheftCrimes.loc[:,'Motor vehicle theft'].max())

datadict = {motorTheftCrimes.loc[i]['id']:motorTheftCrimes.loc[i]['Motor vehicle theft'] for i in motorTheftCrimes.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

In [None]:
motorTheftCrimes['Motor vehicle theft'] = np.log(motorTheftCrimes['Motor vehicle theft'])

colormap = linear.Paired.scale(
    motorTheftCrimes.loc[:,'Motor vehicle theft'].min(),
    motorTheftCrimes.loc[:,'Motor vehicle theft'].max())

datadict = {motorTheftCrimes.loc[i]['id']:motorTheftCrimes.loc[i]['Motor vehicle theft'] for i in motorTheftCrimes.index}

kw = {'location': [48, -102], 'zoom_start': 4}
m = folium.Map(**kw, tiles='cartodBpositron')

folium.GeoJson(open(us_states),
       style_function=lambda feature: {
        'fillColor': colormap(datadict[feature['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '2, 2'
    }).add_to(m)

colormap.add_to(m)
m

In [None]:
motorTheftCrimes[motorTheftCrimes.loc[:, 'Motor vehicle theft'] == motorTheftCrimes.loc[:,'Motor vehicle theft'].min()]

In [None]:
motorTheftCrimes[motorTheftCrimes.loc[:, 'Motor vehicle theft'] == motorTheftCrimes.loc[:,'Motor vehicle theft'].max()]

**Again California has the highest level and in this case Vermont has the lowest level of crimes.**

**In conclusion, by looking on maps and also looking on the min and maximum values, we can say that for all the type of crimes each state has quite the same position in the ranking. For each crime the difference are really few. Of course we can also say that the most "dangerous" state is California and the "safest" state are Wyoming and Vermont.**

In [None]:
data1

In [None]:
data2 = data1.copy()
del data2['Area']
del data2['DetailArea']

In [None]:
data2 = data2.groupby('State').sum()

data2

In [None]:
data2.describe()

In [None]:
%matplotlib inline
histograms = data2.hist(figsize=(16,20))

The above diagrams shows that frequency of the different crimes in USA

In [None]:
fig, axes = plt.subplots(nrows=6, ncols=2, figsize=(12,30))

colNameList = list(data2.columns)
n = 0
for i in range(0,int(len(data2.columns)/2)+1):
    for j in range(0,2):
        if(n < 11):
            data2.boxplot(column=colNameList[n],ax=axes[i,j])
            n+=1

Box Plot showing the value of mean crimes in USA for 2014. Box plots provids an esay way to graphically represent the numerical value of any variable

In [None]:
correlations = data2.corr()
correlations

In [None]:
f, ax = plt.subplots(figsize=(17, 13))
sns.heatmap(correlations, mask=np.zeros_like(correlations, dtype=np.bool), \
            cmap=sns.diverging_palette(220, 10, as_cmap=True,center="light"),
            square=True, ax=ax)

This is the representation for the correlation matrix. The diagonal element show the maximum correlation which is obvious because it shows the correlation with itself. Apart from that we can also see many varibale are __highly correlated__ and are playing importnat role in deciding the crime rate in USA. The five most correlated values can be seen below.

In [None]:
mask = np.tril(np.ones(correlations.shape)).astype(np.bool)
for i in range(0,correlations.shape[0]):
    mask[i][i] = False
correlationsApp = correlations.where(mask)
mostCorr = correlationsApp.stack().nlargest(5)

#five most correlated variables
mostCorr

The two new definitions of _Rape_ are showing maximum correlation. And this is obvious as both are of same category. The most important factor is the correlation between __Aggravated Crime and Violent Crime__. This correlation tells us that violent crimes are highly contributed by aggravated assaults. Similarly *Burglary* is highly correlated by Property Crime. One more highy contrasting feature is that Robbery is in highly correlated by the population which means the strength of popolation directly affects the crime Robbery.

**Later we used such correlation in our regression to compute the linear fit and tried to predict the data for the crimes.**