### Crime Rate

In [1]:
import pandas as pd
import os

Crime rate data from United States is extracted from the following page:<br>
https://corgis-edu.github.io/corgis/csv/state_crime/

Rates are the number of reported offenses per 100,000 population.

- Population: The number of people living in this state at the time the report was created.
- CrimePropertyRate: This property reflects all of the Property-related crimes, including burglaries, larcenies, and motor crimes.
- CrimeViolentRate: This property reflects all of the Violent crimes, including assaults, murders, rapes, and robberies.

### We extract the data from our S3 bucket

In [2]:
from private.s3_aws import access_key, secret_access_key

In [4]:
df = pd.read_csv(f"s3://rawdatagrupo07/state_crime.csv",
    storage_options={
        "key": access_key,
        "secret": secret_access_key
    },
)

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3115 entries, 0 to 3114
Data columns (total 21 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   State                          3115 non-null   object 
 1   Year                           3115 non-null   int64  
 2   Data.Population                3115 non-null   int64  
 3   Data.Rates.Property.All        3115 non-null   float64
 4   Data.Rates.Property.Burglary   3115 non-null   float64
 5   Data.Rates.Property.Larceny    3115 non-null   float64
 6   Data.Rates.Property.Motor      3115 non-null   float64
 7   Data.Rates.Violent.All         3115 non-null   float64
 8   Data.Rates.Violent.Assault     3115 non-null   float64
 9   Data.Rates.Violent.Murder      3115 non-null   float64
 10  Data.Rates.Violent.Rape        3115 non-null   float64
 11  Data.Rates.Violent.Robbery     3115 non-null   float64
 12  Data.Totals.Property.All       3115 non-null   i

We keep the data from after 2011.

In [4]:
df = df[df["Year"]>=2011]

The states names are changed to their abbreviation.

In [5]:
us_state_to_abbrev = {
    "Alabama": "AL",
    "Alaska": "AK",
    "Arizona": "AZ",
    "Arkansas": "AR",
    "California": "CA",
    "Colorado": "CO",
    "Connecticut": "CT",
    "Delaware": "DE",
    "Florida": "FL",
    "Georgia": "GA",
    "Hawaii": "HI",
    "Idaho": "ID",
    "Illinois": "IL",
    "Indiana": "IN",
    "Iowa": "IA",
    "Kansas": "KS",
    "Kentucky": "KY",
    "Louisiana": "LA",
    "Maine": "ME",
    "Maryland": "MD",
    "Massachusetts": "MA",
    "Michigan": "MI",
    "Minnesota": "MN",
    "Mississippi": "MS",
    "Missouri": "MO",
    "Montana": "MT",
    "Nebraska": "NE",
    "Nevada": "NV",
    "New Hampshire": "NH",
    "New Jersey": "NJ",
    "New Mexico": "NM",
    "New York": "NY",
    "North Carolina": "NC",
    "North Dakota": "ND",
    "Ohio": "OH",
    "Oklahoma": "OK",
    "Oregon": "OR",
    "Pennsylvania": "PA",
    "Rhode Island": "RI",
    "South Carolina": "SC",
    "South Dakota": "SD",
    "Tennessee": "TN",
    "Texas": "TX",
    "Utah": "UT",
    "Vermont": "VT",
    "Virginia": "VA",
    "Washington": "WA",
    "West Virginia": "WV",
    "Wisconsin": "WI",
    "Wyoming": "WY",
    "District of Columbia": "DC",
    "American Samoa": "AS",
    "Guam": "GU",
    "Northern Mariana Islands": "MP",
    "Puerto Rico": "PR",
    "United States Minor Outlying Islands": "UM",
    "U.S. Virgin Islands": "VI",
    "United States": "US"
}

In [6]:
states_id = pd.DataFrame()
states_id['StateId'] = us_state_to_abbrev.values()
states_id['StateName'] = us_state_to_abbrev.keys()

In [8]:
states_id.to_csv(os.path.join(os.getcwd(),'..','_clean_data','states_id.csv'),index=False)

In [9]:
df2 = pd.DataFrame()

In [10]:
df2['State'] = df.State.astype(str).apply(lambda x: us_state_to_abbrev[x])

In [11]:
df.head()

Unnamed: 0,State,Year,Data.Population,Data.Rates.Property.All,Data.Rates.Property.Burglary,Data.Rates.Property.Larceny,Data.Rates.Property.Motor,Data.Rates.Violent.All,Data.Rates.Violent.Assault,Data.Rates.Violent.Murder,...,Data.Rates.Violent.Robbery,Data.Totals.Property.All,Data.Totals.Property.Burglary,Data.Totals.Property.Larceny,Data.Totals.Property.Motor,Data.Totals.Violent.All,Data.Totals.Violent.Assault,Data.Totals.Violent.Murder,Data.Totals.Violent.Rape,Data.Totals.Violent.Robbery
51,Alabama,2011,4803689,3605.4,1064.2,2319.3,222.0,419.8,282.9,6.2,...,102.1,173192,51119,111411,10662,20166,13591,299,1370,4906
52,Alabama,2012,4822023,3502.2,984.7,2312.8,204.8,449.9,311.8,7.1,...,104.1,168878,47481,111523,9874,21693,15035,342,1296,5020
53,Alabama,2013,4833722,3351.3,877.8,2254.8,218.7,430.8,285.2,7.2,...,96.2,161993,42429,108993,10571,20826,13787,347,2044,4648
54,Alabama,2014,4849377,3177.6,819.0,2149.5,209.1,427.4,283.4,5.7,...,96.9,154094,39715,104238,10141,20727,13745,276,2005,4701
55,Alabama,2015,4858979,2978.9,725.6,2040.7,212.7,472.4,328.3,7.2,...,94.9,144746,35255,99156,10335,22952,15954,348,2039,4611


In [12]:
df2['Year'] = df['Year']

In [13]:
df2['Population'] = df['Data.Population']

In [14]:
df2['CrimePropertyRate'] = df['Data.Rates.Property.All']
df2['CrimeViolentRate'] = df['Data.Rates.Violent.All']

In [15]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 468 entries, 51 to 3114
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   State              468 non-null    object 
 1   Year               468 non-null    int64  
 2   Population         468 non-null    int64  
 3   CrimePropertyRate  468 non-null    float64
 4   CrimeViolentRate   468 non-null    float64
dtypes: float64(2), int64(2), object(1)
memory usage: 21.9+ KB
