# *Top and bottom level analysis for Baltimore City Crime*

## Importing Packages
- Below you will see the following packages I will be using for my report on Baltimore City Crime.
- I will be using the package csv to read in my dataset due to it being a csv file. Also I will be using Seaborn package because of the high level customization along with the vizualization power it contains. 
- You will also see that I am disabling the max rows because I am using a dataset that is very large (500,000 records). 
- Now you will also see that I am enabling json. I am doing this because it will help handle my large dataset so that I can create visualization will out crashing the notebook. 

In [1]:
import csv
import pandas as pd
import altair as alt
from altair import datum
alt.data_transformers.disable_max_rows()
alt.data_transformers.enable('json')

DataTransformerRegistry.enable('json')

## Taking in the Data
- The code you see below is me reading in two csv files containing criminal activity from by 2011-2016 and 1965-2020 within Baltimore City.
- You then will see me reading in certain columns of data from the csvs. I am doing this because between the two files only what is listed in the code below is shared amoung the two files.
- I then created a variable called "CrimeReport" this variable will be used to house the two csv file's data after is has been merged. 
- Once I mergered the two files into one I then drop any duplicates from the CrimeReport dataframe so that I can have a more accurate data set to work from. 
- Finally, I export the CrimeReport dataframe into it's own csv to save on memory. 

In [2]:
Report_2011_2016 = pd.read_csv("BPD_Crime_2011-2016.csv", usecols = ['CrimeDate','CrimeTime','CrimeCode','Description','Inside/Outside','Weapon','District','Neighborhood','Total Incidents'])
Report_2014_2020 = pd.read_csv("BPD_Part_1_Victim_Based_Crime_Data.csv",usecols = ['CrimeDate','CrimeTime','CrimeCode','Description','Inside/Outside','Weapon','District','Neighborhood','Total Incidents'])
CrimeReport = pd.DataFrame()
CrimeReport = pd.concat([Report_2011_2016,Report_2014_2020])
CrimeReport = CrimeReport.drop_duplicates(['CrimeDate','CrimeTime','CrimeCode','Description','Inside/Outside','Weapon','District','Neighborhood','Total Incidents'])
CrimeReport.to_csv("Baltimore City Crime Report 2011-2020.csv", index=False)

## Cleaning Data Part 1
 - First line you will see me reading in the new csv file using the variable CrimeReport. 
 - The next six lines you will see me cleaning and adding columns in the data set, I changed all "NaN" values to unknown or not reported to see from a reporting standpoint how much data is missed. 
 - Last two lines me filtering out any row of data that does not contain the year 2011 or greater. This is done due to the years before 2011 only containing partial entries. I did not want this to mess up the metric portion of my charts down the line in the project.  

In [3]:
CrimeReport = pd.read_csv("Baltimore City Crime Report 2011-2020.csv")
CrimeReport['Hour'] = CrimeReport['CrimeTime'].str[0:2]
CrimeReport['Month'] = CrimeReport['CrimeDate'].str[0:2]
CrimeReport['Year'] = CrimeReport['CrimeDate'].str[6:].astype('int64')
CrimeReport['CrimeDate'] = pd.to_datetime(CrimeReport['CrimeDate'],format='%m/%d/%Y')
CrimeReport['District'] = CrimeReport['District'].fillna('Unknown')
CrimeReport['Weapon'] = CrimeReport['Weapon'].fillna('Unknown')
CrimeReport['Inside/Outside'] = CrimeReport['Inside/Outside'].fillna('Unknown')
CrimeReport['Neighborhood'] = CrimeReport['Neighborhood'].fillna('No Neighborhood')
Index=CrimeReport[CrimeReport['Year'] < 2011 ].index
CrimeReport.drop(Index,inplace=True)
CrimeReport.to_csv("Baltimore City Crime Report 2011-2020.csv", index=False)

## Cleaning Data Part 2
- You will see the two lines below that I am sorting the data in the frame to ascend with the latest year and time a crime was committed. 

In [5]:
CrimeReport = CrimeReport.sort_values(by=['Year','CrimeDate'],ascending=False)
CrimeReport = CrimeReport.reset_index(drop=True)

## Cleaning Data Part 3
- The following block of code you will see that I am replacing alot of data values with in certain columns. The reason for this is that certain columns such as the 'District' would have the same district but miss spelled. So I went in and replace all data strings to match a uniformed string so that mertics later on are not mis leading or incorrect. 

In [6]:
replacements=[("I", "Inside"),("O", "Outside"),('UNKNOWN','unknown'),('Unknown','unknown'),('Central','CENTRAL'),
              ('NORTHEAST','NORTHEASTERN'),('NORTHESTERN','NORTHEASTERN'),('NORTHWEST','NORTHWESTERN'),
              ('BELAIR-EDISON','Belair-Edison'),('BROOKLYN','Brooklyn'),('CANTON','Canton'),('CHERRY HILL','Cherry Hill'),
              ('COLDSTREAM HOMESTEAD','Coldstream Homestead Montebello'),('DOWNTOWN','Downtown'),('FELLS POINT','Fells Point'),
              ('FRANKFORD','Frankford'),('INNER HARBOR','Inner Harbor'),('SANDTOWN-WINCHESTER','Sandtown-Winchester'),('UPTON','Upton'),
              ('WASHINGTON VILLAGE','Washington Village/Pigtown'),('BROADWAY EAST','Broadway East'),('CARROLLTON RIDGE','Carrollton Ridge'),
              ('CENTRAL PARK HEIGHTS','Central Park Heights'),('CHARLES VILLAGE','Charles Village'),('EAST BALTIMORE MIDWA','East Baltimore Midway'),
              ('ELLWOOD PARK/MONUMEN','Ellwood Park/Monument'),('HAMPDEN','Hampden'),('MONDAWMIN','Mondawmin'),('MOUNT VERNON','Mount Vernon'),
              ('MID-TOWN BELVEDERE','Mid-Town Belvedere'),('MCELDERRY PARK','McElderry Park'),('OLIVER','Oliver'),('RESERVOIR HILL','Reservoir Hill'),
              ('SOUTHEAST','SOUTHEASTERN'),('SOUTHESTERN','SOUTHEASTERN'),('SOUTHWEST','SOUTHWESTERN'),('Gay Street','CENTRAL'),("FIRE", "FIREARM")]
for each_replacement in replacements:
    assert len(each_replacement) == 2 and isinstance(each_replacement, tuple)
    CrimeReport.replace(each_replacement[0], each_replacement[1], inplace=True)

## TOP-LEVEL VIEW OF THE CRIME IN BALTIMORE CITY FROM DIFFERENT FILTERS

## Criminal Activity Reported by Districts from 2011-2020
- You will see from the bar graph below that the Northeastern district out of all 9 Baltimore City districts is the most dangerous with a report of over 80,000+ reports from 2011-2020.
- You will also find that the Southeastern district comes to be a close second will almost 75,000+ reports.
- Now at the very bottom you will see a unknown mark. This is the marker with the amount of criminal reports that were not reported with a district. Some questions here could be made as to why is this so and how can we make sure that we each report is place at the correct location. 

In [44]:
base=alt.Chart(CrimeReport).encode(x=alt.X('sum(Total Incidents)',title=None),y=alt.Y('District',sort='-x',title='Baltimore City Districts')).properties(title='Crime Reported by District from 2011-Current')
bars=base.mark_bar().encode(color=alt.Color('District:N',legend=None))
text=base.mark_text(align='left',baseline='middle',dx=3,fontSize=15,).encode(text='count()')
bars.configure_title(fontSize=20,font='Courier',anchor='start',color='black')

(bars+text).properties(height=500,width=800)

## Criminal Activity from the Nine of the most active Neighborhoods in Baltimore City.

- We can see that from the report Downtown is the holder for most reported cases with 17,500+ reports

In [8]:
base=alt.Chart(CrimeReport).transform_aggregate(Total_Incidents='sum(Total Incidents)',groupby=['Neighborhood','District'],).transform_filter("datum.Total_Incidents > 7000").properties(title='Top 7 Highest Crime Reported Neighborhoods from 2011-Current')
bars=base.mark_bar().encode(x=alt.X('Total_Incidents:Q',title=None),y=alt.Y('Neighborhood',title='Baltimore City Neighborhood',sort='-x'),color=alt.Color('Neighborhood:N',legend=None),tooltip=['Total_Incidents:Q','District'])
bars.configure_title(fontSize=20,font='Courier',anchor='start',color='black')

bars.properties(height=500,width=800)

## Weapons reported being used in criminal offenses from 2011-2020

- As you can see between the Knife, Other, Firearm, and hands no type total reached over 50,000 from 2011-2020.
- Interesting to find is that more crimes were committed where the weapon used was by an unknown type that was not entried in. 
- On the other hand you will see that there were closet to over 400,000 reports that did not list whether or not a weapon was used in the criminal act. 
- Some questions can be asked here such as why and how are we reporting almost 400,000 unknown weapon types listed from 2011-2020. 
- I also wanted to know what of these weapon types were committed either insided or outside by rowing out the graph by the "Inside/Outside" data column. 
- The strange but interesting part is that there is a substantial amount of weapons types plus unknown that is not being properly reported whether it happened in or outside. Along with the other graphs unknown weapon types are very significant. 

In [130]:
base=alt.Chart(CrimeReport).mark_bar().encode(x=alt.X('sum(Total Incidents)',title=None),y=alt.Y('Weapon',sort='-x',title='Weapon Type'),color=alt.Color('Weapon:N',legend=None),tooltip=['sum(Total Incidents):Q'],column='Inside/Outside').properties(title='Weapon counts reported in Crime Report from 2011-Current')
#text=base.mark_text(align='left',baseline='middle',dx=3,fontSize=15,).encode(text='count()')
base.configure_title(fontSize=20,font='Courier',anchor='start',color='black')

base.properties(height=300,width=500)

## The Report of Criminal Charges given from 2011-2020
- As you will see from the report, in almost a 9 year span the most common charge in Baltimore City is Larceny (Theift of personal property)
- The shocking find for me is seeing that Homicide is the second to last common charge given in Baltimore city. 
- I have structured the code in a way that when you click on a specific bar the associated line to that bar will then only show. 
- I have also allowed that the line graph is interactive meaning you are able to scroll and see each day a crime and what that crime take place. 

In [166]:
interval=alt.selection_multi(encodings=['color'])

base=alt.Chart(CrimeReport).encode(
    x=alt.X('sum(Total Incidents)',sort='y',title=None),
    y=alt.Y('Description',sort='-x',title='Criminal Acts')
).properties(title='Types of Crime Reported from 2011-Current')

bars=base.mark_bar().encode(
    color=alt.condition(interval,'Description:N',alt.value('lightgray'),legend=None),
    tooltip=['sum(Total Incidents):Q']
).properties(
    selection=interval
)

#text=base.mark_text(align='center',baseline='middle',dx=30,dy=-10,fontSize=15).encode(text='count()')
bars.configure_title(fontSize=20,font='Courier',anchor='start',color='black')

Time=alt.Chart(CrimeReport).mark_line().encode(
    x=alt.X('CrimeDate:T'),
    y=alt.Y('sum(Total Incidents)'),
    color=alt.Color('Description',legend=None,scale=alt.Scale(scheme='dark2'))
).interactive().transform_filter(
    interval
)
bars.properties(height=400,width=700) & Time.properties(height=400,width=700)

## Crime activity throughout a nine year span within Baltimore City
- You can see from 2011-2013 that criminal activity become very consistent and stagnant for that three year mark.
- Now when you continue onward from the ending of 2013, you see that there has been a spike in 2014 then the highest recorded year in 2015 and then a slow decent in 2016. 
- Then is drastically fall back down and continued to decrease slowly. 

## Criminal Report from a 12 month calendar view from 2011-2020
- You can see that from start of the year criminal activity is some what high in January and the towards the middle of the year when it is during the summer months like May-Aug you see the surge and consistency of crime occuring. 

In [141]:
interval=alt.selection_multi(encodings=['color'])

base=alt.Chart(CrimeReport).encode(
    y=alt.Y('sum(Total Incidents)',sort='y',title=None),
    x=alt.X('CrimeDate:T',timeUnit='year')
)
bars=base.mark_bar().encode(
    color=alt.condition(interval,'Year:N',alt.value('lightgray'),legend=None)
).properties(
    selection=interval
)

text=base.mark_text(align='center',baseline='middle',dx=30,dy=-10,fontSize=15).encode(text='count()')
bars.configure_title(fontSize=20,font='Courier',anchor='start',color='black')

Time=alt.Chart(CrimeReport).mark_line().encode(
    x=alt.X('CrimeDate:T'),
    y=alt.Y('sum(Total Incidents)')
    ,color=alt.Color('Year',legend=None,scale=alt.Scale(scheme='dark2')),strokeDash='Year'
).interactive().transform_filter(
    interval
)

base1=alt.Chart(CrimeReport).encode(
    y=alt.Y('sum(Total Incidents):Q',title=None),
    x=alt.X('CrimeDate:T',timeUnit='month')
)
bars1=base1.mark_bar().encode(
    color=alt.Color('Month:N',legend=None)
)
text1=base1.mark_text(align='center',dx=30,dy=-10,baseline='middle',fontSize=15).encode(text='sum(Total Incidents):Q')
bars1.configure_title(fontSize=20,font='Courier',anchor='start',color='black')

Month=bars1+text1.properties(title='Crime Reported Monthly from 2011-Current',height=400,width=700)
Year=((bars+text).properties(title='Crime Reported Yearly from 2011-Current',height=400,width=700))

(Year | Month) & Time.properties(height=400,width=700)

# Crime Looked at per-hour by day

- Looking at the chart below I decided to create a heatmap for how much crime is committed over the course of the day in hours terms. 
- You can see below that the number of criminal acts are relatively low during the earlier morning from 2am-7am roughly. 
- As for the hot spots you see this from 12pm-11pm roughly, in according to this you will notice that there is a color change or if you will a graduate color change from 7am-12pm 

In [167]:
alt.Chart(CrimeReport).mark_rect().encode(
    x=alt.X('Hour'),
    y=alt.Y('CrimeDate:O',timeUnit='date'),
    color='count()',
    column='Month'
)

## Here is the breakdown of the three highest years in Baltimore City

- Below you will see the three highest years of crime in Baltimore City brokendown by the nine districts 
- You will also notice that the Northeastern & Southeastern district is still the highest reporting district in criminal activity
- There could be a reason for this however, this will be looked into more from phase two of this report. 

In [144]:
base=alt.Chart(CrimeReport).mark_bar().encode(x=alt.X('sum(Total Incidents)',title=None),y=alt.Y('District',sort='-x',title='Baltimore City Districts'),color=alt.Color('District:N',legend=None),tooltip=['sum(Total Incidents):Q','District'],column='Year').transform_filter(alt.FieldOneOfPredicate(field='Year', oneOf=[2014,2015,2016]))
bars.configure_title(fontSize=20,font='Courier',anchor='start',color='black')

base.properties(height=400,width=800)

In [145]:
base=alt.Chart(CrimeReport).mark_bar().encode(x=alt.X('sum(Total Incidents)',title=None),y=alt.Y('Description',sort='-x',title='Types of Crime'),color=alt.Color('Description:N',legend=None),tooltip=['sum(Total Incidents):Q','Description'],column='Year').transform_filter(alt.FieldOneOfPredicate(field='Year', oneOf=[2014,2015,2016]))
bars.configure_title(fontSize=20,font='Courier',anchor='start',color='black')

base.properties(height=400,width=800)