# Capstone Project - The Battle of the Neighborhoods - complete
### Analyse for  secure location in Baltimore

## Problem desription

Goal of project is to detect secure location where new family moving from Europe with 2 kids could get apartment in **Baltimore, US**. 

Intention is to detect which district have least violence taken from official **Baltimore Police Department** web pages.

Idea would be to list down **top 5 districts** for better locations.


## Data which will be used for solving problem

Since problem is to find best place for living in Baltimore with respect of how big crime step  can be expected in particular district, most reliable data will be police report 

Next data sources will help to solve problem:

- **Data set 1**: Data from Baltimore police department:  Data sheet of all Part I crime incidents within Baltimore City: https://www.baltimorepolice.org/crime-stats/open-data

- **Data set 2**: List of Baltimore neighborhoods from Wikipedia: Wikipedia list of Baltimore neighborhoods: https://en.wikipedia.org/wiki/List_of_Baltimore_neighborhoods


#### Libraries import

In [1]:

import numpy as np
import pandas as pd
!pip install opencage

%matplotlib inline 
import matplotlib as mpl
import matplotlib.pyplot as plt

mpl.style.use('ggplot') # optional: for ggplot-like style
print ('Matplotlib version: ', mpl.__version__) # >= 2.0.0

import matplotlib.cm as cm
import matplotlib.colors as colors
import requests
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans

print('Libraries are imported')

Matplotlib version:  3.0.2
Libraries are imported


#### Downloading and reading police department data

In [2]:
baltimore_crime_df = pd.read_csv('https://raw.githubusercontent.com/perozovni/perog/master/Baltimore_crimes.csv', index_col=0)
baltimore_crime_df.drop(['CrimeCode','Location','Inside/Outside','Neighborhood','Longitude','Latitude','Location 1','Weapon','Total Incidents'], axis = 1, inplace = True)
baltimore_crime_df.dropna(how="all", inplace=True)
#crime_df.columns
baltimore_crime_df.head()

Unnamed: 0_level_0,CrimeTime,Description,Post,District,Premise
CrimeDate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
08/31/2017,16:04:00,ROBBERY - STREET,843.0,SOUTHWESTERN,
08/31/2017,14:30:00,LARCENY,511.0,NORTHERN,HOSP/NURS.
01/19/2017,19:35:00,ROBBERY - STREET,211.0,SOUTHEASTERN,STREET
08/30/2017,22:30:00,LARCENY FROM AUTO,612.0,NORTHWESTERN,ALLEY
01/02/2017,08:50:00,ROBBERY - RESIDENCE,425.0,NORTHEASTERN,ROW/TOWNHO


#### List down most violence districts

In [3]:
baltimore_crime_df['District'].value_counts()

NORTHEASTERN    12
SOUTHEASTERN    12
EASTERN         10
NORTHWESTERN     9
SOUTHERN         9
CENTRAL          8
SOUTHWESTERN     7
NORTHERN         6
WESTERN          6
Name: District, dtype: int64

#### List down crime desription

In [4]:
baltimore_crime_df['Description'].value_counts()

COMMON ASSAULT         18
AUTO THEFT             17
LARCENY                12
AGG. ASSAULT           12
LARCENY FROM AUTO      11
ROBBERY - STREET        5
BURGLARY                5
ROBBERY - RESIDENCE     2
ASSAULT BY THREAT       1
Name: Description, dtype: int64

## Methodology

### Table for correlation of crime premise with city district

In [5]:
baltimore_crime_cat = pd.pivot_table(baltimore_crime_df,values=['CrimeTime'],index=['District'],columns=['Premise'],
                               aggfunc=len,
                               fill_value=0,
                               margins=True)
baltimore_crime_cat

Unnamed: 0_level_0,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime
Premise,ALLEY,APT/CONDO,GARAGE ON,GAS STATIO,HOSP/NURS.,OFFICE BUI,OTHER - OU,PARK,PARKING LO,ROW/TOWNHO,STREET,YARD,All
District,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2
CENTRAL,0,0,0,0,0,0,0,0,0,1,5,0,6
EASTERN,0,1,0,0,0,0,0,0,0,2,3,0,6
NORTHEASTERN,0,0,0,1,0,0,0,0,1,4,5,0,11
NORTHERN,0,0,1,0,1,0,1,1,0,0,0,0,4
NORTHWESTERN,1,1,0,0,0,0,1,0,0,1,4,0,8
SOUTHEASTERN,0,0,0,0,0,0,0,0,2,0,8,0,10
SOUTHERN,0,0,0,0,0,1,0,0,1,0,6,0,8
SOUTHWESTERN,0,0,0,0,0,0,0,0,0,1,2,1,4
WESTERN,0,0,0,0,0,0,0,0,0,0,5,0,5
All,1,2,1,1,1,1,2,1,4,9,38,1,62


### Table for correlation of post office, city district with desctipion of violence

In [6]:
baltimore_crime_cat = pd.pivot_table(baltimore_crime_df,values=['CrimeTime'],index=['Post','District'],columns=['Description'],
                               aggfunc=len,
                               fill_value=0,
                               margins=True)
baltimore_crime_cat

Unnamed: 0_level_0,Unnamed: 1_level_0,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime
Unnamed: 0_level_1,Description,AGG. ASSAULT,ASSAULT BY THREAT,AUTO THEFT,BURGLARY,COMMON ASSAULT,LARCENY,LARCENY FROM AUTO,ROBBERY - RESIDENCE,ROBBERY - STREET,All
Post,District,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2
111.0,CENTRAL,0,0,0,0,1,0,0,0,0,1
121.0,CENTRAL,1,0,0,0,2,0,0,0,0,3
123.0,CENTRAL,1,0,0,0,0,1,0,0,0,2
132.0,CENTRAL,0,0,0,1,0,0,0,0,0,1
142.0,CENTRAL,0,0,0,0,0,1,0,0,0,1
211.0,SOUTHEASTERN,0,0,1,0,1,0,1,0,1,4
223.0,SOUTHEASTERN,1,0,0,0,0,0,0,0,0,1
224.0,SOUTHEASTERN,0,0,0,0,0,0,1,0,0,1
232.0,SOUTHEASTERN,0,0,0,0,0,0,1,0,0,1
233.0,SOUTHEASTERN,0,0,1,0,0,0,0,0,0,1


### Basic statistical details

In [7]:
baltimore_crime_cat.describe()

Unnamed: 0_level_0,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime,CrimeTime
Description,AGG. ASSAULT,ASSAULT BY THREAT,AUTO THEFT,BURGLARY,COMMON ASSAULT,LARCENY,LARCENY FROM AUTO,ROBBERY - RESIDENCE,ROBBERY - STREET,All
count,54.0,54.0,54.0,54.0,54.0,54.0,54.0,54.0,54.0,54.0
mean,0.444444,0.037037,0.62963,0.185185,0.555556,0.444444,0.407407,0.074074,0.148148,2.925926
std,1.65594,0.190626,2.325383,0.728764,2.080155,1.65594,1.535965,0.32805,0.595816,10.579175
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
75%,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0
max,12.0,1.0,17.0,5.0,15.0,12.0,11.0,2.0,4.0,79.0


### Discussion and Conlusion
As initially stated problem was detect most safer part of Baltimore City in order to have safe family living.
Data analysis showed that SOUTHEASTERN district is most violent district while WESTERN district is least violent district with less crimes committed.

#### Taking in account crime data from Baltimore police department conclusion is that most safest district in Baltimore would be Western district.