# Introduction: Business Problem

**1.1 Background** <br>
The average American moves about eleven times in their lifetime. This brings us to the
question: **Do people move until they find a place to settle down where they truly feel happy,
or do our wants and needs change over time, prompting us to eventually leave a town we
once called home for a new area that will bring us satisfaction? Or, do we too often move to
a new area without knowing exactly what we’re getting into, forcing us to turn tail and run at
the first sign of discomfort?**
To minimize the chances of this happening, we should always do proper research when
planning our next move in life. Consider the following factors when picking a new place to
live so you don’t end up wasting your valuable time and money making a move you’ll end
up regretting. Safety is a top concern when moving to a new area. If you don’t feel safe in
your own home, you’re not going to be able to enjoy living there.

**1.2 Problem** <br>
The crime statistics dataset of London found on Kaggle has crimes in each Boroughs of
London from 2008 to 2016. The year 2016 being the latest we will be considering the data
of that year which is actually old information as of now. The crime rates in each borough
may have changed over time.
This project aims to select the safest borough in London based on the total crimes, explore
the neighborhoods of that borough to find the 10 most common venues in each
neighborhood and finally cluster the neighborhoods using k-mean clustering.

**1.3 Interest** <br>
Expats who are considering to relocate to London will be interested to identify the safest
borough in London and explore its neighborhoods and common venues around each
neighborhood.

# Data Acquisition and Cleaning

**2.1 Data Acquisition** <br>
The data acquired for this project is a combination of data from three sources. The first data
source of the project uses a London crime data that shows the crime per borough in
London. The dataset contains the following columns:<br>
● lsoa_code : code for Lower Super Output Area in Greater London. <br>
● borough : Common name for London borough.<br>
● major_category : High level categorization of crime <br>
● minor_category : Low level categorization of crime within major category. <br>
● value : monthly reported count of categorical crime in given borough <br>
● year : Year of reported counts, 2008-2016 <br>
● month : Month of reported counts, 1-12 <br> <br>

Data set URL: https://www.kaggle.com/jboysen/london-crime


The second source of data is scraped from a wikipedia page that contains the list of London
boroughs . This page contains additional information about the boroughs, the following are
the columns:<br>
● Borough : The names of the 33 London boroughs.<br>
● Inner : Categorizing the borough as an Inner London borough or an Outer London
Borough.<br>
● Status : Categorizing the borough as Royal, City or other borough.<br>
● Local authority : The local authority assigned to the borough.<br>
● Political control : The political party that control the borough.<br>
● Headquarters: Headquarters of the Boroughs.<br>
● Area (sq mi) : Area of the borough in square miles.<br>
● Population (2013 est)[1] : The population in the borough recorded during the year
2013.<br>
● Co-ordinates : The latitude and longitude of the boroughs.<br>
● Nr. in map : The number assigned to each borough to represent visually on a map.<br><br>
The third data source is the list of Neighborhoods in the Royal Borough of Kingston upon
Thames as found on a wikipedia page. This dataset is created from scratch using the list of
neighborhood available on the site, the following are columns:<br>
● Neighborhood: Name of the neighborhood in the Borough.<br>
● Borough: Name of the Borough. <br>
● Latitude: Latitude of the Borough. <br>
● Longitude: Longitude of the Borough.<br> <br>
**2.2 Data Cleaning** <br>
The data preparation for each of the three sources of data is done separately. From the
London crime data, the crimes during the most recent year (2016) are only selected. The
major categories of crime are pivoted to get the total crimes per the boroughs for each
major category.
<br><br>
The second **data is scraped from a wikipedia page using the Beautiful Soup library** in
python. Using this library we can extract the data in the tabular format as shown in the
website. After the web scraping, string manipulation is required to get the names of the
boroughs in the correct form. This is important because we will be merging the
two datasets together using the Borough names.<br>
<br>
The two datasets are merged on the Borough names to form a new dataset that combines
the necessary information in one dataset. The purpose of this dataset is to
visualize the crime rates in each borough and identify the borough with the least crimes
recorded during the year 2016.
<br><br>
After visualizing the crime in each borough we can find the borough with the lowest crime
rate and hence tag that borough as the safest borough. The third source of data is acquired
from the list of neighborhoods in the safest borough on wikipedia. This dataset is created
from scratch, the pandas data frame is created with the names of the neighborhoods and
the name of the borough with the latitude and longitude left blank.
<br><br>
The coordinates of the neighborhoods is be obtained using **Google Maps API geocoding**
to get the final dataset.
<br><br>
The new dataset is used to generate the 10 most common venues for each neighborhood
using the Foursquare API, finally using **k means clustering algorithm** to cluster similar
neighborhoods together.

# Data

Based on definition of our problem, factors that will influence our decision are: <br>

* The total number of crimes commited in each of the borough during the last year.
* The most common venues in each of the neighborhood in the safest borough selected.
<br>
Following data sources will be needed to extract/generate the required information:
<br>

* <u>Part 1: Preprocessing a real world data set from Kaggle showing the London Crimes from 2008 to 2016</u>: A dataset consisting of the crime statistics of each borough in London obtained from Kaggle
* <u>Part 2: Scraping additional information of the different Boroughs in London from a Wikipedia page</u>: More information regarding the boroughs of London is scraped using the Beautifulsoup library
* <u>Part 3: Creating a new dataset of the Neighborhoods of the safest borough in London and generating their co-ordinates</u>: Co-ordinate of neighborhood will be obtained using Google Maps API geocoding

### Part 1: Preprocessing a real world data set from Kaggle showing the London Crimes from 2008 to 2016

In [10]:
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
from bs4 import BeautifulSoup # library for web scrapping  

!conda install -c conda-forge geocoder --yes
import geocoder

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geocoder


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    geocoder-1.38.1            |             py_1          53 KB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    ratelim-0.1.6              |             py_2           6 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.4 MB

The following NEW packages will be INSTALLED:

    geocoder:        1.38.1-py_1       conda-forge
    ratelim:         0.1.6-py_2        conda-forge

The following packages will be UPDATED:

    

### Define Foursquare Credentials and Version

In [11]:
CLIENT_ID = 'DN0CXUHUICP4UEVBIBSXGRAGFII4BM35O24UX0ZGPLEV2RVY' # my Foursquare ID
CLIENT_SECRET = 'KQLXG1JXRNHO1TQHOTHTO150BL1UFGO3REO5OUKRZZ342SCQ' # my Foursquare Secret

VERSION = '20180604'
LIMIT = 30

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: DN0CXUHUICP4UEVBIBSXGRAGFII4BM35O24UX0ZGPLEV2RVY
CLIENT_SECRET:KQLXG1JXRNHO1TQHOTHTO150BL1UFGO3REO5OUKRZZ342SCQ


In [12]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,lsoa_code,borough,major_category,minor_category,value,year,month
0,E01001116,Croydon,Burglary,Burglary in Other Buildings,0,2016,11
1,E01001646,Greenwich,Violence Against the Person,Other violence,0,2016,11
2,E01000677,Bromley,Violence Against the Person,Other violence,0,2015,5
3,E01003774,Redbridge,Burglary,Burglary in Other Buildings,0,2016,3
4,E01004563,Wandsworth,Robbery,Personal Property,0,2008,6


**Accessing the most recent crime rates (2016)**

In [13]:
# Taking only the most recent year (2016) and dropping the rest
df_data_1.drop(df_data_1.index[df_data_1['year'] != 2016], inplace = True)

# Removing all the entires where crime values are null  
df_data_1 = df_data_1[df_data_1.value != 0]

# Reset the index and dropping the previous index
df_data_1 = df_data_1.reset_index(drop=True)

In [14]:
# Shape of the data frame
df_data_1.shape

(10130, 7)

In [15]:
# View the top of the dataset 
df_data_1.head()

Unnamed: 0,lsoa_code,borough,major_category,minor_category,value,year,month
0,E01004177,Sutton,Theft and Handling,Theft/Taking of Pedal Cycle,1,2016,8
1,E01000733,Bromley,Criminal Damage,Criminal Damage To Motor Vehicle,1,2016,4
2,E01003989,Southwark,Theft and Handling,Theft From Shops,4,2016,8
3,E01002276,Havering,Burglary,Burglary in a Dwelling,1,2016,8
4,E01003674,Redbridge,Drugs,Possession Of Drugs,2,2016,11


In [16]:
# Change the column names
df_data_1.columns = ['LSOA_Code', 'Borough','Major_Category','Minor_Category','No_of_Crimes','Year','Month']
df_data_1.head()

Unnamed: 0,LSOA_Code,Borough,Major_Category,Minor_Category,No_of_Crimes,Year,Month
0,E01004177,Sutton,Theft and Handling,Theft/Taking of Pedal Cycle,1,2016,8
1,E01000733,Bromley,Criminal Damage,Criminal Damage To Motor Vehicle,1,2016,4
2,E01003989,Southwark,Theft and Handling,Theft From Shops,4,2016,8
3,E01002276,Havering,Burglary,Burglary in a Dwelling,1,2016,8
4,E01003674,Redbridge,Drugs,Possession Of Drugs,2,2016,11


In [17]:
# View the information of the dataset 
df_data_1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10130 entries, 0 to 10129
Data columns (total 7 columns):
LSOA_Code         10130 non-null object
Borough           10130 non-null object
Major_Category    10130 non-null object
Minor_Category    10130 non-null object
No_of_Crimes      10130 non-null int64
Year              10130 non-null int64
Month             10130 non-null int64
dtypes: int64(3), object(4)
memory usage: 554.1+ KB


**Total number of crimes in each Borough**

In [18]:
df_data_1['Borough'].value_counts()

Southwark                 458
Croydon                   439
Lambeth                   436
Ealing                    408
Newham                    404
Tower Hamlets             389
Brent                     387
Haringey                  368
Wandsworth                359
Barnet                    356
Enfield                   351
Hackney                   350
Lewisham                  345
Westminster               337
Islington                 336
Waltham Forest            335
Camden                    324
Hillingdon                323
Greenwich                 315
Redbridge                 312
Bromley                   311
Hounslow                  308
Havering                  271
Barking and Dagenham      266
Hammersmith and Fulham    261
Kensington and Chelsea    255
Bexley                    222
Harrow                    208
Merton                    205
Richmond upon Thames      190
Sutton                    167
Kingston upon Thames      130
City of London              4
Name: Boro

**The total crimes per major category**

In [19]:
df_data_1['Major_Category'].value_counts()

Theft and Handling             3313
Violence Against the Person    3225
Criminal Damage                1257
Burglary                       1116
Drugs                           552
Robbery                         364
Other Notifiable Offences       303
Name: Major_Category, dtype: int64

**Pivoting the table to view the no. of crimes for each major category in each Borough**

In [20]:
London_crime = pd.pivot_table(df_data_1,values=['No_of_Crimes'],
                               index=['Borough'],
                               columns=['Major_Category'],
                               aggfunc=np.sum,fill_value=0)
London_crime.head()

Unnamed: 0_level_0,No_of_Crimes,No_of_Crimes,No_of_Crimes,No_of_Crimes,No_of_Crimes,No_of_Crimes,No_of_Crimes
Major_Category,Burglary,Criminal Damage,Drugs,Other Notifiable Offences,Robbery,Theft and Handling,Violence Against the Person
Borough,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
Barking and Dagenham,46,62,21,18,11,133,143
Barnet,83,48,22,18,12,218,172
Bexley,28,58,23,5,4,102,128
Brent,67,54,38,18,16,224,235
Bromley,50,50,12,11,5,197,174


In [21]:
# Reset the index
London_crime.reset_index(inplace = True)

In [22]:
# Total crimes per Borough
London_crime['Total'] = London_crime.sum(axis=1)
London_crime.head(33)

Unnamed: 0_level_0,Borough,No_of_Crimes,No_of_Crimes,No_of_Crimes,No_of_Crimes,No_of_Crimes,No_of_Crimes,No_of_Crimes,Total
Major_Category,Unnamed: 1_level_1,Burglary,Criminal Damage,Drugs,Other Notifiable Offences,Robbery,Theft and Handling,Violence Against the Person,Unnamed: 9_level_1
0,Barking and Dagenham,46,62,21,18,11,133,143,434
1,Barnet,83,48,22,18,12,218,172,573
2,Bexley,28,58,23,5,4,102,128,348
3,Brent,67,54,38,18,16,224,235,652
4,Bromley,50,50,12,11,5,197,174,499
5,Camden,37,69,66,7,20,291,223,713
6,City of London,0,0,0,0,0,3,1,4
7,Croydon,49,86,29,20,22,293,301,800
8,Ealing,65,67,34,13,26,253,248,706
9,Enfield,52,37,33,16,25,211,207,581


**Removing the multi index so that it will be easier to merge**

In [23]:
London_crime.columns = London_crime.columns.map(''.join)
London_crime.head()

Unnamed: 0,Borough,No_of_CrimesBurglary,No_of_CrimesCriminal Damage,No_of_CrimesDrugs,No_of_CrimesOther Notifiable Offences,No_of_CrimesRobbery,No_of_CrimesTheft and Handling,No_of_CrimesViolence Against the Person,Total
0,Barking and Dagenham,46,62,21,18,11,133,143,434
1,Barnet,83,48,22,18,12,218,172,573
2,Bexley,28,58,23,5,4,102,128,348
3,Brent,67,54,38,18,16,224,235,652
4,Bromley,50,50,12,11,5,197,174,499


**Renaming the columns**

In [24]:
London_crime.columns = ['Borough','Burglary', 'Criminal Damage','Drugs','Other Notifiable Offences',
                        'Robbery','Theft and Handling','Violence Against the Person','Total']
London_crime.head()

Unnamed: 0,Borough,Burglary,Criminal Damage,Drugs,Other Notifiable Offences,Robbery,Theft and Handling,Violence Against the Person,Total
0,Barking and Dagenham,46,62,21,18,11,133,143,434
1,Barnet,83,48,22,18,12,218,172,573
2,Bexley,28,58,23,5,4,102,128,348
3,Brent,67,54,38,18,16,224,235,652
4,Bromley,50,50,12,11,5,197,174,499


In [25]:
# Shape of the data set 
London_crime.shape

(33, 9)

In [26]:
# View the Columns in the data frame
London_crime.columns.tolist()

['Borough',
 'Burglary',
 'Criminal Damage',
 'Drugs',
 'Other Notifiable Offences',
 'Robbery',
 'Theft and Handling',
 'Violence Against the Person',
 'Total']

### Part 2: Scraping additional information of the different Boroughs in London from a Wikipedia page

**Using Beautiful soup to scrap the latitude and longitiude of the boroughs in London**

URL: https://en.wikipedia.org/wiki/List_of_London_boroughs

In [29]:
# getting data from internet
import requests
wikipedia_link='https://en.wikipedia.org/wiki/List_of_London_boroughs'
raw_wikipedia_page= requests.get(wikipedia_link).text

# using beautiful soup to parse the HTML/XML codes.
soup = BeautifulSoup(raw_wikipedia_page,'xml')
print(soup.prettify())

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="UTF-8"/>
  <title>
   List of London boroughs - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgMonthNamesShort":["","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"wgRequestId":"XgoaFwpAAEYAAAZtH4QAAAAP","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_London_boroughs","wgTitle":"List of London boroughs","wgCurRevisionId":931680068,"wgRevisionId":931680068,"wgArticleId":28092685,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories

In [30]:
# extracting the raw table inside that webpage
table = soup.find_all('table', {'class':'wikitable sortable'})
print(table)

[<table class="wikitable sortable" style="font-size:100%" width="100%">
<tbody><tr>
<th>Borough
</th>
<th>Inner
</th>
<th>Status
</th>
<th>Local authority
</th>
<th>Political control
</th>
<th>Headquarters
</th>
<th>Area (sq mi)
</th>
<th>Population (2013 est)<sup class="reference" id="cite_ref-1"><a href="#cite_note-1">[1]</a></sup>
</th>
<th>Co-ordinates
</th>
<th><span style="background:#67BCD3"> Nr. in map </span>
</th></tr>
<tr>
<td><a href="/wiki/London_Borough_of_Barking_and_Dagenham" title="London Borough of Barking and Dagenham">Barking and Dagenham</a> <sup class="reference" id="cite_ref-2"><a href="#cite_note-2">[note 1]</a></sup>
</td>
<td>
</td>
<td>
</td>
<td><a href="/wiki/Barking_and_Dagenham_London_Borough_Council" title="Barking and Dagenham London Borough Council">Barking and Dagenham London Borough Council</a>
</td>
<td><a href="/wiki/Labour_Party_(UK)" title="Labour Party (UK)">Labour</a>
</td>
<td><a class="new" href="/w/index.php?title=Barking_Town_Hall&amp;actio

**Converting the table into a data frame**

In [31]:
London_table = pd.read_html(str(table[0]), index_col=None, header=0)[0]
London_table.head()

Unnamed: 0,Borough,Inner,Status,Local authority,Political control,Headquarters,Area (sq mi),Population (2013 est)[1],Co-ordinates,Nr. in map
0,Barking and Dagenham [note 1],,,Barking and Dagenham London Borough Council,Labour,"Town Hall, 1 Town Square",13.93,194352,51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E,25
1,Barnet,,,Barnet London Borough Council,Conservative,"North London Business Park, Oakleigh Road South",33.49,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W,31
2,Bexley,,,Bexley London Borough Council,Conservative,"Civic Offices, 2 Watling Street",23.38,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E,23
3,Brent,,,Brent London Borough Council,Labour,"Brent Civic Centre, Engineers Way",16.7,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W,12
4,Bromley,,,Bromley London Borough Council,Conservative,"Civic Centre, Stockwell Close",57.97,317899,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E,20


**The second table on the site contains the addition Borough i.e. City of London**

In [32]:
# Read in the second table 
London_table1 = pd.read_html(str(table[1]), index_col=None, header=0)[0]

# Rename the columns to match the previous table to append the tables.

London_table1.columns = ['Borough','Inner','Status','Local authority','Political control',
                         'Headquarters','Area (sq mi)','Population (2013 est)[1]','Co-ordinates','Nr. in map']

# View the table
London_table1

Unnamed: 0,Borough,Inner,Status,Local authority,Political control,Headquarters,Area (sq mi),Population (2013 est)[1],Co-ordinates,Nr. in map
0,City of London,([note 5],Sui generis;City;Ceremonial county,Corporation of London;Inner Temple;Middle Temple,?,Guildhall,1.12,7000,51°30′56″N 0°05′32″W﻿ / ﻿51.5155°N 0.0922°W,1


**Append the data frame together**

In [33]:
# A continuous index value will be maintained 
# across the rows in the new appended data frame. 

London_table = London_table.append(London_table1, ignore_index = True) 
London_table.head()

Unnamed: 0,Borough,Inner,Status,Local authority,Political control,Headquarters,Area (sq mi),Population (2013 est)[1],Co-ordinates,Nr. in map
0,Barking and Dagenham [note 1],,,Barking and Dagenham London Borough Council,Labour,"Town Hall, 1 Town Square",13.93,194352,51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E,25
1,Barnet,,,Barnet London Borough Council,Conservative,"North London Business Park, Oakleigh Road South",33.49,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W,31
2,Bexley,,,Bexley London Borough Council,Conservative,"Civic Offices, 2 Watling Street",23.38,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E,23
3,Brent,,,Brent London Borough Council,Labour,"Brent Civic Centre, Engineers Way",16.7,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W,12
4,Bromley,,,Bromley London Borough Council,Conservative,"Civic Centre, Stockwell Close",57.97,317899,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E,20


**Check if the last row was appended correctly**

In [34]:
London_table.tail()

Unnamed: 0,Borough,Inner,Status,Local authority,Political control,Headquarters,Area (sq mi),Population (2013 est)[1],Co-ordinates,Nr. in map
28,Tower Hamlets,,,Tower Hamlets London Borough Council,Labour,"Town Hall, Mulberry Place, 5 Clove Crescent",7.63,272890,51°30′36″N 0°00′21″W﻿ / ﻿51.5099°N 0.0059°W,8
29,Waltham Forest,,,Waltham Forest London Borough Council,Labour,"Waltham Forest Town Hall, Forest Road",14.99,265797,51°35′27″N 0°00′48″W﻿ / ﻿51.5908°N 0.0134°W,28
30,Wandsworth,,,Wandsworth London Borough Council,Conservative,"The Town Hall, Wandsworth High Street",13.23,310516,51°27′24″N 0°11′28″W﻿ / ﻿51.4567°N 0.1910°W,5
31,Westminster,,City,Westminster City Council,Conservative,"Westminster City Hall, 64 Victoria Street",8.29,226841,51°29′50″N 0°08′14″W﻿ / ﻿51.4973°N 0.1372°W,2
32,City of London,([note 5],Sui generis;City;Ceremonial county,Corporation of London;Inner Temple;Middle Temple,?,Guildhall,1.12,7000,51°30′56″N 0°05′32″W﻿ / ﻿51.5155°N 0.0922°W,1


**View the information of the data set**

In [35]:
London_table.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33 entries, 0 to 32
Data columns (total 10 columns):
Borough                     33 non-null object
Inner                       4 non-null object
Status                      5 non-null object
Local authority             33 non-null object
Political control           33 non-null object
Headquarters                33 non-null object
Area (sq mi)                33 non-null float64
Population (2013 est)[1]    33 non-null int64
Co-ordinates                33 non-null object
Nr. in map                  33 non-null int64
dtypes: float64(1), int64(2), object(7)
memory usage: 2.7+ KB


In [36]:
#Removing Unnecessary string in the Data set
London_table = London_table.replace('note 1','', regex=True) 
London_table = London_table.replace('note 2','', regex=True) 
London_table = London_table.replace('note 3','', regex=True) 
London_table = London_table.replace('note 4','', regex=True) 
London_table = London_table.replace('note 5','', regex=True) 

# View the top of the data set
London_table.head()

Unnamed: 0,Borough,Inner,Status,Local authority,Political control,Headquarters,Area (sq mi),Population (2013 est)[1],Co-ordinates,Nr. in map
0,Barking and Dagenham [],,,Barking and Dagenham London Borough Council,Labour,"Town Hall, 1 Town Square",13.93,194352,51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E,25
1,Barnet,,,Barnet London Borough Council,Conservative,"North London Business Park, Oakleigh Road South",33.49,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W,31
2,Bexley,,,Bexley London Borough Council,Conservative,"Civic Offices, 2 Watling Street",23.38,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E,23
3,Brent,,,Brent London Borough Council,Labour,"Brent Civic Centre, Engineers Way",16.7,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W,12
4,Bromley,,,Bromley London Borough Council,Conservative,"Civic Centre, Stockwell Close",57.97,317899,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E,20


In [37]:
#Check the type of the newly created table
type(London_table)

pandas.core.frame.DataFrame

In [38]:
# Shape of the data frame
London_table.shape

(33, 10)

In [39]:
#Check if the Borough in both the data frames match.
set(df_data_1.Borough) - set(London_table.Borough)

{'Barking and Dagenham', 'Greenwich', 'Hammersmith and Fulham'}

In [40]:
#Find the index of the Boroughs that didn't match
print("The index of first borough is",London_table.index[London_table['Borough'] == 'Barking and Dagenham []'].tolist())
print("The index of second borough is",London_table.index[London_table['Borough'] == 'Greenwich []'].tolist())
print("The index of third borough is",London_table.index[London_table['Borough'] == 'Hammersmith and Fulham []'].tolist())

The index of first borough is [0]
The index of second borough is [9]
The index of third borough is [11]


In [41]:
#Changing the Borough names to match the other data frame
London_table.iloc[0,0] = 'Barking and Dagenham'
London_table.iloc[9,0] = 'Greenwich'
London_table.iloc[11,0] = 'Hammersmith and Fulham'

In [42]:
#Check if the Borough names in both data sets match
set(df_data_1.Borough) - set(London_table.Borough)

set()

In [43]:
#We can combine both the data frames together
Ld_crime = pd.merge(London_crime, London_table, on='Borough')
Ld_crime.head(10)

Unnamed: 0,Borough,Burglary,Criminal Damage,Drugs,Other Notifiable Offences,Robbery,Theft and Handling,Violence Against the Person,Total,Inner,Status,Local authority,Political control,Headquarters,Area (sq mi),Population (2013 est)[1],Co-ordinates,Nr. in map
0,Barking and Dagenham,46,62,21,18,11,133,143,434,,,Barking and Dagenham London Borough Council,Labour,"Town Hall, 1 Town Square",13.93,194352,51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E,25
1,Barnet,83,48,22,18,12,218,172,573,,,Barnet London Borough Council,Conservative,"North London Business Park, Oakleigh Road South",33.49,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W,31
2,Bexley,28,58,23,5,4,102,128,348,,,Bexley London Borough Council,Conservative,"Civic Offices, 2 Watling Street",23.38,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E,23
3,Brent,67,54,38,18,16,224,235,652,,,Brent London Borough Council,Labour,"Brent Civic Centre, Engineers Way",16.7,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W,12
4,Bromley,50,50,12,11,5,197,174,499,,,Bromley London Borough Council,Conservative,"Civic Centre, Stockwell Close",57.97,317899,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E,20
5,Camden,37,69,66,7,20,291,223,713,,,Camden London Borough Council,Labour,"Camden Town Hall, Judd Street",8.4,229719,51°31′44″N 0°07′32″W﻿ / ﻿51.5290°N 0.1255°W,11
6,City of London,0,0,0,0,0,3,1,4,([],Sui generis;City;Ceremonial county,Corporation of London;Inner Temple;Middle Temple,?,Guildhall,1.12,7000,51°30′56″N 0°05′32″W﻿ / ﻿51.5155°N 0.0922°W,1
7,Croydon,49,86,29,20,22,293,301,800,,,Croydon London Borough Council,Labour,"Bernard Weatherill House, Mint Walk",33.41,372752,51°22′17″N 0°05′52″W﻿ / ﻿51.3714°N 0.0977°W,19
8,Ealing,65,67,34,13,26,253,248,706,,,Ealing London Borough Council,Labour,"Perceval House, 14-16 Uxbridge Road",21.44,342494,51°30′47″N 0°18′32″W﻿ / ﻿51.5130°N 0.3089°W,13
9,Enfield,52,37,33,16,25,211,207,581,,,Enfield London Borough Council,Labour,"Civic Centre, Silver Street",31.74,320524,51°39′14″N 0°04′48″W﻿ / ﻿51.6538°N 0.0799°W,30


In [44]:
Ld_crime.shape

(33, 18)

In [45]:
set(df_data_1.Borough) - set(Ld_crime.Borough)

set()

In [46]:
#Rearranging the Columns

# List of Column names of the data frame 
list(Ld_crime)

['Borough',
 'Burglary',
 'Criminal Damage',
 'Drugs',
 'Other Notifiable Offences',
 'Robbery',
 'Theft and Handling',
 'Violence Against the Person',
 'Total',
 'Inner',
 'Status',
 'Local authority',
 'Political control',
 'Headquarters',
 'Area (sq mi)',
 'Population (2013 est)[1]',
 'Co-ordinates',
 'Nr. in map']

In [47]:
columnsTitles = ['Borough','Local authority','Political control','Headquarters',
                 'Area (sq mi)','Population (2013 est)[1]',
                 'Inner','Status',
                 'Burglary','Criminal Damage','Drugs','Other Notifiable Offences',
                 'Robbery','Theft and Handling','Violence Against the Person','Total','Co-ordinates']

Ld_crime = Ld_crime.reindex(columns=columnsTitles)

Ld_crime = Ld_crime[['Borough','Local authority','Political control','Headquarters',
                 'Area (sq mi)','Population (2013 est)[1]','Co-ordinates',
                 'Burglary','Criminal Damage','Drugs','Other Notifiable Offences',
                 'Robbery','Theft and Handling','Violence Against the Person','Total']]

Ld_crime.head()

Unnamed: 0,Borough,Local authority,Political control,Headquarters,Area (sq mi),Population (2013 est)[1],Co-ordinates,Burglary,Criminal Damage,Drugs,Other Notifiable Offences,Robbery,Theft and Handling,Violence Against the Person,Total
0,Barking and Dagenham,Barking and Dagenham London Borough Council,Labour,"Town Hall, 1 Town Square",13.93,194352,51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E,46,62,21,18,11,133,143,434
1,Barnet,Barnet London Borough Council,Conservative,"North London Business Park, Oakleigh Road South",33.49,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W,83,48,22,18,12,218,172,573
2,Bexley,Bexley London Borough Council,Conservative,"Civic Offices, 2 Watling Street",23.38,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E,28,58,23,5,4,102,128,348
3,Brent,Brent London Borough Council,Labour,"Brent Civic Centre, Engineers Way",16.7,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W,67,54,38,18,16,224,235,652
4,Bromley,Bromley London Borough Council,Conservative,"Civic Centre, Stockwell Close",57.97,317899,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E,50,50,12,11,5,197,174,499
