# Capstone Project: The Battle of the Neighbourhoods (Week 1)
### Corurse 9: Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)

## Introduction: Business Problem <a name="introduction"></a>

In this project, we will try to build a model to find the optimal neighbourhood for openning a new business. As an example we will specify the business type to be an **Italian restaurant**.

## Data <a name="data"></a>

Import Libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import re

In [2]:
dfs = pd.read_html('https://en.wikipedia.org/wiki/List_of_London_boroughs') 

# df = pd.read_csv('london-borough-profiles.csv', encoding = "ISO-8859-1")

In [3]:
df = pd.concat([dfs[0], dfs[1]])
df.head()

Unnamed: 0,Borough,Inner,Status,Local authority,Political control,Headquarters,Area (sq mi),Population (2013 est)[1],Co-ordinates,Nr. in map,Population(2011 est),Nr. inmap
0,Barking and Dagenham [note 1],,,Barking and Dagenham London Borough Council,Labour,"Town Hall, 1 Town Square",13.93,194352.0,51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E,25.0,,
1,Barnet,,,Barnet London Borough Council,Conservative,"Barnet House, 2 Bristol Avenue, Colindale",33.49,369088.0,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W,31.0,,
2,Bexley,,,Bexley London Borough Council,Conservative,"Civic Offices, 2 Watling Street",23.38,236687.0,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E,23.0,,
3,Brent,,,Brent London Borough Council,Labour,"Brent Civic Centre, Engineers Way",16.7,317264.0,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W,12.0,,
4,Bromley,,,Bromley London Borough Council,Conservative,"Civic Centre, Stockwell Close",57.97,317899.0,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E,20.0,,


In [4]:
london_df = df[['Borough', 'Co-ordinates']].reset_index(drop=True)

In [5]:
# clean Borough from [note 1]
note_regex = re.compile(r"\[note \d]")
london_df['Borough'] = london_df['Borough'].str.replace(pat=note_regex, repl='', regex=True)

london_df[['Co-ordinates_DMS', 'Co-ordinates_DEC']] = london_df['Co-ordinates'].str.split(pat=' / ', expand=True)
london_df[['Latitude', 'Longitude']] = london_df['Co-ordinates_DEC'].str.split(expand=True)

london_df.head()

Unnamed: 0,Borough,Co-ordinates,Co-ordinates_DMS,Co-ordinates_DEC,Latitude,Longitude
0,Barking and Dagenham,51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E,51°33′39″N 0°09′21″E﻿,﻿51.5607°N 0.1557°E,﻿51.5607°N,0.1557°E
1,Barnet,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W,51°37′31″N 0°09′06″W﻿,﻿51.6252°N 0.1517°W,﻿51.6252°N,0.1517°W
2,Bexley,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E,51°27′18″N 0°09′02″E﻿,﻿51.4549°N 0.1505°E,﻿51.4549°N,0.1505°E
3,Brent,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W,51°33′32″N 0°16′54″W﻿,﻿51.5588°N 0.2817°W,﻿51.5588°N,0.2817°W
4,Bromley,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E,51°24′14″N 0°01′11″E﻿,﻿51.4039°N 0.0198°E,﻿51.4039°N,0.0198°E


In [6]:
def clean_dec(dec_str):
    
    sign = -1 if re.search('[swSW]', dec_str) else 1
    dec_str = re.sub(r'°.', '', dec_str)
    dec_str = re.sub(' ', '', dec_str)
    dec_str = re.sub(u'\ufeff', '', dec_str)
    
    return sign * (float(dec_str))

london_df['Latitude'] = london_df['Latitude'].apply(clean_dec)
london_df['Longitude'] = london_df['Longitude'].apply(clean_dec)

london_df = london_df[['Borough', 'Latitude', 'Longitude']]

london_df.head()

Unnamed: 0,Borough,Latitude,Longitude
0,Barking and Dagenham,51.5607,0.1557
1,Barnet,51.6252,-0.1517
2,Bexley,51.4549,0.1505
3,Brent,51.5588,-0.2817
4,Bromley,51.4039,0.0198
