# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## <span style="color:darkred">Table of contents</span>
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## <span style="color:darkred"> 1. Introduction: Business Problem <a name="introduction"></a></span>

In this project, we will investigate the influence of poverty rate to the facilities and amenities of certain borough and how this insights can provide targeting local and business development for London boroughs. It will be interesting to see how poverty impacts the facilities, venues and amenities available in the boroughs. 

<br><br>
<center><img src="poverty_rate.jpg"
     alt="London Poverty Rates"
     width = "700"
     style="float: centre; margin-left: 10px;" /></center><br><br>


This will be done by carrying out a comparative studies on the facilities and amenities between the two boroughs with the highest income rate and the lowest in London, **Tower Hamlet (TH)** borough and **Bromley (Brom)** borough of London, as shown in the figure above.  

We will use the data science principles and techniques learned to generate a model of these boroughs, looking at the venues and facilities available in these areas and provides insights to stakeholders, the local authorities and business chambers of commerce.

### Questions to answer:

1.	What types of facilities (venues) and amenities are available in the wards (neighbourhoods) with different poverty line?
2.	How venues changing based on the spending power?
3.	What are the distinctive venues that represent in these boroughs?
4.	Suggestions and recommendations. 

By answering the above questions, the findings can be used for the targeting development for the rest of the London Boroughs so that unnecessary development can be avoided and overall budget can be sustained.

======================================================================================================================== 

## <span style="color:darkred">2. Data <a name="data"></a></span>

Based on the problem definition, factors that will influence the decision in this project will be:

* number and the type of venues and facilities available in the suurounding area of these boroughs
* the most frequent venues and facilties for each boroughs

To define the surrounding area of the borough, we will be using:
* London Boroughs poverty information from: **https://www.trustforlondon.org.uk/data/poverty-borough**. *Accessed: 11/03/2019*
* latitudes and longitudes of Tower Hamlet and Bromley  obtained from: **https://www.distancesto.com/coordinates/gb/**. *Accessed: 11/03/2019*  
* venues, type and locations in every borough will be obtained using **Foursquare API**

Further information of the boroughs can be found at:
* Bromley: **https://en.wikipedia.org/wiki/London_Borough_of_Bromley** 
* Tower Hamlet: **https://en.wikipedia.org/wiki/London_Borough_of_Tower_Hamlets**

## Boroughs Locations 
Based on the information obtained from https://www.distancesto.com/coordinates/gb/, the latitude and the longitude of TM and Brom are as follows:

In [2]:
TH_coordinates = (51.520261, -0.02934)
BROM_coordinates = (51.367971, 0.070062)
london_coordinates = (51.509865, -0.118092)

Let's visualise the locations of these boroughs on London map:

In [3]:
import folium
from folium.features import DivIcon

london_map = folium.Map(location = london_coordinates, zoom_start = 10)

folium.Circle(
    radius=2500,   # the radius is calculated based on the area coverage of the borough 
    location= TH_coordinates,
    color= 'crimson',
    fill=False,
).add_to(london_map)

folium.Marker(
    TH_coordinates, 
    popup=('Tower Hamlet'), 
    icon=folium.Icon(color='crimson', 
    icon_color='white', icon='info-sign', angle=0, prefix='fa')
).add_to(london_map)

folium.Circle(
    radius=6900, # the radius is calculated based on the area coverage of the borough 
    location= BROM_coordinates, 
    popup = 'Bromley',
    color='darkblue',
    fill=False,
).add_to(london_map)

folium.Marker(
    BROM_coordinates, 
    popup=('Bromley'), 
    icon=folium.Icon(color='darkblue', 
    icon_color='white', icon='info-sign', angle=0, prefix='fa')
).add_to(london_map)



london_map

Preliminary observations from the above map show that based on the locations for the two boroughs, Tower Hamlet is located very near to the London centre, where as Bromley borough located at the boundary of the M25 Ringroad, which is about 2 hours drive from London centre. In terms of area size, Tower Hamlet convers about 19.77km2 and Bromley is about 150.2km2. Based on these information, we can used them as distance references when retrieving venues, type and locations  using Foursquare API.

Now that we have the information about the boroughs, let's load the wards (neighbourhood) information of each boroughs.

In [4]:
import pandas as pd

TH_data = pd.read_csv('TH_neighbourhoods.csv')
TH_data.head()

Unnamed: 0,neighbourhood,latitude,longitude
0,Bethnal Green,51.526962,-0.06674
1,Blackwall and Cubitt Town,51.495182,-0.009826
2,Bow East,51.528309,-0.019482
3,Bow West,51.528309,-0.019482
4,Canary Wharf,51.505219,-0.0189


In [5]:
print('Borough of Tower Hamlet has {} neighbourhoods.'.format(
        len(TH_data['neighbourhood'].unique()),
        TH_data.shape[0]
    )
)

Borough of Tower Hamlet has 15 neighbourhoods.


In [6]:
BROM_data = pd.read_csv('BROM_neighbourhoods.csv')
BROM_data.head()

Unnamed: 0,neighbourhood,latitude,longitude
0,Bickley,51.40174,0.043712
1,Biggin Hill,51.331959,0.029057
2,Bromley Common & Keston,51.375875,0.043819
3,Bromley Town,51.402805,0.014814
4,Chelsfield & Pratts Bottom,51.357943,0.127288


In [7]:
print('Borough of Bromley has {} neighborhoods.'.format(
        len(BROM_data['neighbourhood'].unique()),
        BROM_data.shape[0]
    )
)

Borough of Bromley has 20 neighborhoods.


Now, let's confirm the locations of all the wards within the borough on the map based on the area coverage we have define earlier. b

In [8]:
london_map = folium.Map(location = london_coordinates, zoom_start = 10)

folium.Circle(
    radius=2500,   # the radius is calculated based on the area coverage of the borough 
    location= TH_coordinates,
    color= 'crimson',
    fill=False,
).add_to(london_map)

# add markers to map
for lat, lng, label in zip(TH_data['latitude'], TH_data['longitude'], TH_data['neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(london_map) 

folium.Circle(
    radius=6900, # the radius is calculated based on the area coverage of the borough 
    location= BROM_coordinates, 
    popup = 'Bromley',
    color='darkblue',
    fill=False,
).add_to(london_map)

# add markers to map
for lat, lng, label in zip(BROM_data['latitude'], BROM_data['longitude'], BROM_data['neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(london_map) 


london_map

## Foursquare API

Now that we have our location candidates, let's use Foursquare API to get info on venues in each of the wards within each borough

As an exploratory project, we will retrieve the venues based on the areas of each borough. We will then do the neccesary manipulations and analysis to achieve our objectives.  

### Define Foursquare Credential and Version 

In [9]:
CLIENT_ID = 'VB4GSHAOKEPLPPVS0VBVRAXL3DXVHHTRU3BJ4X4NJSGSF3R4' # your Foursquare ID
CLIENT_SECRET = '0N21QNS5SLASFGZFAGYFVAGKCE15NM40N2ZOXLAHS50KHICP' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: VB4GSHAOKEPLPPVS0VBVRAXL3DXVHHTRU3BJ4X4NJSGSF3R4
CLIENT_SECRET:0N21QNS5SLASFGZFAGYFVAGKCE15NM40N2ZOXLAHS50KHICP


In [10]:
# importing the neccesary libraries for the tasks 

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import pandas as pd
import json # library to handle JSON files
import requests # library to handle requests

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt
%matplotlib inline 

In [17]:
# fundtion to repeat the same process of retrieving venues of all the neighbourhoods 

def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT = 100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Tower Hamlet: Let's get the venues in Tower Hamlet within a radius of 500 meters of each neighbourhoods

In [18]:
TH_venues = getNearbyVenues(names=TH_data['neighbourhood'],
                                   latitudes = TH_data['latitude'],
                                   longitudes = TH_data['longitude']
                                  )



Bethnal Green
Blackwall and Cubitt Town
Bow East
Bow West
Canary Wharf
Island Gardens 
Lansbury
Limehouse 
Poplar
Shadwell
Spitalfields and Banglatown
St Katharine's and Wapping 
Stepney Green
Weavers
Whitechapel


In [19]:
print(TH_venues.shape)

TH_venues.to_csv('TH_venues', index = False)
TH_venues.head()

(520, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bethnal Green,51.526962,-0.06674,The King's Arms,51.525754,-0.065868,Pub
1,Bethnal Green,51.526962,-0.06674,Sam's Cafe,51.526424,-0.065056,Café
2,Bethnal Green,51.526962,-0.06674,Woolidando,51.526377,-0.066518,Café
3,Bethnal Green,51.526962,-0.06674,Jonestown,51.526092,-0.067936,Coffee Shop
4,Bethnal Green,51.526962,-0.06674,E Pellicci,51.526516,-0.063426,Café


In [20]:
TH_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bethnal Green,63,63,63,63,63,63
Blackwall and Cubitt Town,32,32,32,32,32,32
Bow East,13,13,13,13,13,13
Bow West,13,13,13,13,13,13
Canary Wharf,100,100,100,100,100,100
Island Gardens,30,30,30,30,30,30
Lansbury,16,16,16,16,16,16
Limehouse,28,28,28,28,28,28
Poplar,13,13,13,13,13,13
Shadwell,22,22,22,22,22,22


#### Let's find out how many unique categories can we obtained from Tower Hamlet

In [21]:
print('There are {} uniques venue categories at Tower Hamlet.'.format(len(TH_venues['Venue Category'].unique())))

There are 156 uniques venue categories at Tower Hamlet.


### Bromley: Let's get the venues in Bromley within a radius of 500 meters of each neighbourhoods

In [22]:
BROM_venues = getNearbyVenues(names=BROM_data['neighbourhood'],
                                   latitudes = BROM_data['latitude'],
                                   longitudes = BROM_data['longitude']
                                  )



Bickley
Biggin Hill
Bromley Common & Keston
Bromley Town
Chelsfield & Pratts Bottom
Chislehurst
Copers Cope
Cray Valley East
Crystal Palace
Darwin
Farnborough & Croftton
Hayes & Coney Hall
Kelsey & Eden Park
Mottingham & Chislehurst North
Orpington
Penge & Cator
Petts Wood & Knoll
Plaistow & Sunbridge 
Shortlangs 
West Wickham 


In [23]:
print(BROM_venues.shape)

BROM_venues.to_csv('BROM_venues', index = False)
BROM_venues.head()

(193, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bickley,51.40174,0.043712,Bickley Railway Station (BKL),51.400032,0.045351,Train Station
1,Bickley,51.40174,0.043712,Bickley Park Cricket Club,51.401508,0.046263,Cricket Ground
2,Bickley,51.40174,0.043712,J Henry Flooring,51.401227,0.040652,Home Service
3,Bickley,51.40174,0.043712,Village Sandwich Bar,51.399301,0.047951,Café
4,Biggin Hill,51.331959,0.029057,Biggin Hill Airport (BQH) (Biggin Hill Airport),51.331794,0.028845,Airport


In [24]:
BROM_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bickley,4,4,4,4,4,4
Biggin Hill,4,4,4,4,4,4
Bromley Common & Keston,5,5,5,5,5,5
Bromley Town,45,45,45,45,45,45
Chelsfield & Pratts Bottom,3,3,3,3,3,3
Chislehurst,6,6,6,6,6,6
Copers Cope,4,4,4,4,4,4
Cray Valley East,28,28,28,28,28,28
Crystal Palace,22,22,22,22,22,22
Darwin,3,3,3,3,3,3


In [25]:
print('There are {} uniques venue categories at Bromley.'.format(len(BROM_venues['Venue Category'].unique())))

There are 76 uniques venue categories at Bromley.


####  In summary:
* Total venues in Tower Hamlet neighbourhoods returned by Foursquare: **520**
* Total venues in Bromley neighbourhoods returned by Foursquare: **193**
* Total unique venue categories in Tower Hamlet = **156**
* Total unique venue categories in Bromley = **76**
* Tower Hamlet venues dataframe is called: **TH_venues**
* Tower Hamlet venues dataframe is called: **BROM_venues**

## <span style="color:darkred">3. Methodology <a name="methodology"></a></span>

In this project, we will be only concentrating the two boroughs with the largest poverty rate gaps. 

In the first step, we will be looking into the top 5 most common venues for each of the borough, this will provides us with an overview of the types of venues popular in the boroughs. 

Second steps we will be clustering the neighbourhood for each borough to investigate the cluster formations 

in third and final steps, we will be drill into each of cluster to obtained the reviews of some of the venues to seek the quality of service at venues in this separate borough.

With these analysis, we will be able to drawn some conclusions on how poverty rate in particular borough impacting the venues in the areas using Foursquare geospatial data.

======================================================================================================================== 

## <span style="color:darkred">4. Analysis <a name="analysis"></a></span>

### Analyse Each Borough 

======================================================================================================================== 

## <span style="color:darkred">5. Results and Discussion <a name="results"></a></span>

======================================================================================================================== 

## <span style="color:darkred">6. Conclusions <a name="conclusion"></a></span>

======================================================================================================================== 