# Capstone Project: The Battle of Neighbourhoods-1¶

## CLUSTERING ANALYSIS TO OPEN A SUPERMARKET BUSINESS IN LONDON

## 1. Introduction/Business Understanding

### 1.1 Description of the problem

>Our client wants to open a supermarket in London. London is a competitive market and our customer would like to take into account the distribution of supermarkets and other commercial venues in the area to decide best place to open a supermarket. They also would like to see additional information about population and earnings by borough for London but this is a secondary data to look into in case of indecision between two options. 

### 1.2 Discussion of the background
>London is one of the largest cities of Europe and it is capital of United Kingdom. The population is more than 9 million. Moreover, it is really attractive destinations for tourists with nearly 21 million visitors per year. When we consider its diversity, it is obvious that supermarket business with offering great range of products from all around the world is good idea.  This analysis is aiming to help deciding the location of supermarket to optimize the revenue.

### 1.3 Target Audience
>The aim of this analysis is to recommend the perfect location for opening a supermarket in London for our client AAA Company. 

### Success Criteria
>This project will be considered as successful, if it could provide distribution of supermarkets and other commercials in London area, with given the information about the population and average earnings in same area. 


## 2. Data
* __Scrapped Wikipedia__: List of areas of London to get information about London boroughs, their locations and postal codes.
* Foursquare API to extract venue information about like café, restaurants, supermarkets etc in borough in London. 
* The population of the boroughs of Greater London from citypopulation.de
* Earnings by Place of Residence, Borough  in London from https://data.london.gov.uk/dataset/earnings-place-residence-borough


> __Import necessary libraries__  
>__Scrapped data from Wikipedia__

In [3]:
import numpy as np # library to handle data 

import pandas as pd # library for data analsysis

import requests 



url="https://en.wikipedia.org/wiki/List_of_areas_of_London"
data_url=requests.get(url).text
#parse data from the html 
df, = pd.read_html(data_url, match="Location", skiprows=None)
df.head()

#create columns
columns=["Location", "Borough", "Post town", "PostalCode", "Dial code", "OS grid ref"]

df.columns=columns
#delete unrelevant columns
df2=df.drop(["Dial code", "OS grid ref"], axis=1)
df2['Borough'] = df2['Borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))
df3 = df2.drop('PostalCode', axis=1).join(df['PostalCode'].str.split(',', expand=True)
                                       .stack().reset_index(level=1, drop=True).rename('PostalCode'))
df4 = df3.reset_index(drop=True)
#delete not assigned 
df4 = df4[df4["Post town"] == "LONDON"].reset_index(drop=True)

df4.head()

import os, ssl
if (not os.environ.get('PYTHONHTTPSVERIFY', '') and getattr(ssl, '_create_unverified_context', None)):
    ssl._create_default_https_context = ssl._create_unverified_context

from geopy.geocoders import Nominatim

#Get coordinates of London
address = 'London, UK'

geolocator = Nominatim(user_agent="explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of London is {}, {}.'.format(latitude, longitude))



The geograpical coordinates of London is 51.5073219, -0.1276474.


__Convert an address into latitude and longitude values__

In [4]:
import geocoder
'''Geocoder starts here'''
'''Defining a function to use --> get_latlng()'''
def get_latlng(arcgis_geocoder):
    
    # Initialize the Location (lat. and long.) to "None"
    lat_lng_coords = None
    
    # While loop helps to create a continous run until all the location coordinates are geocoded
    
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, London, United Kingdom'.format(arcgis_geocoder))
        lat_lng_coords = g.latlng
    return lat_lng_coords

postal_codes = df4['PostalCode']    
coordinates = [get_latlng(postal_code) for postal_code in postal_codes.tolist()]


In [5]:
len(coordinates)

356

In [6]:
# This will store the London dataframe with coordinates
df_loc = df4

# The obtained latitude and longitude are merged with the dataframe 
df_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])
df_loc['Latitude'] = df_coordinates['Latitude']
df_loc['Longitude'] = df_coordinates['Longitude']
df_loc.head()

Unnamed: 0,Location,Borough,Post town,PostalCode,Latitude,Longitude
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2,51.49245,0.12127
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,W3,51.51324,-0.26746
2,Acton,"Ealing, Hammersmith and Fulham",LONDON,W4,51.48944,-0.26194
3,Aldgate,City,LONDON,EC3,51.512,-0.08058
4,Aldwych,Westminster,LONDON,WC2,51.51651,-0.11968


> __The coordinates of London City and creating a map of London City with boroughs__

In [7]:
import folium
print('The geograpical coordinate of "London" are: {}, {}.'.format(latitude, longitude))
# create map of Vienna using latitude and longitude values
map_London = folium.Map(location=[latitude,longitude], zoom_start=11)

# add markers to map
for lat, lng, borough in zip(df_loc['Latitude'], df_loc['Longitude']
                                          , df_loc['Borough']):
    
    label = '{}'.format(borough)
    label2 = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_London)  
   
map_London

The geograpical coordinate of "London" are: 51.5073219, -0.1276474.


__Using the Foursquare API to explore the neighborhoods__ 

In [277]:
# @hidden_cell
CLIENT_ID = 'hidden' # your Foursquare ID
CLIENT_SECRET = 'hidden' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: hidden
CLIENT_SECRET:hidden
