# Capstone Project - The Battle of Neighborhoods

This is the capstone project from the IBM Data Science Profesional Certificate. This notebook is based on [Capstone Project - The Battle of the Neighborhoods (Week 2)](https://cocl.us/coursera_capstone_notebook)

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

##  Introduction <a name="introduction"></a>

This research project aims to use Big Data techniques in order to obtain insight regarding opening a new business in major city. 

The main idea is to understand market conditions to select the better location to open a new business based on basic demographics and current information on an specific geographical region. 

The city of London is one of the largest cities in Europe which a population over 9M, moreover London is one of the world's leading tourism destinations which about 20M of visitors each year. With this data in mind, London looks like the perfect location to our research project. 

Convenience stores are retail small shops which a large variate items to sell. It stocks all the essential goods, such as groceries, but also other every day items. The target customers usually are people from the own neighboorhood.  


## Business Problem 

This research project aims to obtain information about the best place to start a new convience store in Lodon , by analyzing demographic data of each neighboorhod as well as the number of similar businesses in the area.




## Data Section <a name="data"></a>

The data needed to this project are on one side, information about the population on the target city, in this case London, and  the knowledge about similar busineeses in each neighbourhood, on the other.

For the first, Wikipedia may supply with the needed information about districts, neighbourhoods and population. To obtain data about similar businesses (supermarkets, groceries, convinience stores) foursquare is the selected tool.


### Boroughs/Neighbours Candidates

Wikipedia offers a lot of information public and free. On this [link](https://en.wikipedia.org/wiki/List_of_areas_of_London) are the list of nwgihbourhoods as well
as the borought.



Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
!pip3 install geopy # uncomment this line if you haven't completed the Foursquare API lab
!pip3 install folium==0.5.0 # uncomment this line if you haven't completed the Foursquare API lab
!pip3 install geocoder
!pip3 install lxml
!pip3 install shapely
!pip3 install pyproj
#!pip3 install pygeocoder
print("Installation Done!!")

Installation Done!!


In [2]:
import json # library to handle JSON files
import math
import numpy as np # library to handle data in a vectorized manner
import os
import pandas as pd # library for data analsysis
import pickle
import requests # library to handle requests
import time
import warnings

import folium # map rendering library
from folium import plugins
from folium.plugins import HeatMap
#from pygeocoder import Geocoder as g2
from geopy.geocoders import Nominatim
import geopy.distance
import matplotlib.cm as cm # Matplotlib and associated plotting modules
import matplotlib.colors as colors
import pyproj
import shapely.geometry

from sklearn.cluster import KMeans # import k-means from clustering stage
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)


app = Nominatim(user_agent="coursera_project") # instantiate a new Nominatim client

warnings.filterwarnings("ignore")
print('Libraries imported.')

Libraries imported.


The referecial in London is Trafalgar Square.

In [3]:
address = 'Trafalgar Square, London, United Kingdom'
london_center = [51.50739, -0.12764]
print('Coordinate of {}: {}'.format(address, london_center))

Coordinate of Trafalgar Square, London, United Kingdom: [51.50739, -0.12764]


Auxiliar functions are defined to calculate distances.

In [4]:
# Common functions to work with geocoodirnates.
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)


Now,areas around Trafalgar Square are defined.

In [5]:
london_center_x, london_center_y = lonlat_to_xy(london_center[1], london_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = london_center_x - 6000
x_step = 600
y_min = london_center_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(london_center_x, london_center_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

364 candidate neighborhood centers generated.


Map representing the areas created around London Center.

In [6]:
# create map using latitude and longitude values
map_london = folium.Map(location=london_center, zoom_start=13)

# add markers to map
for lat, lng in zip(latitudes, longitudes):    
    folium.Circle(
        [lat, lng],
        radius=300,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

In [7]:
def get_address_by_location(latitude, longitude, language="en"):
    """This function returns an address as raw from a location
    will repeat until success"""
    # build coordinates string to pass to reverse() function
    coordinates = f"{latitude}, {longitude}"
    # sleep for a second to respect Usage Policy
    time.sleep(1)
    try:
        return app.reverse(coordinates, language=language, zoom=16).raw
    except Exception as e:
        print(str(e))
        return get_address_by_location(latitude, longitude)

def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

In [8]:
column_names_address = ['Address', 'Borough', 'Neighborhood', 'Latitude', 'Longitude']
csv_london_grid = 'london_grid_address.csv'

if os.path.isfile(csv_london_grid):
    print('Load Data From CSV File.')
    addresses = pd.read_csv(csv_london_grid, sep=',', names=column_names_address, skiprows=1)
else:
    print('Retrieve Data from Geocoder')
    
    addresses = pd.DataFrame(columns=column_names_address)
    
    aux_address = []
    
    for lat, lng in zip(latitudes, longitudes):    
        res = get_address_by_location(lat, lng)
        aux_address.append((res, lat, lng))
        address = res['display_name']
        borough = res['address']['city_district'] if 'city_district' in res['address'] else ''
        neighborhood = res['address']['neighbourhood'] if 'neighbourhood' in res['address'] else res['address']['suburb'] if 'suburb' in res['address'] else ''
        addresses = addresses.append({'Address': address, 
                                      'Borough': borough, 
                                      'Neighborhood': neighborhood, 
                                      'Latitude':lat ,
                                      'Longitude':lng}, ignore_index=True)
        print(' .', end='')
    print(' done.')
        
    addresses.sort_values(['Borough','Neighborhood'],inplace=True)
    addresses.to_csv(csv_london_grid, index = False)


Load Data From CSV File.


In [9]:
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude']
csv_filename = 'london_neighborhoods.csv'

if os.path.isfile(csv_filename):
    print('Load Data From CSV File.')
    neighborhoods = pd.read_csv(csv_filename, sep=',', names=column_names, skiprows=1)
else:
    print('Retrieve Data from Wikipedia/Geocoder')
    # Create Dataset from wikipedia url.
    df = pd.read_html('https://en.wikipedia.org/wiki/List_of_areas_of_London')[1]
    # define the dataframe columns
    
    # instantiate the dataframe
    neighborhoods = pd.DataFrame(columns=column_names)

    # get location raw data
    for i,row in df.iterrows():
        index_special_note = len(row[1]) if str.find(row[1],'[') == -1 else str.find(row[1],'[')
        borobrough = row[1][0:index_special_note]
        neighborhood = row['Location']    
        latitude = np.nan
        longitude = np.nan      
        res = geocoder.arcgis(f"'{row['Location']}, London, United Kingdom'")
        if res:
            location = res.json        
            latitude = location['lat']
            longitude = location['lng']
        print(f"{borobrough} - {neighborhood} {latitude} {longitude} ")
        neighborhoods = neighborhoods.append({'Borough': borobrough, 'Neighborhood': neighborhood, 'Latitude':latitude ,'Longitude':longitude}, ignore_index=True)
    neighborhoods.to_csv(csv_filename, index = True)

neighborhoods.sort_values(['Borough','Neighborhood'],inplace=True)


Load Data From CSV File.


In [10]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 60 boroughs and 532 neighborhoods.


In [11]:
borough_column_names = ['Borough', 'Population', 'Household', 'Hectares', 'Density', 'GAP', 'Latitude', 'Longitude']    
    
csv_london_boroughs = 'london_boroughs.csv'
if os.path.isfile(csv_london_boroughs):
    print('Load Data From CSV File.')
    df_borough = pd.read_csv(csv_london_boroughs, sep=',', names=borough_column_names, skiprows=1)
else:
    print('Retrieve Data from Wikipedia/Geocoder')

    df_london_boroughs = pd.read_csv('london-borough-profiles.csv', sep=',', encoding='ISO-8859-1')[['Area_name', 
                                                                                       'GLA_Population_Estimate_2017',
                                                                                       'GLA_Household_Estimate_2017',
                                                                                       'Inland_Area_(Hectares)',
                                                                                       'Population_density_(per_hectare)_2017',
                                                                                       'Gross_Annual_Pay,_(2016)']]
    df_borough = pd.DataFrame(columns=borough_column_names)
    # get location raw data
    for i,row in df_london_boroughs.iterrows():
        print(row)
        if row[0] in ('Inner London', 'Outer London', 'London', 'England', 'United Kingdom'):
            continue
        
        borobrough = row[0]
        population = int(row[1])
        household = float(np.nan if row[2] == '.' else row[2])
        hectares = float(np.nan if row[3] == '.' else row[3].replace(',',''))
        density = float(np.nan if row[4] == '.' else row[4])
        gap = float(np.nan if row[5] == '.' else row[5])
        latitude = np.nan
        longitude = np.nan      
        res = geocoder.arcgis(f"'{row[0]}, London, United Kingdom'")
        if res:
            location = res.json        
            latitude = location['lat']
            longitude = location['lng']
        print(f"{borobrough} - {population} {household} {hectares} {density} {gap} {latitude} {longitude} ")
        df_borough = df_borough.append({'Borough': borobrough,
                                        'Population':population, 
                                        'Household':household, 
                                        'Hectares':hectares,
                                        'Density':density, 
                                        'GAP':gap,
                                        'Latitude':latitude,
                                        'Longitude':longitude}, ignore_index=True)
    df_borough.to_csv(csv_london_boroughs, index = True)
    
    



Load Data From CSV File.


In [12]:

# create map using latitude and longitude values
map_london = folium.Map(location=london_center, zoom_start=10)

# add markers to map
for lat, lng, borough, population in zip(df_borough['Latitude'], df_borough['Longitude'], df_borough['Borough'], df_borough['Population']):
    label = '{}, {}'.format(borough, population)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

#### Foursquare
Now that we have our location candidates, let's use Foursquare API to get info on venues that stocks a range of everyday items in each neighborhood in the category of Shop & Service.

After scrapping the foursquare categories we choose the next categories: 
'Bagel Shop', 'Supermarket', 'Food Stand', 'Sandwich Place', 'Grocery Store', 'Deli / Bodega', 'Market', 'Organic Grocery', 
'Food & Drink Shop', 'Convenience Store'.

See https://developer.foursquare.com/docs/resources/categories

In [13]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

# Category IDs corresponding to Italian restaurants were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

shop_and_service_category = '4d4b7105d754a06378d81259' # 'Root' category for all food-related venues

food_and_drink_shop = '4d4b7105d754a06378d81259'

specific_retails_categories = ['4bf58dd8d48988d179941735', '52f2ab2ebcbc57f1066b8b46', '56aa371be4b08b9a8d57350b', 
                               '4bf58dd8d48988d1c5941735', '4bf58dd8d48988d118951735', '4bf58dd8d48988d146941735', 
                               '50be8ee891d4fa8dcc7199a7', '4bf58dd8d48988d1f9941735', '4d954b0ea243a5684a65b473']



def belongs_to_food_retail(categories, specific_filter=None):
    #restaurant_words = ['food', 'grocery', 'market']
    food_retails_words = ['food', 'grocery', 'market']
    #restaurant = False
    food_and_drink_store = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in food_retails_words:
            if r in category_name:
                food_and_drink_store = True

        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            food_and_drink_store = True

    return food_and_drink_store, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Deutschland', '')
    address = address.replace(', Germany', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        CLIENT_ID, CLIENT_SECRET, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results0 = requests.get(url).json()
        if results0['meta']['code'] != 200:
            print('Error in {}'.format(name))
            continue
        
        results = results0["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Your credentails:
CLIENT_ID: T2YAX2CSK0IOORW1JKGZLVT2B20MIPTKB15AED0RJASVHZI2
CLIENT_SECRET:UPCIL51R1ZEZAIBU02RODEVFWK3FQ3G3RAFHB20W2TCX0DI2


In [14]:
def get_target_category_venues(target_category, specific_category, lats, lons):
    res_venues = {}
    location_venues = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, target_category, CLIENT_ID, CLIENT_SECRET, radius=350, limit=100)
        in_area_venue = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_food_retail, specific = belongs_to_food_retail(venue_categories, specific_filter=specific_category)
            if is_food_retail:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                venue = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_food_retail, x, y)
                if venue_distance<=300:
                    in_area_venue.append(venue)
                res_venues[venue_id] = venue
        location_venues.append(in_area_venue)
        print(' .', end='')
    print(' done.')
    return res_venues, location_venues

In [15]:
# Try to load from local file system in case we did this before
venues = {}
location_venues = []
loaded = False

try:
    with open('venues_350.pkl', 'rb') as f:
        venues = pickle.load(f)
    with open('location_venues_350.pkl', 'rb') as f:
        location_venues = pickle.load(f)
    print('Venues data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    venues, location_venues = get_target_category_venues(food_and_drink_shop, 
                                                         specific_retails_categories, 
                                                         addresses['Latitude'], 
                                                         addresses['Longitude'])
    
    # Let's persists this in local file system
    with open('venues_350.pkl', 'wb') as f:
        pickle.dump(venues, f)
    with open('location_venues_350.pkl', 'wb') as f:
        pickle.dump(location_venues, f)
        



Venues data loaded.


In [16]:
print('Total number of food retails:', len(venues))
print('Average number of food retail in neighborhood:', np.array([len(r) for r in location_venues]).mean())

Total number of food retails: 987
Average number of food retail in neighborhood: 2.4450549450549453


In [17]:
print('List of all shops')
print('-----------------------')
for r in list(venues.values())[:10]:
    print(r)
print('...')
print('Total:', len(venues))

List of all shops
-----------------------
('5220a0a111d2a0bf140e557f', 'Little Waitrose & Partners', 51.510019326158094, -0.0865131711466347, '41-46 King William St, London, Greater London, EC4R 9AN, United Kingdom', 234, True, -544119.0118779244, 5815256.205399979)
('4b0147c6f964a520fa4122e3', 'Tesco Metro', 51.510600375349426, -0.08498921063619491, '6 Eastcheap, Eastcheap, London, Greater London, EC3M 1AE, United Kingdom', 250, True, -544000.6234906035, 5815298.154794292)
('577e02e5498e861ea281f5b0', 'Little Waitrose & Partners', 51.50733438217989, -0.10574364011063153, '20 Stamford St, London, Greater London, SE1 9LJ, United Kingdom', 347, True, -545504.8680319489, 5815239.539530788)
('55acad2a498ee3c74050f2e0', "Sainsbury's Local", 51.51247670860394, -0.10390408006902385, '30 - 31 New Bridge Street, London, Greater London, EC4V 6DA, United Kingdom', 304, True, -545258.4745963605, 5815779.864679381)
('4ce783d4fe90a35dd4fb3c0e', 'Tesco Express', 51.51266202743872, -0.1044344902038574

In [18]:
venues

{'5220a0a111d2a0bf140e557f': ('5220a0a111d2a0bf140e557f',
  'Little Waitrose & Partners',
  51.510019326158094,
  -0.0865131711466347,
  '41-46 King William St, London, Greater London, EC4R 9AN, United Kingdom',
  234,
  True,
  -544119.0118779244,
  5815256.205399979),
 '4b0147c6f964a520fa4122e3': ('4b0147c6f964a520fa4122e3',
  'Tesco Metro',
  51.510600375349426,
  -0.08498921063619491,
  '6 Eastcheap, Eastcheap, London, Greater London, EC3M 1AE, United Kingdom',
  250,
  True,
  -544000.6234906035,
  5815298.154794292),
 '577e02e5498e861ea281f5b0': ('577e02e5498e861ea281f5b0',
  'Little Waitrose & Partners',
  51.50733438217989,
  -0.10574364011063153,
  '20 Stamford St, London, Greater London, SE1 9LJ, United Kingdom',
  347,
  True,
  -545504.8680319489,
  5815239.539530788),
 '55acad2a498ee3c74050f2e0': ('55acad2a498ee3c74050f2e0',
  "Sainsbury's Local",
  51.51247670860394,
  -0.10390408006902385,
  '30 - 31 New Bridge Street, London, Greater London, EC4V 6DA, United Kingdom',
 

In [19]:
map_lodon = folium.Map(location=london_center, zoom_start=13)
folium.Marker(london_center, popup='Trafalgar Square').add_to(map_lodon)
for res in venues.values():
    name = res[1]
    lat = res[2] 
    lon = res[3]
    address = res[4]
    
    #color = 'red' if is_italian else 'blue'
    color = 'blue'
    label = '{}, {}'.format(name, address)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lon], 
                        radius=3, 
                        color=color, 
                        fill=True, 
                        fill_color=color, 
                        popup=label,
                        fill_opacity=1).add_to(map_lodon)
map_lodon

## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of London that have low food retails density, using as source of location boroughs. 

In the first step we have obtained information about London demographics like neighbourhood, densitity, etc.

Second step in our analysis will be calculation and exploration of '**retails density**' across different areas of London. 

In third and final step we will focus on most promising areas and within those create **clusters of locations that meet some basic requirements** established in discussion with stakeholders: we will take into consideration locations with **no more than two food retails in radius of 250 meters**, and we want locations **without convinience store in radius of 400 meters**. We will present map of all such locations but also create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

## Analisys <a name="analisys"/>

Let's perform some basic explanatory data analysis and derive some additional info from our raw data. First let's count the **number of  in every area candidate**:

In [20]:
location_count = [len(res) for res in location_venues]

addresses['Food Retails'] = location_count

print('Average number of food stores in every area with radius=300m:', np.array(location_count).mean())



Average number of food stores in every area with radius=300m: 2.4450549450549453


Calculate to closest food market in each area.

In [21]:
closest_distances_to_food_retail = []

for area_lat, area_lon in zip(addresses['Latitude'],addresses['Longitude']):
    min_distance = 10000
    for res in venues.values():
        res_lat= res[2]
        res_lon = res[3]
        d = geopy.distance.distance((area_lat,area_lon),(res_lat, res_lon)).km * 1000
        if d<min_distance:
            min_distance = d
    closest_distances_to_food_retail.append(min_distance)

addresses['Distance to Food Retail'] = closest_distances_to_food_retail

In [22]:
print('Average distance to closest food retail from each area center:', addresses['Distance to Food Retail'].mean())

Average distance to closest food retail from each area center: 214.64441115718344


OK, so **on average a food retial can be found within ~200m** from every area center candidate. That's fairly close, so we need to filter our areas carefully!

Let's crete a map showing **heatmap / density of food suppliers** and try to extract some meaningfull info from that. Also, let's show **borders of London boroughs** on our map and a few circles indicating distance of 1km, 2km and 3km from **Trafalgar Square**.

In [28]:
london_boroughs_file = 'Others/london_boroughs.json'

with open(london_boroughs_file) as f:
    london_boroughs = json.load(f)
 

def boroughs_style(feature):
    return { 'color': 'black', 'fill': False, 'line_weight' : 0.5 }

In [24]:
food_retails_latlons = [[res[2], res[3]] for res in venues.values()]

In [25]:
map_london = folium.Map(location=london_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_london) #cartodbpositron cartodbdark_matter
HeatMap(food_retails_latlons).add_to(map_london)
folium.Marker(london_center).add_to(map_london)
folium.Circle(london_center, radius=1000, fill=False, color='white').add_to(map_london)
folium.Circle(london_center, radius=2000, fill=False, color='white').add_to(map_london)
folium.Circle(london_center, radius=3000, fill=False, color='white').add_to(map_london)
folium.GeoJson(london_boroughs, style_function=boroughs_style, name='geojson').add_to(map_london)
map_london

Looks like there is a low density of food retails closest to city center can be found **south, south-west and west from Trafalgar Square**. 

In [26]:
distances_to_city_center = []

for area_lat, area_lon in zip(addresses['Latitude'],addresses['Longitude']):
    res_lat= london_center[0]
    res_lon = london_center[1]
    distance = geopy.distance.distance((area_lat,area_lon),(res_lat, res_lon)).km * 1000
    distances_to_city_center.append(distance)

addresses['Distance to Trafalgar Square'] = distances_to_city_center

In [27]:
distance_meters = 150.
distance_from_city_center = 2000
targer_locations = addresses[addresses['Distance to Food Retail'] > distance_meters]
targer_locations = targer_locations[targer_locations['Distance to Trafalgar Square'] < distance_from_city_center]
targer_locations.sort_values(['Distance to Food Retail', 'Distance to Trafalgar Square'], ascending=[False, True],inplace=True)
targer_locations.head(15)

Unnamed: 0,Address,Borough,Neighborhood,Latitude,Longitude,Food Retails,Distance to Food Retail,Distance to Trafalgar Square
198,"Aldford Street, Mayfair, Westminster, London, ...",,Mayfair,51.508591,-0.154206,0,481.049143,1849.256537
159,"Groom Place, Belgravia, Westminster, London, G...",,Belgravia,51.499572,-0.151143,0,393.2587,1849.25651
123,"Lambeth Bridge, Westminster, Millbank, Westmin...",,Millbank,51.493859,-0.123054,0,361.85799,1538.709716
183,"Waterloo Bridge, St Clement Danes, Covent Gard...",,St Clement Danes,51.509042,-0.115121,0,347.971368,888.384673
184,"Blackfriars Underpass, Blackfriars, City of Lo...",,,51.510143,-0.106775,0,320.601943,1480.652435
181,"Waterloo Place, St. James's, Covent Garden, We...",,St. James's,51.506839,-0.131813,0,310.878591,296.123697
124,"Lambeth Walk, Lambeth, London Borough of Lambe...",London Borough of Lambeth,Lambeth,51.494961,-0.11471,1,284.516566,1648.765839
161,"Birdcage Walk, Westminster, Victoria, Westmins...",,Victoria,51.501778,-0.134455,1,264.849932,783.466658
125,"Geraldine Street, Elephant and Castle, London ...",London Borough of Southwark,Elephant and Castle,51.496061,-0.106366,1,246.829504,1941.850004
199,"Berkeley Street, St. James's, Mayfair, Westmin...",,St. James's,51.509695,-0.145861,1,246.82256,1290.758469


This concludes our analysis. We have created 15 addresses representing centers of zones containing locations with low number of food retails nearby, all zones being fairly close to city center (all less than 2km from Trafalgar Square. From this point forward, each area could be analyzed using demographic information to show potentianl clients in each area.

## Results and Discussion <a name="results"/>

Our analysis shows that although there is a great number of food retails in London (~1000 in our initial area of interest which was 12x12km around Trafalgar Square), there are pockets of low restaurant density fairly close to city center. Highest concentration of food retails was detected north and east from Trafalgar Sqaure.

The purpose of this analysis was to only provide info on areas close to London center but not crowded with existing food retails - it is entirely possible that there is a very good reason for small number of our target venues in any of those areas, reasons which would make them unsuitable for a new business regardless of lack of competition in the area. 

Recommended zones should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

## Conclusion <a name="conclusion" />

Purpose of this project was to identify Lonfon areas close to center with low number of food reatails, in order to aid stakeholders in narrowing down the search for optimal location for a new convinience store. By calculating similar business density distribution from Foursquare data we have first identified general boroughs that justify further analysis.

Final decission on optimal convinience store location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.

## References <a name="references"/>
[Capstone Project - The Battle of the Neighborhoods (Week 2)](https://cocl.us/coursera_capstone_notebook)

[1]https://data.london.gov.uk/ - London Demographics Information.

[2]https://developer.foursquare.com/docs/resources/categories