<h1>Table of content</h1>
<ul>
    <li><a href="#ref1">Introduction / Problem definition </a></li>
    <li><a href="#ref2">Data </a></li>
    <li><a href="#ref3">Methodology </a></li>
    <li><a href="#ref4">Grid Search</a></li>
</ul>

Lets first start with importing all the necesary libraries to run the analysis

In [1]:
import pandas as pd
import numpy as np 
import folium
import json
import math
from shapely.geometry import shape, Point
import requests
from pandas.io.json import json_normalize

# Data preparation
## GeoJson data:
The data obtained from the British government provides all the boundaries of all British Local Authority Areas. Since the analysis is focused in London a modification in the file is needed dropping all the areas that are not within London metropolitan Area. This can be easyly recognized because they are between E09000001 and E09000033. In future, this new file will be used to exclude any venues that are outside London and to display the chlorophet map with only the boroughs in the analysis

In [2]:
# Create a list with only the codes from the London Metropolitan Area
LondonBoroughsCodes = list(range(9000001,9000034))
LondonBoroughs = []
for item in LondonBoroughsCodes:
    LondonBoroughs.append('E0{}'.format(item))
 


In [4]:
#Delete features for non Local Authorities outside Lonodon Area. Open the file, loop throw all the features an keep only the ones belonging to London Authority area
with open('Local_Authority_Districts__December_2017__Boundaries_in_the_UK__WGS84_.geojson', 'r') as f_in:
    data2 = json.load(f_in)
    features_filtered=[]
    for i in range(len(data2['features'])):
        if data2['features'][i]['properties']['lad17cd']  in LondonBoroughs:
            features_filtered.append(data2['features'][i])
            
    data2['features'] = features_filtered  
    with open('London_Boroughs_Boundaries.geojson', 'w') as f_out:
        json.dump(data2, f_out)
        print('File London_Boroughs_Boundaries.geojson created')



File London_Boroughs_Boundaries.geojson created




## Foursquare data:
The objective is to obtain all the venues of the selected categories that are inside the London metropolitan Area. The challenge in this is the limits in number of requests and number of venues in the results that Foursquare set for free accounts. A Foursquare request, centred in the centre of London with a radius large enough to cover all the metropolitan area and with all the categories we are interested could be used, however only 50 venues would be retrieved due to Foursquare limits. To ensure that all the venues of each category are retrieved from the Foursquare database a sweep of the London metropolitan area with circles of a radius small enough to not saturate the request with over 50 results is used. Going further a sweep of the area looking for venues of only one particular category is done. In that way the number of sweeps to the London metropolitan area will match exactly the number of distinct types of venues that the target audience of the analysis are interested in.

### Creation of the mesh of points

Foursquare only allows to get venues by giving it a centre and a radius. In this way we need to create a mesh of points (Longitude, Latitude) that will be used as centres in the Foursquare request and that will cover the entire area of all London neighbourhoods. The layout of circles that will cover the area with less overlapping zones is similar to the one in the next picture.

<img src=https://upload.wikimedia.org/wikipedia/commons/c/c6/Circle_covering_-_Hexagonal_pattern.png>

The distance between the centerpoints of each circle in function of the radious are defined by the next expressions:

<img src=mesh_definition.jpg>

Once the distance between points is known we only need to define where to place the mesh. The answer to that lies with the GeoJson data from the previous step: by checking what are the minimum and maximum values of the coordinates longitude and latitude of the geographical points defining the boundary of all the neighbourhoods of London we obtain the 4 vertex of the rectangle that set the boundaries of the mesh



In [6]:
#Creation a df with all the points in the london boroughs boundaries to obtain min and max Long and Lat. 
boroughs_boundary = []
with open('London_Boroughs_Boundaries.geojson') as f:
    data = json.load(f)
    for element in data['features']:
        if element['properties']['lad17cd']  in LondonBoroughs:
            for coordinategroup in element['geometry']['coordinates']:
                for coordinate in coordinategroup:
                    boroughs_boundary.append([[element['properties']['lad17cd'], coordinate[0], coordinate[1]]])

boroughs_boundary_df = pd.DataFrame([item for element in boroughs_boundary for item in element])
boroughs_boundary_df.columns = ['Borough_code', 'Longitude', 'Latitude']

min_long=boroughs_boundary_df['Longitude'].min()
max_long=boroughs_boundary_df['Longitude'].max()
min_lat=boroughs_boundary_df['Latitude'].min()
max_lat=boroughs_boundary_df['Latitude'].max()

print('The number of geographycal points definining the boundaries of London is {}'.format(boroughs_boundary_df.shape[0]))
print('The metropolitan area of london is between {}  and {} degrees Latitude and {} and {} degreees Longitude'.format(min_lat, max_lat, min_long, max_long))
#create a new geometry by merging all London boroughs into one area



The number of geographycal points definining the boundaries of London is 1272
The metropolitan area of london is between 51.287296026000035  and 51.69236207100005 degrees Latitude and -0.511281856999972 and 0.332213805000038 degreees Longitude


The conversion from km to coordinates Lat, Long is:

Latitude: 1 deg = 110.574 km

Longitude: 1 deg = 111.320*cos(latitude) km

In [7]:
#The next piece of code will generate a mesh of geo points that will cover the whole are of London
radius = 2.5 #radio en km
delta_longitude = math.sqrt(3)*radius/(111.32*math.cos(math.radians(boroughs_boundary_df['Latitude'].mean())))
delta_latitude = 3*radius/110.574

mesh_points = []
#To ensure we cover the entire area we start 1 step before the min long and lat and we go another step after the maximuns
for i in range(-1, int((max_long - min_long)/delta_longitude)+2):
    for j in range(-1, int((max_lat - min_lat)/delta_latitude)+2):
        mesh_points.append([min_lat + j*delta_latitude, min_long + i*delta_longitude])
        mesh_points.append([min_lat + delta_latitude/2 +j*delta_latitude, min_long +delta_longitude/2 + i*delta_longitude])
print('Mesh for London area created')
London_lat=boroughs_boundary_df['Latitude'].mean()
London_lon=boroughs_boundary_df['Longitude'].mean()


Mesh for London area created


In [8]:
#The next  piece of code draw the mesh created in a map with the London area identified
map_london = folium.Map(location=[London_lat, London_lon], zoom_start=10, tiles='openstreetmap')
map_london.choropleth(
    geo_data='London_Boroughs_Boundaries.geojson',
    
    fill_color='orange', 
    fill_opacity=0.4, 
    line_opacity=0.6,
    legend_name='London Boroughs'
)

for point in mesh_points:
   folium.Circle(point, radius= radius*1000, color='green', fill_color='green', fill_opacity=0.3).add_to(map_london)
map_london

For each point in the mesh we are going to check if the vertex of the its hexagon will be inside any london area:

As we can see we cover the whole area of London, however there are a number of circles that are entirely out of any London area.  The next step is to remove them from the mesh. In order to do that a check in 6 points of the its circumference is used to discard any circle with all the 6 points out of London. 

In [9]:
#Definition of a function that will check if any given geopoint will be inside a given GeoJson file with the boundaries
def check_london_area(geopoint, geojson):
    with open(geojson) as f:
        js = json.load(f)
        point = Point(geopoint[1],geopoint[0])
        # check each polygon to see if it contains the point
        result=False
        for feature in js['features']:
            polygon = shape(feature['geometry'])
            if polygon.contains(point):
                return True
                break
    return result    
mesh_points_within_london = []
#Loop throw all the poinst in the mesh and check for each of them if its hexagon sits in any area of London
for point in mesh_points:
    point_a = [point[0] + radius/110.574, point[1]]
    point_b = [point[0] + radius/(2*110.574), point[1] + math.sqrt(3)*radius/(2*111.32*math.cos(math.radians(boroughs_boundary_df['Latitude'].mean())))]
    point_c = [point[0] - radius/(2*110.574), point[1] + math.sqrt(3)*radius/(2*111.32*math.cos(math.radians(boroughs_boundary_df['Latitude'].mean())))]
    point_d = [point[0] - radius/110.574, point[1]]
    point_e = [point[0] + radius/(2*110.574), point[1] - math.sqrt(3)*radius/(2*111.32*math.cos(math.radians(boroughs_boundary_df['Latitude'].mean())))]
    point_f = [point[0] - radius/(2*110.574), point[1] - math.sqrt(3)*radius/(2*111.32*math.cos(math.radians(boroughs_boundary_df['Latitude'].mean())))]
    if check_london_area(point_a, 'London_Boroughs_Boundaries.geojson'):
       mesh_points_within_london.append(point)
    elif check_london_area(point_b, 'London_Boroughs_Boundaries.geojson'):
       mesh_points_within_london.append(point)
    elif check_london_area(point_c, 'London_Boroughs_Boundaries.geojson'):
       mesh_points_within_london.append(point)
    elif check_london_area(point_d, 'London_Boroughs_Boundaries.geojson'):
       mesh_points_within_london.append(point) 
    elif check_london_area(point_e, 'London_Boroughs_Boundaries.geojson'):
       mesh_points_within_london.append(point) 
    elif check_london_area(point_f, 'London_Boroughs_Boundaries.geojson'):
       mesh_points_within_london.append(point)       
    folium.CircleMarker(point_a, radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=0.7, parse_html=False).add_to(map_london)
    folium.CircleMarker(point_b, radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=0.7, parse_html=False).add_to(map_london)
    folium.CircleMarker(point_c, radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=0.7, parse_html=False).add_to(map_london)
    folium.CircleMarker(point_d, radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=0.7, parse_html=False).add_to(map_london)
    folium.CircleMarker(point_e, radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=0.7, parse_html=False).add_to(map_london)
    folium.CircleMarker(point_f, radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=0.7, parse_html=False).add_to(map_london)
map_london


In [13]:
print('From an original mesh of {} points, only {} fall into London Area and will be considered.'.format(len(mesh_points),len(mesh_points_within_london)))

From an original mesh of 256 points, only 130 fall into London Area and will be considered.


In [11]:
#The next piece of code will draw the map with the final mesh used:
map_london_filtered_mesh = folium.Map(location=[London_lat, London_lon], zoom_start=10, tiles='openstreetmap')
map_london_filtered_mesh.choropleth(
    geo_data='London_Boroughs_Boundaries.geojson',
    
    fill_color='orange', 
    fill_opacity=0.4, 
    line_opacity=0.6,
    legend_name='London Boroughs'
)

for point in mesh_points_within_london:
   folium.Circle(point, radius= radius*1000, color='green', fill_color='green', fill_opacity=0.3).add_to(map_london_filtered_mesh)
map_london_filtered_mesh

### Foursquare request

In total 15 different venues categories have been selected for the analysis. They all fall into the main groups that are considered the lead for the project that can be see next including the Foursquare venue category code. This piece of information has been obtained from the Foursquare documentation (*https://developer.foursquare.com/docs/build-with-foursquare/categories/*)


In [14]:
#definition of the Foursquare categories number in a dictionary
categories_group = {
    'Health': {
        'Hospital': '4bf58dd8d48988d196941735',
        'Doctor': '4bf58dd8d48988d177941735',
        'Pharmacy': '4bf58dd8d48988d10f951735'},
    'Transport': {
        'Trains_Platforms': '4f4531504b9074f6e4fb0102',
        'Underground': '4bf58dd8d48988d1fd931735',
        'Light_rail': '4bf58dd8d48988d1fc931735'},
    'Well-Being': {
        'Park': '4bf58dd8d48988d163941735',
        'Pool': '4bf58dd8d48988d15e941735',
        'Gym': '4bf58dd8d48988d175941735'},
    'Dailyneeds': {
        'Fruit_And_Veg_Shop': '52f2ab2ebcbc57f1066b8b1c',
        'Supermarket': '52f2ab2ebcbc57f1066b8b46',
        'Shopping_Mall': '4bf58dd8d48988d1fd941735'},
    'Education':{
        'Elementary_School': '4f4533804b9074f6e4fb0105',
        'Middle_School': '4f4533814b9074f6e4fb0106',
        'Preschool': '52e81612bcbc57f1066b7a45'}

    }
categories_group_1={
    'Health': {
        'Hospital': '4bf58dd8d48988d196941735',
        'Doctor': '4bf58dd8d48988d177941735',
        'Pharmacy': '4bf58dd8d48988d10f951735'},
    'Transport': {
        'Trains_Platforms': '4f4531504b9074f6e4fb0102',
        'Underground': '4bf58dd8d48988d1fd931735',
        'Light_rail': '4bf58dd8d48988d1fc931735'}
    }
categories_group_2={
    'Well-Being': {
        'Park': '4bf58dd8d48988d163941735',
        'Pool': '4bf58dd8d48988d15e941735',
        'Gym': '4bf58dd8d48988d175941735'},
    'Dailyneeds': {
        'Fruit_And_Veg_Shop': '52f2ab2ebcbc57f1066b8b1c',
        'Supermarket': '52f2ab2ebcbc57f1066b8b46',
        'Shopping_Mall': '4bf58dd8d48988d1fd941735'},
    'Education':{
        'Elementary_School': '4f4533804b9074f6e4fb0105',
        'Middle_School': '4f4533814b9074f6e4fb0106',
        'Preschool': '52e81612bcbc57f1066b7a45'}

    }


In [None]:
CLIENT_ID = '3CL45KZFHOK53DH0KIK33CC2YQWHHYSYUFFVCZNMD43OEXL1' # your Foursquare ID
CLIENT_SECRET = 'FSBQZABH0LK45QVXZ1Y0ZUQU421M5SCU5XYMX1ZSIMB54LEC' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 500
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

def getVenues(mesh_points, group, category_name, category_code, radius=2000):
    
    venues_list=[]
    for mesh_point in mesh_points:
                    
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(    
            CLIENT_ID, 
            CLIENT_SECRET, 
            mesh_point[0], 
            mesh_point[1], 
            VERSION,
            category_code,
            radius, 
            LIMIT)
        
        # make the GET request
        json_results = requests.get(url).json() #["response"]['groups'][0]['items']
        # return only relevant information for each nearby venue
        results = json_results['response']['venues']
        
        venues_list.append([(
            v['name'], 
            v['location']['lat'], 
            v['location']['lng'],  
            group,
            category_name) for v in results])

    
        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        if nearby_venues.shape[0]>0:
            nearby_venues.columns = ['Venue', 
                        'Latitude', 
                        'Longitude',
                        'Group', 
                        'Category']
                        
            nearby_venues.drop_duplicates(inplace=True)
        nearby_venues.to_csv('{}.csv'.format(category_name))


In [None]:

# load GeoJSON file containing sectors
def find_borough(names, lats, longs, geojson):
    with open(geojson) as f:
        js = json.load(f)
    venues_borough = []
    for name, lat, long, in zip(names, lats, longs):
        # construct point based on lon/lat returned by geocoder
        point = Point(long, lat)
        # check each polygon to see if it contains the point
        for feature in js['features']:
            polygon = shape(feature['geometry'])
            if polygon.contains(point):
                venues_borough.append([[name, lat, long, feature['properties']['lad17nm']]])
    venues_borough_df= pd.DataFrame([item for venue_borough in venues_borough for item in venue_borough])
    venues_borough_df.columns = ['Name', 'Latitude', 'Longitude', 'Borough']
    return venues_borough_df

In [None]:
trialmesh = mesh_points_within_london[:2] 
print(trialmesh)

In [None]:
for group_name, categories in categories_group_1.items():
    print('Obtaining venues for group {}'.format(group_name))
    for name, code  in categories.items():
        print ('Using code:{} to generate {}.csv'.format(code,name))
        print ('...')
        getVenues(mesh_points_within_london, group_name, name, code, radius*1000 )


In [None]:
for group_name, categories in categories_group_2.items():
    print('Obtaining venues for group {}'.format(group_name))
    for name, code  in categories.items():
        print ('Using code:{} to generate {}.csv'.format(code,name))
        print ('...')
        getVenues(mesh_points_within_london, group_name, name, code, radius*1000 )


In [None]:
trains_platforms_df = pd.read_csv('Trains_Platforms.csv')
underground_df = pd.read_csv('Underground.csv')
Elementary_School_df = pd.read_csv('Elementary_School.csv')
Supermarket_df = pd.read_csv('Supermarket.csv')



In [None]:
for lat, long in zip(Elementary_School_df['Latitude'],Elementary_School_df['Longitude']):
    folium.CircleMarker([lat,long], radius=2, color='yellow', fill=True, fill_color='yellow', fill_opacity=0.7, parse_html=False).add_to(map_london)
for lat, long in zip(Supermarket_df['Latitude'],Supermarket_df['Longitude']):
    folium.CircleMarker([lat,long], radius=2, color='orange', fill=True, fill_color='orange', fill_opacity=0.7, parse_html=False).add_to(map_london)

map_london