# Capstone Project - The Battle of the Neighborhoods

### Applied Data Science by IBM/Coursera

## Introduction

### Background - Optimal neighborhoods for new comers



London is the capital of UK, which attracts immigration from both domestic and international. In 2018/19, there were more than 340 thousands of immigration moving into London. New immigration family always find themselves struggling in finding the best spot for their family.

### Relocation Problem

In this project, we aim to solve for the optimal location for a family which is new to a city. Specifically, this report will be targeted to expats or immigrants interested in moving into **London**, UK.

Since every family has their unique characteristic and needs, some basic assumptions on family composition and their preference are needed to be made. Out of all basic needs, education usually is the most critical for home location, since proximity is the determining factor for school admission. Neighborhoods with sufficient education institutes for children will be ranked by its **proximity to the city center** as it will be more convenient for the parents to commute to work.

Once gathering related data from various sources, a data science model will be built to specify few most suitable neighbourhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by the new family.

## Data

Describe the data that you will be using to solve the problem or execute your idea. 
Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. 
You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.

Based on definition of our problem, factors that will influence our decission are:
* number of schools in the neighborhood (any type of schools)
* number of schools for young kids in the neighborhood 
* distance of neighborhood from city center

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **geocoder package in geopy**
* number of schools, clinics, recreational facilities and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of London center will be obtained using **geocoder classes in geopy** of the Trafalgar Square, a well known landmark in central London.

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np 
import json 
from geopy.geocoders import Nominatim 
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium 

In [2]:
#get_coordinates function replace with foursquare

address = 'Trafalgar Square, London'

geolocator = Nominatim(user_agent="tn_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Trafalgar Square are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Trafalgar Square are 51.508037, -0.12804941070724718.


In [3]:
#!pip install shapely
#!pip install pyproj
import pyproj
import shapely #.geometry
import math

In [4]:
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Trafalgar Square longitude={}, latitude={}'.format(longitude, latitude))
x, y = lonlat_to_xy(longitude,latitude)
print('Trafalgar Square UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Trafalgar Square longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Trafalgar Square longitude=-0.12804941070724718, latitude=51.508037
Trafalgar Square UTM X=-547023.4456756007, Y=5815641.585630685
Trafalgar Square longitude=-0.1280494107072397, latitude=51.508037


  after removing the cwd from sys.path.
  # Remove the CWD from sys.path while we load stuff.


In [5]:
london_center_x, london_center_y = lonlat_to_xy(longitude, latitude) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = london_center_x - 6000
x_step = 600
y_min = london_center_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(london_center_x, london_center_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

  after removing the cwd from sys.path.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we 

  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.

  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.

  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.

  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.

  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.

  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.

  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.

  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.

  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.

  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.

  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.


364 candidate neighborhood centers generated.


  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.


In [6]:
map_london = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.Marker([latitude, longitude], popup='Trafalgar Square').add_to(map_london)
for lat, lon in zip(latitudes, longitudes):
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_london)
map_london

Reverse geocoding check

In [7]:
def get_address(agent, lat, long):
    try:
        geolocator = Nominatim(user_agent=agent)
        location = geolocator.reverse([lat, long])
        address = location[0]
        return address
    except:
        return None

addr = get_address('user_agent',latitude, longitude)
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(latitude, longitude, addr))

Reverse geocoding check
-----------------------
Address of [51.508037, -0.12804941070724718] is: Trafalgar Square, St. James's, Covent Garden, City of Westminster, London, Greater London, England, WC2, United Kingdom


In [8]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address('user_agent', lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    address = address.replace(', Greater London, England', '') # We don't need country part of address
    address = address.replace(', United Kingdom', '') # We don't need country part of address
    addresses.append(address)
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [9]:
df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head(5)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,"Franconia Road, Abbeville Village, Clapham Par...",51.45512,-0.136259,-548823.445676,5809926.0,5992.495307
1,"Mandrell Road, Clapham Park, London Borough of...",51.456222,-0.127923,-548223.445676,5809926.0,5840.3767
2,"Brixton Hill Court, Hayter Road, Clapham Park,...",51.457323,-0.119587,-547623.445676,5809926.0,5747.173218
3,"Saint Vincent's Community Centre, Probert Road...",51.458423,-0.11125,-547023.445676,5809926.0,5715.767665
4,"Alice Walker Close, Herne Hill, London Borough...",51.459523,-0.102912,-546423.445676,5809926.0,5747.173218


In [10]:
df_locations.to_pickle('./locations.pkl')    

### Foursquare
Now that we have our location candidates, let's use Foursquare API to get info on restaurants in each neighborhood.

We're interested in venues in 'school' category, but only those that are schools for kids. 
business school or driving school etc. are not institutes for children so we don't care about those. 
So we will include in out list only venues that have 'schools' in category name, 
We have define a subset of schools for young children, including pre-school, nursery, primary and junior school. We'll make sure to detect and include all the subcategories of specific 'children school' category, as we need info on school for young children in the neighborhood.

In [11]:
CLIENT_ID = 'ID' # your Foursquare ID
CLIENT_SECRET = 'SECRET' # your Foursquare Secret
ACCESS_TOKEN = 'TOKEN' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 30

In [12]:
# Category IDs from Foursquare web site

school_category = '4bf58dd8d48988d13b941735' # category for all school venues

elementary_sch ='4f4533804b9074f6e4fb0105'
nursery_sch = '4f4533814b9074f6e4fb0107'
pre_sch = '52e81612bcbc57f1066b7a45'

children_sch = [elementary_sch, nursery_sch, pre_sch]
                                 
def is_school(categories, specific_filter=None):
    school_words = ['school', 'primary', 'secondary', 'nursery', 'academy', 'junior school']
    school = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in school_words:
            if r in category_name:
                school = True
        if 'driving school' in category_name:
            school = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            school = True
    return school, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', United Kingdom', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [13]:
import pickle

def get_schools(lats, lons):
    schools = {}
    children_schools = {}
    location_schools = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any schools (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, school_category, CLIENT_ID, CLIENT_SECRET, radius=350, limit=100)
        area_schools = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_sch, is_children = is_school(venue_categories, specific_filter=children_sch)
            if is_sch:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                school = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_children, x, y)
                if venue_distance<=300:
                    area_schools.append(school)
                schools[venue_id] = school
                if is_children:
                    children_schools[venue_id] = school
        location_schools.append(area_schools)
        print(' .', end='')
    print(' done.')
    return schools, children_schools, location_schools

# Try to load from local file system in case we did this before
schools = {}
children_schools = {}
location_schools = []
loaded = False
try:
    with open('schools_350.pkl', 'rb') as f:
        schools = pickle.load(f)
    with open('private_schools_350.pkl', 'rb') as f:
        children_schools = pickle.load(f)
    with open('location_restaurants_350.pkl', 'rb') as f:
        location_schools = pickle.load(f)
    print('School data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    schools, children_schools, location_schools = get_schools(latitudes, longitudes)
    
    # Let's persists this in local file system
    with open('schools_350.pkl', 'wb') as f:
        pickle.dump(schools, f)
    with open('children_schools_350.pkl', 'wb') as f:
        pickle.dump(children_schools, f)
    with open('location_schools_350.pkl', 'wb') as f:
        pickle.dump(location_schools, f)
        

School data loaded.


In [14]:
import numpy as np

print('Total number of schools:', len(schools))
print('Total number of Children schools:', len(children_schools))
print('Percentage of Children schools: {:.2f}%'.format(len(children_schools) / len(schools) * 100))
print('Average number of schools in neighborhood:', np.array([len(r) for r in location_schools]).mean())

Total number of schools: 122
Total number of Children schools: 25
Percentage of Children schools: 20.49%
Average number of schools in neighborhood: 7.107142857142857


In [15]:
print('List of all schools')
print('-----------------------')
for r in list(schools.values())[:10]:
    print(r)
print('...')
print('Total:', len(schools))

List of all schools
-----------------------
('4e6f7ae4b9933190ed891abe', 'Evelyn Grace Academy', 51.460843343701484, -0.1041640126890318, '255 Shakespeare Road, London, Greater London, SE24 0QN', 170, False, -546478.937760159, 5810089.639489229)
('5064613ce4b0768b7de1635d', 'Jessop Primary School', 51.46052672633242, -0.10053268350047895, 'Lowden Road', 199, True, -546236.1450621937, 5810001.9136700155)
('4ccf0f9772106dcbf9fbac99', 'Clapham Manor Primary School', 51.4645516601882, -0.1374341723644726, 'Belmont Road, London, Greater London', 138, True, -548684.332951342, 5810983.020392459)
('5757f9b1498ebbab5f52a282', 'Empress Music Products', 51.4677538, -0.1250996999999643, '106 Grantham Road, London, Greater London, SW9 9EB', 231, False, -547760.0452837036, 5811156.420311325)
('57ea2b6c498ef3fc553cd9fa', 'Flux Jewellery School', 51.47354, -0.08073, 'Unit 2/F, Vanguard Court rear of 36-38, Peckham Rd, London, Greater London, SE5 8QT', 189, False, -544569.3791373353, 5811149.302467831)

In [16]:
print('List of Children School')
print('---------------------------')
for r in list(children_schools.values())[:10]:
    print(r)
print('...')
print('Total:', len(children_schools))

List of Children School
---------------------------
('5064613ce4b0768b7de1635d', 'Jessop Primary School', 51.46052672633242, -0.10053268350047895, 'Lowden Road', 199, True, -546236.1450621937, 5810001.9136700155)
('4ccf0f9772106dcbf9fbac99', 'Clapham Manor Primary School', 51.4645516601882, -0.1374341723644726, 'Belmont Road, London, Greater London', 138, True, -548684.332951342, 5810983.020392459)
('516ebed7e4b05076914ed94f', 'Shaftesbury Park School', 51.46874272033401, -0.1576608906174096, 'Ashbury Road, London, Greater London, SW11 5UW', 264, True, -549979.6148732882, 5811740.225546181)
('4baa453af964a5209a593ae3', 'Newton Preparatory School', 51.47659194774821, -0.14521479606628415, '149 Battersea Park Rd, Battersea, Greater London, SW8 4BX', 206, True, -548939.1163106125, 5812424.197605844)
('4bc5fb62bf29c9b64807f92a', 'London Early Years Foundation (LEYF nurseries)', 51.4930447248446, -0.12901316992559697, '121 Marsham St, Millbank, Greater London, SW1P 4LX', 197, True, -547439.

In [17]:
map_london = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.Marker([latitude, longitude], popup='Trafalgar Square').add_to(map_london)
for sch in schools.values():
    lat = sch[2]; lon = sch[3]
    is_children = sch[6]
    color = 'red' if is_children else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_london)
map_london

## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of London that have high density of schools, particularly those are high with school for young kids. We will limit our analysis to area ~6km around city center.

In first step we have collected the required **data: location and type (category) of every school within 6km from London city center** (Trafalgar Square). We have also built a list of **identified schools for young children** from the Foursquare categorization.

Second step in our analysis will be calculation and exploration of '**density of schools**' across different areas of London - we will use **heatmaps** to identify a few promising areas close to center with many schools for selection *and* especailly schools for young children and focus our attention on those areas.

In third and final step we will focus on most promising areas and within those create **clusters of locations that meet some basic requirements** established in discussion with stakeholders: we will take into consideration locations with **a school for children in radius of 500 meters**, and we want locations **more than three schools in radius of 1000 meters**. We will present map of all such locations but also create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.