## Problem Description

Given the advent of shared workspaces, I believe it is a good idea to research the perfect location of setting up a shared workspace in Philadelphia. Given that Philadelphia is home to a lot on educational institutions and students are always coming up with great ideas that can be the next big thing. One of the examples is GoPuff which recently received a major investment from Softbank.

This analysis can be used by either people who are looking to jump into the investement for a shared workspace or pre established companies like WeWork who are looking for expansion into cities.


To start, Firstly, A perfect shared workspace would be a place which is near the educational institutions allowing student contributors to come and go easily and even at odd hours, given that the startups and small team companies have a varied working schedule for hours.

Secondly the workspace should be adequetly be accesible by public transport systems, so bus or train stations will also be measured in.

Another factor to measure is the availability of restaurants and other food joints and also making sure that they are open at the odd hours in case the companies want to work in late and order food.

The plan is to start with these basic parameters and then see if we can measure in other factors to enhance the search for a perfect shared workspace in Philadelphia. 

Let the search begin!

## Data

The data we will be using will be Foursquare location data to track proximity to institutions, public transport hotspots and restaurants / eateries etc. for the city of Philadelphia. Also, We will be making use of the following packages that are imported in the cell below. These are basically used for getting location data, address information, clustering library and matplotlib and folium for plotting modules and map rendering respectively.

In [5]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

In [6]:
import pandas as pd
import numpy as np
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library



Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [7]:
pip install pgeocode

Note: you may need to restart the kernel to use updated packages.


## Geolocator to get location Coordinates for Philadelphia

In [8]:

address = 'Philadelphia, US'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Philadelphia are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Philadelphia are 39.9527237, -75.1635262.


## Pgeocode to get the latitudes and longitudes for various neighborhoods for Philadelphia and also ease of calculating distances between lat,long duos

In [121]:
import pgeocode
nomi = pgeocode.Nominatim('us')
zips = ["19102","19103","19104","19106","19107","19109","19111","19112","19114","19115","19116","19118","19119","19120","19121","19122","19123","19124","19125","19126","19127","19128","19129","19130","19131","19132","19133","19134","19135","19136","19137","19138","19139","19140","19141","19142","19143","19144","19145","19146","19147","19148","19149","19150","19151","19152","19153","19154"]
phillyData = nomi.query_postal_code(zips)
phillyData

Unnamed: 0,postal_code,country code,place_name,state_name,state_code,county_name,county_code,community_name,community_code,latitude,longitude,accuracy
0,19102,US,Philadelphia,Pennsylvania,PA,Philadelphia,101.0,,,39.9489,-75.1661,4.0
1,19103,US,Philadelphia,Pennsylvania,PA,Philadelphia,101.0,,,39.9513,-75.1741,4.0
2,19104,US,Philadelphia,Pennsylvania,PA,Philadelphia,101.0,,,39.9597,-75.2024,4.0
3,19106,US,Philadelphia,Pennsylvania,PA,Philadelphia,101.0,,,39.9474,-75.1473,4.0
4,19107,US,Philadelphia,Pennsylvania,PA,Philadelphia,101.0,,,39.9487,-75.1593,4.0
5,19109,US,Philadelphia,Pennsylvania,PA,Philadelphia,101.0,,,39.9496,-75.1637,4.0
6,19111,US,Philadelphia,Pennsylvania,PA,Philadelphia,101.0,,,40.0596,-75.0818,4.0
7,19112,US,Philadelphia,Pennsylvania,PA,Philadelphia,101.0,,,39.8893,-75.1782,4.0
8,19114,US,Philadelphia,Pennsylvania,PA,Philadelphia,101.0,,,40.0634,-74.999,4.0
9,19115,US,Philadelphia,Pennsylvania,PA,Philadelphia,101.0,,,40.0903,-75.041,4.0


## Reverse locator to get Neighborhood names, since Pgeocode is missing those

In [122]:
locator = Nominatim(user_agent="myGeocoder")
coordinates = "40.0568,-75.1379"
location = locator.reverse(coordinates)
location.raw

{'place_id': 265010142,
 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
 'osm_type': 'way',
 'osm_id': 12204094,
 'lat': '40.056403857142854',
 'lon': '-75.13793804081634',
 'display_name': '1255, 68th Avenue, Pittville, Philadelphia, Philadelphia County, Pennsylvania, 19126, United States of America',
 'address': {'house_number': '1255',
  'road': '68th Avenue',
  'neighbourhood': 'Pittville',
  'city': 'Philadelphia',
  'county': 'Philadelphia County',
  'state': 'Pennsylvania',
  'postcode': '19126',
  'country': 'United States of America',
  'country_code': 'us'},
 'boundingbox': ['40.056353857143',
  '40.056453857143',
  '-75.137988040816',
  '-75.137888040816']}

## Data Prep and Cleaning

In [123]:
#Appending Neighbourhood/Suburb Names

for index, row in phillyData.iterrows():
    lat = row['latitude']
    lon = row['longitude']
    coords = str(lat)+','+str(lon)
    location = locator.reverse(coords)
#     print("\n")
#     print(location.raw)
    try:
        neighbourhood = location.raw['address']['neighbourhood']
    except Exception as e:
        print("Neighbourhood doesn't exist, checking for Suburb")
        try:
            neighbourhood = location.raw['address']['suburb']
        except Exception as e:
            print("No Neighbourhood, No Suburb")
            neighbourhood = ''
    phillyData.at[index,'neighbourhood'] = neighbourhood
    
phillyData

Neighbourhood doesn't exist, checking for Suburb
Neighbourhood doesn't exist, checking for Suburb
No Neighbourhood, No Suburb
Neighbourhood doesn't exist, checking for Suburb


Unnamed: 0,postal_code,country code,place_name,state_name,state_code,county_name,county_code,community_name,community_code,latitude,longitude,accuracy,neighbourhood
0,19102,US,Philadelphia,Pennsylvania,PA,Philadelphia,101.0,,,39.9489,-75.1661,4.0,Rittenhouse Square
1,19103,US,Philadelphia,Pennsylvania,PA,Philadelphia,101.0,,,39.9513,-75.1741,4.0,Rittenhouse Square
2,19104,US,Philadelphia,Pennsylvania,PA,Philadelphia,101.0,,,39.9597,-75.2024,4.0,Powelton Village
3,19106,US,Philadelphia,Pennsylvania,PA,Philadelphia,101.0,,,39.9474,-75.1473,4.0,Society Hill
4,19107,US,Philadelphia,Pennsylvania,PA,Philadelphia,101.0,,,39.9487,-75.1593,4.0,Rittenhouse Square
5,19109,US,Philadelphia,Pennsylvania,PA,Philadelphia,101.0,,,39.9496,-75.1637,4.0,Rittenhouse Square
6,19111,US,Philadelphia,Pennsylvania,PA,Philadelphia,101.0,,,40.0596,-75.0818,4.0,Burholme
7,19112,US,Philadelphia,Pennsylvania,PA,Philadelphia,101.0,,,39.8893,-75.1782,4.0,South Philadelphia
8,19114,US,Philadelphia,Pennsylvania,PA,Philadelphia,101.0,,,40.0634,-74.999,4.0,Academy Garden
9,19115,US,Philadelphia,Pennsylvania,PA,Philadelphia,101.0,,,40.0903,-75.041,4.0,Pauls Run


In [124]:
#Dropping non essential columns 

phillyData = phillyData[['postal_code','county_name','latitude', 'longitude', 'neighbourhood']]

#Dropping empty neighbourhood rows

phillyData = phillyData[phillyData['neighbourhood'] != '']

phillyData

Unnamed: 0,postal_code,county_name,latitude,longitude,neighbourhood
0,19102,Philadelphia,39.9489,-75.1661,Rittenhouse Square
1,19103,Philadelphia,39.9513,-75.1741,Rittenhouse Square
2,19104,Philadelphia,39.9597,-75.2024,Powelton Village
3,19106,Philadelphia,39.9474,-75.1473,Society Hill
4,19107,Philadelphia,39.9487,-75.1593,Rittenhouse Square
5,19109,Philadelphia,39.9496,-75.1637,Rittenhouse Square
6,19111,Philadelphia,40.0596,-75.0818,Burholme
7,19112,Philadelphia,39.8893,-75.1782,South Philadelphia
8,19114,Philadelphia,40.0634,-74.999,Academy Garden
9,19115,Philadelphia,40.0903,-75.041,Pauls Run


## And Finally Foursquare API data to get description and location of venues

In [13]:
#Obtain all categories for venue types and their IDs to fine tune our venue search to only include our selected features.

CLIENT_ID = '10UP2LT2HSBHAECSOXEODQS4RMO2DYHNMJTRWFBR2Z2REVLK' # your Foursquare ID
CLIENT_SECRET = 'OVV3FEQZE2Z4TEOJ2KCHCD24PJKXGXRAU15ZIIHE4TBIJYZL' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
getCategoriesURL = 'https://api.foursquare.com/v2/venues/categories?&client_id={}&client_secret={}&v={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET,
            VERSION)
resultsCategories = requests.get(getCategoriesURL)
categories = resultsCategories.json()["response"]['categories']
# print(categories)
for each in categories:
    print(each['name'],each['id'])

Arts & Entertainment 4d4b7104d754a06370d81259
College & University 4d4b7105d754a06372d81259
Event 4d4b7105d754a06373d81259
Food 4d4b7105d754a06374d81259
Nightlife Spot 4d4b7105d754a06376d81259
Outdoors & Recreation 4d4b7105d754a06377d81259
Professional & Other Places 4d4b7105d754a06375d81259
Residence 4e67e38e036454776db1fb3a
Shop & Service 4d4b7105d754a06378d81259
Travel & Transport 4d4b7105d754a06379d81259


In [125]:

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

eligibleUniversityVenues = '4d4b7105d754a06372d81259'

eligibleFoodVenues = '4d4b7105d754a06374d81259'

eligibleTransportVenues = '4d4b7105d754a06379d81259'

def getNearbyVenues(names, latitudes, longitudes, radius, categoryString):
    LIMIT = 100
    venues_list=[]
    getCategoriesURL = 'https://api.foursquare.com/v2/venues/categories?&client_id={}&client_secret={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET)
    resultsCategories = requests.get(getCategoriesURL)
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            categoryString)
            
        # make the GET request
            
        # make the GET request
        try:
            results = requests.get(url).json()["response"]['groups'][0]['items']
        except Exception as e:
            print(e)
            print("\n")
            print(requests.get(url).json()["response"])
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['neighbourhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


#Get all school, transport and food venues near every postal code

philly_venues_schools = getNearbyVenues(names=phillyData['neighbourhood'],
                                   latitudes=phillyData['latitude'],
                                   longitudes=phillyData['longitude'],
                                    radius=1000,
                                    categoryString=eligibleUniversityVenues 
                                  )

philly_venues_transport = getNearbyVenues(names=phillyData['neighbourhood'],
                                   latitudes=phillyData['latitude'],
                                   longitudes=phillyData['longitude'],
                                          radius=1000,
                                          categoryString=eligibleTransportVenues
                                  )
philly_venues_food = getNearbyVenues(names=phillyData['neighbourhood'],
                                   latitudes=phillyData['latitude'],
                                   longitudes=phillyData['longitude'],
                                    radius=1000,
                                     categoryString=eligibleFoodVenues 
                                  )

Your credentails:
CLIENT_ID: 10UP2LT2HSBHAECSOXEODQS4RMO2DYHNMJTRWFBR2Z2REVLK
CLIENT_SECRET:OVV3FEQZE2Z4TEOJ2KCHCD24PJKXGXRAU15ZIIHE4TBIJYZL


In [126]:
#Count the number of venues for each neighbourhood and put them into seperate dataframes

schools = philly_venues_schools.groupby(['neighbourhood'])['Venue Category'].count().reset_index(name="school_count")
transports = philly_venues_transport.groupby(['neighbourhood'])['Venue Category'].count().reset_index(name="transport_count")
foods = philly_venues_food.groupby(['neighbourhood'])['Venue Category'].count().reset_index(name="food_count")

foods.head()

Unnamed: 0,neighbourhood,food_count
0,Academy Garden,10
1,Bella Vista,100
2,Bridesburg,12
3,Burholme,25
4,Cedar Park,29


In [127]:
#Merge dataframes to obtain specific counts for each type of venue

philly = schools.merge(transports,on="neighbourhood", how='inner')
philly = philly.merge(foods,on="neighbourhood", how='inner')

#Apply weights of 5, 3 and 1 to the school, transport and food counts respectively to normalize the counts.

philly['school_count'] = philly['school_count'] * 5
philly['transport_count'] = philly['transport_count'] * 3
philly['food_count'] = philly['food_count'] * 1
philly['total_score'] = philly['food_count'] + philly['transport_count'] + philly['school_count']

In [133]:
philly.head()

Unnamed: 0,Cluster Labels,neighbourhood,school_count,transport_count,food_count,total_score
0,1,Academy Garden,60,3,10,73
1,3,Bella Vista,40,210,100,350
2,1,Bridesburg,25,18,12,55
3,1,Burholme,30,27,25,82
4,5,Cedar Park,5,87,29,121


In [128]:
# set number of clusters
kclusters = 6
phil = philly
philly_grouped_clustering = philly.drop('neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(philly_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 3, 1, 1, 5, 1, 1, 5, 5, 5, 5, 1, 3, 1, 5, 1, 4, 1, 5, 1, 1, 1,
       1, 1, 1, 1, 1, 3, 1, 4, 0, 1, 2, 1, 5, 3, 1, 3, 5, 1, 5],
      dtype=int32)

In [129]:
# add clustering labels
philly.insert(0, 'Cluster Labels', kmeans.labels_)

philly_merged = philly.merge(phillyData,on="neighbourhood", how='inner')

In [130]:
philly_merged

Unnamed: 0,Cluster Labels,neighbourhood,school_count,transport_count,food_count,total_score,postal_code,county_name,latitude,longitude
0,1,Academy Garden,60,3,10,73,19114,Philadelphia,40.0634,-74.999
1,3,Bella Vista,40,210,100,350,19147,Philadelphia,39.9362,-75.1563
2,1,Bridesburg,25,18,12,55,19137,Philadelphia,40.0008,-75.0727
3,1,Burholme,30,27,25,82,19111,Philadelphia,40.0596,-75.0818
4,5,Cedar Park,5,87,29,121,19143,Philadelphia,39.9448,-75.2288
5,1,East Falls,50,30,27,107,19129,Philadelphia,40.0118,-75.1861
6,1,Eastwick,5,36,16,57,19153,Philadelphia,39.9055,-75.2444
7,5,Fishtown,50,66,64,180,19125,Philadelphia,39.9788,-75.1262
8,5,Garden Court,50,105,40,195,19139,Philadelphia,39.9612,-75.2303
9,5,Girard Estates,15,87,59,161,19145,Philadelphia,39.9227,-75.1812


In [131]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, totalScore in zip(philly_merged['latitude'], philly_merged['longitude'], philly_merged['neighbourhood'], philly_merged['Cluster Labels'],philly_merged['total_score']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster) + ", Total Score : " + str(totalScore), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters