# Analysis of the filtered dataset

We have selected with `main.py` a list of possible office locations and obtained information about specific places that are requirements to select the best spot.

In this notebook we analyze and make a decision based on the available information.

To behin with, we are going to load several libraries and inizialize the fuctions that are necesary for this analysis.


In [1]:
import pandas as pd
import numpy as np
import math
from bson import ObjectId
from pymongo import MongoClient
from folium import Map, Marker, Icon, FeatureGroup, LayerControl, Choropleth
from folium.plugins import HeatMap
from folium.vector_layers import Circle, Polygon
from sklearn.metrics.pairwise import haversine_distances
from math import radians
from functions.analysis import *
import re
import os
import requests
from dotenv import load_dotenv
import json

client = MongoClient()

load_dotenv()
id_4sq = os.getenv("FOURSQ_ID")
tk_4sq = os.getenv("FOURSQ_SEC")

Now we load the dataset and examine how it is loaded.

In [2]:
candidates = pd.read_csv("temporal_database/candidates_airports_def.csv", index_col=0, encoding = "ISO-8859-1")
candidates.head()

Unnamed: 0,level_0,index,_id,category_code,id_near,name_near,address1,address2,city,latitude,...,num_night,has_daycare,name_daycare,coord_daycare,has_pet,name_pet,coord_pet,has_airport,name_airport,coord_airpott
0,0,0,5faf9c3760efce486b0d5450,network_hosting,5faf9c3660efce486b0d4e7c,Wetpaint,615 2nd Avenue Suite 280,,Seattle,47.602416,...,[20],[True],['Bright Horizons at Fourth and Madison'],"[-122.3336027, 47.6054777]",[True],['Doney Memorial Pet Clinic'],"[-122.330604, 47.601537]",[True],['Seattle-Tacoma International Airport (SEA) (...,"[-122.302508354187, 47.44358853419229]"
1,1,1,5faf9c3860efce486b0d6666,web,5faf9c3660efce486b0d4e7c,Wetpaint,615 Second Avenue,Suite 280,Seattle,47.602416,...,[20],[True],['Bright Horizons at Fourth and Madison'],"[-122.3336027, 47.6054777]",[True],['Doney Memorial Pet Clinic'],"[-122.330604, 47.601537]",[True],['Seattle-Tacoma International Airport (SEA) (...,"[-122.302508354187, 47.44358853419229]"
2,2,2,5faf9c3860efce486b0d7075,software,5faf9c3660efce486b0d4e7c,Wetpaint,2200 Alaskan Way,Suite 130,Seattle,47.603873,...,[20],[True],['Bright Horizons at Fourth and Madison'],"[-122.3336027, 47.6054777]",[True],['Doney Memorial Pet Clinic'],"[-122.330604, 47.601537]",[True],['Seattle-Tacoma International Airport (SEA) (...,"[-122.302508354187, 47.44358853419229]"
3,3,3,5faf9c3860efce486b0d694a,software,5faf9c3660efce486b0d4e7c,Wetpaint,810 3rd Ave Ste 242,,Seattle,47.604266,...,[20],[True],['Bright Horizons at Fourth and Madison'],"[-122.3336027, 47.6054777]",[True],['Doney Memorial Pet Clinic'],"[-122.330604, 47.601537]",[True],['Seattle-Tacoma International Airport (SEA) (...,"[-122.302508354187, 47.44358853419229]"
4,5,5,5faf9c3760efce486b0d5150,web,5faf9c3660efce486b0d4e7c,Wetpaint,811 First Avenue,Suite 480,Seattle,47.603327,...,[20],[True],['Bright Horizons at Fourth and Madison'],"[-122.3336027, 47.6054777]",[True],['Doney Memorial Pet Clinic'],"[-122.330604, 47.601537]",[True],['Seattle-Tacoma International Airport (SEA) (...,"[-122.302508354187, 47.44358853419229]"


Some cells need to be in lists, integer, float or boolean rather than string. We need to solve this issue with simple apply functions.

In [3]:
candidates.columns

Index(['level_0', 'index', '_id', 'category_code', 'id_near', 'name_near',
       'address1', 'address2', 'city', 'latitude', 'longitude', 'coords',
       'has_veg', 'name_veg', 'coord_veg', 'num_veg', 'has_basket',
       'name_basket', 'coord_basket', 'has_night', 'name_night', 'coord_night',
       'num_night', 'has_daycare', 'name_daycare', 'coord_daycare', 'has_pet',
       'name_pet', 'coord_pet', 'has_airport', 'name_airport',
       'coord_airpott'],
      dtype='object')

In [4]:
col_unfold = [ 'has_veg', 'name_veg', 'coord_veg', 'num_veg', 'has_basket',
       'name_basket', 'coord_basket', 'has_night', 'name_night', 'coord_night',
       'num_night', 'has_daycare', 'name_daycare', 'coord_daycare', 'has_pet',
       'name_pet', 'coord_pet', 'coord_pet', 'has_airport', 'name_airport',
       'coord_airpott']
integer_cols = ['num_night','num_veg']
longlat =['latitude', 'longitude']

col_coords = [ 'coord_veg', 'coord_basket', 'coord_night', 'coord_daycare', 'coord_pet', 'coord_airpott']
bolcols = [ 'has_veg', 'has_basket','has_night','has_daycare','has_pet', 'has_airport']

#We eliminate the brakets, because we don't want lists with only one elemnt
for col in col_unfold:
    candidates[col]=candidates[col].apply(lambda x: x[1:-1])

#Converting to integer    
for col in integer_cols:
    candidates[col]=candidates[col].apply(int)

#Converting to float (longitude and latitude)
for col in longlat:
    candidates[col]=candidates[col].apply(float) 

#Converting to boolean
for col in bolcols:
    candidates[col]=candidates[col].apply(lambda x: x == "True")  


#Obtain a list of 2 floats with the longitude and latitude      
def change_coords(x):    
    xc=re.split(r'(,)', x)
    if not("None" in xc) and not("one" in xc):
        
        return [float(xc[-1]),float(xc[0])]
    else:
        return None

for col in col_coords:    
    candidates[col]=candidates[col].apply(change_coords) 
    
    

## Assigning a score to each locatio based on how conditions are met

We need to have a mathematical criterion to decide what location is the best based on the requirements of the project. To do so we do the following:

- For the airport and the baseball stadium, we just check if there is one within the selection radius (30km and 10 km, respectively). For each one, we add 1 to the score if the condition is met.

- For other parameters, we want to decide not only if there are required spots nearby, but which one is closer and hence a better option. We define a sigmoid funtion (see a plot example below) to assign between 0 and 1 these conditions in the closest place. This is the best one as it is rougly one nearby, and 0 far away from the position, being 0.5 at a defined `c2` distande. With the `c1` parameter we change how the value changes between 0 and 1 arround `c2`. Larger `c1` values result in a steper function. We use `c1=0.25` for all, and `c2=250` for child daycare and per care sercies, and `c2=100` for vegan restaurants and nightlife spots.

![Sigmoid](https://www.researchgate.net/profile/Tali_Leibovich-Raveh/publication/325868989/figure/fig2/AS:639475206074368@1529474178211/A-Basic-sigmoid-function-with-two-parameters-c1-and-c2-as-commonly-used-for-subitizing.png)

- Lastly, for vegan restaurants and nightlife spots we also want to check how many of them are nearby, so the score of the closest place is multiplied by 0.5 and we add other metric to check the viariety of places, that has a maximum of 0.5. This is simply dividing the number of places found rearby by the maximum number searched (10 for vegan restaurants and 20 for nightlife party places), and multiplied by 0.5. In total, this adds 1 at maximum to the score and provides an useful criterion to decide.

We compute this score using the function `obtain_score` to create a new column in the DataFrame.

In [5]:
candidates["score"] = obtain_score(candidates)

  return 1./(1. + np.exp(-c1 * (c2 - d)))


We sort the candidates DataFrame based on the score, and just to show, we print the best 3 candidates.

In [6]:
candidates=candidates.sort_values(by=['score'], ascending=False)
candidates[:3]

Unnamed: 0,level_0,index,_id,category_code,id_near,name_near,address1,address2,city,latitude,...,has_daycare,name_daycare,coord_daycare,has_pet,name_pet,coord_pet,has_airport,name_airport,coord_airpott,score
1176,4722,4722,5faf9c3860efce486b0d6620,software,5faf9c3760efce486b0d4fc5,SelectMinds,"246, Fifth Avenue",Ste. 301,New York,40.744549,...,True,'Apple Seeds',"[40.743248277718095, -73.98961352390818]",True,'New York Dog Spa & Hotel',"[40.74370946621388, 73.99053168978655]",True,'Teterboro Airport (TEB) (Teterboro Airport)',"[40.85568601427126, -74.06060052691976]",5.999973
334,436,436,5faf9c3760efce486b0d5c09,other,5faf9c3660efce486b0d4e9f,Kyte,116 Natoma St.,3rd Floor,San Francisco,37.787076,...,True,'Kids by the Bay - Financial District',"[37.785471, -122.39803700000002]",True,'Pawtrero Bathhouse & Feed Co.',"[37.7838, 122.3893422]",True,'San Francisco International Airport (SFO) (Sa...,"[37.6167130000997, -122.38709449768066]",5.999427
1179,4726,4726,5faf9c3860efce486b0d6fec,ecommerce,5faf9c3760efce486b0d4fc5,SelectMinds,244 5th Avenue,Suite 2LP,New York,40.744618,...,True,'Apple Seeds',"[40.743248277718095, -73.98961352390818]",True,'New York Dog Spa & Hotel',"[40.74370946621388, 73.99053168978655]",True,'Teterboro Airport (TEB) (Teterboro Airport)',"[40.85568601427126, -74.06060052691976]",5.998199


Now we are going to represent the chosen location for our gaming company in red, and the rest of the conditions that are met for this location.

In [7]:
#Storing the coordinates
long=candidates[candidates["_id"] == "5faf9c3860efce486b0d6620"]["longitude"]
lat=candidates[candidates["_id"] == "5faf9c3860efce486b0d6620"]["latitude"]

#Creating the map centered in the chosen coordinates
m = Map(location=[lat, long],zoom_start=15 )

#Creating a marker and placing this marker on the map
ic = Icon(color="red",prefix="fa", icon="gamepad")
selected = Marker(location=[lat, long], tooltip="CHOSEN LOCATION", popup="Generic Gaming Company S.L.",icon=ic)
selected.add_to(m)

#Creating a marker for the neighbour tech company and placing this marker on the map
long_near=obtain_closer_data(client,"5faf9c3760efce486b0d4fc5")[0]['offices']['longitude']
lat_near=obtain_closer_data(client,"5faf9c3760efce486b0d4fc5")[0]['offices']['latitude']
name_near=obtain_closer_data(client,"5faf9c3760efce486b0d4fc5")[0]['name']
ic = Icon(color="blue",prefix="fa", icon="gear")
mrk=Marker(location=[lat_near, long_near], tooltip="Neighbour Succesful Tech Company", popup=name_near,icon=ic)
mrk.add_to(m)

#Creating a marker for the closest airport and placing this marker on the map
lat_t=candidates[candidates["_id"] == "5faf9c3860efce486b0d6620"]["coord_airpott"].values[0][0]
long_t=candidates[candidates["_id"] == "5faf9c3860efce486b0d6620"]["coord_airpott"].values[0][1]
name_t=candidates[candidates["_id"] == "5faf9c3860efce486b0d6620"]["name_airport"].values[0]
ic = Icon(color="gray",prefix="fa", icon="plane")
mrk=Marker(location=[lat_t, long_t], tooltip="Airport", popup=name_t,icon=ic)
mrk.add_to(m)


#Creating a marker for the Baskerball Stadium and placing this marker on the map
lat_t=candidates[candidates["_id"] == "5faf9c3860efce486b0d6620"]["coord_basket"].values[0][0]
long_t=candidates[candidates["_id"] == "5faf9c3860efce486b0d6620"]["coord_basket"].values[0][1]
name_t=candidates[candidates["_id"] == "5faf9c3860efce486b0d6620"]["name_basket"].values[0]
ic = Icon(color="orange",prefix="fa", icon="dribbble")
mrk=Marker(location=[lat_t, long_t], tooltip="Basketball Stadium", popup=name_t,icon=ic)
mrk.add_to(m)


#Creating a marker for the child daycare site and placing this marker on the map
lat_t=candidates[candidates["_id"] == "5faf9c3860efce486b0d6620"]["coord_daycare"].values[0][0]
long_t=candidates[candidates["_id"] == "5faf9c3860efce486b0d6620"]["coord_daycare"].values[0][1]
name_t=candidates[candidates["_id"] == "5faf9c3860efce486b0d6620"]["name_daycare"].values[0]
ic = Icon(color="pink",prefix="fa", icon="user")
mrk=Marker(location=[lat_t, long_t], tooltip="Child Daycare", popup=name_t,icon=ic)
mrk.add_to(m)


#Creating a marker for the pet sercice place and placing this marker on the map
lat_t=candidates[candidates["_id"] == "5faf9c3860efce486b0d6620"]["coord_pet"].values[0][0]
long_t=candidates[candidates["_id"] == "5faf9c3860efce486b0d6620"]["coord_pet"].values[0][1]
name_t=candidates[candidates["_id"] == "5faf9c3860efce486b0d6620"]["name_pet"].values[0]
ic = Icon(color="purple",prefix="fa", icon="paw")
mrk=Marker(location=[lat_t, -long_t], tooltip="Pet Services", popup=name_t,icon=ic)
mrk.add_to(m)




#Now for the different vegan restaurants and nightlife spots, we do an aditional api request
#to get the coordinates and names for all the places nearby that we didnt store to not to
#create a very massive DataFrame before.

#Nightlife spots
url = 'https://api.foursquare.com/v2/venues/explore'
params = dict(client_id=id_4sq,client_secret=tk_4sq,v='20201114',
        ll=str(lat.values[0])+","+str(long.values[0]),limit=20,
        categoryId="4d4b7105d754a06376d81259",radius=2000)
resp = requests.get(url=url, params=params)
data = json.loads(resp.text)
#Creating a marker for each party place and placing them on the map
for i in range(len(data["response"]["groups"][0]["items"])):
    long_t = data["response"]["groups"][0]["items"][i]["venue"]["location"]["lng"]
    lat_t = data["response"]["groups"][0]["items"][i]["venue"]["location"]["lat"]
    name_t = data["response"]["groups"][0]["items"][i]["venue"]["name"]
    ic = Icon(color="black",prefix="fa", icon="beer")
    mrk=Marker(location=[lat_t, long_t], tooltip="Nightlife",
               popup=name_t,icon=ic)
    mrk.add_to(m)
    

#Vegan restaurants
params = dict(client_id=id_4sq,client_secret=tk_4sq,v='20201114',
        ll=str(lat.values[0])+","+str(long.values[0]),limit=10,
        categoryId="4bf58dd8d48988d1d3941735",radius=500)
resp = requests.get(url=url, params=params)
data = json.loads(resp.text)
#Creating a marker for each vegan spot and placing them on the map
for i in range(len(data["response"]["groups"][0]["items"])):
    long_t = data["response"]["groups"][0]["items"][i]["venue"]["location"]["lng"]
    lat_t = data["response"]["groups"][0]["items"][i]["venue"]["location"]["lat"]
    name_t = data["response"]["groups"][0]["items"][i]["venue"]["name"]
    ic = Icon(color="green",prefix="fa", icon="leaf")
    mrk=Marker(location=[lat_t, long_t], tooltip="Vegan Spots",
               popup=name_t,icon=ic)
    mrk.add_to(m)
        
#Shows the map    
m

Lastly, just for curiosity, we create a second map with the top 25% candidates, with its score. This allows to check for a set of the best locations arround the globe the score of the spots.

In [8]:
#Creates a new map
m2 = Map(location=[lat, long],zoom_start=2 )

#Places in the marker of each candidate if it is in the rop 25%
for i in range(len(candidates)):
    if candidates["score"][i] > candidates["score"].describe()["75%"]:
        lat_t=candidates["latitude"][i]
        long_t=candidates["longitude"][i]
        name_t="Score: "+str(candidates["score"][i])
        ic = Icon(color="red",prefix="fa", icon="rebel")
        mrk=Marker(location=[lat_t, long_t], tooltip=name_t ,popup=name_t,icon=ic)
        mrk.add_to(m2)

#Shows the map
m2