# Capstone Project - Final
### Weeks 4 & 5

## Table of contents
* [Introduction: Business Problem](#business_problem)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Business Problem <a name="business_problem"></a>

The goal of this project is to find set of optimal locations for tourist accomodation (hotel, motel, inn, bed & breakfast etc) in the city of Ljubljana, capital of Slovenia.  

Ljubljana has been a very popular tourist destination in the past years and it hosts more tourists each year. The accomodation capacities need to grow accordingly. 

The following assumptions will be considered when trying to find optimal locations for accomodation:
* No other accomodation object in vicinity.
* Migrating towards to city center (where most of the city attractions are)
* Walking distance to various food places. 
* Walking distance to bus stops.

This information is interesting to investors and stakeholders that are considering new acommodation objects in Ljubljana. Location information is one of the most important variables in this decision. There are other factors that impact the decision which are not considered in this project, like realestate prices. 

## Data <a name="data"></a>

The source of data will be Foursquare:
* Existing accomodation locations in Ljubljana
* Food venues available walking distance from every accomodation location
* Bus stops locations available walking distance from every accomodation location

Locations of administrative regions (neighbourhoods/boroughs/districts) could be used as originating points for Foursquare data. However those regions have incorrect shapes and very different sizes which can skew the data pulled from Foursquare.
To avoid gaps it was decided to place a square grid over the city of Ljubljana, centered at the most popular tourist spot, "Ljubljansko tromostovje".

Each grid unit/area will be used as a bounding box for retrieving venues from Foursquare using the search API. In addition the explore API will be used to get the venue information in a certain radius. 

Nomatim.

In [4]:
# imports

# uncomment to install with conda or pip
# !pip install geopy
# !conda install -c conda-forge geopy --yes 

from geopy.geocoders import Nominatim 
from geopy.extra.rate_limiter import RateLimiter

# !conda install -c conda-forge folium --yes
# !pip install folium
import folium

import requests # library to handle requests
import pandas as pd
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import numpy as np
import math
import time

# !conda install -c conda-forge pyproj --yes 
import pyproj

In [20]:
# foursquare credentials
foursquare_client_id = 'EXENEWRKIY0ZQPXVOZCD3RWEXTNNQC1113GGBXYXJVHERV0J' # your Foursquare ID
foursquare_client_secret = 'YHAQNOL1EMK0BHONYERC4J52J4ECBPAIQOKWSULTVWZZUQKM' # your Foursquare Secret
foursquare_version = '20180605' # Foursquare API version
foursquare_limit = 100

### Accomodation locations

In [5]:
# Get the Ljubljana tourist center point.

lj_center_address = 'Prešernov trg, Ljubljana, Slovenia'

geolocator = Nominatim(user_agent="LJ_explorer")
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=5)

location = geocode(lj_center_address)
lj_center_lat = location.latitude
lj_center_lon = location.longitude
lj_center_coordinates=(lj_center_lat,lj_center_lon)

print('Coordinates of {} = {}'.format(lj_center_address, lj_center_coordinates))

Coordinates of Prešernov trg, Ljubljana, Slovenia = (46.05140755, 14.506095911950972)


In [17]:
# Helper methods for calculating various geo coordinates in WGS84 space 

geod = pyproj.Geod(ellps='WGS84')

# calculate the geo point which is distance away in the direction of the fwd_azimuth
def calculate_geo_point(start_lat, start_lon, fwd_azimuth, distance):
    end_lon, end_lat, back_azimuth = geod.fwd(start_lon,start_lat,fwd_azimuth,distance)
    return (end_lat, end_lon)

# calculates geo coordinates of corners of a square. Start_lat and start_lon assumes SW corner.
def calculate_square_corners(start_lat, start_lon, distance):
    corners = [] # clock wise - sw, nw, ne, sw
    corners.append((start_lat, start_lon))
    corners.append(calculate_geo_point(start_lat, start_lon, 0, distance))
    corners.append(calculate_geo_point(start_lat, start_lon, 45, math.sqrt(2)*distance))
    corners.append(calculate_geo_point(start_lat, start_lon, 90, distance))
    return corners

# calculate distance between 2 geo coordinates 
def calculate_distance(start_lat, start_lon, end_lat, end_lon):
    a1, a1, distance = geod.inv(start_lon, start_lat, end_lon, end_lat)
    return distance

In [45]:
# Define the grid - a square defined by coordinates of it's corners (sw, nw, ne, se).
# Each grid unit/area is a square defined by coordinates of it's corners (sw, nw, ne, se).
grid_size = 3000 # meters
grid_unit = 1000 # meters 


def generate_grid(start_lat, start_lon, grid_size, grid_unit):
    # Get the grid's SW corner starting from the lj center coordinates
    grid_sw_corner = calculate_geo_point(start_lat, start_lon, 225, grid_size*math.sqrt(2)/2)

    # Calculate the rest of the grid's corners
    grid_corners = calculate_square_corners(grid_sw_corner[0], grid_sw_corner[1], grid_size)

    # generate the grid
    # start at the SW corner of the grid and make your way up 
    grid=[]

    row_origin = grid_sw_corner 
    for i in range(0, math.floor(grid_size/grid_unit)):
        col_origin = row_origin # remember the origin
        for j in range(0, math.floor(grid_size/grid_unit)):
            square = calculate_square_corners(col_origin[0], col_origin[1], grid_unit)
            grid.append(square)
            col_origin = square[3] # new SW corner is the SE corner of the previous square
    
        row_origin = calculate_geo_point(row_origin[0], row_origin[1], 0, grid_unit)
    
    return grid



In [50]:
grid = generate_grid(lj_center_lat, lj_center_lon, grid_size, grid_unit)
len(grid)

9

In [51]:
# display the grid on the map of Ljubljana together with the center point
map_lj= folium.Map(location=lj_center_coordinates, zoom_start=12)
folium.Marker(lj_center_coordinates, popup=lj_center_address).add_to(map_lj)

for square in grid:
    folium.Rectangle(square, weight=1).add_to(map_lj)

map_lj

In [72]:
# helper functions for pulling the data from Foursquare

# Foursquare catergoy ids
hotel_id = "4bf58dd8d48988d1fa931735"
bus_id = "4bf58dd8d48988d1fe931735"
food_id = "4d4b7105d754a06374d81259"

radius = 250 # meters

# requires category_id, coordinates of sw corner, coordinates of ne corner
def search_venues_in_area(sw_lat, sw_lon, ne_lat, ne_lon, category_id, client_id, client_secret, version, limit):
    
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&intent=browse&sw={},{}&ne={},{}&categoryId={}&limit={}'.format(
        client_id, 
        client_secret, 
        version,
        sw_lat,
        sw_lon,
        ne_lat,
        ne_lon,
        category_id,    
        limit)
    
    try:
        r = requests.get(url)
        r.raise_for_status()
        return r.json()['response']['venues']
        
    except requests.exceptions.HTTPError as err:
        print(err.response.text)
        return None
    
    
def search_venues_in_radius(lat, lon, category_id, radius, client_id, client_secret, version, limit):
    
    # create the API request URL
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&intent=browse&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, 
        client_secret, 
        version,
        lat,
        lon,
        category_id,
        radius,
        limit)
    
    try:
        r = requests.get(url)
        r.raise_for_status()
        return r.json()['response']['groups'][0]['items'] 
        
    except requests.exceptions.HTTPError as err:
        print(err.response.text)
        return None

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']



In [44]:
print(explore_venues_in_radius(46.00049034,
                     14.47754566,
                     food_id,   
                     250,
                     foursquare_client_id,
                     foursquare_client_secret,
                     foursquare_version,
                     foursquare_limit
                    ))

[{'reasons': {'count': 0, 'items': [{'summary': 'This spot is popular', 'type': 'general', 'reasonName': 'globalInteractionReason'}]}, 'venue': {'id': '514f18f9e4b05d925e8996d1', 'name': 'Posestvo Trnulja', 'location': {'address': 'Črna vas 265', 'lat': 46.000529466645624, 'lng': 14.477544229647345, 'labeledLatLngs': [{'label': 'display', 'lat': 46.000529466645624, 'lng': 14.477544229647345}], 'distance': 4, 'postalCode': '1000', 'cc': 'SI', 'city': 'Ljubljana', 'state': 'Ljubljana', 'country': 'Slovenija', 'formattedAddress': ['Črna vas 265', '1000 Ljubljana', 'Slovenija']}, 'categories': [{'id': '4bf58dd8d48988d1c4941735', 'name': 'Restaurant', 'pluralName': 'Restaurants', 'shortName': 'Restaurant', 'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/default_', 'suffix': '.png'}, 'primary': True}], 'photos': {'count': 0, 'groups': []}}, 'referralId': 'e-0-514f18f9e4b05d925e8996d1-0'}]


In [None]:
# Get venues based from grid 

def get_venues_in_grid(categories, grid):
    frames = []
    print("Processing areas: ",end="")
    
    for area in grid:
        
        for category_id in categories:
            time.sleep(1)
            print(".",end="")
            venues_in_area = search_venues_in_area(area[0][0],
                                                   area[0][1],
                                                   area[2][0],
                                                   area[2][1],
                                                   category_id,
                                                   foursquare_client_id,
                                                   foursquare_client_secret,
                                                   foursquare_version,
                                                   foursquare_limit)
    
            # Check if something went wrong
            if venues_in_area == None:
                print("Search not working...")
            break
        
            # if nothing is found skip to the next area
            if len(venues_in_area) == 0:
                continue
    
            # clean up and generate a panda frame, add to the frames list
            venues_in_area = json_normalize(venues_in_area)
            filtered_columns = ['id', 'name', 'categories', 'location.lat', 'location.lng']
            venues_in_area = venues_in_area.loc[:, filtered_columns]
            venues_in_area['categories'] = venues_in_area.apply(get_category_type, axis=1)
            venues_in_area.columns = ['Id','Name', 'Category', 'Latitude', 'Longitude']
            frames.append(venues_in_area)



In [65]:
# Get all the accomodation locations in the grid

frames = []
print("Processing areas for hotels: ", end="")
for area in grid:
    time.sleep(1)
    print(".",end="")
    
    hotels_in_area = search_venues_in_area(area[0][0],
                                           area[0][1],
                                           area[2][0],
                                           area[2][1],
                                           hotel_id,
                                           foursquare_client_id,
                                           foursquare_client_secret,
                                           foursquare_version,
                                           foursquare_limit)
    
    # Check if something went wrong
    if hotels_in_area == None:
        print("Search not working...")
        break
        
    # if nothing is found skip to the next area
    if len(hotels_in_area) == 0:
        continue
    
    # clean up and generate a panda frame, add to the frames list
    hotels_in_area = json_normalize(hotels_in_area)
    filtered_columns = ['id', 'name', 'categories', 'location.lat', 'location.lng']
    hotels_in_area = hotels_in_area.loc[:, filtered_columns]
    hotels_in_area['categories'] = hotels_in_area.apply(get_category_type, axis=1)
    hotels_in_area.columns = ['Venue id','Accomodation name', 'Accomodation category', 'Latitude', 'Longitude']
    frames.append(hotels_in_area)






Processing areas for hotels: .........

In [66]:
# Create a panda frame 

if (len(frames) > 0):
    # combine results into one frame
    lj_hotels = pd.concat(frames, ignore_index=True)

    # remove duplicates (based on venue name) and reset the index
    lj_hotels.drop_duplicates(subset ="Accomodation name", 
                     keep = False, inplace = True)
    lj_hotels.reset_index(drop=True, inplace=True)

    # add a column for Distance to city center
    lj_hotels['Distance to center'] = lj_hotels.apply(lambda x: 
                                        calculate_distance(
                                            lj_center_lat, lj_center_lon, 
                                            x["Latitude"], x["Longitude"]), axis=1)
    

In [67]:
print(len(lj_hotels))
lj_hotels.head()


117


Unnamed: 0,Venue id,Accomodation name,Accomodation category,Latitude,Longitude,Distance to center
0,58fc805e9ec3990cbc1e73b5,Vila Teslova,Hostel,46.043716,14.493887,1274.286829
1,518953c4498e522e3fd47ece,Student 2011,Hostel,46.039986,14.489419,1810.478659
2,4e35a8f518a82fdd6578c724,SimbolHostel,Hostel,46.039995,14.489365,1812.738622
3,5a5d2cdaf193c07e2a004520,Isabella rooms,Hotel,46.043865,14.491685,1395.310354
4,4dc531b17d8b14fb46232f79,MartaStudio,Bed & Breakfast,46.045287,14.488353,1532.507071


### Bus stations

In [73]:
frames = []
print("Processing areas for bus stops: ", end="")
for area in grid:
    time.sleep(1)
    print(".",end="")
    
    busstops_in_area = search_venues_in_area(area[0][0],
                                           area[0][1],
                                           area[2][0],
                                           area[2][1],
                                           bus_id,
                                           foursquare_client_id,
                                           foursquare_client_secret,
                                           foursquare_version,
                                           foursquare_limit)
    
    # Check if something went wrong
    if busstops_in_area == None:
        print("Search not working...")
        break
        
    # if nothing is found skip to the next area
    if len(busstops_in_area) == 0:
        continue
    
    # clean up and generate a panda frame, add to the frames list
    busstops_in_area = json_normalize(busstops_in_area)
    filtered_columns = ['id', 'name', 'location.lat', 'location.lng']
    busstops_in_area = busstops_in_area.loc[:, filtered_columns]
    busstops_in_area.columns = ['Venue id', 'Bus stop name', 'Latitude', 'Longitude']
    frames.append(busstops_in_area)

##






Processing areas for bus stops: .........

In [74]:
# Create a panda frame

if (len(frames) > 0):
    # combine results into one frame
    lj_busstops = pd.concat(frames, ignore_index=True)

    # remove duplicates (based on venue name) and reset the index
    lj_busstops.drop_duplicates(subset ="Bus stop name", 
                     keep = False, inplace = True)
    lj_busstops.reset_index(drop=True, inplace=True)

    # add a column for Distance to city center
    lj_busstops['Distance to center'] = lj_busstops.apply(lambda x: 
                                        calculate_distance(
                                            lj_center_lat, lj_center_lon, 
                                            x["Latitude"], x["Longitude"]), axis=1)
    

In [75]:
print(len(lj_busstops))
lj_busstops.head()

67


Unnamed: 0,Venue id,Bus stop name,Latitude,Longitude,Distance to center
0,4dea153388774880e311c63b,LPP postajališče Aškerčeva,46.046732,14.498875,1274.286829
1,4f391a5fe4b08dd06b0c736e,LPP postajališče Hajdrihova,46.045728,14.490274,1810.478659
2,4f36c342e4b0a67feb88924f,LPP postajališče Gerbičeva,46.040892,14.489848,1812.738622
3,50815364e4b061669612eac3,LPP postajalisce Jamova,46.04435,14.487646,1395.310354
4,4ee8eb13aa1f29ac6316e83a,LPP postajališče Jadranska,46.043651,14.488481,1532.507071


### Food venues

In [83]:
frames = []
print("Processing areas for food venues: ", end="")
for area in grid:
    time.sleep(1)
    print(".",end="")
    
    food_venues_in_area = search_venues_in_area(area[0][0],
                                           area[0][1],
                                           area[2][0],
                                           area[2][1],
                                           food_id,
                                           foursquare_client_id,
                                           foursquare_client_secret,
                                           foursquare_version,
                                           foursquare_limit)
    
    # Check if something went wrong
    if food_venues_in_area == None:
        print("Search not working...")
        break
        
    # if nothing is found skip to the next area
    if len(food_venues_in_area) == 0:
        continue
    
    # clean up and generate a panda frame, add to the frames list
    food_venues_in_area = json_normalize(food_venues_in_area)
    filtered_columns = ['id', 'name', 'categories', 'location.lat', 'location.lng']
    food_venues_in_area = food_venues_in_area.loc[:, filtered_columns]
    food_venues_in_area['categories'] = food_venues_in_area.apply(get_category_type, axis=1)
    food_venues_in_area.columns = ['Venue id', 'Food venue name', 'Category', 'Latitude', 'Longitude']
    frames.append(food_venues_in_area)

##




Processing areas for food venues: .........

In [84]:
# Create a panda frame

if (len(frames) > 0):
    # combine results into one frame
    lj_food_venues = pd.concat(frames, ignore_index=True)

    # remove duplicates (based on venue name) and reset the index
    lj_food_venues.drop_duplicates(subset ="Food venue name", 
                     keep = False, inplace = True)
    lj_food_venues.reset_index(drop=True, inplace=True)

    # add a column for Distance to city center
    lj_food_venues['Distance to center'] = lj_food_venues.apply(lambda x: 
                                        calculate_distance(
                                            lj_center_lat, lj_center_lon, 
                                            x["Latitude"], x["Longitude"]), axis=1)

In [85]:
print(len(lj_food_venues))
lj_food_venues.head()

323


Unnamed: 0,Venue id,Food venue name,Category,Latitude,Longitude,Distance to center
0,4cd01d266200b1f7ed3dd228,Kavarna Largo,Café,46.038113,14.499389,1566.264952
1,4c2cc1c0d1a10f4718eef964,Hombre,Mexican Restaurant,46.039564,14.487031,1977.501338
2,594d2e7010345b445d78c1b7,Mafija,Coffee Shop,46.042248,14.490469,1580.950366
3,4b8d458af964a52022f132e3,Volta cafe,Pizza Place,46.045641,14.489822,1413.188119
4,5b59a1bcc530930037cf370d,Don’t tell mama,Restaurant,46.037939,14.49905,1593.299373


In [92]:
# Visualize accomodations, bus stations and food venues on a map
# display the grid on the map of Ljubljana together with the center point
map_lj= folium.Map(location=lj_center_coordinates, zoom_start=13)
folium.Marker(lj_center_coordinates, popup=lj_center_address).add_to(map_lj)

for square in grid:
    folium.Rectangle(square, weight=0.5).add_to(map_lj)
    
for lat, lon, name in zip(lj_hotels['Latitude'], lj_hotels['Longitude'], lj_hotels['Accomodation name']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        popup=label,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.7,
        parse_html=False).add_to(map_lj)  

for lat, lon, name in zip(lj_busstops['Latitude'], lj_busstops['Longitude'], lj_busstops['Bus stop name']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7,
        parse_html=False).add_to(map_lj)  
    
for lat, lon, name in zip(lj_food_venues['Latitude'], lj_food_venues['Longitude'], lj_food_venues['Food venue name']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        popup=label,
        color='green',
        fill=True,
        fill_color='green',
        fill_opacity=0.7,
        parse_html=False).add_to(map_lj)  
map_lj

In [97]:
from folium import plugins
from folium.plugins import HeatMap

lj_hotels_coordinates = zip(lj_hotels['Latitude'], lj_hotels['Longitude'])


map_lj= folium.Map(location=lj_center_coordinates, zoom_start=13)
#folium.TileLayer('cartodbpositron').add_to(map_lj)
HeatMap(lj_hotels_coordinates).add_to(map_lj)

for square in grid:
    folium.Rectangle(square, weight=0.5).add_to(map_lj)

map_lj

In [99]:
lj_food_venue_coordinates = zip(lj_food_venues['Latitude'], lj_food_venues['Longitude'])


map_lj= folium.Map(location=lj_center_coordinates, zoom_start=13)
#folium.TileLayer('cartodbpositron').add_to(map_lj)
HeatMap(lj_food_venue_coordinates).add_to(map_lj)

for square in grid:
    folium.Rectangle(square, weight=0.5).add_to(map_lj)

map_lj