# Capstone Project - Final
### Weeks 4 & 5

## Table of contents
* [Introduction: Business Problem](#business_problem)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Business Problem <a name="business_problem"></a>

The city of Ljubljana, capital of Slovenia, has been a very popular tourist destination in the recent  years and it hosts more tourists each year. Also more and more tourists decide to stay in Ljubljana for more than 1 day.  Thus the accommodation offering (hotels, motels, inns, bed & breakfasts etc. ) needs to grow with the demand.  

One of the most important factors when looking for a location for tourist accommodation is location. The goal of this project is to identify parts of the city of Ljubljana  that could be candidates for location for tourist acommodation. 

The report is targeting investors/stakeholders/contractors that are interested in creating new accommodation offerings in Ljubljana. The results should be interesting for either small sized (butique) acommodations as well as higher capacity accommodation.  

There are other factors that impact the decision which are not considered in this project, like realestate prices. 

## Data <a name="data"></a>

The following assumptions will be considered when trying to find a potential location of tourist accommodation:
* No other tourist accommodation venue in vicinity. 
* Gravitating towards the city center. Most of Ljubljana's attractions are in the city center. 
* Walking distance away to existing food & drink venues. We will not distinct specifically on the type of food venue. 
* Walking distance to the public transportation grid (bus stations/stops).

The main  datasource will be Foursquare database for the following information:
* Location of existing accommodation venues (any type). 
* Location of existing food venues (any type).
* Bus station/stop locations. 

Transformations between addresses and geographic locations will be done using the Nomatim API. 

Locations of administrative regions (neighbourhoods/boroughs/districts) could be used as source points for Foursquare requests. However those regions have incorrect shapes and very different sizes which can skew the data. To avoid search gaps it was decided to place a square grid over the city of Ljubljana, spanning several kilometers and centered at the most popular tourist spot, "Ljubljansko tromostovje".
Each grid unit/area will be used as a bounding box for retrieving venues from Foursquare using the search API.


## Methodology

The goal is to finds part of the city of Ljubljana that have low accomodation density. At the same time those areas need to have close (walking distance) access to the public transportation grid and also several food venues. 

In the first steps we collect the venue information from Foursquare from an area of several kilometers. To ease the data collection we will create a square grid and each grid unit will represent the bounds for Foursquare search. 
Each venue (hotel, bus station, food) has a location and category.

Next section will explore the collected data using visualization aids.

In the final section we will use k-means clustering approach to find blobs of location that meet the predefined requirements:
* no other acommodation object in vicinity
* at least one bus station within the walking distance
* at least 5 food places wihthin the walking distance

## Preparation

In [2]:
# imports

# uncomment to install with conda or pip, below is just an example
# !pip install geopy
# !conda install -c conda-forge geopy --yes 

from geopy.geocoders import Nominatim 
from geopy.extra.rate_limiter import RateLimiter

import folium

import requests 
import pandas as pd
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import numpy as np
import math
import time

import pyproj

In [3]:
# foursquare credentials
foursquare_client_id = 'EXENEWRKIY0ZQPXVOZCD3RWEXTNNQC1113GGBXYXJVHERV0J' # your Foursquare ID
foursquare_client_secret = 'YHAQNOL1EMK0BHONYERC4J52J4ECBPAIQOKWSULTVWZZUQKM' # your Foursquare Secret
foursquare_version = '20180605' # Foursquare API version
foursquare_limit = 100

In [94]:
# Get the Ljubljana tourist center point.

lj_center_address = 'Prešernov trg, Ljubljana, Slovenia'

geolocator = Nominatim(user_agent="LJ_explorer")
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=5)

location = geocode(lj_center_address)
lj_center_lat = location.latitude
lj_center_lon = location.longitude
lj_center_coordinates=(lj_center_lat,lj_center_lon)

print('Coordinates of {} = {}'.format(lj_center_address, lj_center_coordinates))

Coordinates of Prešernov trg, Ljubljana, Slovenia = (46.05140755, 14.506095911950972)


In [95]:
# Helper methods for calculating various geo coordinates in WGS84 space 

geod = pyproj.Geod(ellps='WGS84')

# calculate the geo point which is distance away in the direction of the fwd_azimuth
def calculate_geo_point(start_lat, start_lon, fwd_azimuth, distance):
    end_lon, end_lat, back_azimuth = geod.fwd(start_lon,start_lat,fwd_azimuth,distance)
    return (end_lat, end_lon)

# calculates geo coordinates of corners of a square. Start_lat and start_lon assumes SW corner.
def calculate_square_corners(start_lat, start_lon, square_size):
    corners = [] # clock wise - sw, nw, ne, sw
    corners.append((start_lat, start_lon))
    corners.append(calculate_geo_point(start_lat, start_lon, 0, square_size))
    corners.append(calculate_geo_point(start_lat, start_lon, 45, math.sqrt(2)*square_size))
    corners.append(calculate_geo_point(start_lat, start_lon, 90, square_size))
    return corners

# calculate center coordinates of a square, assumes SW corner
def calculate_square_center(start_lat, start_lon, square_size):
    return calculate_geo_point(start_lat, start_lon, 45, math.sqrt(2)*square_size/2 )

# calculate distance between 2 geo coordinates 
def calculate_distance(start_lat, start_lon, end_lat, end_lon):
    a1, a1, distance = geod.inv(start_lon, start_lat, end_lon, end_lat)
    return distance

In [104]:
# Define the grid = a square defined by coordinates of the sw corner
# Each grid unit/area is a square defined by coordinates of it's corners (sw, nw, ne, se).


def generate_grid(start_lat, start_lon, grid_size, grid_unit):
   
    
    # Calculate the rest of the grid's corners
    #grid_corners = calculate_square_corners(start_lat, start_lon, grid_size)

    # generate the grid
    # start at the SW corner of the grid and make your way up 
    grid=[]
    
    row_origin = (start_lat, start_lon) 
    for i in range(0, math.floor(grid_size/grid_unit)):
        col_origin = row_origin # remember the origin
        for j in range(0, math.floor(grid_size/grid_unit)):
            square = calculate_square_corners(col_origin[0], col_origin[1], grid_unit)
            grid.append(square)
            col_origin = square[3] # new SW corner is the SE corner of the previous square
    
        row_origin = calculate_geo_point(row_origin[0], row_origin[1], 0, grid_unit)
    
    return grid



In [105]:
grid_size = 5000 # meters
grid_unit = 1000 # meters 

# Get the grid's SW corner starting from the Ljubljana center defined by start_lat, start_lon
grid_sw_corner = calculate_geo_point(lj_center_lat, lj_center_lon, 225, grid_size*math.sqrt(2)/2)

grid = generate_grid(grid_sw_corner[0], grid_sw_corner[1], grid_size, grid_unit)
#grid = generate_grid(lj_center_lat, lj_center_lon, grid_size, grid_unit)
print("Number of grid units (areas):", len(grid))

Number of grid units (areas): 25


In [106]:
# Display the grid on the map of Ljubljana together with the center point
map_lj= folium.Map(location=lj_center_coordinates, zoom_start=13)
folium.Marker(lj_center_coordinates, popup=lj_center_address).add_to(map_lj)

for square in grid:
    folium.Rectangle(square, weight=1).add_to(map_lj)

map_lj

## Data collection

In [10]:
# helper functions for pulling the data from Foursquare

radius = 250 # meters

# requires category_id, coordinates of sw corner, coordinates of ne corner
def search_venues_in_area(sw_lat, sw_lon, ne_lat, ne_lon, category_id, client_id, client_secret, version, limit):
    
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&intent=browse&sw={},{}&ne={},{}&categoryId={}&limit={}'.format(
        client_id, 
        client_secret, 
        version,
        sw_lat,
        sw_lon,
        ne_lat,
        ne_lon,
        category_id,    
        limit)
    
    try:
        r = requests.get(url)
        r.raise_for_status()
        return r.json()['response']['venues']
        
    except requests.exceptions.HTTPError as err:
        print(err.response.text)
        return None
    
    
def search_venues_in_radius(lat, lon, category_id, radius, client_id, client_secret, version, limit):
    
    # create the API request URL
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&intent=browse&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, 
        client_secret, 
        version,
        lat,
        lon,
        category_id,
        radius,
        limit)
    
    try:
        r = requests.get(url)
        r.raise_for_status()
        return r.json()['response']['groups'][0]['items'] 
        
    except requests.exceptions.HTTPError as err:
        print(err.response.text)
        return None

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']



In [11]:
# Get venues based from grid, limit by category ids

def get_venues_in_grid(categories, grid):
    frames = []
    print("Processing areas: ",end="")
    counter = 0
    for area in grid:
        
        for category_id in categories:
            time.sleep(1)
            
            venues_in_area = search_venues_in_area(area[0][0],
                                                   area[0][1],
                                                   area[2][0],
                                                   area[2][1],
                                                   category_id,
                                                   foursquare_client_id,
                                                   foursquare_client_secret,
                                                   foursquare_version,
                                                   foursquare_limit)
    
            # Check if something went wrong
            if venues_in_area == None:
                print("Search not working...")
                break
        
            # if nothing is found skip to the next area
            if len(venues_in_area) == 0:
                print(".",end="")
                continue
    
            # clean up and generate a panda frame, add to the frames list
            print("+",end="")
            venues_in_area = json_normalize(venues_in_area)
            filtered_columns = ['name', 'categories', 'location.lat', 'location.lng']
            venues_in_area = venues_in_area.loc[:, filtered_columns]
            venues_in_area['categories'] = venues_in_area.apply(get_category_type, axis=1)
            venues_in_area.columns = ['Name', 'Category', 'Latitude', 'Longitude']
            venues_in_area['Area'] = counter
            
            frames.append(venues_in_area)
            ###
        
        counter = counter + 1
        
    # combine results into one frame
    venues = pd.concat(frames, ignore_index=True)

    # remove duplicates (based on venue name) and reset the index
    venues.drop_duplicates(subset ="Name", keep = False, inplace = True)
    venues.reset_index(drop=True, inplace=True)
    
    # add a column for Distance to city center
    venues['Distance to center'] = venues.apply(lambda x: 
                                        calculate_distance(
                                            lj_center_lat, lj_center_lon, 
                                            x["Latitude"], x["Longitude"]), axis=1)
    
    return venues


### Accommodation locations

In [12]:
# Get all the accomodation locations in the grid
hotel_id = ["4bf58dd8d48988d1fa931735"]

hotels = get_venues_in_grid(hotel_id, grid)

Processing areas: ++...+++..+++++.+++++++..

In [16]:
print("The number of acoomodation locations in the grid is:", len(hotels))
hotels.tail(50)

The number of acoomodation locations in the grid is: 144


Unnamed: 0,Name,Category,Latitude,Longitude,Area,Distance to center
94,Bed And Breakfast Sincere 1830,Bed & Breakfast,46.052081,14.511585,12,431.380707
95,Holiday Inn Zuerich Airport,Hotel,46.052318,14.50322,12,244.496013
96,Djokic Apartments,Hotel,46.050192,14.505089,12,156.02855
97,Alibi M14 Hostel,Hostel,46.053416,14.506713,12,228.245805
98,Munda House,Hostel,46.047733,14.500629,12,588.080041
99,Apartment Under The Castle,Hostel,46.04901,14.511397,12,489.222452
100,Hotel Park,Hotel,46.053503,14.51457,13,695.97317
101,DIC hostel,Hostel,46.048119,14.518575,13,1032.678269
102,Hotel Meksiko,Hotel,46.05427,14.519736,13,1102.506571
103,Apartment Center 25,Vacation Rental,46.05534,14.514252,13,767.76158


### Bus stations/stops locations

In [17]:
# Get all the bus stations in the grid
bus_station_id = ["4bf58dd8d48988d1fe931735", "52f2ab2ebcbc57f1066b8b4f"]

bus_stations = get_venues_in_grid(bus_station_id, grid)


Processing areas: +.+.+.+++.+.+.+.+...+.+.+++.+...+.+++++.+.+.+++.+.

In [18]:
bus_stations.head()

Unnamed: 0,Name,Category,Latitude,Longitude,Area,Distance to center
0,Lpp Mestni Log,Bus Station,46.037838,14.482951,0,2341.832498
1,LPP postajališče Koprska,Bus Station,46.035342,14.484233,0,2460.142953
2,LPP postajališče Krimska,Bus Station,46.036959,14.490026,1,2031.352294
3,Lpp postajalisce Veliki Stradon,Bus Station,46.03719,14.504613,2,1584.426856
4,LPP postajališče Livada,Bus Station,46.034501,14.517548,3,2077.807156


### Food venues locations

In [19]:
# Get all the food venues in the grid

food_id = ["4d4b7105d754a06374d81259"]

food_venues = get_venues_in_grid(food_id, grid)



Processing areas: +++++++++.+++++++++++++++

In [20]:

food_venues.head()
food_venues.describe()

Unnamed: 0,Latitude,Longitude,Area,Distance to center
count,525.0,525.0,525.0,525.0
mean,46.054238,14.50616,13.56,1422.102558
std,0.010215,0.014248,6.205969,760.275537
min,46.029031,14.474364,0.0,78.579787
25%,46.04692,14.496592,10.0,741.165675
50%,46.053615,14.507094,13.0,1357.685572
75%,46.062863,14.515011,18.0,2045.486541
max,46.073877,14.53823,24.0,3500.635938


In [21]:

# Check numbers of each area
hotels_by_area = hotels.groupby('Area').count()
hotels_by_area

Unnamed: 0_level_0,Name,Category,Latitude,Longitude,Distance to center
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,1,1,1,1,1
1,2,2,2,2,2
5,2,2,2,2,2
6,7,7,7,7,7
7,23,23,23,23,23
10,6,6,6,6,6
11,9,9,9,9,9
12,50,50,50,50,50
13,11,11,11,11,11
14,1,1,1,1,1


In [20]:
bus_stations_by_area = bus_stations.groupby('Area').count()
bus_stations_by_area

Unnamed: 0_level_0,Name,Category,Latitude,Longitude,Distance to center
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,2,2,2,2,2
1,1,1,1,1,1
2,1,1,1,1,1
3,3,3,3,3,3
4,2,2,2,2,2
5,6,6,6,6,6
6,6,6,6,6,6
7,6,6,6,6,6
8,2,2,2,2,2
10,1,1,1,1,1


In [21]:
food_venues_by_area = food_venues.groupby('Area').count()
food_venues_by_area

Unnamed: 0_level_0,Name,Category,Latitude,Longitude,Distance to center
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,14,14,14,14,14
1,11,11,11,11,11
2,3,3,3,3,3
3,5,5,5,5,5
4,3,3,3,3,3
5,22,22,22,22,22
6,17,17,17,17,17
7,49,49,49,49,49
8,7,7,7,7,7
10,11,11,11,11,11


In [22]:
print("Total number of hotels:", len(hotels))
print("Total number of bus stations:", len(bus_stations))
print("Total number of food venues:", len(food_venues))

Total number of hotels: 144
Total number of bus stations: 119
Total number of food venues: 525


## Analysis

In [152]:
# Visualize accomodation venues, bus stations and food venues on a map

map_lj= folium.Map(location=lj_center_coordinates, zoom_start=13)
folium.Marker(lj_center_coordinates, popup=lj_center_address).add_to(map_lj)
folium.TileLayer('cartodbpositron').add_to(map_lj)
for square in grid:
    folium.Rectangle(square, weight=0.5).add_to(map_lj)
    
for lat, lon, name in zip(hotels['Latitude'], hotels['Longitude'], hotels['Name']):
   
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.3,
        parse_html=False).add_to(map_lj)  

    
for lat, lon, name in zip(food_venues['Latitude'], food_venues['Longitude'], food_venues['Name']):
   
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.3,
        parse_html=False).add_to(map_lj)  
    
for lat, lon, name in zip(bus_stations['Latitude'], bus_stations['Longitude'], bus_stations['Name']):
   
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        color='yellow',
        fill=True,
        fill_color='yellow',
        fill_opacity=0.3,
        parse_html=False).add_to(map_lj)  
map_lj

In [51]:
# Heatmap of hotel density 

from folium import plugins
from folium.plugins import HeatMap

hotels_coordinates = zip(hotels['Latitude'], hotels['Longitude'])

map_lj= folium.Map(location=lj_center_coordinates, zoom_start=13.6)
#folium.TileLayer('cartodbpositron').add_to(map_lj)
folium.Marker(lj_center_coordinates, popup=lj_center_address).add_to(map_lj)
HeatMap(hotels_coordinates, radius = 20).add_to(map_lj)

for lat, lon in zip(hotels['Latitude'], hotels['Longitude']):
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.3,
        parse_html=False).add_to(map_lj)  
    
for square in grid:
    folium.Rectangle(square, weight=0.7).add_to(map_lj)
    


map_lj

Here we see that the existing accommodation venues group mostly around the city center in the 500m radius. This is expected. There are also clusters of stretched along the main roads around and inbound to Ljubljana.

There are many pockets of possible locations. 

Now we have all the acommodation venues, bus stations and food venues in scope of few kilometers from hte tourist center of Ljubljana. 

## Analysis

In [None]:
walking_distance = 250 # meters

In [35]:
# For each hotel get the number of other hotels in walking distance and nearest hotel

def calculate_number_of_nearby_hotels(row):
    count = 0
    for lat, lon in zip(hotels['Latitude'], hotels['Longitude']):
        dist = calculate_distance(row["Latitude"], row["Longitude"], lat, lon)
        if dist > 0 and dist < walking_distance:
            count = count + 1
    return count
###          
    
def calculate_distance_to_nearest_hotel(row):
    min_distance = 100000
    for lat, lon in zip(hotels['Latitude'], hotels['Longitude']):
        dist = calculate_distance(row["Latitude"], row["Longitude"], lat, lon)
        
        if dist > 0 and dist < min_distance:
            min_distance = dist
    return min_distance
###

hotels['Number of nearby hotels'] = hotels.apply(lambda x: calculate_number_of_nearby_hotels(x), axis=1)
hotels['Distance to nearest hotel'] = hotels.apply(lambda x: calculate_distance_to_nearest_hotel(x), axis=1) 




In [112]:
hotels.head()

Unnamed: 0,Name,Category,Latitude,Longitude,Area,Distance to center,Number of nearby hotels,Nearest hotel,Distance to nearest hotel,Number of nearby bus stations,Distance to nearest bus station,Number of nearby food venues,Distance to nearest food venue
0,Swiss Diamond Hotel,Hotel,46.036586,14.475757,0,2868.589601,0,690.208078,690.208078,2,99.628467,2,109.168645
1,AHOTEL Hotel Ljubljana,Hotel,46.030272,14.491635,1,2602.324674,0,575.107919,575.107919,4,175.917993,1,220.112977
2,Murgle Luxury Apartments,Hotel,46.035213,14.489431,1,2214.516878,0,530.529562,530.529562,2,171.214743,5,111.930061
3,The Vault Hotel Ljubljana,Hotel,46.044758,14.476707,5,2391.698089,0,502.633889,502.633889,0,772.679326,0,453.010161
4,Sport Hotel,Hotel,46.038589,14.484196,5,2214.345151,0,429.568496,429.568496,0,443.903589,2,91.053225


In [37]:
# For each hotel get the number of bus stations in walking distance and nearest bus station

def calculate_number_of_nearby_bus_stations(row):
    count = 0
    for lat, lon in zip(bus_stations['Latitude'], hotels['Longitude']):
        dist = calculate_distance(row["Latitude"], row["Longitude"], lat, lon)
        if dist > 0 and dist < walking_distance:
            count = count + 1
    return count
###

def calculate_distance_to_nearest_bus_station(row):
    min_distance = 100000
    for lat, lon in zip(bus_stations['Latitude'], hotels['Longitude']):
        dist = calculate_distance(row["Latitude"], row["Longitude"], lat, lon)
        
        if dist > 0 and dist < min_distance:
            min_distance = dist
    return min_distance
###

hotels['Number of nearby bus stations'] = hotels.apply(lambda x: calculate_number_of_nearby_bus_stations(x), axis=1)          
hotels['Distance to nearest bus station'] = hotels.apply(lambda x: calculate_distance_to_nearest_bus_station(x), axis=1) 




In [38]:
hotels.head()

Unnamed: 0,Name,Category,Latitude,Longitude,Area,Distance to center,Number of nearby hotels,Nearest hotel,Distance to nearest hotel,Number of nearby bus stations,Distance to nearest bus station
0,Swiss Diamond Hotel,Hotel,46.036586,14.475757,0,2868.589601,0,690.208078,690.208078,2,99.628467
1,AHOTEL Hotel Ljubljana,Hotel,46.030272,14.491635,1,2602.324674,0,575.107919,575.107919,4,175.917993
2,Murgle Luxury Apartments,Hotel,46.035213,14.489431,1,2214.516878,0,530.529562,530.529562,2,171.214743
3,The Vault Hotel Ljubljana,Hotel,46.044758,14.476707,5,2391.698089,0,502.633889,502.633889,0,772.679326
4,Sport Hotel,Hotel,46.038589,14.484196,5,2214.345151,0,429.568496,429.568496,0,443.903589
5,Vila Teslova,Hostel,46.043716,14.493887,6,1274.286829,2,171.242334,171.242334,2,130.510568
6,Student 2011,Hostel,46.039986,14.489419,6,1810.478659,1,4.314432,4.314432,0,336.464163
7,SimbolHostel,Hostel,46.039995,14.489365,6,1812.738622,1,4.314432,4.314432,0,337.540977
8,Isabella rooms,Hotel,46.043865,14.491685,6,1395.310354,3,107.785414,107.785414,2,55.369751
9,Hotel Sest Pik - Hostel At Six Dots,Hostel,46.04476,14.492221,6,1303.512516,3,107.785414,107.785414,2,26.803157


In [39]:
# For each hotel get the number of food venues in walking distance and nearest food venue
def calculate_number_of_nearby_food_venues(row):
    count = 0
    for lat, lon in zip(food_venues['Latitude'], hotels['Longitude']):
        dist = calculate_distance(row["Latitude"], row["Longitude"], lat, lon)
        if dist > 0 and dist < walking_distance:
            count = count + 1
    return count
###         
    
def calculate_distance_to_nearest_food_venue(row):
    min_distance = 100000
    for lat, lon in zip(food_venues['Latitude'], hotels['Longitude']):
        dist = calculate_distance(row["Latitude"], row["Longitude"], lat, lon)
        
        if dist > 0 and dist < min_distance:
            min_distance = dist
    return min_distance
###

hotels['Number of nearby food venues'] = hotels.apply(lambda x: calculate_number_of_nearby_food_venues(x), axis=1) 
hotels['Distance to nearest food venue'] = hotels.apply(lambda x: calculate_distance_to_nearest_food_venue(x), axis=1) 



In [57]:
hotels.describe()

Unnamed: 0,Latitude,Longitude,Area,Distance to center,Number of nearby hotels,Nearest hotel,Distance to nearest hotel,Number of nearby bus stations,Distance to nearest bus station,Number of nearby food venues,Distance to nearest food venue
count,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0
mean,46.051843,14.503039,12.027778,1001.483114,6.847222,118.42538,118.42538,3.229167,223.085426,3.944444,402.201448
std,0.008258,0.010507,4.558776,743.197345,5.966479,137.868372,137.868372,2.871901,259.182267,6.940081,528.64926
min,46.030272,14.475757,0.0,104.745374,0.0,1.542595,1.542595,0.0,0.889218,0.0,6.70546
25%,46.046927,14.49591,10.0,464.668797,2.0,21.229032,21.229032,0.0,67.947508,0.0,104.054054
50%,46.051115,14.504764,12.0,718.695752,5.0,85.180597,85.180597,2.0,120.605529,1.0,208.215993
75%,46.055376,14.507487,13.0,1458.612621,11.0,172.729308,172.729308,6.0,267.314624,3.25,394.463049
max,46.07371,14.537657,22.0,2951.348727,21.0,690.208078,690.208078,9.0,1377.89533,28.0,2343.812025


In [58]:
print('Number of hotels without a bus station within the walkign distance', len(hotels[hotels['Number of nearby bus stations'] == 0]))
print('Number of hotels without a food venue within the walking distance', len(hotels[hotels['Number of nearby food venues'] == 0]))

Number of hotels without a bus station within the walkign distance 37
Number of hotels without a food venue within the walkign distance 56


We see that most hotels group together, average distance to nearest hotel is 118 meters. This makes sense especially in the city center. 
Good news is that most hotels have a bus station within the walking distance. However there are 37 hotels that do not have a bus station within the walking distance.

A hotel has on average around 4 food venues within a walking distance. More suprising is that 57 hotels do not have a food venue that is within the walking distance, and that the average distance to a food venue is about 400 meters. Max distance is about 2.3 km. 

In [110]:

#square_center = calculate_square_center(grid[0][0][0], grid[0][0][1], 50)
#subgrid = generate_grid(grid[0][0][0], grid[0][0][1], grid_unit, 50)
#subgrid = generate_grid(lj_center_lat, lj_center_lon, grid_size, grid_unit)
#print("Number of grid units (areas):", len(subgrid))

subgrid_unit = 100 # meters

# for each area create a fine grid that and calculate square centers. This will represent the potential hotel locations.
location_candidates = []
for area in grid:
    subgrid = generate_grid(area[0][0], area[0][1], grid_unit, subgrid_unit)
    for subarea in subgrid:
        subarea_center = calculate_square_center(subarea[0][0], subarea[0][1], subgrid_unit)
        location_candidates.append(subarea_center)


In [123]:
# create a Dataframe with location

#df_locations = pd.DataFrame({'Latitude':[],'Longitude':[]})
df_locations = pd.DataFrame(location_candidates, columns = ['Latitude', 'Longitude'])
#print(len(df_locations))
df_locations.head()

Unnamed: 0,Latitude,Longitude
0,46.029361,14.474452
1,46.029361,14.475743
2,46.029361,14.477035
3,46.029361,14.478326
4,46.029361,14.479618


In [124]:
df_locations['Number of nearby hotels'] = df_locations.apply(lambda x: calculate_number_of_nearby_hotels(x), axis=1)
df_locations['Distance to nearest hotel'] = df_locations.apply(lambda x: calculate_distance_to_nearest_hotel(x), axis=1) 



In [125]:
df_locations['Number of nearby bus stations'] = df_locations.apply(lambda x: calculate_number_of_nearby_bus_stations(x), axis=1)          
df_locations['Distance to nearest bus station'] = df_locations.apply(lambda x: calculate_distance_to_nearest_bus_station(x), axis=1) 



In [126]:

df_locations['Number of nearby food venues'] = df_locations.apply(lambda x: calculate_number_of_nearby_food_venues(x), axis=1) 
df_locations['Distance to nearest food venue'] = df_locations.apply(lambda x: calculate_distance_to_nearest_food_venue(x), axis=1) 




In [142]:
print(len(df_locations))
df_locations.head()

2500


Unnamed: 0,Latitude,Longitude,Number of nearby hotels,Distance to nearest hotel,Number of nearby bus stations,Distance to nearest bus station,Number of nearby food venues,Distance to nearest food venue
0,46.029361,14.474452,0,809.358304,0,887.582742,0,706.0351
1,46.029361,14.475743,0,803.029279,0,868.68484,0,615.782972
2,46.029361,14.477035,0,809.103362,0,796.077041,0,529.046813
3,46.029361,14.478326,0,827.307409,0,729.973822,0,447.874028
4,46.029361,14.479618,0,856.868669,0,672.296608,0,375.886307


In [149]:
# Filter out those location that do not have a bus station within a walking distance
# Filter out those locations that do not have a food venue within a walking distance
# Filter out those locations that have a hotel within 400m

loc1 = df_locations[df_locations['Distance to nearest bus station'] < 400]
loc2 = loc1[loc1['Distance to nearest food venue'] < 400]
loc3 = loc2[loc2['Distance to nearest hotel'] > 200]
loc3.reset_index(drop=True, inplace=True)

In [150]:

print(len(loc3))
loc3.head()

322


Unnamed: 0,Latitude,Longitude,Number of nearby hotels,Distance to nearest hotel,Number of nearby bus stations,Distance to nearest bus station,Number of nearby food venues,Distance to nearest food venue
0,46.029361,14.484784,0,539.951272,0,371.021369,0,322.607415
1,46.029361,14.486076,0,442.124251,0,277.005318,0,261.406755
2,46.030261,14.484785,0,530.358816,0,354.728863,1,239.42966
3,46.030261,14.486076,0,430.359975,0,254.775224,0,292.672916
4,46.03116,14.483493,0,638.029445,0,375.253262,1,99.389757


In [151]:
# Display the grid on the map of Ljubljana together with the center point
map_lj= folium.Map(location=lj_center_coordinates, zoom_start=13)
folium.Marker(lj_center_coordinates, popup=lj_center_address).add_to(map_lj)

for square in grid:
    folium.Rectangle(square, weight=1).add_to(map_lj)

for lat, lon in zip(loc3['Latitude'], loc3['Longitude']):
    folium.CircleMarker(
        [lat, lon],
        radius=1,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.3,
        parse_html=False).add_to(map_lj)  
    
map_lj

In [157]:
from sklearn.cluster import KMeans

number_of_clusters = 10

good_coordinates = loc3[['Latitude', 'Longitude']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_coordinates)

cluster_centers = kmeans.cluster_centers_
print(cluster_centers)



[[46.04366015 14.49804506]
 [46.05461252 14.4888708 ]
 [46.04204448 14.5131632 ]
 [46.03289958 14.48512938]
 [46.05248926 14.49474798]
 [46.03379443 14.49444134]
 [46.03768157 14.50584004]
 [46.03736585 14.47809514]
 [46.05445181 14.48252806]
 [46.04089975 14.48950411]]


In [160]:
# Display the grid on the map of Ljubljana together with the center point
map_lj= folium.Map(location=lj_center_coordinates, zoom_start=13)
folium.Marker(lj_center_coordinates, popup=lj_center_address).add_to(map_lj)

for square in grid:
    folium.Rectangle(square, weight=1).add_to(map_lj)

for lat, lon in zip(loc3['Latitude'], loc3['Longitude']):
    folium.CircleMarker(
        [lat, lon],
        radius=1,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.3,
        parse_html=False).add_to(map_lj)  

    
    

for lat, lon in cluster_centers:
    folium.Circle([lat, lon], 
                  radius=400,
                  color='green', 
                  fill=True, 
                  fill_opacity=0.25).add_to(map_lj) 
    
map_lj