### Finding the best location for a new vegetarian restaurant in city of Helsinki, Finland

#### Introduction/Business Problem

In recent years there has been huge grow in popularity of vegetarian diet in Finland and especially in the capital, Helsinki. Many new vegetarian restaurants have been launched during 2010s in addition to punch of veg restaurants that has been around for years. Coming to 2020s the popularity for healthy and environmental friendly restaurant options is growing faster than ever before. Due to this I have choose to find out what location would be the best location for new vegetarian restaurant in Helsinki.

To solve the problem we can easily utilize Foursquare location data of Helsinki city areas neighborhoods and cluster the given venues into desirable segments. Our target audience is any restaurant business owner who is about to launch a new vegetarian eatery in the near future in Helsinki. We will be provide an analysis of the current locations of vegetarian restaurant and answer the question where the competition is favorable for new business.

To answer the main question we will have to find out where the most of Helsinkis current vegetarian restaurants are located. As the city is rather small we will avoid the most popular area given that the competition is there already too high. Instead we will utilize the base information of the citys structure. We know that the "heart" of Helsinki is around the main railway station and we don't want to go too far from that point when creating new business. We will find the optimum location for a new restaurant near that area keeping mind that we should avoid the most competitive area.

We will also consider as a optimum location a place that has many restaurants and other venues but does not have yet any vegetarian ones. In addition there has been recently huge debate about too many new shopping centers in Helsinki area. Causing the situation where the restaurant and other business owners are complaining that there is not anymore enough customers for their business. This goes especially to the new shopping centers. So we will utilize that information when choosing the best location. Even so we will carefully study also those locations and their surroundings.

I don't have any background information about running restaurant business so it is presumable that as the project proceeds there will be some additional information I will utilize finding the best location. 

#### Data

In the project I will mainly utilize location information that can be found from Wikipedia and Foursquare. From Wikipedia I will get the info of subdivision of Helsinki and with that I will create dataframe including all the neigborhoods and their subareas called quarters. In addition I will combine to the dataframe the latitude and longitude information of each location. That I will achieve by utilizing Geopy. 

From Foursquare I will request the venues of all the locations I have in the  dataframe. After having the venues listed I will start analysing the information by visualizing the venues over a map and creating new dataframes including the favorable features(many venues densely etc.). The outcome will be my data for analysing the current restaurant locations in Helsinki and that I will cluster into veg- and non-veg restaurants and after that I will see if there is need to further clustering for veg restaurants only.

In addition we will determine the main railway station (latitude: 60.1698, longitude: 24.9382) to be the "heart" of Helsinki and because of that it also to be the base for our study.


#### First we import the libraries we need in creating Helsinki locations dataframe.

In [0]:
import requests
import pandas as pd
#!pip install lxml
import requests
import pandas as pd 
import numpy as np 
import random 
#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 
from IPython.display import Image 
from IPython.core.display import HTML 
from pandas.io.json import json_normalize
#!conda install -c conda-forge folium=0.5.0 --yes
import folium 


#### Here we get the table with Pandas and create the first version of the dataframe. We see that it contains also Swedish names of locations. Those we don't need so we drop that column.

In [0]:
url = "https://fi.wikipedia.org/wiki/Helsingin_alueellinen_jako"
tables = pd.read_html(url)
hel_df = tables[1]

print(hel_df.columns)

hel_df.columns = hel_df.columns.droplevel()

hel_df = hel_df.drop(columns=["Ruotsiksi"])

hel_df.head()

MultiIndex([('Kaupunginosajako', 'Kaupunginosa Osa-alue'),
            ('Kaupunginosajako',             'Ruotsiksi')],
           )


Unnamed: 0,Kaupunginosa Osa-alue
0,01 Kruununhaka
1,02 Kluuvi
2,03 Kaartinkaupunki
3,04 Kamppi
4,05 Punavuori


#### Now we notice that there are digits in each location. Lets remove them. We also translate the columns name.

In [0]:
hel_df['Kaupunginosa Osa-alue'] = hel_df['Kaupunginosa Osa-alue'].str.replace('\d+', '')
hel_df.rename(columns={"Kaupunginosa Osa-alue":"Neighborhood"}, inplace=True)
hel_df.head(5)

Unnamed: 0,Neighborhood
0,Kruununhaka
1,Kluuvi
2,Kaartinkaupunki
3,Kamppi
4,Punavuori


#### After inspecting locations. We decide that we need only locations that has no subareas. From location that has subareas we need only the subareas. So we create a loop that fix the dataframe for us. In addition we have a few location that have multible parts in their names. For those we create exception rule in the loop so that they will be correct. And in the end we also remove some dublicates and empty spaces from location names.

In [0]:
for n in range(3):
    df_len = hel_df.shape[0]
    for u in range(df_len):
        list = hel_df.loc[u].tolist()
        list = str(list[0])
        list = list.split()
        if len(list) > 1:
            df_len += len(list)
            hel_df = hel_df.drop([u])
            list = list[1:]
            for u in list:
                if u == "Vanha" or u == "yritysalue" or u == "Malmin" or \
                u == "lentokenttä" or u == "Nordsjön" or u == "kartano" \
                or u == "Viikin" or u == "tiedepuisto":
                    continue
                if u == "Herttoniemen":
                    u = "Herttoniemen yritysalue"
                    hel_df = hel_df.append({"Neighborhood" : u} , ignore_index=True)
                if u == "Pikku":
                    u = "Pikku Huopalahti"
                    hel_df = hel_df.append({"Neighborhood" : u} , ignore_index=True)
                else:
                    hel_df = hel_df.append({"Neighborhood" : u} , ignore_index=True)       
print(hel_df.shape)
hel_df.drop_duplicates(subset ="Neighborhood", keep = False, inplace = True) 
print(hel_df.shape)

hel_df['Neighborhood'] = hel_df['Neighborhood'].str.replace(' ', '')
hel_df = hel_df.reset_index(drop=True)
hel_df.head()

(145, 1)
(143, 1)


Unnamed: 0,Neighborhood
0,Kruununhaka
1,Kluuvi
2,Kaartinkaupunki
3,Kamppi
4,Punavuori


#### Now we have the clean dataframe of the locations. Lets find out the latitudes and longitudes for them. First we request the latitude and longitude information with Geopys Nominatim. Then we add the given values for the existing dataframe of location.

In [0]:
df_len = int(hel_df.shape[0])
list_latlong = []
for u in range(df_len):
    address = hel_df.loc[u]
    address = str(address[0])
    geolocator = Nominatim(user_agent="foursquare_agent")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    loc_list = [latitude,longitude]
    list_latlong.append(loc_list)
    
print(list_latlong)

[[60.1728702, 24.9547326], [60.1707783, 24.9473293], [60.1652138, 24.9472225], [60.1685348, 24.9304942], [60.1612371, 24.9365046], [60.1561911, 24.9383747], [60.1587146, 24.949404], [60.1669752, 24.9681511], [60.1564647, 24.95526209714076], [60.1740311, 24.9223027], [60.1853441, 24.9167329], [60.1913479, 24.9026643], [60.1940859, 24.9191674], [60.1768926, 24.990193792940012], [60.1961671, 24.9567103], [60.2040009, 24.9581075], [60.2156842, 24.9527864], [60.2187652, 24.9682441], [60.2141157, 24.9791849], [60.2385504, 24.8460646], [60.245247, 24.9896936], [60.1853879, 25.0096847], [60.1914529, 25.0592919], [60.1842356, 25.0769141], [60.1580759, 25.110848], [60.1524761, 25.049372], [60.1457064, 24.9888603], [60.25960365, 25.187008478540385], [60.244024, 25.1574065], [60.2436241, 25.2015583], [60.2510434, 25.2234058], [59.999292800000006, 25.02017736950416], [60.185969, 24.9640802], [60.187506, 24.9766836], [60.180063, 24.9761179], [60.164429, 24.8379575], [60.1883906, 24.9527031], [60.191

In [0]:
lat_list =[]
long_list = []
for u in list_latlong:
    lat_list.append(u[0])
    long_list.append(u[1])

hel_df['Latitude'] = lat_list
hel_df['Longitude'] = long_list

hel_df.head()
    

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Kruununhaka,60.17287,24.954733
1,Kluuvi,60.170778,24.947329
2,Kaartinkaupunki,60.165214,24.947222
3,Kamppi,60.168535,24.930494
4,Punavuori,60.161237,24.936505


#### Now we request the location data from Foursquare. To see everything works we first request data just for one location. We limit the results to 150 and in radius of 500 meters.

In [0]:
CLIENT_ID = 'BUASIKWFCBKCGMSVKHK3R1CXNPWGT5MDWSLLUDGRIC1QWT4C'
CLIENT_SECRET = 'NWCYVWT2GNVQR0CUELCSS2BXXQPPTFNLUSLRGNOYFJFM2JG4' 
VERSION = '20180605' 

hel_df.loc[3, 'Neighborhood']

neighborhood_latitude = hel_df.loc[3, 'Latitude']
neighborhood_longitude = hel_df.loc[3, 'Longitude'] 

neighborhood_name = hel_df.loc[3, 'Neighborhood']

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))


Latitude and longitude values of Kamppi are 60.1685348, 24.9304942.


In [0]:
LIMIT = 150
radius = 500

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

results = requests.get(url).json()

In [0]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

nearby_venues.head()
       

85 venues were returned by Foursquare.


  


Unnamed: 0,name,categories,lat,lng
0,Cafe Rouge,Middle Eastern Restaurant,60.168711,24.933027
1,Kaffecentralen,Coffee Shop,60.16758,24.932526
2,Baana,Road,60.169973,24.928837
3,Pobre,Filipino Restaurant,60.1695,24.933484
4,The Ounce,Tea Room,60.167182,24.932993


#### As everything seems to work next we will request data for all the locations.

In [0]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Here we are utilizing getNearbyVenues -function that was provided in the IBMs Applied Data Science course materials.

In [0]:
helsinki_venues = getNearbyVenues(names=hel_df['Neighborhood'],
                                   latitudes=hel_df['Latitude'],
                                   longitudes=hel_df['Longitude']
                                  )
helsinki_venues.head()

Kruununhaka
Kluuvi
Kaartinkaupunki
Kamppi
Punavuori
Eira
Ullanlinna
Katajanokka
Kaivopuisto
Etu-Töölö
Taka-Töölö
Meilahti
Laakso
Mustikkamaa–Korkeasaari
Vallila
Kumpula
Käpylä
Koskela
Vanhakaupunki
Konala
Pukinmäki
Kulosaari
Tammisalo
Vartiosaari
Villinki
Santahamina
Suomenlinna
Östersundom
Salmenkallio
Talosaari
Karhusaari
Aluemeri
Vilhonvuori
Kalasatama
Sompasaari
Hanasaari
Harju
Alppila
Ruskeasuo
Ruoholahti
Lapinlahti
Jätkäsaari
Hernesaari
Toukola
Arabianranta
Pirkkola
Maunula
Metsälä
Patola
Veräjämäki
Maunulanpuisto
Veräjälaakso
Munkkiniemi
Kuusisaari
Lehtisaari
Munkkivuori
Niemenmäki
Talinranta
Kannelmäki
Maununneva
Malminkartano
Hakuninmaa
Kuninkaantammi
Honkasuo
Paloheinä
Torpparinmäki
Tuomarinkartano
Haltiala
Ylä-Malmi
Ala-Malmi
Pihlajamäki
Tattariharju
Pihlajisto
Siltamäki
Tapulikaupunki
Töyrynummi
Länsi-Herttoniemi
Roihuvuori
Herttoniemenranta
Vartioharju
Puotila
Puotinharju
Myllypuro
Marjaniemi
Roihupelto
Itäkeskus
Kontula
Vesala
Mellunmäki
Kivikko
Kurkimäki
Yliskylä
Jollas


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Kruununhaka,60.17287,24.954733,Cafe LOV,60.171284,24.956623,Café
1,Kruununhaka,60.17287,24.954733,Papu Cafe,60.17304,24.956453,Café
2,Kruununhaka,60.17287,24.954733,Anton & Anton,60.172348,24.956458,Organic Grocery
3,Kruununhaka,60.17287,24.954733,Korea House,60.17291,24.956436,Korean Restaurant
4,Kruununhaka,60.17287,24.954733,Coconut Street,60.173976,24.956452,Vietnamese Restaurant


#### To prevent situation where we run out of our daily requests from Foursquare we save the results to a csv -file. Then we use groupbys count-funtion to see how many venues there are in each loacation.

In [0]:
###helsinki_venues.to_csv("hel_venues_foursquare.csv")
helsinki_venues = pd.read_csv("hel_venues_foursquare.csv")
hel_venues_all = helsinki_venues
helsinki_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Unnamed: 0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Ala-Malmi,19,19,19,19,19,19,19
Alppikylä,15,15,15,15,15,15,15
Alppila,19,19,19,19,19,19,19
Arabianranta,16,16,16,16,16,16,16
Aurinkolahti,19,19,19,19,19,19,19
...,...,...,...,...,...,...,...
Viikinranta,16,16,16,16,16,16,16
Vilhonvuori,35,35,35,35,35,35,35
Villinki,4,4,4,4,4,4,4
Yliskylä,3,3,3,3,3,3,3


#### Here we use one hot encoding to get an idea what kind of categories of venues we have in our locations.

In [0]:
helsinki_onehot = pd.get_dummies(helsinki_venues[['Venue Category']], prefix="", prefix_sep="")

helsinki_onehot['Neighborhood'] = helsinki_venues['Neighborhood']

#fixed_columns = [helsinki_onehot.columns[-1]] + list(helsinki_onehot.columns[:-1])
#helsinki_onehot = helsinki_onehot[fixed_columns]

helsinki_onehot.head()

Unnamed: 0,Accessories Store,African Restaurant,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bar,Basketball Court,Bay,Beach,Beach Bar,Beer Bar,Beer Garden,Beer Store,Bistro,Blini House,Board Shop,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Buffet,Burger Joint,Burrito Place,Bus Line,...,Sports Club,Sri Lankan Restaurant,Stables,Stationery Store,Steakhouse,Summer Camp,Supermarket,Sushi Restaurant,Taco Place,Taxi Stand,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Track Stadium,Trail,Train Station,Tram Station,Tunnel,Turkish Restaurant,Used Bookstore,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0


#### Now we can start to analyse the data more deeply. First we want to see what is the "hot spot" of venues in Helsinki.

In [0]:
helsinki_venues = helsinki_venues.groupby('Neighborhood').count()

maxValueIndexObj = helsinki_venues.idxmax()
 
print("The area with the highest value of venues:")
print(maxValueIndexObj[0])


The area with the highest value of venues:
Punavuori


In [0]:
print("The number of venues in Punavuori is:")
print(helsinki_venues.loc["Punavuori"][0])
#print(helsinki_venues)

helsinki_venues_org = helsinki_venues

The number of venues in Punavuori is:
100


#### Now we know that the "hot spot" of venues is Punavuori. Now we will have a look what kind of venues there are. We get all the result to be shown by chaging Pandas show option for row, but we leave only the short version of dataframe print here for our convinience. 

In [0]:
hel_venues_all = hel_venues_all.set_index(hel_venues_all["Neighborhood"])
print(hel_venues_all.loc["Punavuori"])

              Unnamed: 0 Neighborhood  ...  Venue Longitude           Venue Category
Neighborhood                           ...                                          
Punavuori            266    Punavuori  ...        24.936966                   Bakery
Punavuori            267    Punavuori  ...        24.933676                     Park
Punavuori            268    Punavuori  ...        24.937480              Coffee Shop
Punavuori            269    Punavuori  ...        24.936152              Yoga Studio
Punavuori            270    Punavuori  ...        24.937536                 Beer Bar
...                  ...          ...  ...              ...                      ...
Punavuori            361    Punavuori  ...        24.932382        Indian Restaurant
Punavuori            362    Punavuori  ...        24.933330  Scandinavian Restaurant
Punavuori            363    Punavuori  ...        24.939076     Caucasian Restaurant
Punavuori            364    Punavuori  ...        24.929216      

#### The data reveals that there are a lot of rastaurants in Punavuori, but not vegeterian ones which can we consideres good news for us. Keeping that in mind we continue analyse. Next we will see what is the frequency of given venues in each location. We are looking for a location where there is high density of veg restaurants among top 5 venues.

In [0]:
helsinki_grouped = helsinki_onehot.groupby('Neighborhood').mean().reset_index()


In [0]:
num_top_venues = 5

for hood in helsinki_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = helsinki_grouped[helsinki_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Ala-Malmi----
                  venue  freq
0  Gym / Fitness Center  0.11
1  Himalayan Restaurant  0.11
2           Flower Shop  0.05
3  Fast Food Restaurant  0.05
4              Pharmacy  0.05


----Alppikylä----
         venue  freq
0  Supermarket  0.27
1     Bus Stop  0.27
2     Pharmacy  0.07
3        Hotel  0.07
4        Plaza  0.07


----Alppila----
                          venue  freq
0  Theme Park Ride / Attraction  0.32
1                          Park  0.11
2                           Gym  0.05
3                    Theme Park  0.05
4                 Track Stadium  0.05


----Arabianranta----
                    venue  freq
0            Tram Station  0.12
1                    Park  0.12
2  Furniture / Home Store  0.12
3           Shopping Mall  0.06
4       Recreation Center  0.06


----Aurinkolahti----
             venue  freq
0            Beach  0.11
1  Harbor / Marina  0.11
2    Grocery Store  0.11
3             Café  0.05
4         Beer Bar  0.05


----Eira----
       

#### We found out that Kaartinkaupunki and Munkkiniemi, has vegeratian restaurants among their top5 venues. With that information we can decide that Kaartinkaupunki and Munkkiniemi has already too much of competition. Next we will remove all the other venues but vegetarian / vegan restaurant and see the frequency for them only.

In [0]:
helsinki_onehot2 = helsinki_onehot.drop(helsinki_onehot.columns.difference(['Neighborhood','Vegetarian / Vegan Restaurant']), 1)
helsinki_grouped2 = helsinki_onehot2.groupby('Neighborhood').mean().reset_index()
for hood in helsinki_grouped2['Neighborhood']:
    print("----"+hood+"----")
    temp = helsinki_grouped2[helsinki_grouped2['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')


----Ala-Malmi----
                           venue  freq
0  Vegetarian / Vegan Restaurant  0.05


----Alppikylä----
                           venue  freq
0  Vegetarian / Vegan Restaurant   0.0


----Alppila----
                           venue  freq
0  Vegetarian / Vegan Restaurant   0.0


----Arabianranta----
                           venue  freq
0  Vegetarian / Vegan Restaurant   0.0


----Aurinkolahti----
                           venue  freq
0  Vegetarian / Vegan Restaurant   0.0


----Eira----
                           venue  freq
0  Vegetarian / Vegan Restaurant   0.0


----Etelä-Haaga----
                           venue  freq
0  Vegetarian / Vegan Restaurant   0.0


----Etu-Töölö----
                           venue  freq
0  Vegetarian / Vegan Restaurant   0.0


----Hakuninmaa----
                           venue  freq
0  Vegetarian / Vegan Restaurant   0.0


----Haltiala----
                           venue  freq
0  Vegetarian / Vegan Restaurant   0.0


----Hanasaari----
 

#### By inspecting the frequesies we will find out that only a few places has vegetarian restaurants in Foursquares database. This is possibly because of unpopularity of the service in Finland where most people use Google nowadays. Anyway as we are tide to Foursquare in this project we will continue. The good news in the data is that it reveals some facts that are well known in Helsinki area: by looking the data we can see that vegetarian restaurant "hot spots" in Helsinki are Harju, Torkkelinmäki and Linjat, these areas being the well known for veg options. In addition data reveals three not so well known locations: Kaartinkaupunki, Ala-Malmi, Itä-Pasila and Munkkiniemi. Now we will check how many venues each of these location has, so we can see if any of them is our intrest for the best location.

In [0]:
print("Venues by location:")
print("Harju:", helsinki_venues_org.loc["Harju"][0])
print("Torkkelinmäki:", helsinki_venues_org.loc["Torkkelinmäki"][0])
print("Linjat:", helsinki_venues_org.loc["Linjat"][0])
print("Kaartinkaupunki:", helsinki_venues_org.loc["Kaartinkaupunki"][0])
print("Ala-Malmi:", helsinki_venues_org.loc["Ala-Malmi"][0])
print("Itä-Pasila:", helsinki_venues_org.loc["Itä-Pasila"][0])
print("Munkkiniemi:", helsinki_venues_org.loc["Munkkiniemi"][0])

Venues by location:
Harju: 59
Torkkelinmäki: 88
Linjat: 69
Kaartinkaupunki: 56
Ala-Malmi: 19
Itä-Pasila: 25
Munkkiniemi: 17


#### As we remember Helsinki venue hot spot is the area of Punavuori that has 100 venues. Compared to that only Torkkelinmäki and Linjat (of high frequensy veg restaurant locations) stands up with venues of 88 and 69. We will keep those locations in mind for further analyse. Next we will see what kind of venues each location has, cluster and visualize them.


In [0]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']


columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))


neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = helsinki_grouped['Neighborhood']

for ind in np.arange(helsinki_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(helsinki_grouped.iloc[ind, :], num_top_venues)

pd.set_option('display.max_rows', 500)
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ala-Malmi,Gym / Fitness Center,Himalayan Restaurant,Coffee Shop,Restaurant,Fast Food Restaurant,Beer Bar,Liquor Store,Basketball Court,Pharmacy,Thai Restaurant
1,Alppikylä,Bus Stop,Supermarket,Plaza,Convenience Store,Pharmacy,Grocery Store,Hotel,Shopping Mall,Karaoke Bar,Football Stadium
2,Alppila,Theme Park Ride / Attraction,Park,Pub,Sushi Restaurant,Track Stadium,Trail,Gym,Bar,Grocery Store,History Museum
3,Arabianranta,Tram Station,Furniture / Home Store,Park,Arts & Crafts Store,Himalayan Restaurant,Pizza Place,Plaza,Café,Art Museum,Art Gallery
4,Aurinkolahti,Beach,Harbor / Marina,Grocery Store,Beer Bar,Park,Gym / Fitness Center,Restaurant,Bus Stop,Sri Lankan Restaurant,Playground


#### Having sorted the venue information, we will check once again how does the venue hot spot, Punavuori looks like.

In [0]:
neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['Neighborhood'] == "Punavuori"]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
83,Punavuori,Scandinavian Restaurant,Beer Bar,Italian Restaurant,Pizza Place,Bakery,Park,Sushi Restaurant,Coffee Shop,Restaurant,Bar


#### Here we can see that Punavuori that has the biggest amout of venues (100) in Helsinki has a very favourable top10 list of venues when looking for an area where people go to eat and drink. From here we continue by clustering the locations in Helsinki so we can see is there any big differences between the locations near our base location, main railway station.

In [0]:
from sklearn.cluster import KMeans

kclusters = 5
helsinki_grouped_clustering = helsinki_grouped.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(helsinki_grouped_clustering)

kmeans.labels_[0:10] 


neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

helsinki_merged = hel_df
helsinki_merged = helsinki_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
helsinki_merged = helsinki_merged.dropna()
helsinki_merged[("Cluster Labels1")] = helsinki_merged[("Cluster Labels")].astype(int)
helsinki_merged = helsinki_merged.drop(columns="Cluster Labels")
helsinki_merged = helsinki_merged.rename(columns={"Cluster Labels1": "Cluster Labels"})
helsinki_merged.head(50
                     
                     ) 

Unnamed: 0,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,Kruununhaka,60.17287,24.954733,Boat or Ferry,Grocery Store,Theater,Café,History Museum,Coffee Shop,Bar,Park,Indie Movie Theater,Vietnamese Restaurant,0
1,Kluuvi,60.170778,24.947329,Coffee Shop,Plaza,Café,Park,Pizza Place,Clothing Store,Burger Joint,Theater,Bistro,Chinese Restaurant,0
2,Kaartinkaupunki,60.165214,24.947222,Hotel,Park,Pizza Place,Scandinavian Restaurant,Café,Falafel Restaurant,French Restaurant,Coffee Shop,Bistro,Plaza,0
3,Kamppi,60.168535,24.930494,Beer Bar,Wine Bar,Gym / Fitness Center,Bar,Scandinavian Restaurant,Art Museum,Sushi Restaurant,Mexican Restaurant,Food Court,Restaurant,0
4,Punavuori,60.161237,24.936505,Scandinavian Restaurant,Beer Bar,Italian Restaurant,Pizza Place,Bakery,Park,Sushi Restaurant,Coffee Shop,Restaurant,Bar,0
5,Eira,60.156191,24.938375,Park,Ice Cream Shop,Italian Restaurant,Café,Boat or Ferry,Bakery,French Restaurant,Waterfront,Playground,Scandinavian Restaurant,0
6,Ullanlinna,60.158715,24.949404,Park,Grocery Store,Pizza Place,Scandinavian Restaurant,French Restaurant,Coffee Shop,Café,Cocktail Bar,Ice Cream Shop,Chinese Restaurant,0
7,Katajanokka,60.166975,24.968151,Park,Scandinavian Restaurant,Hotel,Boat or Ferry,Restaurant,Bar,Plaza,Himalayan Restaurant,Theme Park Ride / Attraction,Gym / Fitness Center,0
8,Kaivopuisto,60.156465,24.955262,Grocery Store,Park,Ice Cream Shop,Playground,Coffee Shop,Dessert Shop,Pharmacy,Monument / Landmark,Café,Nightclub,0
9,Etu-Töölö,60.174031,24.922303,Scandinavian Restaurant,Plaza,Park,Restaurant,Coffee Shop,Gym,Sushi Restaurant,Supermarket,Tennis Court,Gym / Fitness Center,0


#### After clustering we will visualize the clusters over Helsinkis map, so we will have idea how they are splited.

In [0]:
import matplotlib.cm as cm
import matplotlib.colors as colors

latitude = 60.169332656
longitude = 24.939746241

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(helsinki_merged['Latitude'], helsinki_merged['Longitude'], helsinki_merged['Neighborhood'], helsinki_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

CONCLUSION
#### After clustering and visualizing the venues data we can find that around our base location, main railway station, the locations in matter of venues are very homogeneous. There are variety in the location that are further, among those, we can find locations that we found earlier to have high density of veg restaurants. As they are too far from our base location we wont be analyse them more. The well known "vegetarian hot spots", which our analyse also confirmed, fell into same cluster as all the location around main railway station. We already analysed those locations and find out that most of them have not enough venues to be desireble location for us. We found out that two location, Torkkelinmäki and Linjat has high venues value but as they are alreydy among the location that has high frequency of veg restaurant we wont consider them to be optimium location for us. After these findings we can conclude our study and state that propability is high for one certain location to be the best area for a new vegetarian restaurant. That is Punavuori which had the highest frequensy of venues and which is close by(in radius of 500 meters) railway station. According to Foursquare database there is not any vegetarian restaurant yet in Punavuori area, but still the area is very popular for restaurants and other venues as our analyse proved.