# **Capstone Project - Benefit Zones for Indian Restaurants**
## **Prakirth Govardhanam**
## **Applied Data Science Capstone by IBM/Coursera**

## Introduction/Business-Problem
In this project, I try to find possible-beneficial locations within the Neighborhoods (Districts) of Helsinki, Finland, for establishing a chain of Indian Restaurants. The conditions to fulfill in order are:

* CONDITION 1 - Distance from Popularity Centre (Assumption) in the Neighborhood (District) - for popularity
* CONDITION 2 - Absence of other Indian restaurants in the Neighborhood - to limit competition


## Data
Data sources used to determine the Neighborhoods within the city of Helsinki are provided by:

* Wikipedia_(https://en.wikipedia.org/wiki/Names_of_places_in_Finland_in_Finnish_and_in_Swedish#Municipalities)_ - for listing the Neighborhoods (Districts) of Helsinki
* The City of Helsinki(https://kartta.hel.fi/avoindata) - for geospatial Data
* Foursquare API - for popular venues, restaurants and their respective geospatial data

## Project Assumption
**_Popularity Centre_** = the centroid of the Top Venues from the Top10 Venue Categories(by frequency of occurence) in each District will be considered as the "Popularity Centre" within every District
* **Clarification:** Top-10 Venues were ideally planned to be **_filtered by Ratings of Venues_**. Unfortunately, I have a Sandbox account & Ratings of Venues at the scale I need would be possible only with Premium accounts

___

# PART 1 - Data Preparation

## PART 1.1 - Data Extraction

### Import necessary libraries

In [2]:
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup
import re
import geocoder
from geopy.geocoders import Nominatim

### Clarification:
* Names of anything in Finland has its name in 2 languages, Finnish & Swedish
* Hence, names of Districts are also in same pattern: Finnish-name (Swedish-name)
### ***Assumption #1***
* In the current source of District labels, **"Swedish-names of Finland _could be_ confused with Swedish-names of Sweden" in the FourSquare API**.
* Hence, we will extract and work only with Finnish-names of the Neighbourhoods & Districts


In [3]:
#url with Helsinki District names
url = 'https://en.wikipedia.org/wiki/Names_of_places_in_Finland_in_Finnish_and_in_Swedish#Municipalities'

#parsing the webpage for html content
html = requests.get(url).text
soup = BeautifulSoup(html, features='html.parser')

#extract <a href> tags
atags = soup.select('a[href]')

#extract titles of <a href> tags
titles = []
for atag in atags:
    titles.append(atag.get('title'))

#slice the labels of Helsinki Districts
districts = titles[titles.index('Ala-Malmi'): titles.index('Ylä-Malmi')+1]
print(f"Total Districts listed: {len(districts)}")

Total Districts listed: 110


In [4]:
#extract coordinates from District/Neighborhood names using geopy.geocoders.Nominatim
geolocator = Nominatim(user_agent='Helsinki_districts')

#empty lists for latitude & longitude values and None values, if any
lats = []
longs = []

#looping through district names for coordinates
for name in districts:
    location = geolocator.geocode(name)
    try:
        lats.append(location.latitude)
        longs.append(location.longitude)
    except AttributeError:
        pass

In [5]:
print(f"Total values identified \n(Latitude, Longitude): {len(lats), len(longs)}")

Total values identified 
(Latitude, Longitude): (109, 109)


## PART 1.2 - Investigating Data for errors

### 1.2.1 - Districts with ***None*** values for coordinates

In [6]:
# Investigating None value in districts list, if Any
trial = []
for name in districts:
    location = geolocator.geocode(name)
    try:
        trial.append(location.latitude)
    except AttributeError as err:
        print('None value detected!')
        raise

None value detected!


AttributeError: 'NoneType' object has no attribute 'latitude'

In [7]:
#Identify District with NoneType coordinate
print(f"District with NoneType coordinate:\n{districts[len(trial)]}")

District with NoneType coordinate:
Kampinmalmi


In [8]:
#Direct verification 
geolocator.geocode('Kampinmalmi').latitude

AttributeError: 'NoneType' object has no attribute 'latitude'

### 1.2.2 - Districts with Improper coordinates (Detected *Manually*)

In [9]:
wrong_coords = ['Pasila','Töölö']
for name in wrong_coords:
    print(f"Locations as identified by geopy.geocoders API for {name}:\n{geolocator.geocode(name)}\n")

Locations as identified by geopy.geocoders API for Pasila:
Brasil

Locations as identified by geopy.geocoders API for Töölö:
Toolo, Loroum, Nord, Burkina Faso



In [10]:
#Districts, Latitudes & Longitudes with NoneType & Improper coordinates - to be removed from Lists

print(f"BEFORE Cleaning:\nTotal Districts:{len(districts)}\nTotal Latitude values:{len(lats)}\nTotal Longitude values:{len(longs)}")

loc_to_pop = ['Pasila','Töölö','Kampinmalmi']
lat_to_pop = [-10.3333333, 13.744717]
long_to_pop = [-53.2, -1.9645989]

#Remove districts without coordinates and with improper coordinates
for loc in loc_to_pop:
    districts.remove(loc)

#Remove improper coordinates    
for lat, long in zip(lat_to_pop, long_to_pop):
    lats.remove(lats[lats.index(lat)])
    longs.remove(longs[longs.index(long)])
    
print(f"\nAFTER Cleaning:\nTotal Districts:{len(districts)}\nTotal Latitude values:{len(lats)}\nTotal Longitude values:{len(longs)}")

BEFORE Cleaning:
Total Districts:110
Total Latitude values:109
Total Longitude values:109

AFTER Cleaning:
Total Districts:107
Total Latitude values:107
Total Longitude values:107


In [11]:
#Frame all extracted values in a Dataframe
districts_df = pd.DataFrame(data= zip(districts, lats, longs), columns=['District', 'Latitude', 'Longitude'])
districts_df.head()

Unnamed: 0,District,Latitude,Longitude
0,Ala-Malmi,60.249474,25.014539
1,Alppiharju,60.189728,24.94412
2,Aurinkolahti,60.201507,25.155669
3,Eira,60.156191,24.938375
4,Etelä-Haaga,60.211615,24.891092


In [12]:
districts_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 107 entries, 0 to 106
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   District   107 non-null    object 
 1   Latitude   107 non-null    float64
 2   Longitude  107 non-null    float64
dtypes: float64(2), object(1)
memory usage: 2.6+ KB


****

# PART 2 - Exploratory Data Analysis

1. plot map of city of Helsinki Districts using Folium
2. Use FourSquare API to:
* extract popular(top 10) venues around each District
* locate _Indian-restaurants_ present in the District
* locate **"_Popularity centres_"** by calculating the centroid of the top-10 venues from each district using clustering-methods, such as linkage, fcluster..
3. plot map of **"_Popularity centres_"** & _Indian-restuaurants_ using Folium
4. plot Districts with **"_Popularity centres_"**:
* **without Indian-restaurants**, labeled as **"_Benefit-Zones_"** *(in green)*
* **with Indian-restaurants NOT IN top-10 venues**, labeled as **_Minor Competition-Zones_** *(in blue)*
* **with Indian-restaurants IN top-10 venues**, labeled as **_Major Competition-Zones_** *(in red)*

## PART 2.1 - Plot city map of Helsinki indicating Districts

### Import necessary libraries

In [13]:
import folium

In [14]:
address = 'Helsinki, Finland'
geolocator = Nominatim(user_agent='Helsinki_district_map')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(f"Coordinates of Helsinki are: {latitude}, {longitude}")

Coordinates of Helsinki are: 60.1674881, 24.9427473


In [15]:
helsinki_map = folium.Map(location=[latitude, longitude], zoom_start=6)

for dist, lat, long in zip(districts_df.District, districts_df.Latitude, districts_df.Longitude):
    label = folium.Popup('{}'.format(dist), parse_html=True)
    folium.CircleMarker([lat, long],
    radius=20,
    popup=label,
    fill=False,
    parse_html=False).add_to(helsinki_map)

helsinki_map

### 2.1.1 - Districts with Improper coordinates (Outside Helsinki, *detected using Folium map*)

In [16]:
# Verification of locations with improper coordinates' Districts
wrong_districts = ['Vanhakaupunki','Siltasaari', 'Reijola', 'Vironniemi', 'Koivusaari']

for district in wrong_districts:
    print(f"District as identified by geopy.geocoders API for {district}:\n{geolocator.geocode(district)}\n")

District as identified by geopy.geocoders API for Vanhakaupunki:
Gamla stan, Stortorget, Gamla stan, Södermalms stadsdelsområde, Stockholm, Stockholms kommun, Stockholms län, 111 29, Sverige

District as identified by geopy.geocoders API for Siltasaari:
Siltasaari, Jyränkö, Heinola, Lahden seutukunta, Päijät-Häme, Etelä-Suomen aluehallintovirasto, Manner-Suomi, Suomi

District as identified by geopy.geocoders API for Reijola:
Reijola, Joensuu, Joensuun seutukunta, Pohjois-Karjala, Itä-Suomen aluehallintovirasto, Manner-Suomi, 80330, Suomi

District as identified by geopy.geocoders API for Vironniemi:
Vironniemi, Siilinjärvi, Kuopion seutukunta, Pohjois-Savo, Itä-Suomen aluehallintovirasto, Manner-Suomi, 71870, Suomi

District as identified by geopy.geocoders API for Koivusaari:
Koivusaari, Nurmes, Pielisen Karjalan seutukunta, Pohjois-Karjala, Itä-Suomen aluehallintovirasto, Manner-Suomi, Suomi



In [18]:
#collecting indices of rows to be removed
rows_to_pop = []
for district in wrong_districts:
    index = districts_df.loc[districts_df.District == district].index.to_list()
    rows_to_pop.append(index)

indices = [j for i in rows_to_pop for j in i]
indices = sorted(indices)
print(f"Indices to be removed from the districts_df Dataframe: {indices}")

Indices to be removed from the districts_df Dataframe: [30, 84, 92, 103, 105]


In [20]:
#Drop rows in districts_df Dataframe
districts_df.drop(indices, axis=0, inplace=True)
districts_df.reset_index(drop=True, inplace=True)
print(f"Dataframe refined for venues extraction from FourSquare API:\nTotal Rows: {districts_df.shape[0]}\nTotal Columns: {districts_df.shape[1]}")

Dataframe refined for venues extraction from FourSquare API:
Total Rows: 102
Total Columns: 3


In [21]:
#Map corrected for wrong districts
helsinki_map = folium.Map(location=[latitude, longitude], zoom_start=6)

for dist, lat, long in zip(districts_df.District, districts_df.Latitude, districts_df.Longitude):
    label = folium.Popup('{}'.format(dist), parse_html=True)
    folium.CircleMarker([lat, long],
    radius=20,
    popup=label,
    fill=False,
    parse_html=False).add_to(helsinki_map)

helsinki_map

## PART 2.2 - Use **FourSquareAPI** & Extract nearby venues

### ***Assumption #2 (Important)***
* In reality, there are more Indian Restaurants than **explored Indian Restaurants using *FourSquare API***
* Since, the project is based on **"using FourSquare API for implementation of the Idea"** we will assume the following:
    * **"explored Indian Restaurants" _=_ "existing Indian Restaurants"** 


In [22]:
#Credentials
CLIENT_ID = 'CXC1D1CNWMCS54XHC3M0VLPRLBCPQQMID0OZC04Z0VYTMSAU' 
CLIENT_SECRET = 'OQRFM1BNLVMREJ3N3VJBAWGKU2ERVDEBC3Q1M2UXHBVNDBN3' 
VERSION = '20201201' 
LIMIT = 100

In [23]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    """ 
    function to request and extract the venues list for each district from a .json file. Extracted values are assigned to a Dataframe.

    Args:
    names - District/Neighborhood names of City, dtype: list
    latitudes - Latitude values of the District, dtype: list
    longitudes - Longitude values of the District, dtype: list
    radius - radius around the epicentre of the District for extracting venues, default=500

    Returns:
    nearby_venues - Dataframe with name and spatial details of the respective District and Venues
    """
    
    venues_list=[]
    for name, lat, long in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = f"https://api.foursquare.com/v2/venues/explore?&client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&v={VERSION}&ll={lat},{long}&radius={radius}&limit={LIMIT}"
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            long, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [24]:
helsinki_venues = getNearbyVenues(districts_df.District, districts_df.Latitude, districts_df.Longitude)

Ala-Malmi
Alppiharju
Aurinkolahti
Eira
Etelä-Haaga
Haaga
Hakaniemi
Hakuninmaa
Haltiala
Heikinlaakso
Hermanni (Helsinki)
Herttoniemen teollisuusalue
Herttoniemenranta
Herttoniemi
Hevossalmi
Hietalahti, Helsinki
Itä-Pakila
Itä-Pasila
Itäsaaret
Jollas, Helsinki
Kaarela
Kaartinkaupunki
Kaisaniemi
Kaivopuisto
Kallahti
Kallio
Keski-Pasila
Keski-Vuosaari
Kivihaka
Kluuvi
Konala
Koskela
Kruununhaka
Kulosaari
Kumpula
Kurkimäki
Kuusisaari
Laajasalo
Laakso
Länsi-Herttoniemi
Länsi-Pakila
Länsi-Pasila
Lassila
Lauttasaari
Lehtisaari, Helsinki
Malmi, Helsinki
Marttila, Helsinki
Marjaniemi
Maunula
Maunulanpuisto
Maununneva
Meilahti
Mellunkylä
Meri-Rastila
Merihaka
Metsälä
Munkkiniemi
Munkkisaari
Munkkivuori
Mustavuori
Mustikkamaa–Korkeasaari
Myllypuro
Niemenmäki
Niinisaari
Oulunkylä
Pajamäki
Pakila
Patola, Helsinki
Pihlajamäki
Pihlajisto
Pikku Huopalahti
Pirkkola
Pitäjänmäen teollisuusalue
Pitäjänmäki
Pohjois-Haaga
Pohjois-Pasila
Puistola
Pukinmäki
Punavuori
Puotila
Puotinharju
Puroniitty
Rastila
Reima

In [25]:
print(f"Total Rows:{helsinki_venues.shape[0]}, Total Columns:{helsinki_venues.shape[1]}")
helsinki_venues.head()

Total Rows:1773, Total Columns:7


Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ala-Malmi,60.249474,25.014539,Ravintola Makalu,60.250291,25.012946,Himalayan Restaurant
1,Ala-Malmi,60.249474,25.014539,Fitness24Seven,60.251597,25.013711,Gym / Fitness Center
2,Ala-Malmi,60.249474,25.014539,Malmin Uimahalli | Fix Liikuntakeskus,60.251131,25.0164,Pool
3,Ala-Malmi,60.249474,25.014539,Thai Thai,60.2485,25.010685,Thai Restaurant
4,Ala-Malmi,60.249474,25.014539,Alko,60.251465,25.013255,Liquor Store


In [26]:
print(f"Total unique Venue categories: {helsinki_venues['Venue Category'].nunique()}")

Total unique Venue categories: 257


In [27]:
print(f"Total Districts identified: {districts_df.District.nunique()}\nTotal Districts with Venues: {helsinki_venues.District.nunique()}")

Total Districts identified: 102
Total Districts with Venues: 100


### 2.2.1 - Identify Districts with Indian Restaurants(Red Zones) in helsinki_venues

### ***Assumption #3***
* Total Venue Category with "Himalayan Restaurant" = 11
* Total Venue Category with "Indian Restaurant" = 7 
* Hence, we will be considering **BOTH Venue Categories (Indian & Himalayan) as Indian Restaurants**

In [32]:
print(f"Total Himalayan Restaurants: {len(helsinki_venues[(helsinki_venues['Venue Category'] == 'Himalayan Restaurant')])}\nTotal Indian Restaurants: {len(helsinki_venues[(helsinki_venues['Venue Category'] == 'Indian Restaurant')])}")

Total Himalayan Restaurants: 11
Total Indian Restaurants: 7


In [28]:
helsinki_indian_venues = helsinki_venues[(helsinki_venues['Venue Category'] == 'Indian Restaurant') | (helsinki_venues['Venue Category']=='Himalayan Restaurant')]
print(f"Total number of Indian Restaurants in Helsinki Districts (with venues): {len(helsinki_indian_venues)}")
print("Districts with Indian Restaurant/s:\n")
helsinki_indian_venues

Total number of Indian Restaurants in Helsinki Districts (with venues): 18
Districts with Indian Restaurant/s:



Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ala-Malmi,60.249474,25.014539,Ravintola Makalu,60.250291,25.012946,Himalayan Restaurant
96,Etelä-Haaga,60.211615,24.891092,Roseway,60.207981,24.88694,Indian Restaurant
199,Herttoniemenranta,60.189238,25.029584,Ravintola Mantra,60.186781,25.030365,Himalayan Restaurant
231,Herttoniemi,60.195525,25.029063,Gurkha,60.19493,25.02867,Himalayan Restaurant
273,"Hietalahti, Helsinki",60.162768,24.927331,Aangan,60.163198,24.927786,Himalayan Restaurant
376,Itä-Pasila,60.198825,24.937867,Wok'n'Curry,60.203071,24.935465,Indian Restaurant
686,Keski-Pasila,60.20124,24.92966,Wok'n'Curry,60.203071,24.935465,Indian Restaurant
915,Länsi-Herttoniemi,60.209119,25.040027,Sunkosi,60.206577,25.042706,Himalayan Restaurant
953,Lassila,60.231027,24.876722,Ravintola Moksha,60.229298,24.880959,Indian Restaurant
997,"Malmi, Helsinki",60.250761,25.008574,Ravintola Makalu,60.250291,25.012946,Himalayan Restaurant


### 

In [29]:
print(f"Total unique Districts with Indian restaurants: {helsinki_indian_venues.District.nunique()}")

Total unique Districts with Indian restaurants: 18


In [33]:
#Districts with 1 Indian Restaurant - RedZone Map
helsinki_redzone_map = folium.Map(location=[latitude, longitude], zoom_start=12)

for venue, dist, lat, long in zip(helsinki_indian_venues.Venue, helsinki_indian_venues.District, helsinki_indian_venues['Venue Latitude'], helsinki_indian_venues['Venue Longitude']):
    label = folium.Popup('{},{}'.format(venue, dist), parse_html=True)
    folium.CircleMarker([lat, long],
    radius=5,
    popup=label,
    color='red',
    fill=True,
    fill_color='blue',
    fill_opacity=0.7,
    parse_html=False).add_to(helsinki_redzone_map)

helsinki_redzone_map

### DONE (22 Dec 2020):
* 99 Districts **WITH Venues**
* 19 Districts **WITH "1" INDIAN Restaurant**
* 80 Districts **WIHOUT "1" INDIAN Restaurant**
* 19 Districts - Red Zone
* 80 Districts - Green Zone/Benefit Zone

### TO-DO:
* Extract Top10 Venues from each District
* Calculate _Popularity Centre_ for each District (clustering methods)
* Plot _Popularity Centre_ in Map for each District
* Plot Red Zone & Green Zone _Popularity Centre_ in Helsinki City Map

## PART 2.3 - Extract Top 10 Venues from each District

### 2.3.1 - one-hot encode the Venue Category in helsinki_venues Dataframe

In [34]:
encoded_venues = pd.get_dummies(helsinki_venues[['Venue Category']], prefix='', prefix_sep='', dtype='int64')
encoded_venues.head()

Unnamed: 0,African Restaurant,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Auto Dealership,Auto Garage,Auto Workshop,...,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Yoga Studio,Zoo,Zoo Exhibit
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [35]:
# Encoded dataframe with column added - District
encoded_venues['District'] = helsinki_venues[['District']]
fix_cols = ['District'] + list(encoded_venues.columns[encoded_venues.columns!='District'])
encoded_venues = encoded_venues[fix_cols]
encoded_venues.head()

Unnamed: 0,District,African Restaurant,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Auto Dealership,Auto Garage,...,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Yoga Studio,Zoo,Zoo Exhibit
0,Ala-Malmi,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Ala-Malmi,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Ala-Malmi,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Ala-Malmi,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Ala-Malmi,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### 2.3.2 - Rank Venue Categories per District (Top 10)

In [36]:
#Grouped dataframe statistics by District for each Venue Category
helsinki_grouped = encoded_venues.groupby('District').mean()
helsinki_grouped

Unnamed: 0_level_0,African Restaurant,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Auto Dealership,Auto Garage,Auto Workshop,...,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Yoga Studio,Zoo,Zoo Exhibit
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Ala-Malmi,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0
Alppiharju,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.038462,0.0,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0
Aurinkolahti,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0
Eira,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.031250,0.0,0.0,0.0,0.000000,0.0
Etelä-Haaga,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Torpparinmäki,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0
Toukola,0.000000,0.0,0.000000,0.047619,0.047619,0.047619,0.000000,0.0,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0
Ullanlinna,0.000000,0.0,0.017544,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.017544,0.0,0.0,0.0,0.000000,0.0
Vallila,0.020833,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,...,0.0,0.020833,0.0,0.0,0.000000,0.0,0.0,0.0,0.020833,0.0


In [37]:
#Transposing the grouped dataframe
helsinki_grouped_T = helsinki_grouped.T
venue_cats = []

#Iterating every District to extract Top 10 venues by frequency(mean) of venues
for col in helsinki_grouped_T.columns.to_list():
    venue_freq = helsinki_grouped_T[col].nlargest(10).round(2)
    venue_cats.append(venue_freq.index.to_list())
print(f"Total arrays of Venue Categories: {len(venue_cats)}")

Total arrays of Venue Categories: 100


In [38]:
#Columns for the top10_venues
district_data = helsinki_grouped_T.columns.to_list()
columns = ['District']
for ind in range(10):
    columns.append(f"Venue Category_RANK{ind+1}")
print(columns)

['District', 'Venue Category_RANK1', 'Venue Category_RANK2', 'Venue Category_RANK3', 'Venue Category_RANK4', 'Venue Category_RANK5', 'Venue Category_RANK6', 'Venue Category_RANK7', 'Venue Category_RANK8', 'Venue Category_RANK9', 'Venue Category_RANK10']


In [40]:
#Splitting venue_cats by 10 for assigning top10 for each District
rl = [j for i in venue_cats for j in i]
rl = np.array_split(rl, len(rl)/10)
print(f"RANK1 Categories for the District {helsinki_grouped_T.columns[0]}:\n{rl[0]}")

RANK1 Categories for the District Ala-Malmi:
['Gym / Fitness Center' 'Pharmacy' 'Basketball Court' 'Beer Bar'
 'Chinese Restaurant' 'Coffee Shop' 'Cultural Center'
 'Fast Food Restaurant' 'Flea Market' 'Himalayan Restaurant']


In [46]:
#Transpose rank list 'rl' for collecting lists by rank
rl_t = np.transpose(rl)

#Initialize dictionary with: Keys as columns, Values as district_data and sub-lists of 'rl_t' 
d_keys = columns
d_vals = [district_data, rl_t[0], rl_t[1], rl_t[2], rl_t[3], rl_t[4], rl_t[5], rl_t[6], rl_t[7], rl_t[8], rl_t[9]]
data_dict = dict(zip(d_keys, d_vals))

#Initialize top10_venues_df
top10_venues = pd.DataFrame(data_dict)
top10_venues

Unnamed: 0,District,Venue Category_RANK1,Venue Category_RANK2,Venue Category_RANK3,Venue Category_RANK4,Venue Category_RANK5,Venue Category_RANK6,Venue Category_RANK7,Venue Category_RANK8,Venue Category_RANK9,Venue Category_RANK10
0,Ala-Malmi,Gym / Fitness Center,Pharmacy,Basketball Court,Beer Bar,Chinese Restaurant,Coffee Shop,Cultural Center,Fast Food Restaurant,Flea Market,Himalayan Restaurant
1,Alppiharju,Theme Park Ride / Attraction,Park,Bar,Greek Restaurant,Asian Restaurant,Beer Garden,Blini House,Bus Stop,Café,Dog Run
2,Aurinkolahti,Harbor / Marina,Beach,Park,Beer Bar,Bus Stop,Café,Grocery Store,Gym / Fitness Center,Ice Cream Shop,Playground
3,Eira,Park,Bakery,Café,French Restaurant,Ice Cream Shop,Italian Restaurant,Beach,Boat or Ferry,Coffee Roaster,Coffee Shop
4,Etelä-Haaga,Bus Stop,Café,Park,Cafeteria,Chinese Restaurant,Gas Station,Indian Restaurant,Pizza Place,Skate Park,African Restaurant
...,...,...,...,...,...,...,...,...,...,...,...
95,Torpparinmäki,Bus Stop,Bistro,Playground,African Restaurant,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant
96,Toukola,Park,Art Gallery,Art Museum,Arts & Crafts Store,Business Service,Café,College Cafeteria,Comic Shop,Flea Market,Furniture / Home Store
97,Ullanlinna,Park,Grocery Store,Coffee Shop,Pizza Place,Scandinavian Restaurant,Café,French Restaurant,Ice Cream Shop,Antique Shop,Bakery
98,Vallila,Bar,Park,Pizza Place,Cafeteria,Chinese Restaurant,Convenience Store,Flea Market,Gym,Hostel,Tram Station


### DONE (23 Dec 2020):
* Extracted Venue Categories in each District (from helsinki_grouped)
* Ranked Top 10 Venue Categories for each District by frequency of occurence

### TO DO:
* Create Dataframe for the Districts with Venues having:
    * Venue, Venue Latitude, Venue Longitude & Venue Category (from helsinki_venues)
    * Venue Categories (from top10_venues)
* Plot Venues in each District
* Calculate Cluster centre (centroid = *Popularity Centre*) for each District from Venue Latitude & Longitude
* Plot *Popularity Centres* on Helsinki City Map

## PART 2.4 Locate **_Popularity Centres_** for each District

### 2.4.1 - Tabulate the Popular Venues from each District

In [168]:
#Extracting Venues & their respective coordinates (from helsinki_venues) using transposed-ranked venues array (rl_t) for each District into single Dataframe

ranked_venues_list = []
# Iterate Venue Categories (from transposed-ranked Venues array) by Rank
for rank in rl_t:
    # Iterate Ranked Categories w.r.t Districts in top10_venues
    for vc, d in zip(rank, top10_venues.District):
        # Filter empty dataframes
        if not (helsinki_venues.loc[(helsinki_venues['District'] == d) & (helsinki_venues['Venue Category'] == vc)]).empty:
            # Append extracted Dataframes (from expression) to ranked_venues_list
            ranked_venues_list.append(helsinki_venues.loc[(helsinki_venues['District'] == d) & (helsinki_venues['Venue Category'] == vc)]) 
        else:
            pass

ranked_venues_df = pd.concat(ranked_venues_list) # Merge (by concatenation) all extracted Dataframes into a single Dataframe
ranked_venues_df.reset_index(inplace=True, drop=True)

In [171]:
display(ranked_venues_df)

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ala-Malmi,60.249474,25.014539,Fitness24Seven,60.251597,25.013711,Gym / Fitness Center
1,Ala-Malmi,60.249474,25.014539,Lady's Club,60.249060,25.009737,Gym / Fitness Center
2,Alppiharju,60.189728,24.944120,Vuoristorata,60.188544,24.941248,Theme Park Ride / Attraction
3,Alppiharju,60.189728,24.944120,Kingi,60.187859,24.941492,Theme Park Ride / Attraction
4,Alppiharju,60.189728,24.944120,Ukko,60.189061,24.940768,Theme Park Ride / Attraction
...,...,...,...,...,...,...,...
1168,Toukola,60.208836,24.972565,Iittala-Arabia-Fiskars Design Centre Store,60.208845,24.975680,Furniture / Home Store
1169,Ullanlinna,60.158715,24.949404,Gateau,60.160778,24.946334,Bakery
1170,Vallila,60.196167,24.956710,HKL Vallilan raitiovaunuvarikko,60.195344,24.962635,Tram Station
1171,Vallila,60.196167,24.956710,HSL 0269 Mäkelänrinne,60.197770,24.949184,Tram Station


#### Investigating Data (Manual)

In [169]:
# Ranked Venue Categories of a random District
top10_venues[top10_venues.District == 'Vallila']

Unnamed: 0,District,Venue Category_RANK1,Venue Category_RANK2,Venue Category_RANK3,Venue Category_RANK4,Venue Category_RANK5,Venue Category_RANK6,Venue Category_RANK7,Venue Category_RANK8,Venue Category_RANK9,Venue Category_RANK10
98,Vallila,Bar,Park,Pizza Place,Cafeteria,Chinese Restaurant,Convenience Store,Flea Market,Gym,Hostel,Tram Station


In [170]:
# Verifying Venue Categories of the same random District in the Dataframe built from extracted data
ranked_venues_df[ranked_venues_df.District == 'Vallila']

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
278,Vallila,60.196167,24.95671,Ravintola Pikku-Vallila,60.192021,24.958999,Bar
279,Vallila,60.196167,24.95671,Backas Bar,60.192544,24.960758,Bar
280,Vallila,60.196167,24.95671,Hermannin Kukko,60.195874,24.963032,Bar
448,Vallila,60.196167,24.95671,Vallilanlaakso,60.199592,24.956837,Park
449,Vallila,60.196167,24.95671,Hollolanpuisto,60.195728,24.954875,Park
450,Vallila,60.196167,24.95671,Hartolan puistikko,60.19364,24.960984,Park
581,Vallila,60.196167,24.95671,Sturenkadun Pizzeria,60.195119,24.958194,Pizza Place
582,Vallila,60.196167,24.95671,Marco Polo,60.193917,24.958328,Pizza Place
583,Vallila,60.196167,24.95671,Power Pizza,60.195564,24.963224,Pizza Place
696,Vallila,60.196167,24.95671,Elmstreet,60.19382,24.95165,Cafeteria


### 2.4.2 - Visualize Popular Venues in Helsinki City Map with Rank and Venue Category

In [172]:
#Copy of 'Venue Category' values as 'Venue Category_RANK' to assign Rank Number
ranked_venues_df['Venue Category_RANK'] = ranked_venues_df['Venue Category'] 
ranked_venues_df.head()

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue Category_RANK
0,Ala-Malmi,60.249474,25.014539,Fitness24Seven,60.251597,25.013711,Gym / Fitness Center,Gym / Fitness Center
1,Ala-Malmi,60.249474,25.014539,Lady's Club,60.24906,25.009737,Gym / Fitness Center,Gym / Fitness Center
2,Alppiharju,60.189728,24.94412,Vuoristorata,60.188544,24.941248,Theme Park Ride / Attraction,Theme Park Ride / Attraction
3,Alppiharju,60.189728,24.94412,Kingi,60.187859,24.941492,Theme Park Ride / Attraction,Theme Park Ride / Attraction
4,Alppiharju,60.189728,24.94412,Ukko,60.189061,24.940768,Theme Park Ride / Attraction,Theme Park Ride / Attraction


In [194]:
if (top10_venues.District.unique() == ranked_venues_df.District.unique()).any() is False:
    print('Ooops!')
else:
    print('All good!')
#list(ranked_venues_df[ranked_venues_df.District == dist].iloc[:, 6])

All good!


In [186]:
# Extract values specific to 'Venue Category' with 'District' as filter from both dataframes (from ranked_venues_df & top10_venues)

ranks = range(1, 11) # Intialize Rank Numbers
venue_dists = top10_venues.District.to_list() # List Districts with Venues

for dist in venue_dists:
    vc_r = top10_venues[top10_venues.District == dist].values[0][1:] # Venue Categories from 'Venue Category_RANK' (top10_venues)
    vc = list(ranked_venues_df[ranked_venues_df.District == dist].iloc[:, 6]) # Venue Categories from 'Venue Category' (ranked_venues_df)

    # Assign Rank Number based on 'Venue Category_RANK' (from top10_venues)
    for r, rank in zip(vc_r, ranks): # Iterate Venue Categories and their respective Rank Numbers
        for v in vc:
            if v == r: # Equate 'Venue Category' with 'Venue Category_RANK' values and assign Rank Number
                ranked_venues_df['Venue Category_RANK'].replace(to_replace=v, value=rank, inplace=True) # Replace values in 'Venue Category_RANK' by Rank Number 

ranked_venues_df.head()

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue Category_RANK
0,Ala-Malmi,60.249474,25.014539,Fitness24Seven,60.251597,25.013711,Gym / Fitness Center,1
1,Ala-Malmi,60.249474,25.014539,Lady's Club,60.24906,25.009737,Gym / Fitness Center,1
2,Alppiharju,60.189728,24.94412,Vuoristorata,60.188544,24.941248,Theme Park Ride / Attraction,1
3,Alppiharju,60.189728,24.94412,Kingi,60.187859,24.941492,Theme Park Ride / Attraction,1
4,Alppiharju,60.189728,24.94412,Ukko,60.189061,24.940768,Theme Park Ride / Attraction,1


#### Investigating assigned Rank Numbers

In [187]:
print(f"Total Null values in Venue Category_RANK: {ranked_venues_df['Venue Category_RANK'].isnull().sum()}")
print(f"List of Assigned Rank Numbers:\n{sorted(ranked_venues_df['Venue Category_RANK'].unique())}")

Total Null values in Venue Category_RANK: 0
List of Assigned Rank Numbers:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In [199]:
top10_venues[top10_venues['District']=='Vallila']

Unnamed: 0,District,Venue Category_RANK1,Venue Category_RANK2,Venue Category_RANK3,Venue Category_RANK4,Venue Category_RANK5,Venue Category_RANK6,Venue Category_RANK7,Venue Category_RANK8,Venue Category_RANK9,Venue Category_RANK10
98,Vallila,Bar,Park,Pizza Place,Cafeteria,Chinese Restaurant,Convenience Store,Flea Market,Gym,Hostel,Tram Station


In [200]:
 ranked_venues_df[ranked_venues_df['District']=='Vallila']

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue Category_RANK
278,Vallila,60.196167,24.95671,Ravintola Pikku-Vallila,60.192021,24.958999,Bar,3
279,Vallila,60.196167,24.95671,Backas Bar,60.192544,24.960758,Bar,3
280,Vallila,60.196167,24.95671,Hermannin Kukko,60.195874,24.963032,Bar,3
448,Vallila,60.196167,24.95671,Vallilanlaakso,60.199592,24.956837,Park,2
449,Vallila,60.196167,24.95671,Hollolanpuisto,60.195728,24.954875,Park,2
450,Vallila,60.196167,24.95671,Hartolan puistikko,60.19364,24.960984,Park,2
581,Vallila,60.196167,24.95671,Sturenkadun Pizzeria,60.195119,24.958194,Pizza Place,8
582,Vallila,60.196167,24.95671,Marco Polo,60.193917,24.958328,Pizza Place,8
583,Vallila,60.196167,24.95671,Power Pizza,60.195564,24.963224,Pizza Place,8
696,Vallila,60.196167,24.95671,Elmstreet,60.19382,24.95165,Cafeteria,4


In [175]:
# Districts with Popular Venues (Venues from Top10 Venue Categories for each District)
from folium import plugins

helsinki_popven_map = folium.Map(location=[latitude, longitude], zoom_start=12) # Initiate Helsinki City Map
pop_venues = plugins.MarkerCluster().add_to(helsinki_popven_map) # Initiate Cluster segmentation plugin

for dist, venue_cat, rank, lat, long in zip(ranked_venues_df['District'], ranked_venues_df['Venue Category'], ranked_venues_df['Venue Category_RANK'],
ranked_venues_df['Venue Latitude'], ranked_venues_df['Venue Longitude']):
    label = folium.Popup(f"{dist}, {venue_cat}, Rank-{rank}", parse_html=True)
    folium.Marker(location=[lat, long],
    icon=None,
    popup=label).add_to(pop_venues)

helsinki_popven_map

# TRIAL CODE - IN Progress

In [None]:
trial_pop = top10_venues.copy()

In [108]:
#Extracting Venues & their respective coordinates (from helsinki_venues) using transposed-ranked venues array (rl_t) for each District into single Dataframe

ranked_venues_list = []
# Iterate Venue Categories (from transposed-ranked Venues array) by Rank
for rank in rl_t:
    # Iterate Ranked Categories w.r.t Districts in top10_venues
    for vc, d in zip(rank, trial_pop.District):
        # Filter empty dataframes
        if not (helsinki_venues.loc[(helsinki_venues['District'] == d) & (helsinki_venues['Venue Category'] == vc)]).empty:
            # Append extracted Dataframes (from expression) to list
            ranked_venues_list.append(helsinki_venues.loc[(helsinki_venues['District'] == d) & (helsinki_venues['Venue Category'] == vc)]) 
        else:
            pass

ranked_venues_df = pd.concat(ranked_venues_list) # Merge (by concatenation) all extracted Dataframes into a single Dataframe
ranked_venues_df

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
1,Ala-Malmi,60.249474,25.014539,Fitness24Seven,60.251597,25.013711,Gym / Fitness Center
6,Ala-Malmi,60.249474,25.014539,Lady's Club,60.249060,25.009737,Gym / Fitness Center
20,Alppiharju,60.189728,24.944120,Vuoristorata,60.188544,24.941248,Theme Park Ride / Attraction
25,Alppiharju,60.189728,24.944120,Kingi,60.187859,24.941492,Theme Park Ride / Attraction
32,Alppiharju,60.189728,24.944120,Ukko,60.189061,24.940768,Theme Park Ride / Attraction
...,...,...,...,...,...,...,...
1638,Toukola,60.208836,24.972565,Iittala-Arabia-Fiskars Design Centre Store,60.208845,24.975680,Furniture / Home Store
1728,Ullanlinna,60.158715,24.949404,Gateau,60.160778,24.946334,Bakery
1695,Vallila,60.196167,24.956710,HKL Vallilan raitiovaunuvarikko,60.195344,24.962635,Tram Station
1701,Vallila,60.196167,24.956710,HSL 0269 Mäkelänrinne,60.197770,24.949184,Tram Station


In [50]:
#original data
trial_pop[trial_pop.District == 'Vallila']

Unnamed: 0,District,Venue Category_RANK1,Venue Category_RANK2,Venue Category_RANK3,Venue Category_RANK4,Venue Category_RANK5,Venue Category_RANK6,Venue Category_RANK7,Venue Category_RANK8,Venue Category_RANK9,Venue Category_RANK10
98,Vallila,Bar,Park,Pizza Place,Cafeteria,Chinese Restaurant,Convenience Store,Flea Market,Gym,Hostel,Tram Station


In [51]:
#extracted data
ranked_venues_df[ranked_venues_df.District == 'Vallila']

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
1676,Vallila,60.196167,24.95671,Ravintola Pikku-Vallila,60.192021,24.958999,Bar
1681,Vallila,60.196167,24.95671,Backas Bar,60.192544,24.960758,Bar
1694,Vallila,60.196167,24.95671,Hermannin Kukko,60.195874,24.963032,Bar
1667,Vallila,60.196167,24.95671,Vallilanlaakso,60.199592,24.956837,Park
1688,Vallila,60.196167,24.95671,Hollolanpuisto,60.195728,24.954875,Park
1698,Vallila,60.196167,24.95671,Hartolan puistikko,60.19364,24.960984,Park
1686,Vallila,60.196167,24.95671,Sturenkadun Pizzeria,60.195119,24.958194,Pizza Place
1687,Vallila,60.196167,24.95671,Marco Polo,60.193917,24.958328,Pizza Place
1697,Vallila,60.196167,24.95671,Power Pizza,60.195564,24.963224,Pizza Place
1677,Vallila,60.196167,24.95671,Elmstreet,60.19382,24.95165,Cafeteria


In [147]:
t2_df = ranked_venues_df.copy().reset_index(drop=True)

#Copy of Venue Category values as Venue Category_RANK to assign Rank Number
t2_df['Venue Category_RANK'] = t2_df['Venue Category'] 
t2_df

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue Category_RANK
0,Ala-Malmi,60.249474,25.014539,Fitness24Seven,60.251597,25.013711,Gym / Fitness Center,Gym / Fitness Center
1,Ala-Malmi,60.249474,25.014539,Lady's Club,60.249060,25.009737,Gym / Fitness Center,Gym / Fitness Center
2,Alppiharju,60.189728,24.944120,Vuoristorata,60.188544,24.941248,Theme Park Ride / Attraction,Theme Park Ride / Attraction
3,Alppiharju,60.189728,24.944120,Kingi,60.187859,24.941492,Theme Park Ride / Attraction,Theme Park Ride / Attraction
4,Alppiharju,60.189728,24.944120,Ukko,60.189061,24.940768,Theme Park Ride / Attraction,Theme Park Ride / Attraction
...,...,...,...,...,...,...,...,...
1168,Toukola,60.208836,24.972565,Iittala-Arabia-Fiskars Design Centre Store,60.208845,24.975680,Furniture / Home Store,Furniture / Home Store
1169,Ullanlinna,60.158715,24.949404,Gateau,60.160778,24.946334,Bakery,Bakery
1170,Vallila,60.196167,24.956710,HKL Vallilan raitiovaunuvarikko,60.195344,24.962635,Tram Station,Tram Station
1171,Vallila,60.196167,24.956710,HSL 0269 Mäkelänrinne,60.197770,24.949184,Tram Station,Tram Station


In [148]:
# Extract values specific to 'Venue Category' with 'District' as filter from both dataframes (from ranked_venues_df & top10_venues)

ranks = range(1, 11) # Intialize Rank Numbers
venue_dists = trial_pop.District.to_list() # List Districts with Venues

for dist in venue_dists:
    vc_r = trial_pop[trial_pop.District == dist].values[0][1:] # Venue Categories from 'Venue Category_RANK' (top10_venues)
    vc = t2_df[t2_df.District == dist].iloc[:, 6] # Venue Categories from 'Venue Category' (ranked_venues_df)

    # Assign Rank Number based on 'Venue Category_RANK' (from top10_venues)
    for r, rank in zip(vc_r, ranks): # Iterate Venue Categories and their respective Rank Numbers
        for v in vc:
            if v == r: # Equate 'Venue Category' with 'Venue Category_RANK' values and assign Rank Number
                t2_df['Venue Category_RANK'].replace(to_replace=v, value=rank, inplace=True) # Replace values in 'Venue Category_RANK' by Rank Number 

t2_df

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue Category_RANK
0,Ala-Malmi,60.249474,25.014539,Fitness24Seven,60.251597,25.013711,Gym / Fitness Center,1
1,Ala-Malmi,60.249474,25.014539,Lady's Club,60.249060,25.009737,Gym / Fitness Center,1
2,Alppiharju,60.189728,24.944120,Vuoristorata,60.188544,24.941248,Theme Park Ride / Attraction,1
3,Alppiharju,60.189728,24.944120,Kingi,60.187859,24.941492,Theme Park Ride / Attraction,1
4,Alppiharju,60.189728,24.944120,Ukko,60.189061,24.940768,Theme Park Ride / Attraction,1
...,...,...,...,...,...,...,...,...
1168,Toukola,60.208836,24.972565,Iittala-Arabia-Fiskars Design Centre Store,60.208845,24.975680,Furniture / Home Store,7
1169,Ullanlinna,60.158715,24.949404,Gateau,60.160778,24.946334,Bakery,2
1170,Vallila,60.196167,24.956710,HKL Vallilan raitiovaunuvarikko,60.195344,24.962635,Tram Station,5
1171,Vallila,60.196167,24.956710,HSL 0269 Mäkelänrinne,60.197770,24.949184,Tram Station,5


In [157]:
print(sorted(t2_df['Venue Category_RANK'].unique()))

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In [164]:
#TESTING Hierarchical Indexing (t3_df) - Indexing probelm for Mapping
t3_df = t2_df.set_index(['District','Venue Category','Venue Category_RANK'])


# Districts with Popular Venues - with Indian Restaurants(RedZone Map)
from folium import plugins

helsinki_popven_map = folium.Map(location=[latitude, longitude], zoom_start=12) # Initiate Helsinki City Map
pop_venues = plugins.MarkerCluster().add_to(helsinki_popven_map) # Initiate Cluster segmentation plugin

for rank, venue_cat, lat, long in zip(t2_df['Venue Category_RANK'], t2_df['Venue Category'], t2_df['Venue Latitude'], t2_df['Venue Longitude']):
    label = folium.Popup(f"Rank-{rank},{venue_cat}", parse_html=True)
    folium.Marker(location=[lat, long],
    icon=None,
    popup=label).add_to(pop_venues)

helsinki_popven_map

### 2.4.3 - Calculate _Popularity Centres_ (Centroids) for each District by clustering methods

### 2.4.4 - Visualize _Popularity Centres_ in Helsinki City Map