--------
--------
# *Profitable Locations for chain of Indian Restaurants*
### Developed by **Prakirth Govardhanam**
### Capstone Project - **Applied Data Science Capstone**
--------
--------

## Project Assumption
**_Popularity Centre_** = the centroid (mean position) of the most popular venues from the Top10 Venue Categories (by frequency of occurence) in each District will be considered as the "Popularity Centre" within every District
* **Clarification:** Popular Venues from Top10 Venue Categories  were ideally planned to be **_filtered by Ratings of Venues_**. Unfortunately, I have a Sandbox account & Ratings of Venues at the scale I need would be possible only with Premium accounts

## Introduction/Business-Problem
In this project, I try to find possible-beneficial locations within the Neighborhoods (Districts) of Helsinki, Finland, for establishing a chain of Indian Restaurants. The primary conditions for the project are as follows:
*	Locate Popularity Centre (Project Assumption) in the District – for attention
*	Identify Districts with:
    *	Absence of Indian restaurants – for profitable Business
    *	Presence of Indian restaurants – for moderate competition to evolve Business model
    *	Presence of Indian restaurants within Popular venues – to circumvent extreme competition


## Data
Data sources used to determine the Neighborhoods within the city of Helsinki are provided by:

* Wikipedia_(https://en.wikipedia.org/wiki/Names_of_places_in_Finland_in_Finnish_and_in_Swedish#Municipalities)_ - for labels of the Districts of Helsinki
* _geopy_ (Python package) – for coordinates of the Districts of Helsinki
* Foursquare API - for popular venues, restaurants and their respective geospatial data

___
___

# PART 1 - Data Preparation

## PART 1.1 - Data Extraction

### Import necessary libraries

In [2]:
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup
import re
import geocoder
from geopy.geocoders import Nominatim

______

### Clarification #1
* Names of anything in Finland has its name in 2 languages, Finnish & Swedish
* Hence, names of Districts are also in same pattern: Finnish-name (Swedish-name)
### ***Assumption #1***
* In the current source of District labels, "Swedish-names of Finland Districts _could be_ confused with Swedish-names of Sweden Districts" in the FourSquare API.
* Hence, we will extract and work only with Finnish-names of the Districts


______

In [3]:
#url with Helsinki District names
url = 'https://en.wikipedia.org/wiki/Names_of_places_in_Finland_in_Finnish_and_in_Swedish#Municipalities'

#parsing the webpage for html content
html = requests.get(url).text
soup = BeautifulSoup(html, features='html.parser')

#extract <a href> tags
atags = soup.select('a[href]')

#extract titles of <a href> tags
titles = []
for atag in atags:
    titles.append(atag.get('title'))

#slice the labels of Helsinki Districts
districts = titles[titles.index('Ala-Malmi'): titles.index('Ylä-Malmi')+1]
print(f"Total Districts listed: {len(districts)}")

Total Districts listed: 110


In [4]:
#extract coordinates from District/Neighborhood names using geopy.geocoders.Nominatim
geolocator = Nominatim(user_agent='Helsinki_districts')

#empty lists for latitude & longitude values and None values, if any
lats = []
longs = []

#looping through district names for coordinates
for name in districts:
    location = geolocator.geocode(name)
    try:
        lats.append(location.latitude)
        longs.append(location.longitude)
    except AttributeError:
        pass

In [5]:
print(f"Total values identified \n(Latitude, Longitude): {len(lats), len(longs)}")

Total values identified 
(Latitude, Longitude): (109, 109)


## PART 1.2 - Investigating Data

### 1.2.1 - Districts with ***None*** values for coordinates

In [6]:
# Investigating None value in districts list, if Any
trial = []
for name in districts:
    location = geolocator.geocode(name)
    try:
        trial.append(location.latitude)
    except AttributeError as err:
        print('None value detected!')
        raise

None value detected!


AttributeError: 'NoneType' object has no attribute 'latitude'

In [7]:
#Identify District with NoneType coordinate
print(f"District with NoneType coordinate:\n{districts[len(trial)]}")

District with NoneType coordinate:
Kampinmalmi


In [8]:
#Direct verification 
geolocator.geocode('Kampinmalmi').latitude

AttributeError: 'NoneType' object has no attribute 'latitude'

### 1.2.2 - Districts with Improper coordinates (Detected *Manually*)

In [9]:
wrong_coords = ['Pasila','Töölö']
for name in wrong_coords:
    print(f"Locations as identified by geopy.geocoders API for {name}:\n{geolocator.geocode(name)}\n")

Locations as identified by geopy.geocoders API for Pasila:
Brasil

Locations as identified by geopy.geocoders API for Töölö:
Toolo, Loroum, Nord, Burkina Faso



In [10]:
#Districts, Latitudes & Longitudes with NoneType & Improper coordinates - to be removed from Lists

print(f"BEFORE Cleaning:\nTotal Districts:{len(districts)}\nTotal Latitude values:{len(lats)}\nTotal Longitude values:{len(longs)}")

loc_to_pop = ['Pasila','Töölö','Kampinmalmi']
lat_to_pop = [-10.3333333, 13.744717]
long_to_pop = [-53.2, -1.9645989]

#Remove districts without coordinates and with improper coordinates
for loc in loc_to_pop:
    districts.remove(loc)

#Remove improper coordinates    
for lat, long in zip(lat_to_pop, long_to_pop):
    lats.remove(lats[lats.index(lat)])
    longs.remove(longs[longs.index(long)])
    
print(f"\nAFTER Cleaning:\nTotal Districts:{len(districts)}\nTotal Latitude values:{len(lats)}\nTotal Longitude values:{len(longs)}")

BEFORE Cleaning:
Total Districts:110
Total Latitude values:109
Total Longitude values:109

AFTER Cleaning:
Total Districts:107
Total Latitude values:107
Total Longitude values:107


In [11]:
#Frame all extracted values in a Dataframe
districts_df = pd.DataFrame(data= zip(districts, lats, longs), columns=['District', 'Latitude', 'Longitude'])
districts_df.head()

Unnamed: 0,District,Latitude,Longitude
0,Ala-Malmi,60.249474,25.014539
1,Alppiharju,60.189728,24.94412
2,Aurinkolahti,60.201507,25.155669
3,Eira,60.156191,24.938375
4,Etelä-Haaga,60.211615,24.891092


In [12]:
districts_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 107 entries, 0 to 106
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   District   107 non-null    object 
 1   Latitude   107 non-null    float64
 2   Longitude  107 non-null    float64
dtypes: float64(2), object(1)
memory usage: 2.6+ KB


****

# PART 2 - Exploratory Data Analysis

2.1. City map of Helsinki Districts using Folium

2.2. Extract Nearby Venues:
* Venues across Districts in Helsinki using FourSquare API
* Locate _Indian-restaurants_ present in the District

2.3. Rank Venue Categories:
* Extract Top10 Venue Categories from each District 
* Extract Venues based on the Top10 Venue Categories from each District
* Locate **"Popular Venues"** by calculating the centroid of the top-10 venues from each district

2.4. Map of Districts with **"_Popularity centres_"**:
* **WITH Indian-restaurants IN top-10 venues**, labeled as **_Major-Competition Zones_** *(in red)*
* **WITH Indian-restaurants NOT IN top-10 venues**, labeled as **_Minor-Competition Zones_** *(in blue)*
* **WITHOUT Indian-restaurants**, labeled as **"_Benefit-Zones_"** *(in green)*

## PART 2.1 - Plot city map of Helsinki indicating Districts

### Import necessary libraries

In [13]:
import folium

In [14]:
address = 'Helsinki, Finland'
geolocator = Nominatim(user_agent='Helsinki_district_map')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(f"Coordinates of Helsinki are: {latitude}, {longitude}")

Coordinates of Helsinki are: 60.1674881, 24.9427473


In [15]:
helsinki_map = folium.Map(location=[latitude, longitude], zoom_start=6)

for dist, lat, long in zip(districts_df.District, districts_df.Latitude, districts_df.Longitude):
    label = folium.Popup('{}'.format(dist), parse_html=True)
    folium.CircleMarker([lat, long],
    radius=20,
    popup=label,
    fill=False,
    parse_html=False).add_to(helsinki_map)

helsinki_map

### 2.1.1 - Districts with Improper coordinates (Outside Helsinki, *detected using Folium map*)

In [16]:
# Verification of Districts with improper coordinates' 
wrong_districts = ['Vanhakaupunki','Siltasaari', 'Reijola', 'Vironniemi', 'Koivusaari']

for district in wrong_districts:
    print(f"District as identified by geopy.geocoders API for {district}:\n{geolocator.geocode(district)}\n")

District as identified by geopy.geocoders API for Vanhakaupunki:
Gamla stan, Stortorget, Gamla stan, Södermalms stadsdelsområde, Stockholm, Stockholms kommun, Stockholms län, 111 29, Sverige

District as identified by geopy.geocoders API for Siltasaari:
Siltasaari, Jyränkö, Heinola, Lahden seutukunta, Päijät-Häme, Etelä-Suomen aluehallintovirasto, Manner-Suomi, Suomi

District as identified by geopy.geocoders API for Reijola:
Reijola, Joensuu, Joensuun seutukunta, Pohjois-Karjala, Itä-Suomen aluehallintovirasto, Manner-Suomi, 80330, Suomi

District as identified by geopy.geocoders API for Vironniemi:
Vironniemi, Siilinjärvi, Kuopion seutukunta, Pohjois-Savo, Itä-Suomen aluehallintovirasto, Manner-Suomi, 71870, Suomi

District as identified by geopy.geocoders API for Koivusaari:
Koivusaari, Nurmes, Pielisen Karjalan seutukunta, Pohjois-Karjala, Itä-Suomen aluehallintovirasto, Manner-Suomi, Suomi



In [17]:
#collecting indices of rows to be removed
rows_to_pop = []
for district in wrong_districts:
    index = districts_df.loc[districts_df.District == district].index.to_list()
    rows_to_pop.append(index)

indices = [j for i in rows_to_pop for j in i]
indices = sorted(indices)
print(f"Indices to be removed from the districts_df Dataframe: {indices}")

Indices to be removed from the districts_df Dataframe: [30, 84, 92, 103, 105]


In [18]:
#Drop rows in districts_df Dataframe
districts_df.drop(indices, axis=0, inplace=True)
districts_df.reset_index(drop=True, inplace=True)
print(f"Dataframe refined for venues extraction from FourSquare API:\nTotal Rows: {districts_df.shape[0]}\nTotal Columns: {districts_df.shape[1]}")

Dataframe refined for venues extraction from FourSquare API:
Total Rows: 102
Total Columns: 3


In [19]:
#Map corrected for wrong districts
helsinki_map = folium.Map(location=[latitude, longitude], zoom_start=6)

for dist, lat, long in zip(districts_df.District, districts_df.Latitude, districts_df.Longitude):
    label = folium.Popup('{}'.format(dist), parse_html=True)
    folium.CircleMarker([lat, long],
    radius=20,
    popup=label,
    fill=False,
    parse_html=False).add_to(helsinki_map)

helsinki_map

## PART 2.2 - Use **FourSquareAPI** & Extract nearby venues

__________

### ***Assumption #2 (Important)***
* In reality, there are more Indian Restaurants than **explored Indian Restaurants using *FourSquare API***
* Since, the project is based on **"using FourSquare API for implementation of the Idea"** we will assume the following:
    * **"explored Indian Restaurants" _=_ "existing Indian Restaurants"** 


_______

In [20]:
#Credentials
CLIENT_ID = 'CXC1D1CNWMCS54XHC3M0VLPRLBCPQQMID0OZC04Z0VYTMSAU' 
CLIENT_SECRET = 'OQRFM1BNLVMREJ3N3VJBAWGKU2ERVDEBC3Q1M2UXHBVNDBN3' 
VERSION = '20201201' 
LIMIT = 100

In [21]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    """ 
    function to request and extract the venues list for each district from a .json file. Extracted values are assigned to a Dataframe.

    Args:
    names - District/Neighborhood names of City, dtype: list
    latitudes - Latitude values of the District, dtype: list
    longitudes - Longitude values of the District, dtype: list
    radius - radius around the epicentre of the District for extracting venues, default=500

    Returns:
    nearby_venues - Dataframe with name and spatial details of the respective District and Venues
    """
    
    venues_list=[]
    for name, lat, long in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = f"https://api.foursquare.com/v2/venues/explore?&client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&v={VERSION}&ll={lat},{long}&radius={radius}&limit={LIMIT}"
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            long, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Clarification #2
Code below might interrupt due to possible errors in .json file structure, which could not be tackled. Try to re-run for succesful execution. 

In [23]:
helsinki_venues = getNearbyVenues(districts_df.District, districts_df.Latitude, districts_df.Longitude)

Ala-Malmi
Alppiharju
Aurinkolahti
Eira
Etelä-Haaga
Haaga
Hakaniemi
Hakuninmaa
Haltiala
Heikinlaakso
Hermanni (Helsinki)
Herttoniemen teollisuusalue
Herttoniemenranta
Herttoniemi
Hevossalmi
Hietalahti, Helsinki
Itä-Pakila
Itä-Pasila
Itäsaaret
Jollas, Helsinki
Kaarela
Kaartinkaupunki
Kaisaniemi
Kaivopuisto
Kallahti
Kallio
Keski-Pasila
Keski-Vuosaari
Kivihaka
Kluuvi
Konala
Koskela
Kruununhaka
Kulosaari
Kumpula
Kurkimäki
Kuusisaari
Laajasalo
Laakso
Länsi-Herttoniemi
Länsi-Pakila
Länsi-Pasila
Lassila
Lauttasaari
Lehtisaari, Helsinki
Malmi, Helsinki
Marttila, Helsinki
Marjaniemi
Maunula
Maunulanpuisto
Maununneva
Meilahti
Mellunkylä
Meri-Rastila
Merihaka
Metsälä
Munkkiniemi
Munkkisaari
Munkkivuori
Mustavuori
Mustikkamaa–Korkeasaari
Myllypuro
Niemenmäki
Niinisaari
Oulunkylä
Pajamäki
Pakila
Patola, Helsinki
Pihlajamäki
Pihlajisto
Pikku Huopalahti
Pirkkola
Pitäjänmäen teollisuusalue
Pitäjänmäki
Pohjois-Haaga
Pohjois-Pasila
Puistola
Pukinmäki
Punavuori
Puotila
Puotinharju
Puroniitty
Rastila
Reima

### Clarification #3 (IMPORTANT)
* Venues data acquired from FourSquare API through _getNearbyVenues_() differs from day-to-day even though VERSION ('20201201') is a fixed value. 
* Probable reason could be due to satellite movement and communication differences. 
* Reproduciblity cannot be expected in the Top10 Venue Categories on different days of execution. Hence, the values such as the following might differ:
    * Total Districts with venues
    * Total venues
    * Total Venue Categories
    * Total Indian venues
#### However, the code relies on the analysis and not on the values. Hence, hassle-free.


In [24]:
print(f"Total Rows:{helsinki_venues.shape[0]}, Total Columns:{helsinki_venues.shape[1]}")
helsinki_venues.head()

Total Rows:1771, Total Columns:7


Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ala-Malmi,60.249474,25.014539,Ravintola Makalu,60.250291,25.012946,Himalayan Restaurant
1,Ala-Malmi,60.249474,25.014539,Fitness24Seven,60.251597,25.013711,Gym / Fitness Center
2,Ala-Malmi,60.249474,25.014539,Malmin Uimahalli | Fix Liikuntakeskus,60.251131,25.0164,Pool
3,Ala-Malmi,60.249474,25.014539,Thai Thai,60.2485,25.010685,Thai Restaurant
4,Ala-Malmi,60.249474,25.014539,Alko,60.251465,25.013255,Liquor Store


In [25]:
print(f"Total unique Venue categories: {helsinki_venues['Venue Category'].nunique()}")

Total unique Venue categories: 256


In [26]:
print(f"Total Districts identified: {districts_df.District.nunique()}\nTotal Districts with Venues: {helsinki_venues.District.nunique()}")

Total Districts identified: 102
Total Districts with Venues: 100


### 2.2.1 - Identify Districts with Indian Restaurants(Red Zones) in helsinki_venues

__________

### ***Assumption #3***
* Total Venue Category with "Himalayan Restaurant" = 12
* Total Venue Category with "Indian Restaurant" = 10 
* Hence, we will be considering **BOTH Venue Categories (Indian & Himalayan) as Indian Restaurants**

_______________

In [27]:
print(f"Total Himalayan Restaurants: {len(helsinki_venues[(helsinki_venues['Venue Category'] == 'Himalayan Restaurant')])}\nTotal Indian Restaurants: {len(helsinki_venues[(helsinki_venues['Venue Category'] == 'Indian Restaurant')])}")

Total Himalayan Restaurants: 12
Total Indian Restaurants: 10


In [28]:
helsinki_indian_venues = helsinki_venues[(helsinki_venues['Venue Category'] == 'Indian Restaurant') | (helsinki_venues['Venue Category']=='Himalayan Restaurant')]
helsinki_indian_venues.reset_index(drop=True, inplace=True)

print(f"Total number of Indian Restaurants in Helsinki Districts (with venues): {len(helsinki_indian_venues)}")
print("Districts with Indian Restaurant/s:\n")
helsinki_indian_venues

Total number of Indian Restaurants in Helsinki Districts (with venues): 22
Districts with Indian Restaurant/s:



Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ala-Malmi,60.249474,25.014539,Ravintola Makalu,60.250291,25.012946,Himalayan Restaurant
92,Etelä-Haaga,60.211615,24.891092,Roseway,60.207981,24.88694,Indian Restaurant
198,Herttoniemenranta,60.189238,25.029584,Ravintola Mantra,60.186781,25.030365,Himalayan Restaurant
231,Herttoniemi,60.195525,25.029063,Gurkha,60.19493,25.02867,Himalayan Restaurant
265,"Hietalahti, Helsinki",60.162768,24.927331,Aangan,60.163198,24.927786,Himalayan Restaurant
364,Itä-Pasila,60.198825,24.937867,Deli Rasoi,60.198652,24.931503,Indian Restaurant
571,Kallahti,60.200809,25.138395,Ravintola New Light,60.205113,25.135951,Indian Restaurant
678,Keski-Pasila,60.20124,24.92966,Deli Rasoi,60.198652,24.931503,Indian Restaurant
786,Konala,60.23855,24.846065,Ravintola FLAVORS,60.241798,24.851997,Indian Restaurant
860,Kruununhaka,60.17287,24.954733,Nepali Chulo,60.172394,24.958488,Himalayan Restaurant


### 

In [29]:
print(f"Total unique Districts with Indian restaurants: {helsinki_indian_venues.District.nunique()}")

Total unique Districts with Indian restaurants: 22


In [30]:
#Districts with 1 Indian Restaurant - RedZone Map
helsinki_redzone_map = folium.Map(location=[latitude, longitude], zoom_start=12)

for venue, dist, lat, long in zip(helsinki_indian_venues.Venue, helsinki_indian_venues.District, helsinki_indian_venues['Venue Latitude'], helsinki_indian_venues['Venue Longitude']):
    label = folium.Popup('{},{}'.format(venue, dist), parse_html=True)
    folium.CircleMarker([lat, long],
    radius=3,
    popup=label,
    color='red',
    fill=True,
    fill_color='blue',
    fill_opacity=0.7,
    parse_html=False).add_to(helsinki_redzone_map)

helsinki_redzone_map

______________

### Intermediate-SUMMARY I:
* 100 Districts **WITH Venues**
* 22 Districts **WITH "1" INDIAN Restaurant** - **Red Zone**
* 78 Districts **WIHOUT "1" INDIAN Restaurant** - **Green Zone/Benefit Zone**

_________

## PART 2.3 - Extract Top 10 Venue Categories from each District

### 2.3.1 - one-hot encode the Venue Category in helsinki_venues Dataframe

In [31]:
encoded_venues = pd.get_dummies(helsinki_venues[['Venue Category']], prefix='', prefix_sep='', dtype='int64')
encoded_venues.head()

Unnamed: 0,African Restaurant,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Auto Dealership,Auto Workshop,Automotive Shop,...,Venezuelan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Yoga Studio,Zoo,Zoo Exhibit
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [32]:
# Encoded dataframe with column added - District
encoded_venues['District'] = helsinki_venues[['District']]
fix_cols = ['District'] + list(encoded_venues.columns[encoded_venues.columns!='District'])
encoded_venues = encoded_venues[fix_cols]
encoded_venues.head()

Unnamed: 0,District,African Restaurant,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Auto Dealership,Auto Workshop,...,Venezuelan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Yoga Studio,Zoo,Zoo Exhibit
0,Ala-Malmi,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Ala-Malmi,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Ala-Malmi,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Ala-Malmi,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Ala-Malmi,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### 2.3.2 - Rank Venue Categories per District (Top 10)

In [33]:
#Grouped dataframe statistics by District for each Venue Category
helsinki_grouped = encoded_venues.groupby('District').mean()
helsinki_grouped

Unnamed: 0_level_0,African Restaurant,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Auto Dealership,Auto Workshop,Automotive Shop,...,Venezuelan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Yoga Studio,Zoo,Zoo Exhibit
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Ala-Malmi,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.0,...,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0
Alppiharju,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.038462,0.0,0.0,0.0,...,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0
Aurinkolahti,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.0,...,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0
Eira,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.0,...,0.000000,0.0,0.000000,0.0,0.032258,0.0,0.0,0.000000,0.000000,0.0
Etelä-Haaga,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.0,...,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Torpparinmäki,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.0,...,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0
Toukola,0.000000,0.0,0.000000,0.045455,0.045455,0.0,0.000000,0.0,0.0,0.0,...,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0
Ullanlinna,0.000000,0.0,0.017857,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.0,...,0.000000,0.0,0.000000,0.0,0.017857,0.0,0.0,0.000000,0.000000,0.0
Vallila,0.020408,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.0,...,0.020408,0.0,0.020408,0.0,0.000000,0.0,0.0,0.020408,0.020408,0.0


In [34]:
#Transposing the grouped dataframe
helsinki_grouped_T = helsinki_grouped.T
venue_cats = []

#Iterating every District to extract Top 10 venues by frequency(mean) of venues
for col in helsinki_grouped_T.columns.to_list():
    venue_freq = helsinki_grouped_T[col].nlargest(10).round(2)
    venue_cats.append(venue_freq.index.to_list())
print(f"Total arrays of Venue Categories: {len(venue_cats)}")

Total arrays of Venue Categories: 100


In [35]:
#Columns for the top10_venues
district_data = helsinki_grouped_T.columns.to_list()
columns = ['District']
for ind in range(10):
    columns.append(f"Venue Category_RANK{ind+1}")
print(columns)

['District', 'Venue Category_RANK1', 'Venue Category_RANK2', 'Venue Category_RANK3', 'Venue Category_RANK4', 'Venue Category_RANK5', 'Venue Category_RANK6', 'Venue Category_RANK7', 'Venue Category_RANK8', 'Venue Category_RANK9', 'Venue Category_RANK10']


In [36]:
#Splitting venue_cats by 10 for assigning top10 for each District
rl = [j for i in venue_cats for j in i]
rl = np.array_split(rl, len(rl)/10)
print(f"RANK1 Categories for the District {helsinki_grouped_T.columns[0]}:\n{rl[0]}")

RANK1 Categories for the District Ala-Malmi:
['Bus Stop' 'Gym / Fitness Center' 'Basketball Court' 'Beer Bar'
 'Chinese Restaurant' 'Coffee Shop' 'Cultural Center'
 'Fast Food Restaurant' 'Grocery Store' 'Himalayan Restaurant']


In [37]:
#Transpose rank list 'rl' for collecting lists by rank
rl_t = np.transpose(rl)

#Initialize dictionary with: Keys as columns, Values as district_data and sub-lists of 'rl_t' 
d_keys = columns
d_vals = [district_data, rl_t[0], rl_t[1], rl_t[2], rl_t[3], rl_t[4], rl_t[5], rl_t[6], rl_t[7], rl_t[8], rl_t[9]]
data_dict = dict(zip(d_keys, d_vals))

#Initialize top10_venues_df
top10_venues = pd.DataFrame(data_dict)
top10_venues

Unnamed: 0,District,Venue Category_RANK1,Venue Category_RANK2,Venue Category_RANK3,Venue Category_RANK4,Venue Category_RANK5,Venue Category_RANK6,Venue Category_RANK7,Venue Category_RANK8,Venue Category_RANK9,Venue Category_RANK10
0,Ala-Malmi,Bus Stop,Gym / Fitness Center,Basketball Court,Beer Bar,Chinese Restaurant,Coffee Shop,Cultural Center,Fast Food Restaurant,Grocery Store,Himalayan Restaurant
1,Alppiharju,Theme Park Ride / Attraction,Park,Bar,Greek Restaurant,Asian Restaurant,Beer Garden,Blini House,Bus Stop,Café,Dog Run
2,Aurinkolahti,Harbor / Marina,Beach,Beer Bar,Café,Grocery Store,Gym / Fitness Center,Playground,Restaurant,Salon / Barbershop,Sri Lankan Restaurant
3,Eira,Park,Bakery,Café,French Restaurant,Ice Cream Shop,Italian Restaurant,Beach,Boat or Ferry,Coffee Roaster,Coffee Shop
4,Etelä-Haaga,Chinese Restaurant,Park,Bus Stop,Café,College Gym,Dog Run,Indian Restaurant,Pizza Place,Playground,Skate Park
...,...,...,...,...,...,...,...,...,...,...,...
95,Torpparinmäki,Bus Stop,Bistro,Playground,African Restaurant,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant
96,Toukola,Cafeteria,Furniture / Home Store,Art Gallery,Art Museum,Business Service,Café,College Cafeteria,Comic Shop,Convenience Store,Flea Market
97,Ullanlinna,Park,Grocery Store,Coffee Shop,Pizza Place,Scandinavian Restaurant,French Restaurant,Ice Cream Shop,Antique Shop,Beer Garden,Bistro
98,Vallila,Pizza Place,Bar,Park,Cafeteria,Chinese Restaurant,Hostel,Restaurant,Tram Station,African Restaurant,Baseball Field


_______

### Intermediate-SUMMARY II:
* Extracted Venue Categories in each District (from helsinki_grouped)
* Ranked Top 10 Venue Categories for each District by frequency of occurence

_________

## PART 2.4 Locate **_Popularity Centres_** for each District

### 2.4.1 - Extract Popular Venues using Top10 Venue Categories from each District

In [38]:
#Extracting Venues & their respective coordinates (from helsinki_venues) using transposed-ranked venues array (rl_t) for each District into single Dataframe

ranked_venues_list = []
# Iterate Venue Categories (from transposed-ranked Venues array) by Rank
for rank in rl_t:
    # Iterate Ranked Categories w.r.t Districts in top10_venues
    for vc, d in zip(rank, top10_venues.District):
        # Filter empty dataframes
        if not (helsinki_venues.loc[(helsinki_venues['District'] == d) & (helsinki_venues['Venue Category'] == vc)]).empty:
            # Append extracted Dataframes (from expression) to ranked_venues_list
            ranked_venues_list.append(helsinki_venues.loc[(helsinki_venues['District'] == d) & (helsinki_venues['Venue Category'] == vc)]) 
        else:
            pass

ranked_venues_df = pd.concat(ranked_venues_list) # Merge (by concatenation) all extracted Dataframes into a single Dataframe
ranked_venues_df.reset_index(inplace=True, drop=True)

In [39]:
display(ranked_venues_df)

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ala-Malmi,60.249474,25.014539,HSL 3441 Malmin asema,60.250027,25.011935,Bus Stop
1,Ala-Malmi,60.249474,25.014539,HSL 3457 Paavolantie,60.248805,25.012826,Bus Stop
2,Ala-Malmi,60.249474,25.014539,HSL 3442 Malmin asema,60.250256,25.012413,Bus Stop
3,Ala-Malmi,60.249474,25.014539,HSL 3238 Malmin asematie,60.249190,25.007996,Bus Stop
4,Alppiharju,60.189728,24.944120,Vuoristorata,60.188544,24.941248,Theme Park Ride / Attraction
...,...,...,...,...,...,...,...
1173,Sörnäinen,60.183885,24.964409,Savate Club,60.187703,24.962191,Gym / Fitness Center
1174,Toukola,60.208836,24.972565,Vekarakirppis,60.208033,24.967421,Flea Market
1175,Ullanlinna,60.158715,24.949404,Pontus,60.158792,24.946282,Bistro
1176,Vallila,60.196167,24.956710,Vallilan kenttä,60.197689,24.954966,Baseball Field


#### Investigating Data (Manual)

In [40]:
# Ranked Venue Categories of a random District
top10_venues[top10_venues.District == 'Vallila'] 

Unnamed: 0,District,Venue Category_RANK1,Venue Category_RANK2,Venue Category_RANK3,Venue Category_RANK4,Venue Category_RANK5,Venue Category_RANK6,Venue Category_RANK7,Venue Category_RANK8,Venue Category_RANK9,Venue Category_RANK10
98,Vallila,Pizza Place,Bar,Park,Cafeteria,Chinese Restaurant,Hostel,Restaurant,Tram Station,African Restaurant,Baseball Field


In [41]:
# Verifying Venue Categories of the same random District in the Dataframe built from extracted data
ranked_venues_df[ranked_venues_df.District == 'Vallila']

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
301,Vallila,60.196167,24.95671,Sturenkadun Pizzeria,60.195119,24.958194,Pizza Place
302,Vallila,60.196167,24.95671,Marco Polo,60.193917,24.958328,Pizza Place
303,Vallila,60.196167,24.95671,Inter Pizza-Kebab,60.195797,24.962835,Pizza Place
304,Vallila,60.196167,24.95671,Power Pizza,60.195564,24.963224,Pizza Place
468,Vallila,60.196167,24.95671,Ravintola Pikku-Vallila,60.192021,24.958999,Bar
469,Vallila,60.196167,24.95671,Backas Bar,60.192544,24.960758,Bar
470,Vallila,60.196167,24.95671,Hermannin Kukko,60.195874,24.963032,Bar
601,Vallila,60.196167,24.95671,Vallilanlaakso,60.199592,24.956837,Park
602,Vallila,60.196167,24.95671,Paavalinpuisto,60.196537,24.959368,Park
603,Vallila,60.196167,24.95671,Keuruunpuisto,60.191789,24.957465,Park


### 2.4.2 - Visualize Popular Venues in Helsinki City Map

*A Cluster map of Helsinki Districts to witness the concentration of popular venue categories across the Districts.*
* Zoom-in and click-on pop-ups for details about the District & Venue

In [87]:
# Districts with Popular Venues (Venues from Top10 Venue Categories for each District)
from folium import plugins

helsinki_popven_map = folium.Map(location=[latitude, longitude], zoom_start=11) # Initiate Helsinki City Map
pop_venues = plugins.MarkerCluster().add_to(helsinki_popven_map) # Initiate Cluster segmentation plugin

for dist, venue_cat, lat, long in zip(ranked_venues_df['District'], ranked_venues_df['Venue Category'], 
                                    ranked_venues_df['Venue Latitude'], ranked_venues_df['Venue Longitude']):

    label = folium.Popup(f"{dist}, {venue_cat}", parse_html=True)
    folium.Marker(location=[lat, long],
    icon=None,
    popup=label).add_to(pop_venues)

helsinki_popven_map

__________

### Intermediate SUMMARY-III
* Joined Venues (ranked_venues_df) from helsinki_venues based on Top10 Venue Categories as ranked in top10_venues
* Visualized Popular Venues across Helsinki as a Cluster Map

______________

### 2.4.3 Identify Zones & Calculate _Popularity Centres_ (Centroids)

#### Filter Indian Restaurants (using helsinki_indian_venues) from ranked_venues_df

In [88]:
ranked_venues_df # Details of Venue categories from top10_venues

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ala-Malmi,60.249474,25.014539,HSL 3441 Malmin asema,60.250027,25.011935,Bus Stop
1,Ala-Malmi,60.249474,25.014539,HSL 3457 Paavolantie,60.248805,25.012826,Bus Stop
2,Ala-Malmi,60.249474,25.014539,HSL 3442 Malmin asema,60.250256,25.012413,Bus Stop
3,Ala-Malmi,60.249474,25.014539,HSL 3238 Malmin asematie,60.249190,25.007996,Bus Stop
4,Alppiharju,60.189728,24.944120,Vuoristorata,60.188544,24.941248,Theme Park Ride / Attraction
...,...,...,...,...,...,...,...
1173,Sörnäinen,60.183885,24.964409,Savate Club,60.187703,24.962191,Gym / Fitness Center
1174,Toukola,60.208836,24.972565,Vekarakirppis,60.208033,24.967421,Flea Market
1175,Ullanlinna,60.158715,24.949404,Pontus,60.158792,24.946282,Bistro
1176,Vallila,60.196167,24.956710,Vallilan kenttä,60.197689,24.954966,Baseball Field


In [107]:
display(helsinki_indian_venues) # Districts with Indian Restaurant

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ala-Malmi,60.249474,25.014539,Ravintola Makalu,60.250291,25.012946,Himalayan Restaurant
1,Etelä-Haaga,60.211615,24.891092,Roseway,60.207981,24.88694,Indian Restaurant
2,Herttoniemenranta,60.189238,25.029584,Ravintola Mantra,60.186781,25.030365,Himalayan Restaurant
3,Herttoniemi,60.195525,25.029063,Gurkha,60.19493,25.02867,Himalayan Restaurant
4,"Hietalahti, Helsinki",60.162768,24.927331,Aangan,60.163198,24.927786,Himalayan Restaurant
5,Itä-Pasila,60.198825,24.937867,Deli Rasoi,60.198652,24.931503,Indian Restaurant
6,Kallahti,60.200809,25.138395,Ravintola New Light,60.205113,25.135951,Indian Restaurant
7,Keski-Pasila,60.20124,24.92966,Deli Rasoi,60.198652,24.931503,Indian Restaurant
8,Konala,60.23855,24.846065,Ravintola FLAVORS,60.241798,24.851997,Indian Restaurant
9,Kruununhaka,60.17287,24.954733,Nepali Chulo,60.172394,24.958488,Himalayan Restaurant


In [208]:
# RED Zones - Details of Districts WITH Indian Restaurants in Top10 Venues
red_zone = ranked_venues_df.loc[ranked_venues_df.Venue.isin(helsinki_indian_venues.Venue)] # Unqiue Venues, Repeated Districts
red_zone.reset_index(drop=True, inplace=True)

# BLUE Zones - Details of Districts WITH Indian Restaurants *NOT* in Top10 Venues
blue_zone = helsinki_indian_venues.loc[~helsinki_indian_venues.District.isin(red_zone.District)] # Unique Districts, Repeated Venues
blue_zone.reset_index(drop=True, inplace=True)

# GREEN Zones - Details of Districts WITHOUT Indian Restaurants
green_zone = ranked_venues_df.loc[~ranked_venues_df.District.isin(helsinki_indian_venues.District)]
green_zone.reset_index(drop=True, inplace=True)

In [210]:
print(f"Districts WITH Indian Restaurants: {helsinki_indian_venues.District.nunique()}\nDistricts with Indian Restaurants in Top10 venues: {red_zone.District.nunique()}\nDistricts with Indian Restaurants NOT in Top10 venues: {blue_zone.District.nunique()}\nDistricts WITHOUT Indian Restaurant: {green_zone.District.nunique()}")

Districts WITH Indian Restaurants: 22
Districts with Indian Restaurants in Top10 venues: 14
Districts with Indian Restaurants NOT in Top10 venues: 8
Districts WITHOUT Indian Restaurant: 78


#### Investigating Zones ranking in top10_venues

In [182]:
# Ranking of red_zone Districts (in Top10)
top10_venues.loc[top10_venues.District.isin(red_zone.District)]

Unnamed: 0,District,Venue Category_RANK1,Venue Category_RANK2,Venue Category_RANK3,Venue Category_RANK4,Venue Category_RANK5,Venue Category_RANK6,Venue Category_RANK7,Venue Category_RANK8,Venue Category_RANK9,Venue Category_RANK10
0,Ala-Malmi,Bus Stop,Gym / Fitness Center,Basketball Court,Beer Bar,Chinese Restaurant,Coffee Shop,Cultural Center,Fast Food Restaurant,Grocery Store,Himalayan Restaurant
4,Etelä-Haaga,Chinese Restaurant,Park,Bus Stop,Café,College Gym,Dog Run,Indian Restaurant,Pizza Place,Playground,Skate Park
12,Herttoniemenranta,Gym / Fitness Center,Clothing Store,Harbor / Marina,Buffet,Bus Stop,Daycare,Greek Restaurant,Grocery Store,Himalayan Restaurant,Hotel
23,Kallahti,Beach,Café,Gym,Indian Restaurant,Park,Pizza Place,Scandinavian Restaurant,Tennis Court,African Restaurant,American Restaurant
29,Konala,Bus Stop,Automotive Shop,Pizza Place,Supermarket,Auto Workshop,Café,Chinese Restaurant,Dog Run,Indian Restaurant,Laundromat
38,Lassila,Bus Stop,Platform,Cafeteria,Chinese Restaurant,Indian Restaurant,Karaoke Bar,Pharmacy,Pub,Skating Rink,Soccer Field
41,Länsi-Herttoniemi,Bus Stop,Himalayan Restaurant,Pizza Place,Recreation Center,Scenic Lookout,African Restaurant,American Restaurant,Antique Shop,Art Gallery,Art Museum
51,Mellunkylä,Bus Stop,Chinese Restaurant,Discount Store,Falafel Restaurant,Fast Food Restaurant,Flea Market,Gym / Fitness Center,Himalayan Restaurant,Hockey Field,Liquor Store
54,Metsälä,Bus Stop,Cafeteria,Dog Run,Grocery Store,Gym / Fitness Center,Indian Restaurant,Liquor Store,Martial Arts School,Platform,African Restaurant
55,Munkkiniemi,Café,Art Museum,Bistro,Cafeteria,Comfort Food Restaurant,Convenience Store,Disc Golf,Gastropub,Historic Site,Indian Restaurant


In [183]:
# Ranking of blue_zone Districts (NOT in Top10)
top10_venues.loc[top10_venues.District.isin(blue_zone.District)]

Unnamed: 0,District,Venue Category_RANK1,Venue Category_RANK2,Venue Category_RANK3,Venue Category_RANK4,Venue Category_RANK5,Venue Category_RANK6,Venue Category_RANK7,Venue Category_RANK8,Venue Category_RANK9,Venue Category_RANK10
13,Herttoniemi,Bus Stop,Gym / Fitness Center,Supermarket,Clothing Store,Pharmacy,Sandwich Place,Buffet,Bus Station,Chinese Restaurant,Convenience Store
15,"Hietalahti, Helsinki",Restaurant,Scandinavian Restaurant,Hotel,Sandwich Place,Seafood Restaurant,Beer Bar,Café,Italian Restaurant,Japanese Restaurant,Pizza Place
17,Itä-Pasila,Café,Platform,Supermarket,Thai Restaurant,Beer Bar,Bridge,Chocolate Shop,Climbing Gym,Coffee Shop,Comedy Club
25,Keski-Pasila,Restaurant,Beer Bar,Bus Stop,Gym / Fitness Center,Platform,Plaza,Shopping Mall,Supermarket,Bridge,Café
31,Kruununhaka,Boat or Ferry,Café,Bar,Grocery Store,History Museum,Theater,Beer Bar,Chinese Restaurant,Indie Movie Theater,Pizza Place
44,"Malmi, Helsinki",Bus Stop,Gym / Fitness Center,Bar,Beer Bar,Café,Chinese Restaurant,Coffee Shop,Cultural Center,Diner,Fast Food Restaurant
77,Punavuori,Scandinavian Restaurant,Coffee Shop,Bakery,Beer Bar,Café,Italian Restaurant,Park,Pizza Place,Restaurant,American Restaurant
96,Toukola,Cafeteria,Furniture / Home Store,Art Gallery,Art Museum,Business Service,Café,College Cafeteria,Comic Shop,Convenience Store,Flea Market


#### _Popularity Centre_ for each District

#### _Centroid is also known as Mean (by definition), when a Cluster is identified. In our case, District._

In [215]:
# Popularity Centres for all Districts in ranked_venues_df
pop_centers = ranked_venues_df.groupby('District')['Venue Latitude', 'Venue Longitude'].mean() 

In [213]:
# Popularity Centre for Red Zones
red_pop = pop_centers.loc[pop_centers.index.isin(red_zone.District)]

In [212]:
# Popularity Centre for Blue Zones
blue_pop = pop_centers.loc[pop_centers.index.isin(blue_zone.District)]

In [214]:
# Popularity Centre for Red Zones
green_pop = pop_centers.loc[pop_centers.index.isin(green_zone.District)]

### 2.4.4 - Visualize _Popularity Centres_ in Helsinki City Map

In [217]:
helsinki_popcenters_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# Popularity Centres for Red Zones
for dist, lat, long in zip(red_pop.index, red_pop['Venue Latitude'], red_pop['Venue Longitude']):
    label = folium.Popup('{}'.format(dist), parse_html=True)
    folium.CircleMarker([lat, long], radius=5, popup=label, color='red', fill=True, fill_color='red', fill_opacity=0.5, parse_html=False).add_to(helsinki_popcenters_map)
# Popularity Centres for Blue Zones
for dist, lat, long in zip(blue_pop.index, blue_pop['Venue Latitude'], blue_pop['Venue Longitude']):
    label = folium.Popup('{}'.format(dist), parse_html=True)
    folium.CircleMarker([lat, long], radius=5, popup=label, color='blue', fill=True, fill_color='blue', fill_opacity=0.5, parse_html=False).add_to(helsinki_popcenters_map)
# Popularity Centres for Green Zones
for dist, lat, long in zip(green_pop.index, green_pop['Venue Latitude'], green_pop['Venue Longitude']):
    label = folium.Popup('{}'.format(dist), parse_html=True)
    folium.CircleMarker([lat, long], radius=5, popup=label, color='green', fill=True, fill_color='green', fill_opacity=0.5, parse_html=False).add_to(helsinki_popcenters_map)

helsinki_popcenters_map

#### OVERVIEW:
#### Levels of Risk for establishing a chain of Indian Restaurants across Helsinki:
* RED Zone - **_Major-Competition Zone_** - Indian Restaurant **in Top10**
* BLUE Zone - **_Minor-Competition Zone_** - Indian Restaurant **NOT in Top10, BUT in District**
* GREEN Zone - **"_Benefit-Zone_"** - Indian Restaurant **NOT in Top10, NOT in District** _(Based on data from FourSquare API)_

______________
______________

## Declaration
***All the analysis and the assumptions are based on the data provided by the FourSquare API. Hence, I conclusively declare that these analysis could only be as accurate as the data extracted from the FourSquare API.***


_______________
_______________