# Battle of the Cities

------

This notebook contains the coding needed to find a suitable city to open a bookstore.

### The Buisness Dilemma:

An enthusiastic entrepreneur wants to open up a  bookstore in populated city in Finland. Bookstores are vanishing these days due to online books or as they say  ebooks but some people still prefer the reading from a book , this entrepreneur wants to bring back the feel of reading from  a book . This project aims at finding an appropriate city to open a bookstore.This project can be used by anyone who wishes to open a bookstore or any other retail business in Finland.

### Data

The data  needed for this project would be the name of the cities in Finland and their respective latitudes and longitudes, also the name, id , latitude and longitude of the respective venues in and around the cities of Finland. Data  for the city like the names, latitude and longitude of the city will be taken from the website:  [Simple maps,Finland Cities Database](https://simplemaps.com/data/fi-cities), where the data is available in the form of a csv file and also the name, id , latitude and longitude of the respective venues in and around the cities is provided by the Foursquare API.

#### Packages that need to be installed for working the data

+ Geopy
+ Geocoder
+ Folium

Libraries needed for this project

+ numpy
+ pandas
+ random
+ requests
+ matplotlib
+ KMeans
+ folium
+ json_normalize
+ Nominatim

Installing nessecary packages

In [1]:
!pip install geopy



In [2]:
! pip install geocoder

Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 8.3 MB/s  eta 0:00:01
Collecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


In [3]:
!pip install folium

Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 4.8 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1


Importing nesseacary libraries required for data

In [4]:
import pandas as pd
import numpy as np
import random # library for random number generation
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

from geopy.geocoders import Nominatim # for getting latitude and longitude

print('Libraries imported.')

Libraries imported.


#### Main Coding

Downloading the data

In [6]:
# data downloaded from website into a fi.csv file
!wget -q -O 'fi.csv' https://simplemaps.com/static/data/country-cities/fi/fi.csv
print('Data Downloaded')

Data Downloaded


Converting csv file into pandas dataframe 

In [7]:
fincity_df= pd.read_csv('fi.csv')
print('Size of the dataframe downloaded is:',fincity_df.shape)
fincity_df

Size of the dataframe downloaded is: (323, 9)


Unnamed: 0,city,lat,lng,country,iso2,admin_name,capital,population,population_proper
0,Helsinki,60.1756,24.9342,Finland,FI,Uusimaa,primary,642045.0,642045.0
1,Espoo,60.2100,24.6600,Finland,FI,Uusimaa,minor,269802.0,269802.0
2,Tampere,61.4981,23.7608,Finland,FI,Pirkanmaa,admin,225118.0,225118.0
3,Vantaa,60.3000,25.0333,Finland,FI,Uusimaa,minor,214605.0,214605.0
4,Oulu,65.0142,25.4719,Finland,FI,Pohjois-Pohjanmaa,admin,200526.0,200526.0
...,...,...,...,...,...,...,...,...,...
318,Åva,60.4500,21.0833,Finland,FI,Åland,minor,,
319,Maaninka,63.1500,27.3000,Finland,FI,Pohjois-Savo,minor,,
320,Tammela,60.8000,23.7667,Finland,FI,Kanta-Häme,minor,,
321,Tohmajärvi,62.1833,30.3833,Finland,FI,Pohjois-Karjala,minor,,


Data pre-processing

###### Data pre-processing is done to get the desired data from the dataframe in the desired format for further use of the data from the dataframe.       


Rows in the dataframe that have NaN values in the population column will be dropped as a good place would be a populated one.

In [8]:
# removing rows that have as population as NaN
fincity_df.dropna(subset=["population"], axis=0, inplace=True) 
fincity_df.tail(5) # checking dataframe after drop

Unnamed: 0,city,lat,lng,country,iso2,admin_name,capital,population,population_proper
106,Salla,66.8333,28.6667,Finland,FI,Lappi,minor,3727.0,3727.0
107,Pello,66.775,23.9667,Finland,FI,Lappi,minor,3623.0,3623.0
108,Kaavi,62.975,28.4833,Finland,FI,Pohjois-Savo,minor,3194.0,3194.0
109,Muonio,67.95,23.6833,Finland,FI,Lappi,minor,2358.0,2358.0
110,Kaskinen,62.3847,21.2222,Finland,FI,Pohjanmaa,minor,1285.0,1285.0


Some of the columns would not be needed like the columns: Country, iso2, population_proper. Hence these columns would be dropped.

In [9]:
fincity_df = fincity_df.drop(columns =['country', 'iso2','population_proper'])
fincity_df.head(5)     

Unnamed: 0,city,lat,lng,admin_name,capital,population
0,Helsinki,60.1756,24.9342,Uusimaa,primary,642045.0
1,Espoo,60.21,24.66,Uusimaa,minor,269802.0
2,Tampere,61.4981,23.7608,Pirkanmaa,admin,225118.0
3,Vantaa,60.3,25.0333,Uusimaa,minor,214605.0
4,Oulu,65.0142,25.4719,Pohjois-Pohjanmaa,admin,200526.0


In [10]:
fincity_df.shape 

(111, 6)

In [11]:
#renaming column names for better understanding
fincity_df.rename(columns = {'lat':'Latitude','lng':'Longitude','admin_name':'Region(Finnish)'},inplace=True) #renaming column names
fincity_df.head() 

Unnamed: 0,city,Latitude,Longitude,Region(Finnish),capital,population
0,Helsinki,60.1756,24.9342,Uusimaa,primary,642045.0
1,Espoo,60.21,24.66,Uusimaa,minor,269802.0
2,Tampere,61.4981,23.7608,Pirkanmaa,admin,225118.0
3,Vantaa,60.3,25.0333,Uusimaa,minor,214605.0
4,Oulu,65.0142,25.4719,Pohjois-Pohjanmaa,admin,200526.0


In [12]:
#removing cities having population less than 10000
lenth = fincity_df.shape[0]
for i in range(0,lenth):
    if (fincity_df['population'][i]<10000):
        fincity_df.drop([i],axis=0, inplace= True)
fincity_df.shape                                  # checking size of dataframe after drop

(75, 6)

In [15]:
# Showing the capital of Finland and the capital of its respective region
tempdf = fincity_df
tempdf = tempdf.groupby('capital')
tempdf.get_group('primary')

Unnamed: 0,city,Latitude,Longitude,Region(Finnish),capital,population
0,Helsinki,60.1756,24.9342,Uusimaa,primary,642045.0


In [16]:
# Showing capitals of the respectve regions of Finland
tempdf.get_group('admin')

Unnamed: 0,city,Latitude,Longitude,Region(Finnish),capital,population
2,Tampere,61.4981,23.7608,Pirkanmaa,admin,225118.0
4,Oulu,65.0142,25.4719,Pohjois-Pohjanmaa,admin,200526.0
5,Turku,60.4517,22.27,Varsinais-Suomi,admin,187604.0
6,Jyväskylä,62.2333,25.7333,Keski-Suomi,admin,137368.0
7,Lahti,60.9833,25.6556,Päijät-Häme,admin,118119.0
8,Kuopio,62.8925,27.6783,Pohjois-Savo,admin,112117.0
9,Kouvola,60.8681,26.7042,Kymenlaakso,admin,85855.0
10,Pori,61.4847,21.7972,Satakunta,admin,85363.0
11,Joensuu,62.6,29.7639,Pohjois-Karjala,admin,75514.0
12,Lappeenranta,61.0583,28.1861,Etelä-Karjala,admin,72875.0


The main dataframe is now ready.      
##### Ploting the cities on the map of Finland using folium.

In [18]:
# Getting Geographic location of Finland that is latitude and longitude
address = 'Finland'

geolocator = Nominatim(user_agent="fin_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Finland are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Finland are 63.2467777, 25.9209164.


In [19]:
# create map of Finland using latitude and longitude values
finmap = folium.Map(location=[latitude, longitude], zoom_start=6)

# add markers to map
for lat, lng, city in zip(fincity_df['Latitude'], fincity_df['Longitude'], fincity_df['city']):
    label = '{}'.format(city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='white',
        fill=True,
        fill_color='orange',
        fill_opacity=0.7,
        parse_html=False).add_to(finmap)  
    
finmap

The above map shows cities in Finland.

Now moving on to exploring the cities further.

For exploring and finding the different cafes, book shops and other venues in the above cities, Foursquare API will be used.   
Foursquare API requires user credentials for the use of the API.      
The below cell contains credentials but are hidden from sharing and also replaced.

In [20]:
# @hidden_cell
CLIENT_ID = '#' #  Foursquare ID
CLIENT_SECRET = '#' # Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Credentials Accepted')

Credentials Accepted


The following function is created to get the venues in and around all the cities.

In [21]:
def getNearbyVenues(names, latitudes, longitudes, radius=10000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['id'],
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    fin_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    fin_venues.columns = ['city', 
                  'Lat', 
                  'Long', 
                  'Venue',
                  'Venue Id',       
                  'Venue Lat', 
                  'Venue Long', 
                  'Venue Category']
    
    return(fin_venues)

Here, the above function is called and the returned dataframe is put into a new dataframe called fin_venues.

In [22]:
fin_venues = getNearbyVenues(names=fincity_df['city'],
                                   latitudes=fincity_df['Latitude'],
                                   longitudes=fincity_df['Longitude']
                                  )

Helsinki
Espoo
Tampere
Vantaa
Oulu
Turku
Jyväskylä
Lahti
Kuopio
Kouvola
Pori
Joensuu
Lappeenranta
Hämeenlinna
Vaasa
Rovaniemi
Seinäjoki
Mikkeli
Kotka
Salo
Porvoo
Kokkola
Lohja
Hyvinkää
Nurmijärvi
Järvenpää
Rauma
Kajaani
Savonlinna
Kerava
Nokia
Ylöjärvi
Kaarina
Riihimäki
Imatra
Sastamala
Raahe
Raisio
Iisalmi
Tornio
Kemi
Kurikka
Jämsä
Valkeakoski
Varkaus
Hamina
Äänekoski
Heinola
Jakobstad
Naantali
Pieksämäki
Forssa
Toijala
Kauhava
Loimaa
Orimattila
Kuusamo
Uusikaupunki
Pargas
Lovisa
Ylivieska
Lapua
Kauhajoki
Ulvila
Kalajoki
Alavus
Lieksa
Kankaanpää
Mariehamn
Nivala
Kitee
Paimio
Huittinen
Keuruu
Alajärvi


In [23]:
# Displaying the new dataframe and its size
print('Size of the new venues dataframe ',fin_venues.shape)
fin_venues.head(5)

Size of the new venues dataframe  (3391, 8)


Unnamed: 0,city,Lat,Long,Venue,Venue Id,Venue Lat,Venue Long,Venue Category
0,Helsinki,60.1756,24.9342,Arkadia Oy International Bookshop,4bc08b95461576b0d6417a32,60.173369,24.92933,Bookstore
1,Helsinki,60.1756,24.9342,Taidehalli,4adcdb23f964a520dc6021e3,60.172127,24.931014,Art Gallery
2,Helsinki,60.1756,24.9342,Sinisen huvilan kahvila,4be302eb63609c74cfd51bff,60.181305,24.937043,Café
3,Helsinki,60.1756,24.9342,Cafe Rouge,5555afa9498efb7ce749253c,60.168711,24.933027,Middle Eastern Restaurant
4,Helsinki,60.1756,24.9342,Buongiorno Cafe & Restaurant,51289893e4b0386981d9e120,60.175304,24.919294,Café


In [24]:
# Checking number of venues returned for each city.
fin_venues.groupby('city').count()

Unnamed: 0_level_0,Lat,Long,Venue,Venue Id,Venue Lat,Venue Long,Venue Category
city,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Alajärvi,5,5,5,5,5,5,5
Alavus,6,6,6,6,6,6,6
Espoo,100,100,100,100,100,100,100
Forssa,30,30,30,30,30,30,30
Hamina,24,24,24,24,24,24,24
...,...,...,...,...,...,...,...
Vantaa,100,100,100,100,100,100,100
Varkaus,15,15,15,15,15,15,15
Ylivieska,19,19,19,19,19,19,19
Ylöjärvi,66,66,66,66,66,66,66


In [25]:
# Finding out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(fin_venues['Venue Category'].unique())))

There are 297 uniques categories.


Analyzing Each City's Venues

In [26]:
# Using One-hot coding approach
# one hot encoding
fin_onehot = pd.get_dummies(fin_venues[['Venue Category']], prefix="", prefix_sep="")

# add city column back to dataframe
fin_onehot['city'] = fin_venues['city'] 

# move city column to the first column
fixed_columns = [fin_onehot.columns[-1]] + list(fin_onehot.columns[:-1])
fin_onehot = fin_onehot[fixed_columns]

fin_onehot.head(5)

Unnamed: 0,city,ATM,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Apres Ski Bar,Aquarium,...,Vietnamese Restaurant,Warehouse Store,Water Park,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Yoga Studio,Zoo
0,Helsinki,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Helsinki,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Helsinki,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Helsinki,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Helsinki,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [27]:
# grouping rows by city and by taking the mean of the frequency of occurrence of each category
fin_group= fin_onehot.groupby('city').mean().reset_index()
fin_group.head(5)

Unnamed: 0,city,ATM,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Apres Ski Bar,Aquarium,...,Vietnamese Restaurant,Warehouse Store,Water Park,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Yoga Studio,Zoo
0,Alajärvi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Alavus,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Espoo,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0
3,Forssa,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Hamina,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


 Creating a function to sort venues in descending order

In [28]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues] 

Creating a dataframe with top 10 venues for each city.

In [29]:
num_top_venues = 10 # top 10 venues 

indicators = ['st', 'nd', 'rd']

# creating  columns according to number of top venues
columns = ['city']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# creating a new dataframe having city names and its top 10 venues
f_venues = pd.DataFrame(columns=columns)
f_venues['city'] = fin_group['city']

for ind in np.arange(fin_group.shape[0]):
    f_venues.iloc[ind, 1:] = return_most_common_venues(fin_group.iloc[ind, :], num_top_venues)

f_venues.head()  # top 10 venues dataframe

Unnamed: 0,city,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alajärvi,Supermarket,Baseball Field,Gas Station,Flower Shop,Filipino Restaurant,Fireworks Store,Fish Market,Fishing Spot,Fishing Store,Flea Market
1,Alavus,Sandwich Place,Airport,Department Store,Supermarket,Train Station,Burger Joint,Flower Shop,Filipino Restaurant,Fireworks Store,Fish Market
2,Espoo,Café,Gym / Fitness Center,Beach,Golf Course,Coffee Shop,Gym,Himalayan Restaurant,Pizza Place,Park,Juice Bar
3,Forssa,Supermarket,Plaza,Fast Food Restaurant,Pizza Place,Bar,Lake,Brewery,Soccer Field,National Park,Nightclub
4,Hamina,Grocery Store,Supermarket,Fish Market,Hotel,Campground,Café,Fast Food Restaurant,Bar,Bakery,Ski Area


In [30]:
# checking new top 10 venues dataframe size
f_venues.shape

(75, 11)

***Use of KMeans Clustering***         
        
Using KMeans to form clusters and find suitable cities, here 4 clusters are used

In [32]:
# set number of clusters
k = 4

# droping city column to get only top 10 venues columns
fin_clust = fin_group.drop('city', 1) 

# run k-means clustering
# fit kmean model with fin_clust dataframe
kmean = KMeans(n_clusters=k, random_state=0).fit(fin_clust)

# check cluster labels generated for each row in the dataframe
kmean.labels_[0:10] 

array([3, 0, 2, 0, 0, 0, 2, 0, 2, 2], dtype=int32)

In [33]:
# add clustering labels
f_venues.insert(0, 'Cluster Labels', kmean.labels_)

# copying original city dataframe to new dataframe
findf = fincity_df

# merge city dataframe with city venues to add latitude/longitude for each city
findf = findf.join(f_venues.set_index('city'), on='city')

findf.head() # check new datframe and new cluster label column

Unnamed: 0,city,Latitude,Longitude,Region(Finnish),capital,population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Helsinki,60.1756,24.9342,Uusimaa,primary,642045.0,2,Café,Scandinavian Restaurant,Coffee Shop,Hotel,Pizza Place,Park,Bakery,Indie Movie Theater,Theater,French Restaurant
1,Espoo,60.21,24.66,Uusimaa,minor,269802.0,2,Café,Gym / Fitness Center,Beach,Golf Course,Coffee Shop,Gym,Himalayan Restaurant,Pizza Place,Park,Juice Bar
2,Tampere,61.4981,23.7608,Pirkanmaa,admin,225118.0,2,Café,Gym / Fitness Center,Park,Scenic Lookout,Gastropub,Restaurant,Kebab Restaurant,Bistro,Sauna / Steam Room,Pizza Place
3,Vantaa,60.3,25.0333,Uusimaa,minor,214605.0,2,Recreation Center,Gym / Fitness Center,Café,Pizza Place,Hotel,Airport Lounge,Coffee Shop,Sushi Restaurant,Sporting Goods Shop,Thai Restaurant
4,Oulu,65.0142,25.4719,Pohjois-Pohjanmaa,admin,200526.0,2,Café,Pizza Place,Supermarket,Restaurant,Gym / Fitness Center,Fast Food Restaurant,Indian Restaurant,Pub,Pool Hall,Chinese Restaurant


Finally creating a cluster map       

#### Plotting cities of Finland on the map of Finland in clusters

In [34]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=6)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(findf['Latitude'], findf['Longitude'], findf['city'], findf['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Results of Clusters

Cluster 1 :Leisure and Shopping  

In [35]:
findf.loc[findf['Cluster Labels'] == 0,findf.columns[[0] + list(range(5, findf.shape[1]))]]

Unnamed: 0,city,population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Kouvola,85855.0,0,Supermarket,Pizza Place,Fast Food Restaurant,Coffee Shop,Sandwich Place,Café,Grocery Store,Gym / Fitness Center,Chinese Restaurant,Pharmacy
11,Joensuu,75514.0,0,Supermarket,Café,Gym / Fitness Center,Bar,Fast Food Restaurant,Chinese Restaurant,Scandinavian Restaurant,Sandwich Place,Hotel,Coffee Shop
17,Mikkeli,54665.0,0,Supermarket,Restaurant,Café,Fast Food Restaurant,Flea Market,Coffee Shop,Hotel,Movie Theater,Shopping Mall,Grocery Store
26,Rauma,39809.0,0,Supermarket,Café,Restaurant,Fast Food Restaurant,Grocery Store,Pizza Place,Hockey Arena,Turkish Restaurant,Kebab Restaurant,Steakhouse
28,Savonlinna,35523.0,0,Supermarket,Hotel,Pizza Place,Ski Area,Brewery,Train Station,Scandinavian Restaurant,Seafood Restaurant,Resort,Discount Store
35,Sastamala,25220.0,0,Café,Supermarket,Pizza Place,Sandwich Place,Grocery Store,Bakery,Train Station,Chinese Restaurant,Gas Station,Fish Market
36,Raahe,25165.0,0,Fast Food Restaurant,Supermarket,Grocery Store,Flea Market,Bakery,Theater,Café,Sandwich Place,Business Service,Chinese Restaurant
38,Iisalmi,21945.0,0,Supermarket,Bar,Fast Food Restaurant,Golf Course,Train Station,Gastropub,Café,Grocery Store,Italian Restaurant,Flea Market
39,Tornio,21928.0,0,Supermarket,Grocery Store,Smoke Shop,Chinese Restaurant,Hockey Arena,Shopping Mall,Fast Food Restaurant,Clothing Store,Bar,Hotel
40,Kemi,21758.0,0,Supermarket,Grocery Store,Hotel,Event Space,Castle,Beer Bar,Gas Station,Ski Area,Boat or Ferry,Train Station


Cluster 2 : Supermarkets and Transportational services 

In [36]:
findf.loc[findf['Cluster Labels'] == 1,findf.columns[[0] + list(range(5, findf.shape[1]))]]

Unnamed: 0,city,population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
53,Kauhava,16784.0,1,Supermarket,Hotel,Airport,Train Station,Discount Store,Flea Market,Fast Food Restaurant,Filipino Restaurant,Fireworks Store,Fish Market
66,Lieksa,11772.0,1,Supermarket,Pizza Place,Discount Store,Train Station,Bakery,Food Service,Food Court,Food Truck,Food & Drink Shop,Food
69,Nivala,10876.0,1,Supermarket,Pizza Place,Train Station,Turkish Restaurant,Burger Joint,Fishing Store,Fast Food Restaurant,Filipino Restaurant,Fireworks Store,Fish Market


Cluster 3 : Cafe's, Bars, Pubs and Resturants

In [37]:
findf.loc[findf['Cluster Labels'] == 2,findf.columns[[0] + list(range(5, findf.shape[1]))]]

Unnamed: 0,city,population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Helsinki,642045.0,2,Café,Scandinavian Restaurant,Coffee Shop,Hotel,Pizza Place,Park,Bakery,Indie Movie Theater,Theater,French Restaurant
1,Espoo,269802.0,2,Café,Gym / Fitness Center,Beach,Golf Course,Coffee Shop,Gym,Himalayan Restaurant,Pizza Place,Park,Juice Bar
2,Tampere,225118.0,2,Café,Gym / Fitness Center,Park,Scenic Lookout,Gastropub,Restaurant,Kebab Restaurant,Bistro,Sauna / Steam Room,Pizza Place
3,Vantaa,214605.0,2,Recreation Center,Gym / Fitness Center,Café,Pizza Place,Hotel,Airport Lounge,Coffee Shop,Sushi Restaurant,Sporting Goods Shop,Thai Restaurant
4,Oulu,200526.0,2,Café,Pizza Place,Supermarket,Restaurant,Gym / Fitness Center,Fast Food Restaurant,Indian Restaurant,Pub,Pool Hall,Chinese Restaurant
5,Turku,187604.0,2,Café,Gym / Fitness Center,Park,Vegetarian / Vegan Restaurant,Scandinavian Restaurant,Gym,Restaurant,Pizza Place,Bar,Beer Bar
6,Jyväskylä,137368.0,2,Supermarket,Café,Scandinavian Restaurant,Park,Gym,Grocery Store,Coffee Shop,Gym / Fitness Center,General Entertainment,Music Venue
7,Lahti,118119.0,2,Café,Supermarket,Restaurant,Ski Area,Bar,Burger Joint,Pizza Place,Beach,Gym,Gym / Fitness Center
8,Kuopio,112117.0,2,Supermarket,Grocery Store,Café,Bar,Fast Food Restaurant,Scandinavian Restaurant,Pizza Place,Pub,Hotel,Italian Restaurant
10,Pori,85363.0,2,Supermarket,Café,Pizza Place,Grocery Store,Gym / Fitness Center,Fast Food Restaurant,Mexican Restaurant,Park,Bar,Shopping Mall


Cluster 4 : Markets 

In [38]:
findf.loc[findf['Cluster Labels'] == 3,findf.columns[[0] + list(range(5, findf.shape[1]))]]

Unnamed: 0,city,population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
74,Alajärvi,10006.0,3,Supermarket,Baseball Field,Gas Station,Flower Shop,Filipino Restaurant,Fireworks Store,Fish Market,Fishing Spot,Fishing Store,Flea Market


Resulting suitable cities

In [39]:
#list of cities suitable for opening a bookstore
suit_fincity = findf.loc[findf['Cluster Labels'] == 0,findf.columns[[0] + list(range(5, findf.shape[1]))]]
suit_list = suit_fincity['city'].values.tolist()
suit_list

['Kouvola',
 'Joensuu',
 'Mikkeli',
 'Rauma',
 'Savonlinna',
 'Sastamala',
 'Raahe',
 'Iisalmi',
 'Tornio',
 'Kemi',
 'Kurikka',
 'Jämsä',
 'Valkeakoski',
 'Varkaus',
 'Hamina',
 'Äänekoski',
 'Heinola',
 'Jakobstad',
 'Pieksämäki',
 'Forssa',
 'Loimaa',
 'Orimattila',
 'Kuusamo',
 'Uusikaupunki',
 'Lovisa',
 'Ylivieska',
 'Lapua',
 'Kauhajoki',
 'Kalajoki',
 'Alavus',
 'Kankaanpää',
 'Kitee',
 'Paimio',
 'Huittinen',
 'Keuruu']

### Conclusion

Results from the KMeans cluster shows 4 clusters:
+ Cluster 1: Shows venues fit for family leisure like parks , theatres, bookstore, hotels, etc. 
+ Cluster 2: Shows sumermarkets and transport services
+ Cluster 3 : Shows cafe's, bars, pubs and resturants and all high end leisure venues.
+ Cluster 4 : Shows markets like sumpermarkets fish market


From the above analysis and resulting clusters, a conclusion can be drawn that cities in cluster 1 are more suitable to openup a bookstore as the cities in cluster 1 are fairly populated and is advisable for the group of people that the owner plans on targeting.
Hence, *Cities in Cluster 1 are Suitable* for opening a bookstore. These cities are :    
*Kuopio, Kouvola, Joensuu, Seinäjoki, Mikkeli, Kokkola, Järvenpää, Rauma, Kajaani, Savonlinna, Kerava, Nokia, Riihimäki, Sastamala, Raahe, Iisalmi, Tornio, Kemi, Kurikka, Jämsä, Valkeakoski, Hamina, Äänekoski, Heinola, Jakobstad, Pieksämäki, Forssa, Toijala, Loimaa, Orimattila, Kuusamo, Uusikaupunki, Lovisa, Ylivieska, Lapua, Kauhajoki, Ulvila, Kalajoki, Paimio, Huittinen, Keuruu*
 

--------