### Exploring for potential New Hotel venues in Warsaw Districts

### 1.	Introduction

#### a.	Background

This report is for those who are planning to start a new hotel in the city of Warsaw. It provides a suggestion on what would be the best venue to start a new hotel in a highly visited city with already many good hotels.

Warsaw, is one of the most popular tourist destinations. Currently ranked as 74 of most visited cities worldwide. Warsaw is definitely one of the best places to start up a new hotel business.
In 2018, Warsaw received over 2,8 million visitors, tourist areas in Warsaw provide huge opportunities for hotels. We will go through the benefits and pitfalls of opening a new Hotel in a highly visited city with already many hotels offering their services.
The core of Warsaw is made of 18 districts but, We will concentrate on districts, where the busiest venues of Warsaw can be found to target the tourists visiting the city. With that in mind, I will be able to find the Top 3 districts to open a brand new hotel in the city.

#### b.	Business Problem

This report focusses on the issue of where to open a new hotel in a city like Warsaw, once one has decided to go ahead. Let’s imagine investment company Mariette willing to open a new luxury hotel, a first and foremost important decision will be the location for its new hotel.



#### c.	Interest

- On what basis can Mariette decide its new hotel's location?
- While selecting the place there are key points to consider like they need to check out like where the most well-visited venues of the city are?
- If incase there are already other luxury hotels which have good ratings, will it be risky to open new one near these hotels?


### 2.	Data Preparation

#### a.  Scrapping Warsaw Districts Table from Wikipedia

I first make use of Districts of Warsaw page from Wiki to scrap the table to create a data-frame. For this, I used requests and Beautifulsoup4 library to create a dataframe containing name of the 18 districts of Warsaw Area.


In [1]:
import requests
from bs4 import BeautifulSoup as bs
from urllib.request import urlopen as uReq
import requests
import lxml
import pandas as pd
from pandas import DataFrame
import numpy as np


In [2]:
url='https://en.wikipedia.org/wiki/Districts_of_Warsaw'

In [3]:
req = requests.get(url)

In [5]:
soup = bs(req.text,'lxml')
Districts_Warsaw = soup.find('table',{'class':'wikitable'})


In [6]:
dfs = pd.read_html(str(Districts_Warsaw))

In [7]:
df = dfs[0]

In [8]:
df['Size'] = df['Area'].str.split('\xa0km2 ', n = 1, expand=True)[0]

In [9]:
df.drop(['Area'], axis=1, inplace = True)

In [10]:
# remove Total
df.drop([18], inplace = True)

In [11]:
df

Unnamed: 0,District,Population,Size
0,Mokotów,220682,35.4
1,Praga Południe,178665,22.4
2,Ursynów,145938,48.6
3,Wola,137519,19.26
4,Bielany,132683,32.3
5,Targówek,123278,24.37
6,Śródmieście,122646,15.57
7,Bemowo,115873,24.95
8,Białołęka,96588,73.04
9,Ochota,84990,9.7


#### Calculating Latitude and Longitude per District

In [12]:
from geopy.geocoders import Nominatim

In [13]:
def locateDistrict(address):
#address = 'Włochy, Warszawa'

    geolocator = Nominatim(user_agent="foursquare_agent")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    return ([latitude, longitude])

In [14]:
rows = []
for addr in df['District']:
    address = addr + ', Warszawa'
    out = locateDistrict(address)
    rows.append([addr, out[0], out[1]])

In [15]:
df_loc = pd.DataFrame(columns=['District', 'Latitude', 'Longitude'], data=rows)

In [16]:
df_waw = df.merge(df_loc, on='District')

In [17]:
df_waw

Unnamed: 0,District,Population,Size,Latitude,Longitude
0,Mokotów,220682,35.4,52.193987,21.045781
1,Praga Południe,178665,22.4,52.237396,21.071258
2,Ursynów,145938,48.6,52.141039,21.032321
3,Wola,137519,19.26,52.236238,20.954781
4,Bielany,132683,32.3,52.294652,20.92998
5,Targówek,123278,24.37,52.275192,21.058085
6,Śródmieście,122646,15.57,52.23281,21.019067
7,Bemowo,115873,24.95,52.238974,20.913288
8,Białołęka,96588,73.04,52.319665,21.021177
9,Ochota,84990,9.7,52.212225,20.97263


In [18]:
df_waw.shape

(18, 5)

##### So there are 18 rows after merging the dataframe from Wikipedia with location attributes from geopy.

In [19]:
locateDistrict('Warszawa')

[52.2337172, 21.07141112883227]

#### Clustering Warsaw's districts

In [20]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


#### Mapping Warsaw

In [21]:
# Warsaw locators

latitude = 52.2517942
longitude= 21.2292763

In [22]:

map_warsaw = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_waw['Latitude'], df_waw['Longitude'], df_waw['District']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_warsaw)  
    
map_warsaw

#### Foursquare API to explore and segment

##### My Foursquare Credentials

In [2]:
CLIENT_ID = '' # my Foursquare ID
CLIENT_SECRET = '' # my Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


In [24]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 2000 # define radius

In [25]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category' ]
    
    return(nearby_venues)

#### Running the function getNearbyVenues on each neighborhood and create a new dataframe called Warsaw_venues.

In [27]:
Warsaw_venues = getNearbyVenues(names=df_waw['District'],
                                   latitudes=df_waw['Latitude'],
                                   longitudes=df_waw['Longitude']
                                    )

Mokotów
Praga Południe
Ursynów
Wola
Bielany
Targówek
Śródmieście
Bemowo
Białołęka
Ochota
Wawer
Praga Północ
Ursus
Żoliborz
Włochy
Wilanów
Rembertów
Wesoła


In [28]:
print(Warsaw_venues.shape)
Warsaw_venues.head(15)

(236, 7)


Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Mokotów,52.193987,21.045781,4fun.tv,52.196089,21.046074,Arcade
1,Mokotów,52.193987,21.045781,Sikorskiego,52.192557,21.047922,Skate Park
2,Mokotów,52.193987,21.045781,Tor stegny 02,52.191261,21.046389,Bus Station
3,Mokotów,52.193987,21.045781,tor stegny,52.190475,21.046121,Racetrack
4,Mokotów,52.193987,21.045781,Restauracja Giovanni Sport,52.190349,21.045084,Diner
5,Mokotów,52.193987,21.045781,Skrzyżowanie Sobieskiego/Sikorskiego,52.190243,21.046816,Intersection
6,Mokotów,52.193987,21.045781,Bernardyńska Woda,52.192845,21.052259,Lake
7,Praga Południe,52.237396,21.071258,OSP Saska Kępa,52.236835,21.065269,Café
8,Praga Południe,52.237396,21.071258,158,52.235407,21.070821,Bus Line
9,Praga Południe,52.237396,21.071258,Międzynarodowa,52.237355,21.065832,Road


##### Number of venues per District

In [29]:
Warsaw_venues.groupby('District').count()

Unnamed: 0_level_0,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bemowo,12,12,12,12,12,12
Bielany,7,7,7,7,7,7
Mokotów,7,7,7,7,7,7
Ochota,21,21,21,21,21,21
Praga Południe,4,4,4,4,4,4
Praga Północ,7,7,7,7,7,7
Rembertów,3,3,3,3,3,3
Targówek,5,5,5,5,5,5
Ursus,5,5,5,5,5,5
Ursynów,7,7,7,7,7,7


### Missing "Białołęka District" venue data. Skipping from analysis

In [31]:
df_waw.drop(index=8, inplace=True)

In [33]:
df_waw.reset_index()

Unnamed: 0,index,District,Population,Size,Latitude,Longitude
0,0,Mokotów,220682,35.4,52.193987,21.045781
1,1,Praga Południe,178665,22.4,52.237396,21.071258
2,2,Ursynów,145938,48.6,52.141039,21.032321
3,3,Wola,137519,19.26,52.236238,20.954781
4,4,Bielany,132683,32.3,52.294652,20.92998
5,5,Targówek,123278,24.37,52.275192,21.058085
6,6,Śródmieście,122646,15.57,52.23281,21.019067
7,7,Bemowo,115873,24.95,52.238974,20.913288
8,9,Ochota,84990,9.7,52.212225,20.97263
9,10,Wawer,69896,79.71,52.220358,21.137083


##### Number of unique categories

In [34]:
print('The number of unique categories is {}.'.format(len(Warsaw_venues['Venue Category'].unique())))

The number of unique categories is 103.


## Analyzing Districts

In [35]:
# one hot encoding
Warsaw_onehot = pd.get_dummies(Warsaw_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Warsaw_onehot['District'] = Warsaw_venues['District'] 

# move district column to the first column
cols=list(Warsaw_onehot.columns.values)
cols.pop(cols.index('District'))
Warsaw_onehot=Warsaw_onehot[['District']+cols]

# rename Neighborhood for Districts so that future merge works
Warsaw_onehot.rename(columns = {'District': 'District'}, inplace = True)
Warsaw_onehot.head(20)

Unnamed: 0,District,Accessories Store,Amphitheater,Arcade,Art Museum,Asian Restaurant,Athletics & Sports,Automotive Shop,Baby Store,Bakery,Bar,Basketball Court,Bed & Breakfast,Beer Bar,Bike Rental / Bike Share,Bistro,Bookstore,Boutique,Breakfast Spot,Bubble Tea Shop,Burger Joint,Bus Line,Bus Station,Business Service,Cable Car,Café,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Convenience Store,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Donut Shop,Dumpling Restaurant,Electronics Store,Falafel Restaurant,Fast Food Restaurant,Flea Market,Food & Drink Shop,Food Court,Gastropub,Gay Bar,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gym Pool,Hawaiian Restaurant,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Juice Bar,Kebab Restaurant,Lake,Light Rail Station,Lounge,Mediterranean Restaurant,Metro Station,Miscellaneous Shop,Modern European Restaurant,Motorcycle Shop,Music Store,Nightclub,Noodle House,Other Nightlife,Park,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Polish Restaurant,Racetrack,Road,Rock Club,Sandwich Place,Shopping Mall,Skate Park,Skating Rink,Sporting Goods Shop,Supermarket,Sushi Restaurant,Tennis Court,Thai Restaurant,Theme Park Ride / Attraction,Train,Train Station,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Yoga Studio
0,Mokotów,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Mokotów,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Mokotów,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Mokotów,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Mokotów,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,Mokotów,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,Mokotów,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,Praga Południe,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,Praga Południe,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,Praga Południe,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [36]:
Warsaw_onehot.shape

(236, 104)

#### Grouping rows by district and by the mean of the frequency of occurrence of each category

In [37]:
Warsaw_grouped = Warsaw_onehot.groupby('District').mean().reset_index()
Warsaw_grouped

Unnamed: 0,District,Accessories Store,Amphitheater,Arcade,Art Museum,Asian Restaurant,Athletics & Sports,Automotive Shop,Baby Store,Bakery,Bar,Basketball Court,Bed & Breakfast,Beer Bar,Bike Rental / Bike Share,Bistro,Bookstore,Boutique,Breakfast Spot,Bubble Tea Shop,Burger Joint,Bus Line,Bus Station,Business Service,Cable Car,Café,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Convenience Store,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Donut Shop,Dumpling Restaurant,Electronics Store,Falafel Restaurant,Fast Food Restaurant,Flea Market,Food & Drink Shop,Food Court,Gastropub,Gay Bar,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gym Pool,Hawaiian Restaurant,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Juice Bar,Kebab Restaurant,Lake,Light Rail Station,Lounge,Mediterranean Restaurant,Metro Station,Miscellaneous Shop,Modern European Restaurant,Motorcycle Shop,Music Store,Nightclub,Noodle House,Other Nightlife,Park,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Polish Restaurant,Racetrack,Road,Rock Club,Sandwich Place,Shopping Mall,Skate Park,Skating Rink,Sporting Goods Shop,Supermarket,Sushi Restaurant,Tennis Court,Thai Restaurant,Theme Park Ride / Attraction,Train,Train Station,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Yoga Studio
0,Bemowo,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.083333,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.083333,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.083333,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bielany,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.142857,0.142857,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Mokotów,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Ochota,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.047619,0.0,0.047619,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.047619,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.047619,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.047619,0.047619
4,Praga Południe,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Praga Północ,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Rembertów,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Targówek,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Ursus,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.0,0.0
9,Ursynów,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [38]:
Warsaw_grouped.shape

(17, 104)

#### Printing districts along with the top 5 most common venues

In [39]:
num_top_venues = 5

for hood in Warsaw_grouped['District']:
    print("----"+hood+"----")
    temp = Warsaw_grouped[Warsaw_grouped['District'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bemowo----
               venue  freq
0  Food & Drink Shop  0.17
1     Sandwich Place  0.08
2        Supermarket  0.08
3        Coffee Shop  0.08
4  Electronics Store  0.08


----Bielany----
            venue  freq
0  Clothing Store  0.14
1   Shopping Mall  0.14
2     Bus Station  0.14
3        Bus Line  0.14
4    Burger Joint  0.14


----Mokotów----
          venue  freq
0    Skate Park  0.14
1        Arcade  0.14
2  Intersection  0.14
3     Racetrack  0.14
4   Bus Station  0.14


----Ochota----
                   venue  freq
0               Pharmacy  0.10
1                  Hotel  0.10
2  Vietnamese Restaurant  0.05
3           Skating Rink  0.05
4               Gym Pool  0.05


----Praga Południe----
               venue  freq
0               Café  0.25
1          Racetrack  0.25
2           Bus Line  0.25
3               Road  0.25
4  Accessories Store  0.00


----Praga Północ----
                venue  freq
0   Convenience Store  0.14
1        Amphitheater  0.14
2             

In [40]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

## Top 10 venues in district

In [41]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
districts_venues_sorted = pd.DataFrame(columns=columns)
districts_venues_sorted['District'] = Warsaw_grouped['District']

for ind in np.arange(Warsaw_grouped.shape[0]):
    districts_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Warsaw_grouped.iloc[ind, :], num_top_venues)

districts_venues_sorted

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bemowo,Food & Drink Shop,Italian Restaurant,Donut Shop,Coffee Shop,Bus Station,Sandwich Place,Japanese Restaurant,Sporting Goods Shop,Supermarket,Café
1,Bielany,Clothing Store,Shopping Mall,Bus Station,Bus Line,Ice Cream Shop,Burger Joint,Metro Station,Coffee Shop,Comedy Club,Convenience Store
2,Mokotów,Skate Park,Diner,Arcade,Bus Station,Racetrack,Intersection,Lake,Coffee Shop,Comedy Club,Convenience Store
3,Ochota,Hotel,Pharmacy,Yoga Studio,Electronics Store,Dessert Shop,Italian Restaurant,Diner,Park,Basketball Court,Skating Rink
4,Praga Południe,Café,Bus Line,Racetrack,Road,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Convenience Store
5,Praga Północ,Bike Rental / Bike Share,Amphitheater,Comedy Club,Convenience Store,Bus Station,Plaza,Light Rail Station,Diner,Coffee Shop,Cupcake Shop
6,Rembertów,Café,Pizza Place,Discount Store,Gay Bar,Dim Sum Restaurant,Cocktail Bar,Grocery Store,Coffee Shop,Comedy Club,Convenience Store
7,Targówek,Bus Station,Gym / Fitness Center,Pet Store,Cupcake Shop,Diner,Cocktail Bar,Coffee Shop,Comedy Club,Convenience Store,Deli / Bodega
8,Ursus,Park,Train Station,Train,Hotel,Supermarket,Yoga Studio,Dim Sum Restaurant,Clothing Store,Cocktail Bar,Coffee Shop
9,Ursynów,Italian Restaurant,Clothing Store,Food Court,Food & Drink Shop,Theme Park Ride / Attraction,Supermarket,Sporting Goods Shop,Dim Sum Restaurant,Cocktail Bar,Coffee Shop


## Clustering districts

##### New dataframe for clusters and top 10 venues for each district

#### Clustering Districts

In [42]:
# set number of clusters
kclusters = 5

Warsaw_grouped_clustering = Warsaw_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Warsaw_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 3, 3, 1, 3, 3, 2, 3, 1, 1, 1, 1, 0, 1, 4, 1, 1])

### Megre cluster labels

In [43]:
Warsaw_merged = df_waw

# add clustering labels
Warsaw_merged['Cluster Labels'] = kmeans.labels_

# merge Warsaw_grouped with Warsaw_data to add latitude/longitude for each neighborhood
Warsaw_merged = Warsaw_merged.join(districts_venues_sorted.set_index('District'), on='District')

Warsaw_merged.head() # check the last columns!

Unnamed: 0,District,Population,Size,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Mokotów,220682,35.4,52.193987,21.045781,1,Skate Park,Diner,Arcade,Bus Station,Racetrack,Intersection,Lake,Coffee Shop,Comedy Club,Convenience Store
1,Praga Południe,178665,22.4,52.237396,21.071258,3,Café,Bus Line,Racetrack,Road,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Convenience Store
2,Ursynów,145938,48.6,52.141039,21.032321,3,Italian Restaurant,Clothing Store,Food Court,Food & Drink Shop,Theme Park Ride / Attraction,Supermarket,Sporting Goods Shop,Dim Sum Restaurant,Cocktail Bar,Coffee Shop
3,Wola,137519,19.26,52.236238,20.954781,1,Bus Station,Grocery Store,Café,Skating Rink,Flea Market,Falafel Restaurant,Italian Restaurant,Motorcycle Shop,Music Store,Park
4,Bielany,132683,32.3,52.294652,20.92998,3,Clothing Store,Shopping Mall,Bus Station,Bus Line,Ice Cream Shop,Burger Joint,Metro Station,Coffee Shop,Comedy Club,Convenience Store


#### Clusters visualization of Warsaw's Districts

In [44]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Warsaw_merged['Latitude'], Warsaw_merged['Longitude'], Warsaw_merged['District'], Warsaw_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Resulting clusters

#### Cluster 0 - Most common  venues: Cafe, Coffee Shop and Beer Bars

In [45]:
Warsaw_merged.loc[Warsaw_merged['Cluster Labels'] == 0, Warsaw_merged.columns[[0] + list(range(5, Warsaw_merged.shape[1]))]]


Unnamed: 0,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Żoliborz,0,Café,Coffee Shop,Beer Bar,Park,Burger Joint,Italian Restaurant,Cocktail Bar,Deli / Bodega,Bus Station,Kebab Restaurant


#### Cluster 1 -  Most common venues: Park, Pizza Place, Bike Rental, Train Station

In [46]:
Warsaw_merged.loc[Warsaw_merged['Cluster Labels'] == 1, Warsaw_merged.columns[[0] + list(range(5, Warsaw_merged.shape[1]))]]


Unnamed: 0,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Mokotów,1,Skate Park,Diner,Arcade,Bus Station,Racetrack,Intersection,Lake,Coffee Shop,Comedy Club,Convenience Store
3,Wola,1,Bus Station,Grocery Store,Café,Skating Rink,Flea Market,Falafel Restaurant,Italian Restaurant,Motorcycle Shop,Music Store,Park
9,Ochota,1,Hotel,Pharmacy,Yoga Studio,Electronics Store,Dessert Shop,Italian Restaurant,Diner,Park,Basketball Court,Skating Rink
10,Wawer,1,Gun Range,Athletics & Sports,Food & Drink Shop,Falafel Restaurant,Diner,Cocktail Bar,Coffee Shop,Comedy Club,Convenience Store,Cupcake Shop
11,Praga Północ,1,Bike Rental / Bike Share,Amphitheater,Comedy Club,Convenience Store,Bus Station,Plaza,Light Rail Station,Diner,Coffee Shop,Cupcake Shop
12,Ursus,1,Park,Train Station,Train,Hotel,Supermarket,Yoga Studio,Dim Sum Restaurant,Clothing Store,Cocktail Bar,Coffee Shop
14,Włochy,1,Accessories Store,Tram Station,Hotel,Bed & Breakfast,Gastropub,Diner,Cocktail Bar,Coffee Shop,Comedy Club,Convenience Store
16,Rembertów,1,Café,Pizza Place,Discount Store,Gay Bar,Dim Sum Restaurant,Cocktail Bar,Grocery Store,Coffee Shop,Comedy Club,Convenience Store
17,Wesoła,1,Train Station,Pizza Place,Plaza,Tennis Court,Yoga Studio,Dim Sum Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club


#### Cluster 2 - Most common venues: Cofee Shop, Beer Bar, Coctail Bar (No Hotels in Top 10 venues)

In [47]:
Warsaw_merged.loc[Warsaw_merged['Cluster Labels'] == 2, Warsaw_merged.columns[[0] + list(range(5, Warsaw_merged.shape[1]))]]


Unnamed: 0,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Śródmieście,2,Coffee Shop,Café,Beer Bar,Cocktail Bar,Sushi Restaurant,Bakery,Nightclub,Italian Restaurant,Ice Cream Shop,Mediterranean Restaurant


#### Cluster 3 - Most common venues: Italian Restaurant, Food Shop, Shopping Mall, Gym  (No Hotels in Top 10 venues)

In [48]:
Warsaw_merged.loc[Warsaw_merged['Cluster Labels'] == 3, Warsaw_merged.columns[[0] + list(range(5, Warsaw_merged.shape[1]))]]


Unnamed: 0,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Praga Południe,3,Café,Bus Line,Racetrack,Road,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Convenience Store
2,Ursynów,3,Italian Restaurant,Clothing Store,Food Court,Food & Drink Shop,Theme Park Ride / Attraction,Supermarket,Sporting Goods Shop,Dim Sum Restaurant,Cocktail Bar,Coffee Shop
4,Bielany,3,Clothing Store,Shopping Mall,Bus Station,Bus Line,Ice Cream Shop,Burger Joint,Metro Station,Coffee Shop,Comedy Club,Convenience Store
5,Targówek,3,Bus Station,Gym / Fitness Center,Pet Store,Cupcake Shop,Diner,Cocktail Bar,Coffee Shop,Comedy Club,Convenience Store,Deli / Bodega
7,Bemowo,3,Food & Drink Shop,Italian Restaurant,Donut Shop,Coffee Shop,Bus Station,Sandwich Place,Japanese Restaurant,Sporting Goods Shop,Supermarket,Café


#### Cluster 4 - Most common venues: Baby Store, Yoga Studio, Discount Store (No Hotels in Top 10 venues)

In [49]:
Warsaw_merged.loc[Warsaw_merged['Cluster Labels'] == 4, Warsaw_merged.columns[[0] + list(range(5, Warsaw_merged.shape[1]))]]


Unnamed: 0,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,Wilanów,4,Baby Store,Yoga Studio,Discount Store,Cocktail Bar,Coffee Shop,Comedy Club,Convenience Store,Cupcake Shop,Deli / Bodega,Department Store


# Results

#### The following are the observation about the 5 clusters above:
#### 1. Hotels are located only in Cluster 1 (Ochota). It looks strange is contrast with local observation. Conclusion: most of hotels data is missing in analysed data set. 
#### 2. Entertaiment, and shopping venues are popular in Cluster 1
#### 3. Italian Restaurants, Food Shops, Shopping Mall and Gyms are popular in Cluster 3
#### 4. Most common venues in Cluster 2: Cofee Shop, Beer Bar, Coctail Bar
#### 5. Wilanów and Śródmieście districts have different characteristics than other Warsaw districts


### Discussion and Conclusion

#### Above analysis shows that decision about hotel location based on venues data from foursquare.com could be risky.
Data about hotels in Warsaw looks incomplete.  

#### In conclusion, this project would have had better results if there were more available data in terms of actual land pricing data within the area, public transportantion access and allowance of more venues exploration with the Foursquare (limited venues for free calls).

#### It is recomended to make more datailed analysis based on transportation assess and business offices location.
