# Another Convenience Store in Taipei???

### Capstone Project - The Battle of the Neighborhoods (Week Two)

#### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project, we will work with companies that run convenience stores to find the optimum district (or districts) in which to open a convenience store in Taipei. We will focus on districts that have increasing populations (based on Foursquare data), do not have convenience stores in their top ten most popular locations, and have high population density.

## Data <a name="data"></a>

First, I will import the necessary libraries to be used in the notebook.

In [398]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

import pylab as pl
from sklearn.decomposition import PCA

print('Libraries imported.')

Collecting package metadata (repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


I could not find a table with current data online, so I will construct my own data frame using information from the Taipei City government. Luckily, there are only 12 districts, so it shouldn't take too long. I pulled data for population, population density (person/sq. km), % population change (since 2008), and area (sq. km). Latitude and longitude were found by doing a Google search. The link to the pdf file is in the next cell.

[Taipei City District Population Data](https://www-ws.gov.taipei/Download.ashx?u=LzAwMS9VcGxvYWQvMzY4L3JlbGZpbGUvMC82NDcyMy8xYTlmNWNkYS02MGVhLTRjZDUtYjIzOS02MzdjMzgyZDY3NTUucGRm&n=5Li76KiI6JmVLTEwODA3LeiHuuWMl%2bW4gumHjeimgee1seioiOmAn%2bWgsSjoi7HmlofniYgpLeabtOato%2bihqDE3LnBkZg%3d%3d&icon=..pdf)

In [400]:
# initialize list of lists 
taipei_data = [['Beitou', 254138, 4473, ((((254138 - 249752)/249752)*100)), 56.82, 25.1152, 121.5150], 
               ['Daan', 308722, 27173, ((((308722 - 313848)/313848)*100)), 11.36, 25.0262, 121.5427], 
               ['Datong', 127086, 22368, ((((127086 - 124653)/124653)*100)), 5.68, 25.0627, 121.5113], 
               ['Nangang', 120897, 5535, ((((120897 - 113672)/113672)*100)), 21.84, 25.0312, 121.6112], 
               ['Neihu', 286834, 9083, ((((286834 - 266808)/266808)*100)), 31.58, 25.0689, 121.5909], 
               ['Shilin', 285017 , 4570, ((((285017 - 286065)/286065)*100)), 62.37, 25.0950, 121.5246], 
               ['Songshan', 205219, 22096, ((((210097 - 205219)/210097)*100)), 9.29, 25.0542, 121.5639], 
               ['Wanhua', 188225, 21263, ((((188225 - 190361)/190361)*100)), 8.85, 25.0263, 121.4970], 
               ['Wenshan', 273040, 8665, ((((273040 - 261719)/261719)*100)), 31.51, 24.9929, 121.5713], 
               ['Xinyi', 221606, 19773, ((((221606 - 227770)/227770)*100)), 11.21, 25.0348, 121.5677], 
               ['Zhongshan', 228285, 16685, ((((228285 - 218841)/218841)*100)), 13.68, 25.0792, 121.5427], 
               ['Zhongzheng', 158583, 20847, ((((158583 - 159337)/159337)*100)), 7.61, 25.0421, 121.5199]] 
  
# Create the pandas DataFrame 
df_TaipeiCity = pd.DataFrame(taipei_data, columns = ['District', 'Population', 'Population Density', 
                                                     '% Population Change', 'Area (sq. km)', 'Latitude', 'Longitude']) 
  
# print dataframe. 
df_TaipeiCity

Unnamed: 0,District,Population,Population Density,% Population Change,Area (sq. km),Latitude,Longitude
0,Beitou,254138,4473,1.756142,56.82,25.1152,121.515
1,Daan,308722,27173,-1.633275,11.36,25.0262,121.5427
2,Datong,127086,22368,1.951818,5.68,25.0627,121.5113
3,Nangang,120897,5535,6.356007,21.84,25.0312,121.6112
4,Neihu,286834,9083,7.505772,31.58,25.0689,121.5909
5,Shilin,285017,4570,-0.36635,62.37,25.095,121.5246
6,Songshan,205219,22096,2.321785,9.29,25.0542,121.5639
7,Wanhua,188225,21263,-1.122079,8.85,25.0263,121.497
8,Wenshan,273040,8665,4.325632,31.51,24.9929,121.5713
9,Xinyi,221606,19773,-2.706239,11.21,25.0348,121.5677


Excellent! Now let's visualize the map of Taipei with its 12 districts.

In [401]:
# create map of Taipei using latitude and longitude values
latitude = 25.0330
longitude = 121.5654
map_taipei = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(df_TaipeiCity['Latitude'], df_TaipeiCity['Longitude'], df_TaipeiCity['District']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_taipei)  
    
map_taipei

Now, I will access the Foursquare data by using my client ID and client secret code.

In [402]:
CLIENT_ID = 'NFEVMB43ZT5VKKEEGOMY20KHXAKJCXQ043YSLAUXUN1BKTDH' # your Foursquare ID
CLIENT_SECRET = '5FDLQVWRNSMOMLNGOOYZSVBLWF2MESOCWQIUQWHGO3VMWX2K' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: NFEVMB43ZT5VKKEEGOMY20KHXAKJCXQ043YSLAUXUN1BKTDH
CLIENT_SECRET:5FDLQVWRNSMOMLNGOOYZSVBLWF2MESOCWQIUQWHGO3VMWX2K


Next, we will define variables for latitude and longitude of each individual district. This one gives us the result for the district, Beitou.

In [403]:
district_latitude = df_TaipeiCity.loc[0, 'Latitude'] # district latitude value
district_longitude = df_TaipeiCity.loc[0, 'Longitude'] # district longitude value

district_name = df_TaipeiCity.loc[0, 'District'] # district name

print('Latitude and longitude values of {} are {}, {}.'.format(district_name, 
                                                               district_latitude, 
                                                               district_longitude))

Latitude and longitude values of Beitou are 25.1152, 121.515.


Let's get the top 150 results for the district of Beitou within a 2000 m radius.

In [404]:
LIMIT = 150
radius = 2000
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION,
    district_latitude, 
    district_longitude,
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=NFEVMB43ZT5VKKEEGOMY20KHXAKJCXQ043YSLAUXUN1BKTDH&client_secret=5FDLQVWRNSMOMLNGOOYZSVBLWF2MESOCWQIUQWHGO3VMWX2K&v=20180605&ll=25.1152,121.515&radius=2000&limit=150'

Now, we can view the results of the Foursquare data. 

In [405]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d596f9022be120031fc7023'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Běitóu Qū',
  'headerFullLocation': 'Běitóu Qū, Taipei',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 71,
  'suggestedBounds': {'ne': {'lat': 25.133200018000018,
    'lng': 121.53484238363649},
   'sw': {'lat': 25.097199981999985, 'lng': 121.49515761636351}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c1c77828b3aa59364fe985f',
       'name': '水龜伯古早味',
       'location': {'address': '石牌路二段75巷8號',
        'lat': 25.116794235600377,
        'lng': 121.51591778209348,
        'labeledLatLngs': [{'label': 'display',
          'lat': 25.11679423560037

Next, we will use a function to extract the category of the venue.

In [406]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now, we will define the venues variable, filter the coumns and categories, and clean the columns. I decide to pull data for the names, categories, latitude, longitude, and distance from this dataset.

In [407]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories','venue.location.lat', 'venue.location.lng' , 'venue.location.distance']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

Unnamed: 0,name,categories,lat,lng,distance
0,水龜伯古早味,Dessert Shop,25.116794,121.515918,200
1,石牌夜市 Shipai Nightmarket,Night Market,25.116622,121.516702,233
2,東方泰國小館,Thai Restaurant,25.11467,121.515385,70
3,蕭記大餛飩,Chinese Restaurant,25.116001,121.517358,253
4,台北市北投運動中心 Taipei Beitou Sports Center,Athletics & Sports,25.116769,121.509748,557
5,露特西亞 Lutetia,Café,25.114342,121.527354,1248
6,慶熹宮韓國料理,Korean Restaurant,25.115802,121.518145,323
7,宋江餡餅粥,Chinese Restaurant,25.118556,121.526267,1195
8,一品山西刀削麵之家,Chinese Restaurant,25.118909,121.528256,1398
9,瓦城泰國料理 Thai Town Cuisine,Thai Restaurant,25.118716,121.52357,948


Now, we will create a function to repeat the same process for all the districts in Taipei. I settled on a radius of 2000m because it will cover all of the small districts, but limit the less populous parts of the small districts.

In [408]:
def getNearbyVenues(names, latitudes, longitudes, radius=2000, LIMIT=150):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name'],
            v['venue']['location']['distance']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category',
                  'Distance']
    
    return(nearby_venues)

Now, we will create a new dataframe.

In [409]:
taipei_venues = getNearbyVenues(df_TaipeiCity['District'],
                                   latitudes=df_TaipeiCity['Latitude'],
                                   longitudes=df_TaipeiCity['Longitude']
                                  )

Beitou
Daan
Datong
Nangang
Neihu
Shilin
Songshan
Wanhua
Wenshan
Xinyi
Zhongshan
Zhongzheng


Here is the size and a preview of the new dataframe.

In [410]:
print(taipei_venues.shape)
taipei_venues.head(10)

(1011, 8)


Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Distance
0,Beitou,25.1152,121.515,水龜伯古早味,25.116794,121.515918,Dessert Shop,200
1,Beitou,25.1152,121.515,石牌夜市 Shipai Nightmarket,25.116622,121.516702,Night Market,233
2,Beitou,25.1152,121.515,東方泰國小館,25.11467,121.515385,Thai Restaurant,70
3,Beitou,25.1152,121.515,蕭記大餛飩,25.116001,121.517358,Chinese Restaurant,253
4,Beitou,25.1152,121.515,台北市北投運動中心 Taipei Beitou Sports Center,25.116769,121.509748,Athletics & Sports,557
5,Beitou,25.1152,121.515,露特西亞 Lutetia,25.114342,121.527354,Café,1248
6,Beitou,25.1152,121.515,慶熹宮韓國料理,25.115802,121.518145,Korean Restaurant,323
7,Beitou,25.1152,121.515,宋江餡餅粥,25.118556,121.526267,Chinese Restaurant,1195
8,Beitou,25.1152,121.515,一品山西刀削麵之家,25.118909,121.528256,Chinese Restaurant,1398
9,Beitou,25.1152,121.515,瓦城泰國料理 Thai Town Cuisine,25.118716,121.52357,Thai Restaurant,948


Now, we will check how many results were returned for each district.

In [411]:
taipei_venues.groupby('District').count()

Unnamed: 0_level_0,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Distance
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Beitou,71,71,71,71,71,71,71
Daan,100,100,100,100,100,100,100
Datong,100,100,100,100,100,100,100
Nangang,7,7,7,7,7,7,7
Neihu,100,100,100,100,100,100,100
Shilin,80,80,80,80,80,80,80
Songshan,100,100,100,100,100,100,100
Wanhua,81,81,81,81,81,81,81
Wenshan,72,72,72,72,72,72,72
Xinyi,100,100,100,100,100,100,100


Now, we will check to see how many unique categories there are in the venues.

In [412]:
print('There are {} unique categories.'.format(len(taipei_venues['Venue Category'].unique())))

There are 170 unique categories.


## Methodology <a name="methodology"></a>

In this project, we will work to detect districts in Taipei that have a low density of convenience stores and a high population density. If possible, these areas will also have an increasing population. The search radius for each district will be 2000m around the district center.

Above, we collected the necessary data from the Taipei City government and from Foursquare. This data was organized into a dataframe.

Next, we will create clusters within the districts using k-means clustering. We hope to identify a cluster with an increasing population, high population density, and a district with convenience stores outside of their top ten most popular venues.

## Analysis <a name="analysis"></a>

Let's use one hot encoding to make the data more readable for machine learning. Then, we'll preview the dataframe and see what size it is.

In [413]:
taipei_onehot = pd.get_dummies(taipei_venues[['Venue Category']], prefix="", prefix_sep="")

taipei_onehot['District'] = taipei_venues['District'] 

fixed_columns = [taipei_onehot.columns[-1]] + list(taipei_onehot.columns[:-1])
taipei_onehot = taipei_onehot[fixed_columns]

print(taipei_onehot.shape)
taipei_onehot.head()

(1011, 171)


Unnamed: 0,District,Airport,Airport Service,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Baseball Field,Beer Bar,Beer Garden,Beijing Restaurant,Bike Rental / Bike Share,Bistro,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bubble Tea Shop,Buddhist Temple,Buffet,Building,Burger Joint,Bus Station,Cable Car,Cafeteria,Café,Campground,Cantonese Restaurant,Chinese Breakfast Place,Chinese Restaurant,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,Comfort Food Restaurant,Concert Hall,Convenience Store,Cultural Center,Cupcake Shop,Department Store,Dessert Shop,Donburi Restaurant,Donut Shop,Dumpling Restaurant,Duty-free Shop,Electronics Store,Exhibit,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gastropub,Gay Bar,German Restaurant,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Harbor / Marina,Historic Site,History Museum,Hong Kong Restaurant,Hookah Bar,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Hunan Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Lake,Laser Tag,Lounge,Malay Restaurant,Market,Massage Studio,Mexican Restaurant,Modern European Restaurant,Mongolian Restaurant,Monument / Landmark,Motel,Motorcycle Shop,Mountain,Movie Theater,Multiplex,Museum,New American Restaurant,Night Market,Nightclub,Noodle House,Other Great Outdoors,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Pizza Place,Planetarium,Playground,Plaza,Pool,Pub,Public Art,Ramen Restaurant,Record Shop,Rest Area,Restaurant,River,Rock Club,Roof Deck,Salad Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Seafood Restaurant,Shabu-Shabu Restaurant,Shanghai Restaurant,Shanxi Restaurant,Shoe Store,Shopping Mall,Snack Place,Soccer Field,Soup Place,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Stadium,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Tea Room,Temple,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Warehouse Store,Yoga Studio,Zoo,Zoo Exhibit
0,Beitou,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Beitou,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Beitou,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
3,Beitou,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Beitou,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Next, we'll group the data by district to make it easier to read.

In [414]:
taipei_grouped = taipei_onehot.groupby('District').mean().reset_index()
taipei_grouped

Unnamed: 0,District,Airport,Airport Service,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Baseball Field,Beer Bar,Beer Garden,Beijing Restaurant,Bike Rental / Bike Share,Bistro,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bubble Tea Shop,Buddhist Temple,Buffet,Building,Burger Joint,Bus Station,Cable Car,Cafeteria,Café,Campground,Cantonese Restaurant,Chinese Breakfast Place,Chinese Restaurant,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,Comfort Food Restaurant,Concert Hall,Convenience Store,Cultural Center,Cupcake Shop,Department Store,Dessert Shop,Donburi Restaurant,Donut Shop,Dumpling Restaurant,Duty-free Shop,Electronics Store,Exhibit,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gastropub,Gay Bar,German Restaurant,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Harbor / Marina,Historic Site,History Museum,Hong Kong Restaurant,Hookah Bar,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Hunan Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Lake,Laser Tag,Lounge,Malay Restaurant,Market,Massage Studio,Mexican Restaurant,Modern European Restaurant,Mongolian Restaurant,Monument / Landmark,Motel,Motorcycle Shop,Mountain,Movie Theater,Multiplex,Museum,New American Restaurant,Night Market,Nightclub,Noodle House,Other Great Outdoors,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Pizza Place,Planetarium,Playground,Plaza,Pool,Pub,Public Art,Ramen Restaurant,Record Shop,Rest Area,Restaurant,River,Rock Club,Roof Deck,Salad Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Seafood Restaurant,Shabu-Shabu Restaurant,Shanghai Restaurant,Shanxi Restaurant,Shoe Store,Shopping Mall,Snack Place,Soccer Field,Soup Place,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Stadium,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Tea Room,Temple,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Warehouse Store,Yoga Studio,Zoo,Zoo Exhibit
0,Beitou,0.0,0.0,0.028169,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028169,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.056338,0.0,0.0,0.014085,0.084507,0.0,0.0,0.0,0.070423,0.0,0.0,0.0,0.112676,0.0,0.0,0.028169,0.014085,0.0,0.0,0.028169,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028169,0.0,0.0,0.0,0.0,0.014085,0.0,0.014085,0.0,0.014085,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.028169,0.0,0.028169,0.014085,0.0,0.014085,0.0,0.014085,0.0,0.014085,0.0,0.0,0.0,0.0,0.028169,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.028169,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028169,0.0,0.0,0.014085,0.0,0.0,0.042254,0.0,0.0,0.014085,0.0,0.014085,0.014085,0.0,0.0,0.0,0.0,0.0
1,Daan,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.05,0.01,0.0,0.01,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.16,0.0,0.0,0.01,0.05,0.0,0.0,0.02,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.05,0.0,0.0,0.0,0.0,0.01,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.03,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.02,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0
2,Datong,0.0,0.0,0.0,0.0,0.01,0.01,0.05,0.0,0.0,0.02,0.01,0.0,0.03,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.13,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.11,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Nangang,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Neihu,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.01,0.01,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.01,0.0,0.03,0.0,0.01,0.0,0.07,0.0,0.01,0.0,0.23,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.04,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.07,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.04,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
5,Shilin,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0125,0.0125,0.0,0.0,0.0125,0.0,0.0125,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.075,0.0,0.0,0.0125,0.025,0.0125,0.0,0.0,0.0125,0.0,0.0,0.0,0.0375,0.0,0.0,0.025,0.025,0.0,0.0,0.025,0.0,0.0,0.0,0.0125,0.0125,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0125,0.0125,0.0,0.0,0.0,0.0125,0.0,0.0125,0.0,0.0,0.0125,0.0,0.0125,0.0125,0.0,0.0,0.0,0.0125,0.025,0.0,0.025,0.0,0.0375,0.0,0.0,0.0125,0.0,0.0375,0.025,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.025,0.0125,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0125,0.0,0.025,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.025,0.0,0.0,0.0375,0.0125,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0
6,Songshan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.06,0.01,0.07,0.0,0.02,0.01,0.01,0.01,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.08,0.0,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.01,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.0,0.0
7,Wanhua,0.0,0.0,0.012346,0.012346,0.0,0.0,0.0,0.0,0.0,0.024691,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.012346,0.012346,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.012346,0.0,0.0,0.037037,0.0,0.0,0.0,0.049383,0.012346,0.0,0.0,0.037037,0.012346,0.0,0.0,0.024691,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.024691,0.0,0.0,0.0,0.012346,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.024691,0.012346,0.0,0.0,0.024691,0.037037,0.0,0.0,0.0,0.024691,0.012346,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.061728,0.0,0.0,0.08642,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.024691,0.0,0.0,0.0,0.0,0.0,0.012346,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.012346,0.098765,0.012346,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Wenshan,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.013889,0.069444,0.027778,0.013889,0.083333,0.0,0.0,0.0,0.013889,0.0,0.013889,0.0,0.069444,0.0,0.0,0.0,0.194444,0.0,0.013889,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.097222,0.027778,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.055556
9,Xinyi,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.02,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.05,0.0,0.0,0.02,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.02,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.05,0.01,0.02,0.01,0.0,0.01,0.01,0.0,0.01,0.04,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0


Let's take a look at the five most common venues for each district.

In [415]:
num_top_venues = 5

for hood in taipei_grouped['District']:
    print("----"+hood+"----")
    temp = taipei_grouped[taipei_grouped['District'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Beitou----
                venue  freq
0   Convenience Store  0.11
1  Chinese Restaurant  0.08
2         Coffee Shop  0.07
3                Café  0.06
4     Thai Restaurant  0.04


----Daan----
                venue  freq
0                Café  0.16
1               Hotel  0.06
2              Bakery  0.05
3   Hotpot Restaurant  0.05
4  Chinese Restaurant  0.05


----Datong----
                  venue  freq
0                 Hotel  0.13
1  Taiwanese Restaurant  0.11
2          Dessert Shop  0.09
3   Japanese Restaurant  0.05
4      Asian Restaurant  0.05


----Nangang----
                  venue  freq
0     Convenience Store  0.29
1  Gym / Fitness Center  0.14
2           Supermarket  0.14
3                  Café  0.14
4                  Park  0.14


----Neihu----
                 venue  freq
0    Convenience Store  0.23
1  Japanese Restaurant  0.07
2          Coffee Shop  0.07
3                 Café  0.07
4          Supermarket  0.04


----Shilin----
                 venue  freq
0  

Now, let's organize that information into a pandas dataframe.

First, let's write a function to sort the venues in descending order.

In [416]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Here is the new dataframe.

In [417]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
districts_venues_sorted = pd.DataFrame(columns=columns)
districts_venues_sorted['District'] = taipei_grouped['District']

for ind in np.arange(taipei_grouped.shape[0]):
    districts_venues_sorted.iloc[ind, 1:] = return_most_common_venues(taipei_grouped.iloc[ind, :], num_top_venues)

districts_venues_sorted

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Beitou,Convenience Store,Chinese Restaurant,Coffee Shop,Café,Thai Restaurant,Hotpot Restaurant,Fried Chicken Joint,Ice Cream Shop,Market,Breakfast Spot
1,Daan,Café,Hotel,Bakery,Chinese Restaurant,Hotpot Restaurant,Japanese Restaurant,Coffee Shop,Massage Studio,Noodle House,Dumpling Restaurant
2,Datong,Hotel,Taiwanese Restaurant,Dessert Shop,Japanese Restaurant,Asian Restaurant,Coffee Shop,Hotpot Restaurant,Chinese Restaurant,Beer Bar,Café
3,Nangang,Convenience Store,Café,Supermarket,Gym / Fitness Center,Park,Market,Zoo Exhibit,Food & Drink Shop,Flower Shop,Flea Market
4,Neihu,Convenience Store,Japanese Restaurant,Coffee Shop,Café,Supermarket,Bakery,Fast Food Restaurant,Chinese Restaurant,Asian Restaurant,Dumpling Restaurant
5,Shilin,Café,Breakfast Spot,Taiwanese Restaurant,Japanese Restaurant,Ice Cream Shop,Convenience Store,Snack Place,Dumpling Restaurant,Chinese Restaurant,Dessert Shop
6,Songshan,Café,Noodle House,Hotpot Restaurant,Hotel,Dumpling Restaurant,Japanese Restaurant,Bookstore,Chinese Restaurant,Stadium,Comfort Food Restaurant
7,Wanhua,Taiwanese Restaurant,Park,Café,Noodle House,Coffee Shop,Japanese Restaurant,Convenience Store,Chinese Restaurant,Hotel,Dessert Shop
8,Wenshan,Convenience Store,Exhibit,Café,Bus Station,Coffee Shop,Zoo Exhibit,Zoo,Breakfast Spot,Cable Car,Japanese Restaurant
9,Xinyi,Department Store,Chinese Restaurant,Hotel,Coffee Shop,Japanese Restaurant,Café,Gym / Fitness Center,Plaza,Cocktail Bar,Dessert Shop


Now, we want to use k-means clustering to see if we can cluster our districts so that we can find a district (or districts) that would be optimal places to open a convenience store(s).

In [418]:
# set number of clusters
kclusters = 3

taipei_grouped_clustering = taipei_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(taipei_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 1, 2, 0, 1, 1, 1, 0, 1], dtype=int32)

Three clusters does not meaningfully cluster the data as the third cluster only has one district.

Let's try with four clusters.

In [419]:
# set number of clusters
kclusters = 4

taipei_grouped_clustering = taipei_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(taipei_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 2, 2, 0, 3, 1, 2, 2, 3, 1], dtype=int32)

Four clusters looks more meaningful, but there is still only one value in one of the clusters.

In [420]:
# set number of clusters
kclusters = 5

taipei_grouped_clustering = taipei_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(taipei_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 3, 2, 4, 0, 1, 3, 2, 0, 1], dtype=int32)

Five clusters splits the data up into more distinct groups. 

Now, let's merge the data from the Taiepi City government, the data from Foursquare regarding venues, and the clustering data. 

In [421]:
# add clustering labels
#districts_venues_sorted.insert(0,'Cluster Labels', kmeans.labels_)

taipei_merged = df_TaipeiCity

# merge taipei_grouped with taipei_data to add latitude/longitude for each neighborhood
taipei_merged = taipei_merged.join(districts_venues_sorted.set_index('District'), on='District')

taipei_merged.head() # check the last columns!

Unnamed: 0,District,Population,Population Density,% Population Change,Area (sq. km),Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Beitou,254138,4473,1.756142,56.82,25.1152,121.515,Convenience Store,Chinese Restaurant,Coffee Shop,Café,Thai Restaurant,Hotpot Restaurant,Fried Chicken Joint,Ice Cream Shop,Market,Breakfast Spot
1,Daan,308722,27173,-1.633275,11.36,25.0262,121.5427,Café,Hotel,Bakery,Chinese Restaurant,Hotpot Restaurant,Japanese Restaurant,Coffee Shop,Massage Studio,Noodle House,Dumpling Restaurant
2,Datong,127086,22368,1.951818,5.68,25.0627,121.5113,Hotel,Taiwanese Restaurant,Dessert Shop,Japanese Restaurant,Asian Restaurant,Coffee Shop,Hotpot Restaurant,Chinese Restaurant,Beer Bar,Café
3,Nangang,120897,5535,6.356007,21.84,25.0312,121.6112,Convenience Store,Café,Supermarket,Gym / Fitness Center,Park,Market,Zoo Exhibit,Food & Drink Shop,Flower Shop,Flea Market
4,Neihu,286834,9083,7.505772,31.58,25.0689,121.5909,Convenience Store,Japanese Restaurant,Coffee Shop,Café,Supermarket,Bakery,Fast Food Restaurant,Chinese Restaurant,Asian Restaurant,Dumpling Restaurant


Now, let's visualize what the clusters look like on a map of Taipei.

In [422]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11.5)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(taipei_merged['Latitude'], taipei_merged['Longitude'], taipei_merged['District'], kmeans.labels_):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color=rainbow[cluster-2],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.95).add_to(map_clusters)
       
map_clusters

Now, we'll visualize what the five clusters look like.

### Cluster 0

In [423]:
taipei_merged.loc[kmeans.labels_ == 0, taipei_merged.columns[[1] + list(range(0, taipei_merged.shape[1]))]]


Unnamed: 0,Population,District,Population.1,Population Density,% Population Change,Area (sq. km),Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,286834,Neihu,286834,9083,7.505772,31.58,25.0689,121.5909,Convenience Store,Japanese Restaurant,Coffee Shop,Café,Supermarket,Bakery,Fast Food Restaurant,Chinese Restaurant,Asian Restaurant,Dumpling Restaurant
8,273040,Wenshan,273040,8665,4.325632,31.51,24.9929,121.5713,Convenience Store,Exhibit,Café,Bus Station,Coffee Shop,Zoo Exhibit,Zoo,Breakfast Spot,Cable Car,Japanese Restaurant


### Cluster 1

In [424]:
taipei_merged.loc[kmeans.labels_ == 1, taipei_merged.columns[[1] + list(range(0, taipei_merged.shape[1]))]]

Unnamed: 0,Population,District,Population.1,Population Density,% Population Change,Area (sq. km),Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,254138,Beitou,254138,4473,1.756142,56.82,25.1152,121.515,Convenience Store,Chinese Restaurant,Coffee Shop,Café,Thai Restaurant,Hotpot Restaurant,Fried Chicken Joint,Ice Cream Shop,Market,Breakfast Spot
5,285017,Shilin,285017,4570,-0.36635,62.37,25.095,121.5246,Café,Breakfast Spot,Taiwanese Restaurant,Japanese Restaurant,Ice Cream Shop,Convenience Store,Snack Place,Dumpling Restaurant,Chinese Restaurant,Dessert Shop
9,221606,Xinyi,221606,19773,-2.706239,11.21,25.0348,121.5677,Department Store,Chinese Restaurant,Hotel,Coffee Shop,Japanese Restaurant,Café,Gym / Fitness Center,Plaza,Cocktail Bar,Dessert Shop
10,228285,Zhongshan,228285,16685,4.315462,13.68,25.0792,121.5427,Hotel,Convenience Store,Café,Japanese Restaurant,Park,Coffee Shop,Chinese Restaurant,Dessert Shop,Fast Food Restaurant,Hotpot Restaurant


### Cluster 2

In [425]:
taipei_merged.loc[kmeans.labels_ == 2, taipei_merged.columns[[1] + list(range(0, taipei_merged.shape[1]))]]

Unnamed: 0,Population,District,Population.1,Population Density,% Population Change,Area (sq. km),Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,127086,Datong,127086,22368,1.951818,5.68,25.0627,121.5113,Hotel,Taiwanese Restaurant,Dessert Shop,Japanese Restaurant,Asian Restaurant,Coffee Shop,Hotpot Restaurant,Chinese Restaurant,Beer Bar,Café
7,188225,Wanhua,188225,21263,-1.122079,8.85,25.0263,121.497,Taiwanese Restaurant,Park,Café,Noodle House,Coffee Shop,Japanese Restaurant,Convenience Store,Chinese Restaurant,Hotel,Dessert Shop


### Cluster 3

In [426]:
taipei_merged.loc[kmeans.labels_ == 3, taipei_merged.columns[[1] + list(range(0, taipei_merged.shape[1]))]]

Unnamed: 0,Population,District,Population.1,Population Density,% Population Change,Area (sq. km),Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,308722,Daan,308722,27173,-1.633275,11.36,25.0262,121.5427,Café,Hotel,Bakery,Chinese Restaurant,Hotpot Restaurant,Japanese Restaurant,Coffee Shop,Massage Studio,Noodle House,Dumpling Restaurant
6,205219,Songshan,205219,22096,2.321785,9.29,25.0542,121.5639,Café,Noodle House,Hotpot Restaurant,Hotel,Dumpling Restaurant,Japanese Restaurant,Bookstore,Chinese Restaurant,Stadium,Comfort Food Restaurant
11,158583,Zhongzheng,158583,20847,-0.473211,7.61,25.0421,121.5199,Hotel,Café,Noodle House,Coffee Shop,Taiwanese Restaurant,Bakery,Japanese Restaurant,Chinese Restaurant,Hostel,Monument / Landmark


### Cluster 4

In [427]:
taipei_merged.loc[kmeans.labels_ == 4, taipei_merged.columns[[1] + list(range(0, taipei_merged.shape[1]))]]

Unnamed: 0,Population,District,Population.1,Population Density,% Population Change,Area (sq. km),Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,120897,Nangang,120897,5535,6.356007,21.84,25.0312,121.6112,Convenience Store,Café,Supermarket,Gym / Fitness Center,Park,Market,Zoo Exhibit,Food & Drink Shop,Flower Shop,Flea Market


## Results and Discussion <a name="results"></a>

A visual analysis of the clusters shows that cluster 3 has three districts with high population density. In addition, these districts do not have convenience stores in the top ten most common venues.

It would be my advice to target these areas for opening a new convenience store. Songshan District would be my top recommendation since it has an increasing population. My second recommendation would be Zhongzheng District and my third recommendation would be Daan District. Although these two districts have decreasing populations, they still share many characteristics with Songshan District. 

Xinyi District would also be worth looking at as a possibility, although it didn't fit into the cluster that the other three districts were in.

## Conclusion <a name="conclusion"></a>

The purpose of this project was to identify districts in Taipei that would be good places to open a convenience store. By using population data from the Taipei City government, location data from Foursquare, and machine learning, we were able to find districts that would be ideal for opening a convenience store. This data will help to narrow down the area in which stakeholders will need to search in order to find a place to open a convenience store. 