# <center>How to choose a location for a new coffee shop in Taipei</center>

### <center>2019/05/05</center>

<center>By Hanniel, Shih</center>

# Introduction

## Background

In Taipei, there are a lot of coffee shops, but not all are running well. <br>

There are a lot of factors to consider when planning to open a new coffee shop, but for this project, we will focus on the location of it.

The location for a coffee shop is closely related to their business, especially in Taipei, as the majority of people in Taipei rely on public transportation to move around the city.<br>

## Business problem

What is the best location for a coffee shop to be?

What type of venues nearby should one look for or avoid?

## Target of this report

People who wants to open a coffee shop in Taipei, but has not decided the location.

# Data

In this project, we will be using primarily the foursquare location dataset.<br>
Below is an example of the data that we will use.

In [2]:
# @hidden_cell
CLIENT_ID = 'TFBQ3TCNGGG12CQDSF0LJSXUVGADSDLUEVEDRXEUW0P21W0F' # your Foursquare ID
CLIENT_SECRET = '4AJRYELS00FWGQFH20DYLZ4VV2NGHK3JFZZUFS13A1IAKB0R' # your Foursquare Secret
VERSION = '20180628' # Foursquare API version

In [526]:
import json, requests, pandas as pd
url = 'https://api.foursquare.com/v2/venues/explore'

params = dict(
  client_id=CLIENT_ID,
  client_secret=CLIENT_SECRET,
  v=VERSION,
  ll='25.0833,121.517',
  query='coffee',
  limit=5
)
resp = requests.get(url=url, params=params)
data = json.loads(resp.text)
#print(data)
results=data['response']['groups'][0]['items']
ids=[]
for i in results:
    #print(i['venue']['id'])
    ids.append(i['venue']['id'])


In [527]:
info=[]
for VENUE_ID in ids:
    #VENUE_ID='52fefe3f11d2a6f087c20989'
    url = 'https://api.foursquare.com/v2/venues/{}?&client_id={}&client_secret={}&v={}'.format(
        VENUE_ID,        
        CLIENT_ID, 
                CLIENT_SECRET, 
                VERSION
    )
    results = requests.get(url).json()['response']['venue']
    #print(results)
    info.append([results['name'],results['likes']['count'],results['rating'],results['location']['lat'],results['location']['lng']])
df=pd.DataFrame(info)
df.columns=['Name','Likes','rating','latitude','longitude']
#print(results)
df

Unnamed: 0,Name,Likes,rating,latitude,longitude
0,咖啡弄 Coffee Alley,14,7.7,25.088665,121.52648
1,STARBUCKS (Kulun Chengde) (星巴克（庫倫承德門市）),20,7.8,25.072044,121.518828
2,松鶴廳 Lobby Cafe,8,7.2,25.078453,121.52622
3,榕 RON Cafe + Bar,24,7.3,25.091518,121.526787
4,星巴克 Starbucks,4,6.7,25.081778,121.523048


# Methodology

Let's create a function that returns coffee shops near a given location.

In [30]:
def get_coffees(lal,num):
    import json, requests, pandas as pd,numpy as np
    url = 'https://api.foursquare.com/v2/venues/explore'

    params = dict(
      client_id=CLIENT_ID,
      client_secret=CLIENT_SECRET,
      v=VERSION,
      ll=lal,
      query='coffee',
      limit=num
    )
    resp = requests.get(url=url, params=params)
    data = json.loads(resp.text)
    #print(data)
    results=data['response']['groups'][0]['items']
    ids=[]
    for i in results:
        #print(i['venue']['id'])
        ids.append(i['venue']['id'])
    info=[]
    for VENUE_ID in ids:
        #VENUE_ID='52fefe3f11d2a6f087c20989'
        url = 'https://api.foursquare.com/v2/venues/{}?&client_id={}&client_secret={}&v={}'.format(
            VENUE_ID,        
            CLIENT_ID, 
                    CLIENT_SECRET, 
                    VERSION
        )
        while True:
            results = requests.get(url)
            if results.status_code == requests.codes.ok:
                break
        #print(results)
        results=results.json()['response']['venue']
        try:
            rating=results['rating']
        except:
            rating=np.nan
        info.append([results['name'],results['likes']['count'],rating,results['location']['lat'],results['location']['lng']])
    df=pd.DataFrame(info)
    df.columns=['Name','Likes','rating','latitude','longitude']
    #print(results)
    return df

Due to the limit of only 500 calls allowed per day with foursquare API, it's not possible to get the data of every coffee shop in Taipei.

Instead, we will first choose two districts in Taipei with the best overall score.

Get 10 coffee shops in every district

In [None]:
df_Shilin=get_coffees('25.0833,121.517',10)

In [None]:
df_Beitou=get_coffees('25.1167,121.5',10)

In [24]:
df_Neihu=get_coffees('25.04,121.35',10)

In [27]:
df_Wenshan=get_coffees('24.989722,121.572222',10)

In [32]:
df_Nangang=get_coffees('25.053315,121.607409',10)

In [33]:
df_Zhongshan=get_coffees('25.068889,121.533056',10)

In [35]:
df_Daan=get_coffees('25.026389,121.534444',10)

In [36]:
df_Xinyi=get_coffees('25.035833,121.568333',10)

In [37]:
df_Songshan=get_coffees('25.059788,121.55727',10)

In [38]:
df_Wanhua=get_coffees('25.0333,121.483',10)

In [39]:
df_Zhongzheng=get_coffees('25.031667,121.516389',10)

In [40]:
df_Datong=get_coffees('25.059722,121.514167',10)

Calculate score for each district.

The score here is defined to be (number of likes)+10*(rating)

In [74]:
score=[]
for i in [df_Shilin,df_Beitou,df_Neihu,df_Wenshan,df_Nangang,df_Zhongshan,df_Daan,df_Xinyi,df_Songshan,df_Wanhua,df_Zhongzheng,df_Datong]:
    score.append(i.describe().loc['mean']['Likes']+10*i.describe().loc['mean']['rating'])
for s in score:
    print(s)

82.2
87.29999999999998
74.67777777777778
88.1
75.77777777777777
76.3
114.7
103.0
118.9
82.6
97.9
102.49999999999999


Songshan District has the best overall customer statisfation, followed by Da'an District.<br>
We will use coffee shops in these two districts to answer our question.

(Coffee shops with no rating is filled with the average rating in that district.)

### Songshan District

In [75]:
df=get_coffees('25.059788,121.55727',200)

In [93]:
df['score']=df['Likes']+10*df['rating'].fillna(df['rating'].mean())
df

Unnamed: 0,Name,Likes,rating,latitude,longitude,score
0,Fujin Tree 353 Cafe by Simple Kaffa,68,8.1,25.060595,121.557873,149.0
1,All Day Roasting Company,78,8.2,25.056569,121.560347,160.0
2,Coffee Essential (民生工寓),79,8.5,25.057592,121.551793,164.0
3,左先生咖啡 Dousun Cafe,28,8.2,25.056908,121.563851,110.0
4,楽楽咖啡,47,8.2,25.056944,121.564356,129.0
5,STARBUCKS (San Min) (星巴克),24,7.6,25.058858,121.562946,100.0
6,Woolloomooloo,48,7.4,25.059823,121.552403,122.0
7,六丁目cafe,19,7.2,25.058308,121.560942,91.0
8,Leisure Cafe,9,8.0,25.053539,121.552638,89.0
9,朋廚烘培坊 Bonjour,7,6.8,25.060423,121.560447,75.0


### Da'an District

In [96]:
df_1=get_coffees('25.026389,121.534444',200)

In [98]:
df_1['score']=df_1['Likes']+10*df_1['rating'].fillna(df_1['rating'].mean())
df_1

Unnamed: 0,Name,Likes,rating,latitude,longitude,score
0,青田七六,46,8.3,25.028049,121.532596,129.000000
1,Wistaria Tea House (紫藤廬),70,8.0,25.024553,121.534507,150.000000
2,貳月咖啡,10,8.1,25.027019,121.532101,91.000000
3,515 cafe & books,8,8.0,25.028967,121.531372,88.000000
4,咖啡黑潮 Cafe Kuroshio,22,7.8,25.027908,121.530545,100.000000
5,2J Cafe,16,8.5,25.031081,121.538382,101.000000
6,葛樂蒂咖啡館 Galette,8,7.8,25.022247,121.533913,86.000000
7,AGCT Apartment,14,8.1,25.021378,121.533101,95.000000
8,Yaboo Cafe (鴉埠咖啡),73,8.2,25.030502,121.530520,155.000000
9,Libero Coffee & Bar (咖啡小自由),73,7.9,25.030094,121.530538,152.000000


Now, let's explore the area of each coffee shops.

Here we create a function that returns every venue in a radius of 200 meters

In [168]:
def explore_area(lat,lon):
    import json, requests, pandas as pd,numpy as np
    url = 'https://api.foursquare.com/v2/venues/search'

    params = dict(
      client_id=CLIENT_ID,
      client_secret=CLIENT_SECRET,
      v=VERSION,
      ll=str(lat)+','+str(lon),
      radius=200,
      intent='browse'
    )
    while True:
        resp = requests.get(url=url, params=params)
        if resp.status_code == requests.codes.ok:
            break
    data = json.loads(resp.text)
    #print(data)
    venues_list=data['response']['venues']
    cat_list=[]
    id_list=[]
    for venue in venues_list:
        if venue['location']['distance']!=0 and venue['categories']!=[]:
            cat_list.append(venue['categories'][0]['name'])
            #id_list.append(venue['id'])
        #except:
         #   pass
    temp_df=pd.DataFrame(data={'category':cat_list})
    #temp_df.columns=['name','category']
    #temp_df=temp_df.groupby('category')['frequency'].nunique().to_frame().reset_index()
    #temp_df.insert(0,'Name',name)
    return temp_df

### Songshan District

In [173]:
df_nearby=pd.DataFrame(columns=['Name','category'])
#df_nearby.columns=['Name','category','frequency']
for name,lat,lon in zip(df['Name'],df['latitude'],df['longitude']):
    df_temp=explore_area(lat,lon)
    df_temp.insert(0,'Name',name)
    df_nearby=df_nearby.append(df_temp,ignore_index=True,sort=False)
    #print(explore_area(name,lat,lon))
df_nearby

Unnamed: 0,Name,category
0,Fujin Tree 353 Cafe by Simple Kaffa,Clothing Store
1,Fujin Tree 353 Cafe by Simple Kaffa,Park
2,Fujin Tree 353 Cafe by Simple Kaffa,Baby Store
3,Fujin Tree 353 Cafe by Simple Kaffa,Breakfast Spot
4,Fujin Tree 353 Cafe by Simple Kaffa,Café
5,Fujin Tree 353 Cafe by Simple Kaffa,Nature Preserve
6,Fujin Tree 353 Cafe by Simple Kaffa,Furniture / Home Store
7,Fujin Tree 353 Cafe by Simple Kaffa,Kids Store
8,Fujin Tree 353 Cafe by Simple Kaffa,Park
9,Fujin Tree 353 Cafe by Simple Kaffa,Café


### Da'an District

In [174]:
df_1_nearby=pd.DataFrame(columns=['Name','category'])
#df_nearby.columns=['Name','category','frequency']
for name,lat,lon in zip(df_1['Name'],df_1['latitude'],df_1['longitude']):
    df_temp=explore_area(lat,lon)
    df_temp.insert(0,'Name',name)
    df_1_nearby=df_1_nearby.append(df_temp,ignore_index=True,sort=False)
    #print(explore_area(name,lat,lon))
df_1_nearby

Unnamed: 0,Name,category
0,青田七六,Park
1,青田七六,Bistro
2,青田七六,Historic Site
3,青田七六,Confucian Temple
4,青田七六,Used Bookstore
5,青田七六,Church
6,青田七六,Art Gallery
7,青田七六,Mosque
8,青田七六,Camera Store
9,青田七六,Tea Room


Merge the 2 dataframes above and convert them into an one hot encoded table of categories of nearby venues.

In [181]:
df_around=df_nearby.append(df_1_nearby)
df_around_oh=pd.get_dummies(df_around[['category']],prefix='',prefix_sep='')
df_around_oh.insert(0,'Name',df_around['Name'])
df_grouped=df_around_oh.groupby('Name').sum().reset_index()
df_grouped

Unnamed: 0,Name,Accessories Store,Acupuncturist,Advertising Agency,Airport,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Animal Shelter,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Baggage Claim,Bakery,Bank,Bar,Baseball Field,Basketball Court,Basketball Stadium,Beer Bar,Beer Garden,Bike Rental / Bike Share,Bike Shop,Bistro,Board Shop,Bookstore,Boutique,Breakfast Spot,Bridal Shop,Bubble Tea Shop,Buffet,Building,...,Southern / Soul Food Restaurant,Souvenir Shop,Spa,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Stationery Store,Steakhouse,Street Art,Student Center,Supermarket,Sushi Restaurant,Szechuan Restaurant,Tailor Shop,Taiwanese Restaurant,Taxi Stand,Tea Room,Tech Startup,Temple,Thai Restaurant,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Travel Lounge,Tree,Turkish Restaurant,University,Used Bookstore,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Village,Wedding Hall,Whisky Bar,Winery,Women's Store,Yoga Studio
0,2J Cafe,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,2,0,1,0,0,...,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"3,CO Cafe",0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,2,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,50嵐 師大店,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,5,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,515 cafe & books,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,3,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,A. Place Cafe,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0
5,AGCT Apartment,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,1,0,3,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
6,Afterhours Café,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,All Day Roasting Company,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,Angle Café (Angle cafe 自家烘焙咖啡館),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
9,BEANS & BEATS,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Put the original information back in, and we have our final dataset.

In [182]:
df_final=df.append(df_1).merge(df_grouped,on='Name')
df_final

Unnamed: 0,Name,Likes,rating,latitude,longitude,score,Accessories Store,Acupuncturist,Advertising Agency,Airport,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Animal Shelter,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Baggage Claim,Bakery,Bank,Bar,Baseball Field,Basketball Court,Basketball Stadium,Beer Bar,Beer Garden,Bike Rental / Bike Share,Bike Shop,Bistro,Board Shop,Bookstore,Boutique,...,Southern / Soul Food Restaurant,Souvenir Shop,Spa,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Stationery Store,Steakhouse,Street Art,Student Center,Supermarket,Sushi Restaurant,Szechuan Restaurant,Tailor Shop,Taiwanese Restaurant,Taxi Stand,Tea Room,Tech Startup,Temple,Thai Restaurant,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Travel Lounge,Tree,Turkish Restaurant,University,Used Bookstore,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Village,Wedding Hall,Whisky Bar,Winery,Women's Store,Yoga Studio
0,Fujin Tree 353 Cafe by Simple Kaffa,68,8.1,25.060595,121.557873,149.000000,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,All Day Roasting Company,78,8.2,25.056569,121.560347,160.000000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Coffee Essential (民生工寓),79,8.5,25.057592,121.551793,164.000000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0
3,左先生咖啡 Dousun Cafe,28,8.2,25.056908,121.563851,110.000000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,楽楽咖啡,47,8.2,25.056944,121.564356,129.000000,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,STARBUCKS (San Min) (星巴克),24,7.6,25.058858,121.562946,100.000000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,Woolloomooloo,48,7.4,25.059823,121.552403,122.000000,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,六丁目cafe,19,7.2,25.058308,121.560942,91.000000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,Leisure Cafe,9,8.0,25.053539,121.552638,89.000000,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,朋廚烘培坊 Bonjour,7,6.8,25.060423,121.560447,75.000000,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Since doing regression on such a small feature set is not a good idea, we will use K-means clustering to see if we can find any patterns.

In [447]:
X=df_final.drop(['Name','Likes','rating','score'],axis=1)

In [449]:
from sklearn.cluster import KMeans
kmeans=KMeans(init='k-means++',n_clusters=5,n_init=12).fit(X)
df_final.insert(1,'Class',kmeans.labels_)
df_final.head()

Unnamed: 0,Name,Class,Likes,rating,latitude,longitude,score,Accessories Store,Acupuncturist,Advertising Agency,Airport,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Animal Shelter,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Baggage Claim,Bakery,Bank,Bar,Baseball Field,Basketball Court,Basketball Stadium,Beer Bar,Beer Garden,Bike Rental / Bike Share,Bike Shop,Bistro,Board Shop,Bookstore,...,Southern / Soul Food Restaurant,Souvenir Shop,Spa,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Stationery Store,Steakhouse,Street Art,Student Center,Supermarket,Sushi Restaurant,Szechuan Restaurant,Tailor Shop,Taiwanese Restaurant,Taxi Stand,Tea Room,Tech Startup,Temple,Thai Restaurant,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Travel Lounge,Tree,Turkish Restaurant,University,Used Bookstore,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Village,Wedding Hall,Whisky Bar,Winery,Women's Store,Yoga Studio
0,Fujin Tree 353 Cafe by Simple Kaffa,3,68,8.1,25.060595,121.557873,149.0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,All Day Roasting Company,4,78,8.2,25.056569,121.560347,160.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Coffee Essential (民生工寓),4,79,8.5,25.057592,121.551793,164.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0
3,左先生咖啡 Dousun Cafe,4,28,8.2,25.056908,121.563851,110.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,楽楽咖啡,4,47,8.2,25.056944,121.564356,129.0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


To make our dataframe easier to comprehend, we convert the table to show the 5 most common type of venues nearby.

In [498]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row#.iloc[0:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [501]:
indicators = ['st', 'nd', 'rd']
num_top_venues=5
# create columns according to number of top venues
columns = ['Name','Class','Score']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
df_final_venues_sorted = pd.DataFrame(columns=columns)
df_final_venues_sorted['Name'] = df_final['Name']
df_final_venues_sorted['Class'] = df_final['Class']
df_final_venues_sorted['Score'] = df_final['score']

for ind in np.arange(df_final.shape[0]):
    df_final_venues_sorted.iloc[ind, 3:] = return_most_common_venues(df_final.iloc[ind, 7:], num_top_venues)
df_final_venues_sorted

Unnamed: 0,Name,Class,Score,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Fujin Tree 353 Cafe by Simple Kaffa,3,149.000000,Café,Office,Park,Dumpling Restaurant,Clothing Store
1,All Day Roasting Company,4,160.000000,Taiwanese Restaurant,Café,Chinese Restaurant,Residential Building (Apartment / Condo),Bakery
2,Coffee Essential (民生工寓),4,164.000000,Dumpling Restaurant,Chinese Restaurant,Japanese Restaurant,Coffee Shop,Bubble Tea Shop
3,左先生咖啡 Dousun Cafe,4,110.000000,Taiwanese Restaurant,Coffee Shop,Japanese Restaurant,Bakery,Szechuan Restaurant
4,楽楽咖啡,4,129.000000,Convenience Store,Noodle House,Japanese Restaurant,Hong Kong Restaurant,Business Service
5,STARBUCKS (San Min) (星巴克),4,100.000000,Convenience Store,Community Center,Coffee Shop,Library,Building
6,Woolloomooloo,3,122.000000,Art Gallery,Building,Café,Taiwanese Restaurant,Nightclub
7,六丁目cafe,4,91.000000,Taiwanese Restaurant,Bakery,Convenience Store,Salon / Barbershop,Residential Building (Apartment / Condo)
8,Leisure Cafe,4,89.000000,Chinese Restaurant,Japanese Restaurant,Park,Taiwanese Restaurant,Campground
9,朋廚烘培坊 Bonjour,3,75.000000,Café,Salon / Barbershop,Coffee Shop,Italian Restaurant,Park


# Results

Let's look at each class and it's average score

Class 0

In [504]:
print(df_final_venues_sorted[df_final_venues_sorted['Class']==0]['Score'].mean())
df_final_venues_sorted[df_final_venues_sorted['Class']==0]

77.53494623655912


Unnamed: 0,Name,Class,Score,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
76,天下佈武 日月潭紅茶 Sun Moon Lake Black Tea,0,103.0,Bubble Tea Shop,Salon / Barbershop,Food Truck,College Administrative Building,Park
79,petit doux 微小甜,0,85.0,Bubble Tea Shop,Dumpling Restaurant,Convenience Store,Snack Place,Pharmacy
87,回留茶藝素食,0,82.0,Bubble Tea Shop,Tea Room,Cosmetics Shop,Café,Playground
96,STARBUCKS (星巴克),0,83.0,Bubble Tea Shop,Convenience Store,Chinese Restaurant,Doctor's Office,Japanese Restaurant
102,避世所 Bistro O,0,73.0,Bubble Tea Shop,Italian Restaurant,Bakery,Noodle House,Japanese Restaurant
106,Together,0,71.0,Bakery,Bubble Tea Shop,Convenience Store,Dumpling Restaurant,Clothing Store
119,天曉得,0,72.903226,Dumpling Restaurant,Convenience Store,Café,Bubble Tea Shop,Salon / Barbershop
120,小南風 Minami Zephyr,0,70.903226,Bubble Tea Shop,Convenience Store,Park,College Academic Building,Bus Station
121,米倉咖啡,0,72.903226,Café,Bubble Tea Shop,Accessories Store,Soup Place,Night Market
122,Rose House 古典玫瑰園師大店,0,71.903226,Café,Salon / Barbershop,Bubble Tea Shop,Dumpling Restaurant,Convenience Store


Class 1

In [505]:
print(df_final_venues_sorted[df_final_venues_sorted['Class']==1]['Score'].mean())
df_final_venues_sorted[df_final_venues_sorted['Class']==1]

73.38934081346423


Unnamed: 0,Name,Class,Score,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
41,丹堤咖啡 Dante Coffee,1,70.521739,Café,Mobile Phone Shop,Coffee Shop,Convenience Store,College Academic Building
42,丹堤咖啡 Dante Coffee,1,69.521739,Café,Mobile Phone Shop,Coffee Shop,Convenience Store,College Academic Building
43,丹堤咖啡 Dante Coffee,1,70.903226,Café,Mobile Phone Shop,Coffee Shop,Convenience Store,College Academic Building
104,星巴克 Starbucks,1,84.0,Park,Convenience Store,College Academic Building,Coffee Shop,Italian Restaurant
105,星巴克 Starbucks,1,72.0,Park,Convenience Store,College Academic Building,Coffee Shop,Italian Restaurant


Class 2

In [506]:
print(df_final_venues_sorted[df_final_venues_sorted['Class']==2]['Score'].mean())
df_final_venues_sorted[df_final_venues_sorted['Class']==2]

96.38461538461539


Unnamed: 0,Name,Class,Score,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
56,515 cafe & books,2,88.0,Taiwanese Restaurant,Coffee Shop,Café,Park,Dessert Shop
61,Yaboo Cafe (鴉埠咖啡),2,155.0,Taiwanese Restaurant,Café,Ice Cream Shop,Salon / Barbershop,Coffee Shop
62,Libero Coffee & Bar (咖啡小自由),2,152.0,Taiwanese Restaurant,Park,Dentist's Office,Ice Cream Shop,Leather Goods Store
67,誰的書房 who's café,2,87.0,Taiwanese Restaurant,Parking,Art Gallery,Restaurant,Tea Room
70,永康階 The Green Steps,2,101.0,Bubble Tea Shop,Café,Bakery,Coffee Shop,Taiwanese Restaurant
73,Xiaomijo (小米酒咖啡館),2,98.0,Coffee Shop,Café,Boutique,Ice Cream Shop,Gift Shop
74,串門子茶館 Stop By Tea House,2,86.0,Coffee Shop,Taiwanese Restaurant,Convenience Store,Dessert Shop,Ice Cream Shop
80,Youmou to Ohana Coffee (羊毛與花 coffee),2,81.0,Café,Ice Cream Shop,Taiwanese Restaurant,Bubble Tea Shop,Coffee Shop
82,烘培者咖啡 Roaster Family Coffee,2,90.0,Taiwanese Restaurant,Café,Ice Cream Shop,Salon / Barbershop,Chinese Restaurant
88,好多咖啡 Forgood,2,97.0,Taiwanese Restaurant,Park,Coffee Shop,Salon / Barbershop,Café


Class 3

In [507]:
print(df_final_venues_sorted[df_final_venues_sorted['Class']==3]['Score'].mean())
df_final_venues_sorted[df_final_venues_sorted['Class']==3]

84.20848956693479


Unnamed: 0,Name,Class,Score,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Fujin Tree 353 Cafe by Simple Kaffa,3,149.0,Café,Office,Park,Dumpling Restaurant,Clothing Store
6,Woolloomooloo,3,122.0,Art Gallery,Building,Café,Taiwanese Restaurant,Nightclub
9,朋廚烘培坊 Bonjour,3,75.0,Café,Salon / Barbershop,Coffee Shop,Italian Restaurant,Park
14,BEANS & BEATS,3,67.0,Café,Park,Clothing Store,Playground,Pop-Up Shop
15,Kyushu Pancake (九州鬆餅咖啡店),3,76.0,Café,Park,Salon / Barbershop,Noodle House,Music Store
16,Café Mode (木馬),3,64.0,Park,Furniture / Home Store,Playground,Clothing Store,Café
17,De 'A,3,65.0,Café,Park,Clothing Store,Taiwanese Restaurant,Parking
21,Pausa,3,69.521739,Bus Stop,Café,Clothing Store,Nature Preserve,Building
23,"3,CO Cafe",3,71.521739,Park,Café,Furniture / Home Store,Playground,Clothing Store
24,Cafe Ballet-Baby Kitchen,3,70.521739,Park,Café,Playground,Furniture / Home Store,Bakery


Class 4

In [508]:
print(df_final_venues_sorted[df_final_venues_sorted['Class']==4]['Score'].mean())
df_final_venues_sorted[df_final_venues_sorted['Class']==4]

81.78142380422697


Unnamed: 0,Name,Class,Score,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,All Day Roasting Company,4,160.0,Taiwanese Restaurant,Café,Chinese Restaurant,Residential Building (Apartment / Condo),Bakery
2,Coffee Essential (民生工寓),4,164.0,Dumpling Restaurant,Chinese Restaurant,Japanese Restaurant,Coffee Shop,Bubble Tea Shop
3,左先生咖啡 Dousun Cafe,4,110.0,Taiwanese Restaurant,Coffee Shop,Japanese Restaurant,Bakery,Szechuan Restaurant
4,楽楽咖啡,4,129.0,Convenience Store,Noodle House,Japanese Restaurant,Hong Kong Restaurant,Business Service
5,STARBUCKS (San Min) (星巴克),4,100.0,Convenience Store,Community Center,Coffee Shop,Library,Building
7,六丁目cafe,4,91.0,Taiwanese Restaurant,Bakery,Convenience Store,Salon / Barbershop,Residential Building (Apartment / Condo)
8,Leisure Cafe,4,89.0,Chinese Restaurant,Japanese Restaurant,Park,Taiwanese Restaurant,Campground
10,小春日和 動物雜貨• 珈琲,4,74.0,Café,Playground,Bus Line,Laundry Service,Beer Garden
11,Wilbeck Café (威爾貝克手工烘咖啡（南京店）),4,85.0,Asian Restaurant,Dumpling Restaurant,Breakfast Spot,Chinese Restaurant,Medical Center
12,春水堂 光南店,4,99.0,Bakery,Taiwanese Restaurant,Convenience Store,Office,Dumpling Restaurant


## Visualization

### 1.Distribution of clusters

In [461]:
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors

In [482]:
map_clusters = folium.Map(location=[25.040667,121.556667], zoom_start=14)
x = np.arange(5)
ys = [i + x + (i*x)**2 for i in range(5)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(df_final['latitude'], df_final['longitude'], df_final['Name'], df_final['Class']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

map_clusters

### 2.Distribution of clusters and scores

The color of the outer ring represents the score of the coffee shop, and the color inside represents the class label.

In [490]:
map_clusters = folium.Map(location=[25.040667,121.556667], zoom_start=14)
x = np.arange(5)
ys = [i + x + (i*x)**2 for i in range(5)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

x = np.arange(130)
ys = [i + x + (i*x)**2 for i in range(130)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow1 = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster ,score in zip(df_final['latitude'], df_final['longitude'], df_final['Name'],df_final['Class'], df_final['score']):
    label = folium.Popup(str(poi) + ' Score ' + str(round(score))+' Class '+str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow1[round(score-40)],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

map_clusters

Class 2 has a noticeably higher average score.<br>
Let's inspect Class 2 more closely.

In [509]:
df_final_venues_sorted[df_final_venues_sorted['Class']==2]

Unnamed: 0,Name,Class,Score,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
56,515 cafe & books,2,88.0,Taiwanese Restaurant,Coffee Shop,Café,Park,Dessert Shop
61,Yaboo Cafe (鴉埠咖啡),2,155.0,Taiwanese Restaurant,Café,Ice Cream Shop,Salon / Barbershop,Coffee Shop
62,Libero Coffee & Bar (咖啡小自由),2,152.0,Taiwanese Restaurant,Park,Dentist's Office,Ice Cream Shop,Leather Goods Store
67,誰的書房 who's café,2,87.0,Taiwanese Restaurant,Parking,Art Gallery,Restaurant,Tea Room
70,永康階 The Green Steps,2,101.0,Bubble Tea Shop,Café,Bakery,Coffee Shop,Taiwanese Restaurant
73,Xiaomijo (小米酒咖啡館),2,98.0,Coffee Shop,Café,Boutique,Ice Cream Shop,Gift Shop
74,串門子茶館 Stop By Tea House,2,86.0,Coffee Shop,Taiwanese Restaurant,Convenience Store,Dessert Shop,Ice Cream Shop
80,Youmou to Ohana Coffee (羊毛與花 coffee),2,81.0,Café,Ice Cream Shop,Taiwanese Restaurant,Bubble Tea Shop,Coffee Shop
82,烘培者咖啡 Roaster Family Coffee,2,90.0,Taiwanese Restaurant,Café,Ice Cream Shop,Salon / Barbershop,Chinese Restaurant
88,好多咖啡 Forgood,2,97.0,Taiwanese Restaurant,Park,Coffee Shop,Salon / Barbershop,Café


In [525]:
df_class2=df_final[df_final['Class']==2]
map_clusters = folium.Map(location=[25.029567,121.530767], zoom_start=17)
x = np.arange(90)
ys = [i + x + (i*x)**2 for i in range(90)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, score in zip(df_class2['latitude'], df_class2['longitude'], df_class2['Name'], df_class2['score']):
    label = folium.Popup(str(poi) + ' Score ' + str(score), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[round(score-70)],
        fill=True,
        fill_color=rainbow[round(score-70)],
        fill_opacity=0.7).add_to(map_clusters)

map_clusters

# Discussion

Looking at the best cluster, class 2, we found that they are generally around Taiwanese restaurants and other coffee shops.

They are also all located in Da'an district.

Looking at the worst two clusters, class 0 and class 1, we found that they are generally around other coffee shops, bubble tea shops and convenience stores.

It seems like being surrounded by lots of other shops that sell drinks is too much of a competition.

Also notice that while class 2 is accompanied by other coffee shops, it's not the main type of venues nearby.

Based on these observations, we recommend that people open their new coffee shops in Da'an District, with Taiwanese restaurants nearby, and keep a mid-range distance to other coffee shops.

# Conclusion

While there is still a lot of factors that are not considered in this project due to the limit of foursqure API, we can still make some interesting conclusions based on our observations.

1. It is not recommended to have your coffee shop surrounded by all kinds of shops with drinks. It appears that competition of such an intensity is not beneficial to your business.

2. It is recommended to have your coffee shop accompanied by other coffee shops, but not where coffee shops are the main type of business in that area.

3. Generally, it is recommended to open your coffee shop in either Songsang District or Da'an District.

4. Chained coffee shops performs fine and consistently, but not outstanding.

<center>This is a demonstration of using data science techniques to analyze coffee shop performance based on their location.

<center>I do not in any way claim that this report represents the real world.
    
<center>Anyone may use this report to their own interests, but I'm not responsible for any action caused by this report.
