# Boba tea shops in Denver Metro area

## 1.Introduction/Business Problem

Boba tea shops are becoming more popular these years. Boba tea originated in Taiwan in the early 1980s. "Boba", "pearl", or "bubble" refers to chewy tapioca balls that are usually served with tea or milk tea. Both come with a variety of flavors. Boba tea shops are unique and hard to find, unlike coffee shops that you can see at almost every corner of the city. Boba tea tends to be very popular in asian communities, so a city like Denver with a large asian population will be a good place to invest in a boba tea shop.

Denver is one of the fastest growing cities in the USA. More people are moving to Denver every year, the realty industry is growing, the airline industry is also growing. So if someone is looking to open a Boba tea shop the Denver Metropolitan Area, where would it be the best area to open it? 

This study will be helpful for not only an entrepreneur who wants to invest in a boba tea shop, but also an individual who is interested in boba tea shop business.

## 2.Data

Firstly, I have to find data set that include postal code, city, latitude and longtitude and extract only cities in Denver Metropolitan Area only.

    - The postal code or zip code data will be found on https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/table/?refine.state=CO.

    - For the list of cities in in Denver Metropolitan Area, we will scrape it from https://schoolchoiceforkids.org/denver-metro-area-cities-school-districts/.
    

Next, I pull the venue data from Foursquare API trying to get a list of boba tea shops that exist in Denver Metropolitan Area. We will use Latitue and Longtitude of Denver as a center of the map and searching for boba tea shops in radius 40 miles or about 65 kilometers. From this part, we can explore the shops and plot it on the map to see how they locate trough out the areas.

Here is the features we will get

        -venueid
        -venuename
        -latitude
        -longitude
        -city
        -postalcode
        -category_primaryid
        -category_primary


Not only venue data, we also can go into venue detail for each boba tea shop to see how they perform via Foursquare API. We pull venue details to see rating, price, and also what are their busy time to determine the best shop hours. We will also use Clustering for this data to see the differences.

Features for boba tea shop detail as following

        -rating
        -price
        -hours
        -hours_popular
        -description
        -popullarity_score

### Import Colorado zip code data from CSV

In [1]:
import numpy as np
import pandas as pd
print('Libraries imported.')

Libraries imported.


In [9]:
url = 'https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/download/?format=csv&timezone=America/Denver&lang=en&use_labels_for_header=true&csv_separator=%3B'
usa = pd.read_csv(url, sep = ';') #seperate each column with ;
print ('data loaded!')

data loaded!


In [10]:
usa.head()

Unnamed: 0,Zip,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,geopoint
0,66025,Eudora,KS,38.917032,-95.06455,-6,1,"38.917032,-95.06455"
1,74565,Savanna,OK,34.831398,-95.83967,-6,1,"34.831398,-95.83967"
2,75631,Beckville,TX,32.237924,-94.46427,-6,1,"32.237924,-94.46427"
3,92067,Rancho Santa Fe,CA,33.016492,-117.20264,-8,1,"33.016492,-117.20264"
4,92119,San Diego,CA,32.80225,-117.02431,-8,1,"32.80225,-117.02431"


In [15]:
colorado = usa[usa['State']=="CO"] #filter only colorado
colorado.head()

Unnamed: 0,Zip,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,geopoint
42,81003,Pueblo,CO,38.281052,-104.62567,-7,1,"38.281052,-104.62567"
200,80452,Idaho Springs,CO,39.737369,-105.56054,-7,1,"39.737369,-105.56054"
260,81013,Pueblo,CO,38.128626,-104.552299,-7,1,"38.128626,-104.552299"
261,81029,Campo,CO,37.136682,-102.5296,-7,1,"37.136682,-102.5296"
342,81244,Rockvale,CO,38.353064,-105.18642,-7,1,"38.353064,-105.18642"


In [204]:
colorado.shape #cities in colorado

(680, 8)

### Now, scrape a list of Denver Metropolitian Area from the website

In [45]:
pip install html-table-parser-python3 #install package to use BeautifulSoup

Note: you may need to restart the kernel to use updated packages.


In [77]:
from urllib.request import Request, urlopen

#scrape website
req = Request('https://schoolchoiceforkids.org/denver-metro-area-cities-school-districts/', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
webpage

b'<!DOCTYPE html>\n\t\t<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">\n\t\t\t\n\t\t\t\n\t\t\t<head>\n\n\t\t\t\t<!\xe2\x80\x93 Global site tag (gtag.js) \xe2\x80\x93 Google Analytics \xe2\x80\x93>\n<script async src=\xe2\x80\x9dhttps://www.googletagmanager.com/gtag/js?id=UA-6231251-12\xe2\x80\xb3></script>\n<script>\nwindow.dataLayer = window.dataLayer || [];\nfunction gtag(){dataLayer.push(arguments);}\ngtag(\xe2\x80\x98js\xe2\x80\x99, new Date());\n\ngtag(\xe2\x80\x98config\xe2\x80\x99, \xe2\x80\x98UA-6231251-12\xe2\x80\x99);\n</script>\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<title>Denver Metro Area Cities and School Districts - School Choice for Kids</title>\n\n<link rel="stylesheet" href="https://schoolchoiceforkids.org/wp-content/plugins/sitepress-multilingual-cms/res/css/language-selector.css?v=3.4.1" type="text/css" media="all" /> \n\t\t\t\t<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />\n\t\t\t\t<meta name="generator" content="WordPress 5.5.3" /

In [205]:
#get table html
from bs4 import BeautifulSoup  
soup = BeautifulSoup(webpage, 'html.parser')  
results = soup.find_all('table')
results

[<table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="width: 305.65pt; margin-left: 5.4pt; border-collapse: collapse;" width="408">
 <tbody>
 <tr style="height: 12.75pt;">
 <td style="border-style: double solid solid double; border-color: windowtext; border-width: 1.5pt 1pt 1pt 1.5pt; padding: 0in 5.4pt; width: 133.65pt; height: 12.75pt;" valign="bottom" width="178">
 <p class="MsoNormal"><strong><span style="font-size: 10pt; font-family: Arial;">CITY</span></strong></p>
 </td>
 <td style="border-style: double double solid none; border-color: windowtext windowtext windowtext -moz-use-text-color; border-width: 1.5pt 1.5pt 1pt medium; padding: 0in 5.4pt; width: 172pt; height: 12.75pt;" valign="bottom" width="229">
 <p class="MsoNormal"><strong><span style="font-size: 10pt; font-family: Arial;">SCHOOL DISTRICT</span></strong></p>
 </td>
 </tr>
 <tr style="height: 12.75pt;">
 <td style="border-style: none solid solid double; border-color: -moz-use-text-color wind

In [69]:
#scrape table into 2 colums
A=[]
B=[]

for row in soup.find_all('tr'):
    cells=row.find_all('td')
    if len(cells)==2:
        A.append(cells[0].text.strip())
        B.append(cells[1].text.strip())

In [75]:
#get dataframe denmetro
denmetro=pd.DataFrame(A,columns=['CITY'])
denmetro['SCHOOL DISTRICT']=B


denmetro.head()

Unnamed: 0,CITY,SCHOOL DISTRICT
0,CITY,SCHOOL DISTRICT
1,ARVADA,JEFFERSON COUNTY R-1
2,ARVADA,WESTMINSTER 50
3,AURORA,ADAMS-ARAPAHOE 28J
4,AURORA,CHARTER SCHOOL INSTITUTE


In [76]:
denmetro.drop(denmetro.index[0], inplace=True) #get rid of the first row
denmetro.head()

Unnamed: 0,CITY,SCHOOL DISTRICT
1,ARVADA,JEFFERSON COUNTY R-1
2,ARVADA,WESTMINSTER 50
3,AURORA,ADAMS-ARAPAHOE 28J
4,AURORA,CHARTER SCHOOL INSTITUTE
5,AURORA,CHERRY CREEK 5


In [82]:
denmetro.drop('SCHOOL DISTRICT', axis=1, inplace=True) #get rid of the school district column
denmetro

Unnamed: 0,CITY
1,ARVADA
2,ARVADA
3,AURORA
4,AURORA
5,AURORA
...,...
71,WATKINS
72,WESTMINSTER
73,WESTMINSTER
74,WESTMINSTER


In [176]:
denmetro.drop_duplicates(inplace=True) #keep only unique cities
denmetro.reset_index(inplace=True)
denmetro

Unnamed: 0,CITY
0,ARVADA
1,AURORA
2,BAILEY
3,BLACK HAWK
4,BOULDER
5,BRIGHTON
6,BROOMFIELD
7,CASTLE ROCK
8,CENTENNIAL
9,CHERRY HILLS VILLAGE


In [163]:
print('Number of city in Denver Metropolitian Area: ', denmetro.shape[0])

Number of city in Denver Metropolitian Area:  45


In [179]:
#make the city names all lower case
denmetro['CITY'] = denmetro['CITY'].str.lower()
denmetro.head()

Unnamed: 0,CITY
0,arvada
1,aurora
2,bailey
3,black hawk
4,boulder


In [181]:
#make denmetro['CITY'] first letter capitalized like colorado['City'] data frame
denmetro['CITY'] = denmetro['CITY'].str.title()
denmetro.head()

Unnamed: 0,CITY
0,Arvada
1,Aurora
2,Bailey
3,Black Hawk
4,Boulder


### Now, filter colorado data set with just cities in Denver Metropolitian Area

In [187]:
denmetro_latlng = colorado[colorado['City'].isin(list(denmetro['CITY']))]
denmetro_latlng.head()

  if __name__ == '__main__':


Unnamed: 0,Zip,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,geopoint
200,80452,Idaho Springs,CO,39.737369,-105.56054,-7,1,"39.737369,-105.56054"
578,80201,Denver,CO,39.726303,-104.856808,-7,1,"39.726303,-104.856808"
594,80109,Castle Rock,CO,39.380857,-104.89947,-7,1,"39.380857,-104.89947"
739,80162,Littleton,CO,39.522014,-105.223945,-7,1,"39.522014,-105.223945"
862,80014,Aurora,CO,39.665637,-104.83421,-7,1,"39.665637,-104.83421"


In [189]:
denmetro_latlng.reset_index(drop=True, inplace=True)
denmetro_latlng.head()

Unnamed: 0,Zip,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,geopoint
0,80452,Idaho Springs,CO,39.737369,-105.56054,-7,1,"39.737369,-105.56054"
1,80201,Denver,CO,39.726303,-104.856808,-7,1,"39.726303,-104.856808"
2,80109,Castle Rock,CO,39.380857,-104.89947,-7,1,"39.380857,-104.89947"
3,80162,Littleton,CO,39.522014,-105.223945,-7,1,"39.522014,-105.223945"
4,80014,Aurora,CO,39.665637,-104.83421,-7,1,"39.665637,-104.83421"


In [209]:
denmetro_latlng.drop(['State','Timezone','Daylight savings time flag'], axis=1) #keep only neccesary columns

Unnamed: 0,Zip,City,Latitude,Longitude,geopoint
0,80452,Idaho Springs,39.737369,-105.560540,"39.737369,-105.56054"
1,80201,Denver,39.726303,-104.856808,"39.726303,-104.856808"
2,80109,Castle Rock,39.380857,-104.899470,"39.380857,-104.89947"
3,80162,Littleton,39.522014,-105.223945,"39.522014,-105.223945"
4,80014,Aurora,39.665637,-104.834210,"39.665637,-104.83421"
...,...,...,...,...,...
181,80020,Broomfield,39.934040,-105.054540,"39.93404,-105.05454"
182,80257,Denver,39.738752,-104.408349,"39.738752,-104.408349"
183,80044,Aurora,39.738752,-104.408349,"39.738752,-104.408349"
184,80205,Denver,39.758986,-104.966780,"39.758986,-104.96678"


In [191]:
print('Number of Zip code in Denver Metropolitian Area: ', denmetro_latlng.shape[0])

Number of Zip code in Denver Metropolitian Area:  186


### Denver and Metropolitian map

In [22]:
pip install folium

Collecting folium
  Downloading folium-0.11.0-py2.py3-none-any.whl (93 kB)
[K     |████████████████████████████████| 93 kB 3.9 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0
Note: you may need to restart the kernel to use updated packages.


In [24]:
import folium 
from geopy.geocoders import Nominatim 

import matplotlib.cm as cm
import matplotlib.colors as colors
print('imported')

imported


In [193]:
#get Denver latitude and longitude
address = 'Denver'

geolocator = Nominatim(user_agent="denver_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Denver are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Denver are 39.7392364, -104.9848623.


In [192]:
# create map of Denver using latitude and longitude values
map_den = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(denmetro_latlng['Latitude'], denmetro_latlng['Longitude'], denmetro_latlng['City']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_den)  
    
map_den

# Boba tea shops in Denver Metropolitan Area from Foursquare

In [194]:
CLIENT_ID = 'D1H4FH12E30Y1JILNCVUWKCE4K1VECBQQS3NXJWVG22EVQ2N' 
CLIENT_SECRET = 'S2BSNE2WKOUGLJH2RACEXC15KTV4JNMZ0JUDHM3JYJ3PPAXA' 
VERSION = '20180605' 
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: D1H4FH12E30Y1JILNCVUWKCE4K1VECBQQS3NXJWVG22EVQ2N
CLIENT_SECRET:S2BSNE2WKOUGLJH2RACEXC15KTV4JNMZ0JUDHM3JYJ3PPAXA


### We are looking for only Boba tea shops which Foursqare calls them 'Bubble Tea Shop'.
Category ID can be found from https://developer.foursquare.com/docs/build-with-foursquare/categories/.

In [202]:
LIMIT = 100
radius = 65000
categoryId = '52e81612bcbc57f1066b7a0c'
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius,
    categoryId,
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=D1H4FH12E30Y1JILNCVUWKCE4K1VECBQQS3NXJWVG22EVQ2N&client_secret=S2BSNE2WKOUGLJH2RACEXC15KTV4JNMZ0JUDHM3JYJ3PPAXA&v=20180605&ll=39.7392364,-104.9848623&radius=65000&categoryId=52e81612bcbc57f1066b7a0c&limit=100'

In [203]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5fcb1355b1c42f01051cc210'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Denver',
  'headerFullLocation': 'Denver',
  'headerLocationGranularity': 'city',
  'query': 'bubble tea',
  'totalResults': 33,
  'suggestedBounds': {'ne': {'lat': 40.32423698500059,
    'lng': -104.22551611028805},
   'sw': {'lat': 39.15423581499942, 'lng': -105.74420848971195}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '59593fd4029a55239dca639d',
       'name': 'Kung Fu Tea',
       'location': {'address': '6365 E Hampden Ave #102',
        'lat': 39.6539828,
        'lng': -104.9159009,
        'labeledLatLngs': [{'label': 'display',
          'lat': 39.6539828