# The Starbucks Exploration

In [12]:
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import requests
from pandas.io.json import json_normalize
import folium

## Part 1 - Introduction

### Context

Thanks to Coursera and IBM, we are certified in Python and machine learning, and it did not take long for coffee company Starbucks to hire us as data scientists!  

Starbucks has only been growing bigger since its creation, and continuously open more stores across the world.  

In [13]:
url = "https://en.wikipedia.org/wiki/Starbucks#Locations"
req = requests.get(url)
soup = BeautifulSoup(req.content, 'html.parser')

table = soup.find_all('table')
df = pd.read_html(str(table))[1]
df.tail(7)

Unnamed: 0,Year,Revenuein mil. US$,Net incomein mil. US$,Total Assetsin mil. US$,AveragePrice per Sharein US$,Employees
8,2013,14867,8,11517,33.71,182000
9,2014,16448,2068,10753,37.78,191000
10,2015,19163,2757,12416,53.25,238000
11,2016,21316,2818,14313,56.59,254000
12,2017,22387,2885,14366,57.27,277000
13,2018,24720,4518,24156,57.5,291000
14,2019,26509,3599,19220,81.44,346000


According to this same wikipedia page, as of May 2020, Starbucks is present in over 30,000 locations, on 6 continents and 79 countries.

### Business Problem

Our mission is to keep this global expansion going by opening a new store, but the location has to be carefully chosen to guarantee success.

Our problem will be solved by studying the current stores locations. We will then choose a highly populous big city where Starbucks is not yet too present.

We will then try to find a more precise location within this city. In order to do so, we will select several successful Starbucks coffees and use Foursquare API to characterize their neighborhood, and try to find a similar location in our target city where there is no store yet!

## Part 2 - Data

### Store Locations

The first dataset that we will need can be found for free on [Kaggle](https://www.kaggle.com/starbucks/store-locations/data).  

It contains store locations and although it is not quite up to date, it will do a good job of getting us some rows to play with!

In [14]:
stores = pd.read_csv("starbucks_location.csv")
stores.head()

Unnamed: 0,Brand,Store Number,Store Name,Ownership Type,Street Address,City,State/Province,Country,Postcode,Phone Number,Timezone,Longitude,Latitude
0,Starbucks,47370-257954,"Meritxell, 96",Licensed,"Av. Meritxell, 96",Andorra la Vella,7,AD,AD500,376818720.0,GMT+1:00 Europe/Andorra,1.53,42.51
1,Starbucks,22331-212325,Ajman Drive Thru,Licensed,"1 Street 69, Al Jarf",Ajman,AJ,AE,,,GMT+04:00 Asia/Dubai,55.47,25.42
2,Starbucks,47089-256771,Dana Mall,Licensed,Sheikh Khalifa Bin Zayed St.,Ajman,AJ,AE,,,GMT+04:00 Asia/Dubai,55.47,25.39
3,Starbucks,22126-218024,Twofour 54,Licensed,Al Salam Street,Abu Dhabi,AZ,AE,,,GMT+04:00 Asia/Dubai,54.38,24.48
4,Starbucks,17127-178586,Al Ain Tower,Licensed,"Khaldiya Area, Abu Dhabi Island",Abu Dhabi,AZ,AE,,,GMT+04:00 Asia/Dubai,54.54,24.51


### Country Populations

We then will need some population data to be able to find out where Starbucks is not yet heavily present.

In [15]:
url = "https://www.worldometers.info/world-population/population-by-country/"
req = requests.get(url)
soup = BeautifulSoup(req.content, 'html.parser')

table = soup.find_all('table')
df = pd.read_html(str(table))[0]
df.head(10)

Unnamed: 0,#,Country (or dependency),Population (2020),Yearly Change,Net Change,Density (P/Km²),Land Area (Km²),Migrants (net),Fert. Rate,Med. Age,Urban Pop %,World Share
0,1,China,1439323776,0.39 %,5540090,153,9388211,-348399.0,1.7,38,61 %,18.47 %
1,2,India,1380004385,0.99 %,13586631,464,2973190,-532687.0,2.2,28,35 %,17.70 %
2,3,United States,331002651,0.59 %,1937734,36,9147420,954806.0,1.8,38,83 %,4.25 %
3,4,Indonesia,273523615,1.07 %,2898047,151,1811570,-98955.0,2.3,30,56 %,3.51 %
4,5,Pakistan,220892340,2.00 %,4327022,287,770880,-233379.0,3.6,23,35 %,2.83 %
5,6,Brazil,212559417,0.72 %,1509890,25,8358140,21200.0,1.7,33,88 %,2.73 %
6,7,Nigeria,206139589,2.58 %,5175990,226,910770,-60000.0,5.4,18,52 %,2.64 %
7,8,Bangladesh,164689383,1.01 %,1643222,1265,130170,-369501.0,2.1,28,39 %,2.11 %
8,9,Russia,145934462,0.04 %,62206,9,16376870,182456.0,1.8,40,74 %,1.87 %
9,10,Mexico,128932753,1.06 %,1357224,66,1943950,-60000.0,2.1,29,84 %,1.65 %


[Worldometers website](https://www.worldometers.info/world-population) also happens to have various data about city population, so we will be able to narrow down our search to a given large city of the target country.

### Finding Successful Stores and Characterize the Neighborhood

Foursquare API will be very useful to find which Starbucks stores have the most reviews, and hence are likely to be top spots in their respective cities!

We will then use the Foursquare API again to characterize the surroundings and try to find a similar neighborhood in our target city.

If need be, we will use Wikipedia to gather neighborhoods coordinates, as we have done for the Toronto Clustering Exercise.