 Introduction

You are a Starbucks big data analyst ([that’s a real job!](https://www.forbes.com/sites/bernardmarr/2018/05/28/starbucks-using-big-data-analytics-and-artificial-intelligence-to-boost-performance/#130c7d765cdc)) looking to find the next store into a [Starbucks Reserve Roastery](https://www.businessinsider.com/starbucks-reserve-roastery-compared-regular-starbucks-2018-12#also-on-the-first-floor-was-the-main-coffee-bar-five-hourglass-like-units-hold-the-freshly-roasted-coffee-beans-that-are-used-in-each-order-the-selection-rotates-seasonally-5).  These roasteries are much larger than a typical Starbucks store and have several additional features, including various food and wine options, along with upscale lounge areas.  You'll investigate the demographics of various counties in the state of California, to determine potentially suitable locations.


In [10]:
import math
import pandas as pd
import geopandas as gpd
from geopy.geocoders import Nominatim

import folium
from folium import Marker
from folium.plugins import MarkerCluster

In [4]:
root_path="/home/pliu/data_set/kaggle/geospatial/L04"

In [5]:
# Load and preview Starbucks locations in California
starbucks = pd.read_csv(f"{root_path}/starbucks_locations.csv")
starbucks.head()

Unnamed: 0,Store Number,Store Name,Address,City,Longitude,Latitude
0,10429-100710,Palmdale & Hwy 395,14136 US Hwy 395 Adelanto CA,Adelanto,-117.4,34.51
1,635-352,Kanan & Thousand Oaks,5827 Kanan Road Agoura CA,Agoura,-118.76,34.16
2,74510-27669,Vons-Agoura Hills #2001,5671 Kanan Rd. Agoura Hills CA,Agoura Hills,-118.76,34.15
3,29839-255026,Target Anaheim T-0677,8148 E SANTA ANA CANYON ROAD AHAHEIM CA,AHAHEIM,-117.75,33.87
4,23463-230284,Safeway - Alameda 3281,2600 5th Street Alameda CA,Alameda,-122.28,37.79


# Q1 Clean the data
Most of the stores have known (latitude, longitude) locations.  But, all the locations in the city of Berkeley are missing. You need to impute the latitude, and longitude column of this starbucks stores.

In [7]:
# How many rows in each column have missing values?
print(starbucks.isnull().sum())



Store Number    0
Store Name      0
Address         0
City            0
Longitude       5
Latitude        5
dtype: int64


In [14]:
# View rows with missing locations
rows_with_missing = starbucks[starbucks["City"]=="Berkeley"]
rows_with_missing

Unnamed: 0,Store Number,Store Name,Address,City,Longitude,Latitude
153,5406-945,2224 Shattuck - Berkeley,2224 Shattuck Avenue Berkeley CA,Berkeley,-122.26823,37.868839
154,570-512,Solano Ave,1799 Solano Avenue Berkeley CA,Berkeley,-122.280014,37.891477
155,17877-164526,Safeway - Berkeley #691,1444 Shattuck Place Berkeley CA,Berkeley,-122.269869,37.881177
156,19864-202264,Telegraph & Ashby,3001 Telegraph Avenue Berkeley CA,Berkeley,-122.259526,37.855799
157,9217-9253,2128 Oxford St.,2128 Oxford Street Berkeley CA,Berkeley,-122.266095,37.870253


In [11]:
geolocator = Nominatim(user_agent="kaggle_learn")
location = geolocator.geocode("2224 Shattuck Avenue Berkeley CA")

In [12]:
print(location.point)
print(location.address)

37 52m 7.8222s N, 122 16m 5.62872s W
Starbucks, 2224, Shattuck Avenue, Downtown Berkeley, Berkeley, Alameda County, CAL Fire Northern Region, California, 94701, United States


In [13]:
# this function takes an address then return the lat, and long
def my_geocoder(row):
    point = geolocator.geocode(row).point
    return pd.Series({'Latitude': point.latitude, 'Longitude': point.longitude})

berkeley_locations = rows_with_missing.apply(lambda x: my_geocoder(x['Address']), axis=1)
starbucks.update(berkeley_locations)

# Q2 2) View Berkeley locations.

Let's take a look at the locations you just found.  Visualize the (latitude, longitude) locations in Berkeley in the OpenStreetMap style.

In [24]:
m_2 = folium.Map(location=[37.88,-122.26], zoom_start=12)


for idx, row in starbucks[starbucks["City"]=="Berkeley"].iterrows():
    Marker([row['Latitude'], row['Longitude']], popup=row['Store Name']).add_to(m_2)

m_2