# Coursera Capstone: Opening a High-End Bakery in Cambridge/ Somerville


## Introduction


### Project Goal 
One of my dream business opportunity would be to open a bakery. I live in the Cambridge/ Somerville area (right across the river from Boston, Massachusetts) and these neighborhoods have experiences a very steep increase in attractivity in the last ten years. While it was nearly impossible to find quality bread ten years ago, many high-end bakeries have opened since and the competition is now high. 

The purpose of this project is to find the perfect location for opening a bakery. It will have to be near an area with some already existing bakeries, but not too close to avoid direct competition. The bakery will also have to be within reasonable distance of the subway system (the MBTA "T"). A map will be used to summarize and display the results.

### Strategy
- Use the Foursquare API to locate bakeries in the area of interest. Filter the data to remove low rated business or franchises that are not "high-end".
- Use the geocoder to fetch the location of the subway stations.
- Devide the area of interest in sqaures of 200ft by 200ft.
- For each square, compute a score ranging from 0 to 100 that represents how relevant the location is to open a new bakery. The formula of the score is to be found via trial and error but it will be function of the distance to the subway stations and to other bakeries. 
- Display the results on a map.
- Select the 3 best locations.

## Imports

In [1]:
import numpy as np
import pandas as pd
import requests
import re
from bs4 import BeautifulSoup
#!pip install geocoder
import geocoder
#!pip install geopy
import geopy
import folium

## Data Collection

### Subway stations

In [2]:
master_radius = 400 # meters

subway_stations = [
    'harvard square cambridge ma',
    'alewife cambridge ma',
    'davis square somerville ma',
    'porter square cambridge ma',
    'kendall MIT cambridge ma',
    'central square cambridge ma']

center_of_map = (42.382524, -71.103405)

color_list = ['red', 'blue', 'gray', 'darkred', 'lightred', 'orange', 'beige', 'green', 'darkgreen', 'lightgreen', 'darkblue',
    'lightblue', 'purple', 'darkpurple', 'pink', 'cadetblue', 'lightgray', 'black']

def get_locations(list_of_addresses):
    # Returns a list of coordinates tuples [(latitude, longitude), (...,...)]
    locations = []
    locator = geopy.geocoders.Nominatim(user_agent='project')
    for address in list_of_addresses:
        print(address)
        position = locator.geocode(address) # MBTA Harvard T Station
        locations.append((position.latitude, position.longitude))
    return locations

subway_locations = get_locations(subway_stations)

harvard square cambridge ma
alewife cambridge ma
davis square somerville ma
porter square cambridge ma
kendall MIT cambridge ma
central square cambridge ma


In [12]:
def create_map(list_of_addresses, locations, radius, m=None, color='red'):
    if m == None:
        m = folium.Map(center_of_map, zoom_start=13)
    for address, (lat, long) in zip(list_of_addresses, locations):
        folium.Circle((lat, long), radius=radius, popup=address, color=color,
        fill=True, fill_color='white', fill_opacity=0.4, parse_html=False).add_to(m)  
    return m

m = folium.Map(center_of_map, zoom_start=13)
first_map = create_map(subway_stations, subway_locations, master_radius, m)
first_map

### Existing bakeries

In [4]:
# Foursquare Credentials
# Client ID and password are read form local text files for convenience and privacy.

CLIENT_ID = open('client.txt', 'r').readlines()[0].strip() # your Foursquare ID
CLIENT_SECRET = open('pass.txt', 'r').readlines()[0].strip() # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100

In [5]:
def get_bakeries(list_of_addresses, list_of_locations):
    counter = 0
    k = 0
    venues = pd.DataFrame(columns=['id', 'name', 'coords', 'distance', 'nearby', 'category'])
    for (lat, long), address in zip(list_of_locations, list_of_addresses):
        counter+=1
#         if counter != 6: continue
        url = 'https://api.foursquare.com/v2/venues/search?categoryId={}&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&LIMIT={}'.format( \
                    "4bf58dd8d48988d16a941735", CLIENT_ID, CLIENT_SECRET, VERSION, 
                    lat, long, 2*master_radius, LIMIT)
        df =  requests.get(url).json()
        
        for i in range(len(df['response']['venues'])):
            venues.loc[k+i,'id'] = df['response']['venues'][i]['id']
            venues.loc[k+i,'name'] = df['response']['venues'][i]['name']
            venues.loc[k+i,'coords'] = (df['response']['venues'][i]['location']['lat'],df['response']['venues'][i]['location']['lng'])
            venues.loc[k+i,'distance'] = df['response']['venues'][i]['location']['distance']
            venues.loc[k+i,'nearby'] = address
            venues.loc[k+i,'category'] = df['response']['venues'][i]['categories'][0]['name']
            
        k += len(df['response']['venues'])
    
    return venues

df = get_bakeries(subway_stations, subway_locations)


In [6]:
df

Unnamed: 0,id,name,coords,distance,nearby,category
0,53e51fda498e3e1c6aad9965,Mike's Pastry Harvard Square,"(42.37292864360548, -71.11883409902867)",61,harvard square cambridge ma,Dessert Shop
1,5bc8e3276bdee6002c1b6cc3,Swissbäkers,"(42.372349, -71.11863)",127,harvard square cambridge ma,Bakery
2,509b35fae4b0a491884ebae8,Insomnia Cookies,"(42.371952809161066, -71.1181750529613)",180,harvard square cambridge ma,Bakery
3,57f3cfdc498ede68f909b6cc,Tatte Bakery & Cafe,"(42.372794996708876, -71.11696614549051)",179,harvard square cambridge ma,Café
4,5818960c38fa5b61b65ab4b3,Flour Bakery + Cafe,"(42.3731171074856, -71.12234866100246)",282,harvard square cambridge ma,Bakery
5,5820bf08936f0238d304f010,Flour Bakery + Cafe,"(42.373104, -71.122481)",293,harvard square cambridge ma,Bakery
6,520b8640498ed9e96c84506c,Violette Bakers,"(42.36942083928121, -71.11129885306013)",773,harvard square cambridge ma,Bakery
7,5890e625ea1c0d43693f2db5,Sweet Cupcakes,"(42.3731959, -71.120002)",92,harvard square cambridge ma,Bakery
8,49f9f896f964a5209f6d1fe3,Petsi Pies,"(42.368948603337465, -71.11353746694681)",671,harvard square cambridge ma,Coffee Shop
9,51517a70e4b0fe2badd03caa,Panera Bread,"(42.3905074, -71.1405043)",620,alewife cambridge ma,Bakery


In [21]:
all_map = create_map(df['name'], df['coords'], 150, m=None, color='blue')
all_map

In [8]:
# Import cambridge data

cd0 = pd.read_csv('Envision_Cambridge_Mobile_Engagement_Feedback.csv') # source: https://data.cambridgema.gov/Planning/Envision-Cambridge-Mobile-Engagement-Feedback/sx7w-cut7/data
cd0.head(3)

Unnamed: 0,Date,Phase,Type,Location of Engagement,Location of Topic,Latitude of Topic,Longitude of Topic,Tag,Comment,cambridge_zipcodes,Police Neighborhood Regions,Police Response Districts,cambridge_neighborhoods,cambridge_cdd_zoning
0,06/01/2016,Listening,Favorite,Cambridge Learning Center,"(42.3911275, -71.1283529)\n(42.3911275, -71.12...",42.391128,-71.128353,,,1009.0,6.0,93.0,13.0,12.0
1,06/01/2016,Listening,Favorite,Cambridge Learning Center,"(42.3772195, -71.1169481)\n(42.3772195, -71.11...",42.37722,-71.116948,Institution,Harvard University,1637.0,9.0,33.0,9.0,228.0
2,06/01/2016,Listening,Favorite,Cambridge Learning Center,"(42.3771878, -71.1164761)\n(42.3771878, -71.11...",42.377188,-71.116476,Institution,Harvard University,1637.0,9.0,33.0,9.0,228.0


In [9]:
cd = cd0.loc[cd0.loc[:,'Type']=='Favorite',:]
cd.head(3)

Unnamed: 0,Date,Phase,Type,Location of Engagement,Location of Topic,Latitude of Topic,Longitude of Topic,Tag,Comment,cambridge_zipcodes,Police Neighborhood Regions,Police Response Districts,cambridge_neighborhoods,cambridge_cdd_zoning
0,06/01/2016,Listening,Favorite,Cambridge Learning Center,"(42.3911275, -71.1283529)\n(42.3911275, -71.12...",42.391128,-71.128353,,,1009.0,6.0,93.0,13.0,12.0
1,06/01/2016,Listening,Favorite,Cambridge Learning Center,"(42.3772195, -71.1169481)\n(42.3772195, -71.11...",42.37722,-71.116948,Institution,Harvard University,1637.0,9.0,33.0,9.0,228.0
2,06/01/2016,Listening,Favorite,Cambridge Learning Center,"(42.3771878, -71.1164761)\n(42.3771878, -71.11...",42.377188,-71.116476,Institution,Harvard University,1637.0,9.0,33.0,9.0,228.0


In [19]:
def create_map_alpha(list_of_addresses, locations, radius, m=None, color='red'):
    if m == None:
        m = folium.Map(center_of_map, zoom_start=13, tiles='stamentoner')
    for address, (lat, long) in zip(list_of_addresses, locations):
        folium.Circle((lat, long), radius=radius, popup=address, color=None,
        fill=True, fill_color=color, fill_opacity=0.04, parse_html=False).add_to(m)  
    return m

final = create_map_alpha(cd.loc[:,'Location of Engagement'], zip(cd.loc[:,'Latitude of Topic'], cd.loc[:,'Longitude of Topic']), 200, m=all_map, color='green')
final.save('final_map.html')

def embed_map(m):
    from IPython.display import IFrame

    m.save('index.html')
    return IFrame('index.html', width='100%', height='750px')

embed_map(final)