# Location Options for Cafe Cycle Mobile Fleet
### Applied Data Science Capstone by IBM/Coursera

This Notebook is in partial completion of requirements for the [Applied Data Science Capstone Course](https://www.coursera.org/learn/applied-data-science-capstone) and the [IBM Data Science Professional Certificate](https://www.coursera.org/professional-certificates/ibm-data-science). It completes the requirement of the “**Capstone Project - The Battle of Neighborhoods**” to provide the analysis results from which the recommendations are based.

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)


## Introduction: Business Problem <a name="introduction"></a>

With the onslaught of the Coronavirus 19 pandemic, there is an increase in cycling activities for both families and individuals as persons change from indoor and group activities to outdoor and socially distanced activities. This increase in cyclists provides a likely increase in demand to provide specialty beverages (coffee) in an outdoor seating situation.

Café Cycle is a custom beverage provider that caters to the cycling community in Pittsburgh, Pennsylvania. While it has a main coffee shop in a fixed location in Pittsburgh, it recently acquired three new coffee trucks that allow the company to deliver the same products and services, along with an outdoor seating for comfort and safe social interaction.

Café Cycle would like to identify possible operating locations for its small fleet of trucks. Since it already has brand awareness within the Pittsburgh cycling community, it is looking for locations where there is a greater likelihood of cyclist to be present. It would also like to know of competitors in the area that might detract from their sales.

## Data <a name="data"></a>

Foursquare Developers' Places API allows applications to query its Foursquare global database of venue and user content associated with business locations. These APIs are used to pull data locations of where the Pittsburgh cycling community is more likely to frequent, and group these locations into possible vicinities to place their trucks. We will also identify the competitors in the vicinity that might affect their success.
Both the possible locations and the competitors will be displayed on a map using the visualization library, Folium. For the competitors, we will also provide a list with the name and address, location, distance to a group centers, and the overall rating of the business location. We will focus on any business within a 20-mile radius of the center of Pittsburgh (where the three rivers join). The venue categories that we will filter on are below.

Cycling Community Data: Venues associated with cycling

- Outdoors & Recreation --> Bike Trail
- Shop & Service --> Bike Shop
- Travel & Transport --> Bike Rental / Bike Share

Competitor Data: Venues associated with sit down coffee shops

- Food --> Café
- Food --> Coffee Shop
- Shop & Service --> Food & Drink Shop --> Coffee Roaster

## Methodology <a name="methodology"></a>

This analysis will use the following approach to develop its recommendations:
1.	Determine area to assess. In this case, it is any location within a 20 mile radius of the point where the three rivers of Pittsburgh join (Point State Park Fountain). Add as center of the Folium map.
2.	Retrieve from Foursquare all locations where there are biking related venues and add to a dataframe. Important data is the business location name, location, and venue.
3.	Create Folium map of center point with 20-mile radius and add all potential location centers. Add all cycling related venue locations. This map view is the location considerations view.
4.	Using the k-mean clustering for the geographic distances between objects, identify the center of 10 potential location centers. Create a dataframe with this list and the center location.
5.	For each potential location, identify all Competitor locations within a three-mile radius.
6.	Create a separate Folium map for each location that shows the center point, all cycling related locations and all competitors in the radius. 
7.	Create a rating for list for each location using a user defined criteria based on number of cycling related locations and number of competitors. Include a Google Maps link to the center location to assist in the site survey.

In [33]:
# Load Initial Libraries
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Libraries for Haversine Distance
import math

#import k-means from clustering stage
from sklearn.cluster import KMeans

# Import geopy library. Use pip for faster processing
# !conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
# from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
!pip install geopy
from geopy.geocoders import Nominatim

# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

# Import foliam library. Use pip for faster processing
# !conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
# import folium # map rendering library
!pip install folium
import folium
from folium.plugins import BeautifyIcon
from folium.features import DivIcon
from folium import FeatureGroup, LayerControl, Map, Marker

print('Initial Libraries imported.')

Initial Libraries imported.


## 1. Determine Area to Assess
In this case, it is any location within a 20 mile radius of the point where the three rivers of Pittsburgh join ([Point State Park Fountain](https://goo.gl/maps/uXDevGiQSZTCb6ss9)). This will be the center for the Locations Considerations View Folium map.

In [34]:
# Point State Park Fountain Latitude and Longitude
pgh_latitude = 40.442502   # pgh is abbrev for Pittsburgh
pgh_longitude = -80.012685
print('The geograpical coordinate of Point State Park Fountain, Pittsburgh, PA are {}, {}.'.format(pgh_latitude, pgh_longitude))

The geograpical coordinate of Point State Park Fountain, Pittsburgh, PA are 40.442502, -80.012685.


## 2. Retrieve all Biking Related Venues and Add to Dataframe
Retrieve from Foursquare all locations where there are biking related venues and add to a dataframe. Important data is the business location name, location, and venue.

In [69]:
# CLIENT_ID = 'Deleted' # your Foursquare ID
CLIENT_ID = '*************************' # your Foursquare ID
# CLIENT_SECRET = 'Deleted' # your Foursquare Secret
CLIENT_SECRET = '**************************' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: *************************
CLIENT_SECRET:**************************


In [36]:
print('The geograpical coordinate of Point State Park Fountain, Pittsburgh, PA are {}, {}.'.format(pgh_latitude, pgh_longitude))

# Variables that limit searches to specific venue categories. For reference https://developer.foursquare.com/docs/resources/categories
cycle_categoryIds = '56aa371be4b08b9a8d57355e,4bf58dd8d48988d115951735,4e4c9077bd41f78e849722f9' # Bike Trail, Bike Shop, Bike Rental / Bike Share.
coffee_categoryIds = '4bf58dd8d48988d1e0931735,5e18993feee47d000759b256' # Cafe/Coffee Shop, Coffee Roaster
combined_categoryIds = cycle_categoryIds+','+coffee_categoryIds

# Miles to Meters conversion is 1609 Meters in a Mile
mi_meter_conv = 1609

# Create the GET request URL. Name your URL url.
LIMIT = 200 # limit of number of venues returned by Foursquare API
radius = 20 * mi_meter_conv # define radius. In this case, it is a 20 mile radius

# create URL. This uses a search function.
url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    pgh_latitude, 
    pgh_longitude, 
    radius, 
    LIMIT,
    cycle_categoryIds)
#url # display URL

The geograpical coordinate of Point State Park Fountain, Pittsburgh, PA are 40.442502, -80.012685.


In [37]:
cycle_results = requests.get(url).json()
# cycle_results

In [38]:
# Clean the json and structure it into a pandas dataframe.
venues = cycle_results['response']['venues']
    
cycle_venues_df = json_normalize(venues) # flatten JSON

In [39]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venues.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
filtered_columns = ['name', 'categories','location.lat','location.lng','location.distance','id']
pgh_cycle_df = cycle_venues_df.loc[:, filtered_columns]

# filter the category for each row
pgh_cycle_df['categories'] = pgh_cycle_df.apply(get_category_type, axis=1)

# clean column names by keeping only last term
pgh_cycle_df.columns = [column.split('.')[-1] for column in pgh_cycle_df.columns]

# pgh_cycle_df
# Foursquare has a limit of 50 returns for their Search function. Luckily, there are only 48 cycling venue locations in Pgh area.

## 3. Create Locations Considerations View Map
Create Folium map of center point with 20-mile radius and add all potential location centers. Add all cycling related venue locations. This map view is the ***Location Considerations View***.

In [40]:
# Point State Park Fountain Latitude and Longitude
print('The geograpical coordinate of Point State Park Fountain, Pittsburgh, PA are {}, {}.'.format(pgh_latitude, pgh_longitude))

The geograpical coordinate of Point State Park Fountain, Pittsburgh, PA are 40.442502, -80.012685.


In [41]:
# Choose color of circle marker based on category type
def get_cat_color(cat):
    default_color = 'blue'
    if cat in ('Bike Shop', 'Sporting Goods Shop', 'Fishing Store'): # Biking related store
            return 'yellow'
    elif cat in ('Bike Trail', 'Park', 'Tour Provider'):  # Bike riding activities
            return 'green'
    elif cat in ('Bike Rental / Bike Share'):  # Bike rental facilities
            return 'purple'
    else: return default_color

In [42]:
# Create the Location Consideration view, identifying the region used for analyzing the potential locations to operate.

pgh_map = folium.Map(location=[pgh_latitude, pgh_longitude], zoom_start=10)

#### add a red circle marker to represent the a 20 mile circle around Point State Park
folium.Circle(
    [pgh_latitude, pgh_longitude],
    radius=20*mi_meter_conv,       # 20*mi_meter_conv
    color='red',
    popup='Zone of Consideration',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.4
).add_to(pgh_map)

#### add all venues from pgh_cycle_df as blue circle markers
for lat, lng, label in zip(pgh_cycle_df.lat, pgh_cycle_df.lng, pgh_cycle_df.categories):
    cat_color = get_cat_color(label)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color= cat_color,
        popup=label,
        fill = True,
        fill_color= cat_color,
        fill_opacity=0.6
    ).add_to(pgh_map)

#### add legend to map
# folium.Element(legend_html).add_to(pgh_map)

pgh_map

## 4. Determine the top 10 locations centers to consider
Now that we can see where our overall area of consideration around Pittsburgh for placing the cafe shops and the biking related venue locations in this area, we will now identify potential vicinities to locate the food trucks. To do this, we will use K-Means Clustering to find 10 possible locations (10 is an arbitrary number for consideration) that is closest to biking locations. The center of these clusters will be used for further analysis.

Using the k-mean clustering for the geographic distances between objects, identify the center of 10 potential location centers. Create a dataframe with this list and the center location.

In [44]:
# Analyze the Biking related locations to identify the cluster center for 10 locations.
# set number of clusters
kclusters = 10

pgh_grouped_clustering = pgh_cycle_df.drop(['name', 'categories', 'distance', 'id'], axis=1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(pgh_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([5, 0, 0, 9, 2, 0, 5, 5, 2, 0], dtype=int32)

In [45]:
# add clustering labels
pgh_cycle_df.insert(0, 'Cluster Labels', kmeans.labels_)

pgh_merged_sorted = pgh_cycle_df.sort_values('Cluster Labels')

# pgh_merged_sorted.head()

In [46]:
# Calculate the Mean of each Cluster
cluster_ctr_df = pgh_merged_sorted.groupby(['Cluster Labels']).mean()

del cluster_ctr_df['distance']   # Not required

cluster_ctr_df = cluster_ctr_df.reset_index()    # Add new index so I can reference the Cluster Labels column.

cluster_ctr_df.head(10)

Unnamed: 0,Cluster Labels,lat,lng
0,0,40.455327,-79.92834
1,1,40.413767,-80.206269
2,2,40.287226,-79.808273
3,3,40.590821,-79.679906
4,4,40.60671,-80.038295
5,5,40.429284,-80.012229
6,6,40.135228,-80.131575
7,7,40.551125,-80.193529
8,8,40.287698,-80.142206
9,9,40.41881,-79.752237


In [47]:
# visualize the resulting clusters

# create map
map_clusters = folium.Map(location=[pgh_latitude, pgh_longitude], zoom_start=10)

# create feature groups so the display can be filtered.
feature_group1 = FeatureGroup(name='Venue Locations')
feature_group2 = FeatureGroup(name='Possible Location Areas')

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(pgh_merged_sorted['lat'], pgh_merged_sorted['lng'], pgh_merged_sorted['name'], pgh_merged_sorted['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)

    # add all venue locations to the feature group 1
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(feature_group1)

# add circle for the 2 mile radius around each mean of the center cluster
markers_colors = []
for lat, lon, cluster in zip(cluster_ctr_df['lat'], cluster_ctr_df['lng'], cluster_ctr_df['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(cluster), parse_html=True)
    folium.Circle(
        [lat, lon],
        radius=2*mi_meter_conv,       # 2*mi_meter_conv
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.3).add_to(feature_group2)
    folium.Marker(
        [lat, lon], 
        icon=DivIcon(
            icon_size=(150,36),
            icon_anchor=(7,20),
            html='<div style="font-size: 18pt; color : black">' + str(cluster) + '</div>',
            )
        ).add_to(feature_group2)

feature_group1.add_to(map_clusters)
feature_group2.add_to(map_clusters)
LayerControl().add_to(map_clusters)

map_clusters

### 5. Determine Competitor Locations
For each potential location, identify all Competitor locations within a three-mile radius of the center consideration areas. These venue locations will be captured in a dataframe for later reference and analysis.

In [48]:
# Function to build list using Foursquare Explore
def getCoffeeVenues2(cluster, latitudes, longitudes, radius=4827):
    # print('The cluster number is {}, and latitudes, longitudes are {}, {}.'.format(cluster, latitudes, longitudes))
    
    # Step 1: Build URL
    coffee_cat = '4bf58dd8d48988d16d941735,5e18993feee47d000759b256'
    
    # create URL
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        latitudes, 
        longitudes,
        coffee_cat,
        radius, 
        LIMIT)
    # print(url)     # Used for debugging
    
    # Step 2: make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # Step 3: return only relevant information for each nearby venue
    venues_list1=[]
    venues_list1.append([(
        cluster, 
        latitudes, 
        longitudes, 
        v['venue']['name'], 
        v['venue']['location']['lat'], 
        v['venue']['location']['lng'],  
        v['venue']['categories'][0]['name']) for v in results])

    # Step 4: Create dataframe of data and add Column headings
    coffee_venues = pd.DataFrame([item for venue_list1 in venues_list1 for item in venue_list1])
    
    # If No data returned, do not try to change column heading name (will cause error)
    if coffee_venues.shape[0] != 0:

        coffee_venues.columns = ['Cluster Labels',
                                 'Cluster Latitude',
                                 'Cluster Longitude', 
                                 'Venue', 
                                 'Venue Latitude', 
                                 'Venue Longitude',
                                 'Venue Category']

        # Add the returned venues to the overall list of venues. This is to address limitations in number of returns.
        #coffee_venues_all = coffee_venues_all.append(coffee_venues)
        # print(coffee_venues['Cluster Labels'])    # Used for debugging
    
    
    return coffee_venues


In [49]:
# Create empty dataframe to hold all coffee venue locations by cluster
coffee_venues_all = pd.DataFrame({'Cluster Labels': [],
                         'Cluster Latitude': [],
                         'Cluster Longitude': [], 
                         'Venue': [], 
                         'Venue Latitude': [], 
                         'Venue Longitude': [],
                         'Venue Category': []})

# This command calls the getCoffeeVenues2 for each of the 10 cluster centers and appends the results to coffee_venues_all dataframe.
# For some reason, you need a Hard Return after the "for" line to make this work and not error with a "IndexError: list index out of range"
for cluster, lat, lng in zip(cluster_ctr_df['Cluster Labels'], cluster_ctr_df['lat'], cluster_ctr_df['lng']):
    
    coffee_by_cluster = getCoffeeVenues2(cluster, lat, lng)
    coffee_venues_all = coffee_venues_all.append(coffee_by_cluster)

# For some reason Cluster Labels is a Float. This will change it to an Int.
coffee_venues_all['Cluster Labels'] = coffee_venues_all['Cluster Labels'].fillna(0.0).astype(int)


In [50]:
# There are 10 clusters. This shape command shows that there were 100 venue locations for each cluster. 
# This is the max sent from Foursquare.
coffee_venues_all.shape

(243, 7)

### Rebuild the display with a the coffee shops displayed.
This view provides a quick look of which consideration regions have the greatest number of competitors, but is would present to much information to be of value when assessing individual areas. We will address this issue later.

In [51]:
# Rebuilds the Cluster Map with the additional feature group of coffee shops.

# create map
map_clusters_coffee = folium.Map(location=[pgh_latitude, pgh_longitude], zoom_start=10)

# create feature groups so the display can be filtered.
feature_group11 = FeatureGroup(name='Venue Locations')
feature_group21 = FeatureGroup(name='Possible Location Areas')
feature_group31 = FeatureGroup(name='Coffee Shops')

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


# add all venues from coffee_venues_all as white circle markers
for lat, lng, label in zip(coffee_venues_all['Venue Latitude'], coffee_venues_all['Venue Longitude'], coffee_venues_all['Venue']):
    folium.Marker(
        [lat, lng],
        popup=label,
        icon=folium.Icon(color='lightgray', icon='cutlery', prefix='fa')
    ).add_to(feature_group31)


# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(pgh_merged_sorted['lat'], pgh_merged_sorted['lng'], pgh_merged_sorted['name'], pgh_merged_sorted['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)

    # add all venue locations to the feature group 1
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(feature_group11)

# add circle for the 2 mile radius around each mean of the center cluster
markers_colors = []
for lat, lon, cluster in zip(cluster_ctr_df['lat'], cluster_ctr_df['lng'], cluster_ctr_df['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(cluster), parse_html=True)
    folium.Circle(
        [lat, lon],
        radius=2*mi_meter_conv,       # 2*mi_meter_conv
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.3).add_to(feature_group21)
    folium.Marker(
        [lat, lon], 
        icon=DivIcon(
            icon_size=(150,36),
            icon_anchor=(7,20),
            html='<div style="font-size: 18pt; color : black">' + str(cluster) + '</div>',
            )
        ).add_to(feature_group21)
    
    
feature_group11.add_to(map_clusters_coffee)
feature_group21.add_to(map_clusters_coffee)
feature_group31.add_to(map_clusters_coffee)
LayerControl().add_to(map_clusters_coffee)

map_clusters_coffee

## Results and Discussion <a name="results"></a>

### 6. Create a separate Folium map for each location that shows the center point, all cycling related locations and all competitors in the radius
The purpose of our analysis was to capture the necessary data to be able to assess each potential area separately. Once a map reconnisance of the area is completed, individuals will move move to the actual ground to perform a site survey. These maps provide the means to identify potential areas to operate the coffee trailers to allow easier and faster on the ground assessments.

There are 47 Pittsburgh area venue locations retrieved from Foursquare that identify either their business as cycling related or are locations where the activity is primarily as a biking trailhead. Using the assumption that cycling related venues are more likely to be established in areas where cycling is popular, or to attract more cyclist to the area, these points provide locations from which Café Cycle will be able to identify target areas to consider locating the coffee truck cafes.

The number of 10 clusters was an arbitrary number chosen to create enough options to consider, but not so many that the staff would not be able to conduct site visits. Of the 10 areas, two were such a distance from Pittsburgh and in relatively rural settings that only one cycling activity was available (Area 7 and 9). These two locations, identified as the least favorable locations, also had no identified coffee shops within its three mile radius. 

As expected, Coffee Shops were prevalent in large numbers in the two downtown locations (Areas 5 and 9). Due to retrieval limitations with Foursquare, the results of competitor venues were limited to a maximum 100 returns. This limitation would be addressed in an analysis that would be used for making a financial decision. Areas 1, 6, 3, and 8 had more than three cycling venues and either a small or moderate number of competitor coffee shops. Since these locations all surround the downtown Pittsburgh region where the population and traffic is less dense and cycling activities are more likely to occur, they may prove to be the optimum location to setup the coffee trucks. 

In [52]:
# Rebuilds the Cluster Map with the additional feature group of coffee shops.
def build_cluster_map(clust_num):
    
    clust_num_str = str(clust_num)

    coffee_venues_clust = coffee_venues_all[coffee_venues_all['Cluster Labels']==clust_num]

    clus_lat = cluster_ctr_df.loc[clust_num,'lat']
    clus_lng = cluster_ctr_df.loc[clust_num,'lng']
    
    title_html = '''
             <h3 align="center" style="font-size:20px"><b>Location Option '''+clust_num_str+'''</b></h3>
             '''

    # create map
    map_clusters_coffee_0 = folium.Map(location=[clus_lat, clus_lng], zoom_start=13)

    # create feature groups so the display can be filtered.
    feature_group11 = FeatureGroup(name='Venue Locations')
    feature_group21 = FeatureGroup(name='Possible Location Areas')
    feature_group31 = FeatureGroup(name='Coffee Shops')

    # set color scheme for the clusters
    x = np.arange(kclusters)
    ys = [i + x + (i*x)**2 for i in range(kclusters)]
    colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
    rainbow = [colors.rgb2hex(i) for i in colors_array]
    
    # Add Map Title
    map_clusters_coffee_0.get_root().html.add_child(folium.Element(title_html))
    
    
    # add markers to the map
    markers_colors = []
    for lat, lon, poi, cluster in zip(pgh_merged_sorted['lat'], pgh_merged_sorted['lng'], pgh_merged_sorted['name'], pgh_merged_sorted['Cluster Labels']):
        label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)

        # add all venue locations to the feature group 1
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[cluster-1],
            fill=True,
            fill_color=rainbow[cluster-1],
            fill_opacity=0.7).add_to(feature_group11)

    # add circle for the 2 mile radius around each mean of the center cluster
    markers_colors = []
    for lat, lon, cluster in zip(cluster_ctr_df['lat'], cluster_ctr_df['lng'], cluster_ctr_df['Cluster Labels']):
        label = folium.Popup('Cluster ' + str(cluster), parse_html=True)
        folium.Circle(
            [lat, lon],
            radius=2*mi_meter_conv,       # 2*mi_meter_conv
            popup=label,
            color=rainbow[cluster-1],
            fill=True,
            fill_color=rainbow[cluster-1],
            fill_opacity=0.3).add_to(feature_group21)
        folium.Marker(
            [lat, lon], 
            icon=DivIcon(
                icon_size=(150,36),
                icon_anchor=(7,20),
                html='<div style="font-size: 18pt; color : black">' + str(cluster) + '</div>',
                )
            ).add_to(feature_group21)


    # add all venues from coffee_venues_all as white circle markers
    for lat, lng, label in zip(coffee_venues_clust['Venue Latitude'], coffee_venues_clust['Venue Longitude'], coffee_venues_clust['Venue']):
        folium.Marker(
            [lat, lng],
            popup=label,
            icon=folium.Icon(color='lightgray', icon='cutlery', prefix='fa')
        ).add_to(feature_group31)


    feature_group11.add_to(map_clusters_coffee_0)
    feature_group21.add_to(map_clusters_coffee_0)
    feature_group31.add_to(map_clusters_coffee_0)
    LayerControl().add_to(map_clusters_coffee_0)

    return map_clusters_coffee_0


### Now to Examine Each Cluster Area Separately

In [53]:
location_option = build_cluster_map(0)
location_option

In [54]:
location_option = build_cluster_map(1)
location_option

In [55]:
location_option = build_cluster_map(2)
location_option

In [56]:
location_option = build_cluster_map(3)
location_option

In [57]:
location_option = build_cluster_map(4)
location_option

In [58]:
location_option = build_cluster_map(5)
location_option

In [59]:
location_option = build_cluster_map(6)
location_option

In [60]:
location_option = build_cluster_map(7)
location_option

In [61]:
location_option = build_cluster_map(8)
location_option

In [62]:
location_option = build_cluster_map(9)
location_option

## 7. Create Location Option Priority List
Create a rating for list for each location using a user defined criteria based on number of cycling related locations, number of competitors. Include a Google Maps link to the center point.

We will create a dataframe that lists each cluster region along with the following additional values
- Number of Cycling Activities
- Number of Cafes
- Overall Score (10 * Num of Cycling Activities - Number of Cafes)
- Rank order by Overall Score

In [63]:
# Copy Cluster dataframe
location_option_assess = cluster_ctr_df

In [64]:
# Get count of Cycling Locations within a cluster center
count_cycle_df = pgh_merged_sorted.groupby(['Cluster Labels']).count()

# Get count of coffee shop Locations within a 3 mile radius of the cluster center.
count_coffee_df = coffee_venues_all.groupby(['Cluster Labels']).count()
count_coffee_df = count_coffee_df.drop(['Cluster Latitude', 'Cluster Longitude', 'Venue Category', 'Venue Latitude', 'Venue Longitude'], axis=1)
count_coffee_df = count_coffee_df.rename(columns={'Venue': 'Coffee Venues Count'})
count_coffee_df['Coffee Venues Count'] = count_coffee_df['Coffee Venues Count'].astype(int)

# Merge columns to primary dataframe
location_option_assess['Cycle Venues Count'] = count_cycle_df['id'].values  # Same number of rows on both dataframes.
location_option_assess = pd.merge(location_option_assess, count_coffee_df, on='Cluster Labels', how='left')
location_option_assess = location_option_assess.fillna(0)
location_option_assess['Coffee Venues Count'] = location_option_assess['Coffee Venues Count'].astype(int)

# Add Area Name
location_option_assess['Area Name']= location_option_assess.agg(lambda x: f"Area {x['Cluster Labels'].astype(int)}", axis=1)


In [65]:
# Total Score 
# We will use a simplified approach for computing an Overall Score. Since we are
# targeting cyclists for our predominant customer base, those cycling venues within
# the cluster, receive 10 points. For every coffee shop within the 3 mile 
# radius, we subtract a point. The sum of these points will be the overall score.

location_option_assess = location_option_assess.assign(Area_Score = location_option_assess['Cycle Venues Count']*10 + location_option_assess['Coffee Venues Count']-1)
location_option_assess = location_option_assess.sort_values(by='Area_Score', ascending=False)


In [66]:
# Add Google Maps Location Format like "https://www.google.com/maps/search/?api=1&query=40.287226,-79.808273"
location_option_assess['GoogleMapLink']= ['https://www.google.com/maps/search/?api=1&query=' +str(la) + ',' 
                                          + str(lo) for la, lo in zip(location_option_assess['lat'], location_option_assess['lng'])]


In [67]:
# Format table for Presentation   df1 = df[['a','b']]
formatted_options_table = location_option_assess[['Area Name', 'Cycle Venues Count', 'Coffee Venues Count','Area_Score', 'GoogleMapLink']]


In [68]:
# Location Summary Assessment and Priority List
formatted_options_table.head(10)

Unnamed: 0,Area Name,Cycle Venues Count,Coffee Venues Count,Area_Score,GoogleMapLink
5,Area 5,14,100,239,https://www.google.com/maps/search/?api=1&quer...
0,Area 0,11,100,209,https://www.google.com/maps/search/?api=1&quer...
4,Area 4,5,14,63,https://www.google.com/maps/search/?api=1&quer...
2,Area 2,6,1,60,https://www.google.com/maps/search/?api=1&quer...
1,Area 1,3,3,32,https://www.google.com/maps/search/?api=1&quer...
9,Area 9,2,11,30,https://www.google.com/maps/search/?api=1&quer...
7,Area 7,2,7,26,https://www.google.com/maps/search/?api=1&quer...
8,Area 8,2,4,23,https://www.google.com/maps/search/?api=1&quer...
3,Area 3,2,3,22,https://www.google.com/maps/search/?api=1&quer...
6,Area 6,1,0,9,https://www.google.com/maps/search/?api=1&quer...


## Conclusion <a name="conclusion"></a>

The analysis presented in this study does not address all factors necessary for Café Cycle to decide on a location. Other factors such as population density, affluence, and age of the population are also indicators of potential demand. In addition, the availability of adequate space to locate a mobile truck and the cost for use of these locations are important considerations when making a final decision. 

Missing from the data that we used for our analysis are other venues that would attract cyclist. For example, public parks with cycling trails or locations along the Rails to Trails where persons can enter the trailheads are not listed. Further efforts to map these locations and input them into the model will provide a more reliable indicator of demand.

Lastly, it will take Café Cycles leadership to physically assess each area to make the right choice. The advantage of this new mode of delivering Café Cycle's product to its expanded customer base is that if the expected revenues are not fully recognized after operating for a period of time at a location, they have the ability to easily relocate their operations to a new location.