# Adding Pickup Locations, Drones, and Using BART


## Context

Adding more pickup locations may help to grow the customer base and increase the frequency at which customers purchase meals. This would necessarily entail renting or purchasing property and/or renovating space to open these additional pickup locations.

Since the business would be considering longer term leases or purchases with potential costly renovations, we need to choose locations which are future proof.

Locations near BART stations would be good choices because riders could easily pick up meals at or near the stations they travel through on the way to or from work.

[add stuff for drones]

[add stuff for public transit]


## Methodology

We will examine each station's betweenness centrality and surrounding population (within a 1.5 mile radius, the delivery range for a drone). Betweenness centrality will indicate the number of routes which pass through that station. Stations with dense surrounding populations and high betweenness centrality are likely good candidates within which to open pickup locations because this would not only enable us to capture customers with drone deliveries based from those stations, but also commuters who enter, exit, or otherwise pass through those stations.

We will start by identifying a list of stations which have (1) denser surrounding populations and (2) higher betweeness centrality compared to the Berkeley store. Next, we will refine the list of potential pickup locations to minimize travel time between the stations and the Ashby station, as meals will be prepared at the Berkeley store and transported via BART to the pickup locations. Finally, we will identify the potential market share we could capture with an additional 1.5 mile reach using a drone delivery option (where the drones are based out of the pickup location). We will select 3 BART stations from this final list within which to establish pickup locations.

# Included Modules and Packages

In [2]:
import neo4j

import csv
import json

import math
import numpy as np
import pandas as pd

import psycopg2
from geographiclib.geodesic import Geodesic

import warnings
warnings.filterwarnings("ignore")

import gmaps
import gmaps.geojson_geometries

# Supporting Code

In [3]:
# Connect to Neo4j

driver = neo4j.GraphDatabase.driver(uri="neo4j://neo4j:7687", auth=("neo4j","w205"))
session = driver.session(database="neo4j")

# Connect to PostgreSQL

connection = psycopg2.connect(
    user = "postgres",
    password = "ucb",
    host = "postgres",
    port = "5432",
    database = "postgres"
)

cursor = connection.cursor()

In [4]:
# function to run a select query and return rows in a pandas dataframe
# pandas puts all numeric values from postgres to float
# if it will fit in an integer, change it to integer

def my_select_query_pandas(query, rollback_before_flag, rollback_after_flag):
    "function to run a select query and return rows in a pandas dataframe"
    
    if rollback_before_flag:
        connection.rollback()
    
    df = pd.read_sql_query(query, connection)
    
    if rollback_after_flag:
        connection.rollback()
    
    # fix the float columns that really should be integers
    
    for column in df:
    
        if df[column].dtype == "float64":

            fraction_flag = False

            for value in df[column].values:
                
                if not np.isnan(value):
                    if value - math.floor(value) != 0:
                        fraction_flag = True

            if not fraction_flag:
                df[column] = df[column].astype('Int64')
    
    return(df)

In [5]:
def my_calculate_box(point, miles):
    "Given a point and miles, calculate the box in form left, right, top, bottom"
    
    geod = Geodesic.WGS84

    kilometers = miles * 1.60934
    meters = kilometers * 1000

    g = geod.Direct(point[0], point[1], 270, meters)
    left = (g['lat2'], g['lon2'])

    g = geod.Direct(point[0], point[1], 90, meters)
    right = (g['lat2'], g['lon2'])

    g = geod.Direct(point[0], point[1], 0, meters)
    top = (g['lat2'], g['lon2'])

    g = geod.Direct(point[0], point[1], 180, meters)
    bottom = (g['lat2'], g['lon2'])
    
    return(left, right, top, bottom)

In [6]:
def my_station_get_zips(station, miles):
    "given a station, pull all zip codes with miles distance, print them, sum the population"
    
    connection.rollback()
    
    query = "select latitude, longitude from stations "
    query += "where station = '" + station + "'"
    
    cursor.execute(query)
    
    connection.rollback()
    
    rows = cursor.fetchall()
    
    for row in rows:
        latitude = row[0]
        longitude = row[1]
        
    point = (latitude, longitude)
        
    (left, right, top, bottom) = my_calculate_box(point, miles)
    
    query = "select zip, population from zip_codes "
    query += " where latitude >= " + str(bottom[0])
    query += " and latitude <= " + str(top [0])
    query += " and longitude >= " + str(left[1])
    query += " and longitude <= " + str(right[1])
    query += " order by 1 "

    cursor.execute(query)
    
    connection.rollback()
    
    rows = cursor.fetchall()
    
    total_population = 0
    
    for row in rows:
        zip, population = row[0], row[1]
        total_population += population
    return float(total_population)  

In [7]:
def my_station_get_zip_list(station, miles):
    "given a station, pull all zip codes with miles distance, print them, sum the population"
    
    connection.rollback()
    
    query = "select latitude, longitude from stations "
    query += "where station = '" + station + "'"
    
    cursor.execute(query)
    
    connection.rollback()
    
    rows = cursor.fetchall()
    
    for row in rows:
        latitude = row[0]
        longitude = row[1]
        
    point = (latitude, longitude)
        
    (left, right, top, bottom) = my_calculate_box(point, miles)
    
    query = "select zip, population from zip_codes "
    query += " where latitude >= " + str(bottom[0])
    query += " and latitude <= " + str(top [0])
    query += " and longitude >= " + str(left[1])
    query += " and longitude <= " + str(right[1])
    query += " order by 1 "

    cursor.execute(query)
    
    connection.rollback()
    
    rows = cursor.fetchall()
    
    total_population = 0
    
    zip_list = []
    
    for row in rows:
        zip = row[0]
        population = row[1]
        total_population += population
        zip_list.append(row[0])
    return zip_list

In [8]:
def cleanse_stations(df):
    """Returns a data frame with unique station names cleansed of line(s) and depart, arrive"""
    
    words = ["blue", "green", "orange", "red", "yellow", "orange", "gray", "depart", "arrive"]
    regex_pattern = r'\b(?:{})\b'.format('|'.join(words))
    df["name"] = df["name"].str.replace(regex_pattern, '')
    return df

In [9]:
def my_neo4j_run_query_pandas(query, **kwargs):
    "run a query and return the results in a pandas dataframe"
    
    result = session.run(query, **kwargs)
    
    df = pd.DataFrame([r.values() for r in result], columns=result.keys())
    
    return df

In [10]:
## Import Google Maps API key
f = open('../gmap_api_key.txt', 'r')
my_api_key = f.read()
f.close()

gmaps.configure(api_key=my_api_key)

# Generate Data Frame for Analysis

In [10]:
rollback_before_flag = True
rollback_after_flag = True

query = """

select station,
        latitude,
        longitude
from stations
order by station

"""

df = my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)

##### Add population within 1.5 miles of each station, which is the delivery range for a drone

In [11]:
df["pop_1_5"] = [my_station_get_zips(station, 1.5) for station in df["station"]]

##### Add degree centrality, which measures the number of incoming and outgoing connections. High degree centrality indicates that the station connects with many others.

In [12]:
# Degree centrality for the connected graph

query = """

CALL gds.degree.stream('ds_graph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score as degree
ORDER BY degree DESC, name

"""

deg_df = my_neo4j_run_query_pandas(query)

# Remove the line and depart / arrive designations

deg_df = cleanse_stations(deg_df)

# Keep the entry for each station with the maximum degree centrality

deg_df = deg_df.groupby(["name"])["degree"].max()
deg_df = deg_df.to_frame()

# Add degree centrality to df

df.set_index("station", inplace=True)
df["degree_centrality"] = deg_df["degree"].values

##### Add betweenness centrality, which measures the number of paths which pass through a node (station). High betweenness centrality for a station indicates a high number of paths which pass through that station.

In [13]:
# Betweenness centrality

query = """

CALL gds.betweenness.stream('ds_graph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score as betweenness
ORDER BY betweenness DESC

"""

bet_df = my_neo4j_run_query_pandas(query)

# Remove the line and depart / arrive designations

bet_df = cleanse_stations(bet_df)

# Keep the entry for each station with the maximum betweenness centrality

bet_df = bet_df.groupby(["name"])["betweenness"].max()
bet_df = bet_df.to_frame()

# Add degree centrality to df

df["bet_centrality"] = bet_df["betweenness"].values

##### Add PageRank for each station, which measures the influence of that station in the graph. High PageRank indicates an influential station in the BART map.

In [14]:
# PageRank for each station

query = """

CALL gds.pageRank.stream('ds_graph',
                         { maxIterations: $max_iterations,
                           dampingFactor: $damping_factor}
                         )
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score as page_rank
ORDER BY page_rank DESC, name ASC

"""

max_iterations = 20
damping_factor = 0.05

pr_df = my_neo4j_run_query_pandas(query, max_iterations=max_iterations, damping_factor=damping_factor)

# Remove the line and depart / arrive designations

pr_df = cleanse_stations(pr_df)

# Keep the entry for each station with the maximum page rank

pr_df = pr_df.groupby(["name"])["page_rank"].max()
pr_df = pr_df.to_frame()

# Add degree centrality to df

df["page_rank"] = pr_df["page_rank"].values

##### Impute population values for Antioch, Milpitas, OAK, and Pittsburg

In [15]:
rollback_before_flag = True
rollback_after_flag = True

query = """

select *
from zip_codes

"""

temp = my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)

In [16]:
# Using the zip_codes table, find the population for each of the four corresponding zip codes

antioch_station_zip = "94509"
milpitas_station_zip = "95035"
OAK_station_zip = "94621"
pittsburg_station_zip = "94565"

antioch_pop = int(temp.loc[temp["zip"] == antioch_station_zip, "population"])
milpitas_pop = int(temp.loc[temp["zip"] == milpitas_station_zip, "population"])
OAK_pop = int(temp.loc[temp["zip"] == OAK_station_zip, "population"])
pittsburg_pop = int(temp.loc[temp["zip"] == pittsburg_station_zip, "population"])

In [17]:
# Assign the population values back to the data frame

df.loc[df.index=="Antioch", "pop_1_5"] = antioch_pop
df.loc[df.index=="Milpitas", "pop_1_5"] = milpitas_pop
df.loc[df.index=="OAK", "pop_1_5"] = OAK_pop
df.loc[df.index=="Pittsburg", "pop_1_5"] = pittsburg_pop

# Analysis

## Identify which stations look like good candidates for a pickup location

##### Start by finding which stations have higher population (1.5 mile radius) and betweenness centrality values than Berkeley's

In [18]:
# Create values for Downtown Berkeley

berk_pop_1_5 = df.loc[df.index == "Downtown Berkeley", "pop_1_5"][0]
berk_bet_cent = df.loc[df.index == "Downtown Berkeley", "bet_centrality"][0]

In [19]:
# Identify the candidate stations

df[(df["pop_1_5"] > berk_pop_1_5) & 
   (df["bet_centrality"] > berk_bet_cent)]

Unnamed: 0_level_0,latitude,longitude,pop_1_5,degree_centrality,bet_centrality,page_rank
station,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
16th Street Mission,37.764847,-122.420042,263312.0,6.0,3010.550494,1.003696
24th Street Mission,37.752,-122.4187,197241.0,6.0,2829.403538,1.003696
Ashby,37.853068,-122.269957,124290.0,4.0,2460.860672,1.009097
Balboa Park,37.721667,-122.4475,106589.0,6.0,2437.338289,1.005317
Civic Center,37.779861,-122.413498,242275.0,6.0,3180.147417,1.003696
Daly City,37.706224,-122.468934,166169.0,6.0,2242.220328,1.011201
Embarcadero,37.793056,-122.397222,170877.0,6.0,3648.987775,1.00371
Glen Park,37.733118,-122.433808,253123.0,6.0,2637.248955,1.003709
Montgomery Street,37.789355,-122.401942,178168.0,6.0,3492.402727,1.003696
Powell Street,37.784,-122.408,207857.0,6.0,3339.4838,1.003696


## Refine the candidate stations based on accessibility to the Downtown Berkeley station.

##### Travel time between Downtown Berkeley and the new pickup locations should be minimized, as food will be prepared at the Berkeley store location and carried to the new pickup locations via public transit.

## Select 3 stations that offer the largest potential new customer market

##### A drone delivery option will enable us to capture additional customers within a 1.5 mile radius of pickup locations

In [11]:
rollback_before_flag = True
rollback_after_flag = True

# Get population by zip code
query = """
select zip, population
from zip_codes
where state = 'CA'
"""

zip_codes = my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)
zip_codes

# Calculate ranges of populations in zip codes
zip_code_quantiles = zip_codes['population'].quantile([0.2,0.4,0.6,0.8])

In [12]:
# Get geojson data for California
f = open('../data/geojson_data/ca_california_geojson.json')
ca_customer_zip_geojson = json.load(f)
f.close()

In [14]:
berkeley_store = (37.8555, -122.2604)

fig = gmaps.figure(center=berkeley_store, zoom_level=12)

# Determine the correct color for each zip based on population
# The populations are in the zip_codes table in the DB,
# we need to match each of the zips in the geojson file with the populations in the zip_codes table
colors = [
    (220,220,220) if len(zip_codes[zip_codes['zip'] == zip_code['properties']['ZCTA5CE10']]['population']) == 0 
    else (233,62,58) if zip_codes[zip_codes['zip'] == zip_code['properties']['ZCTA5CE10']]['population'].iloc[0] > zip_code_quantiles.iloc[3]
    else (237,104,60) if zip_codes[zip_codes['zip'] == zip_code['properties']['ZCTA5CE10']]['population'].iloc[0] > zip_code_quantiles.iloc[2]
    else (243,144,63) if zip_codes[zip_codes['zip'] == zip_code['properties']['ZCTA5CE10']]['population'].iloc[0] > zip_code_quantiles.iloc[1]
    else (253,199,12) if zip_codes[zip_codes['zip'] == zip_code['properties']['ZCTA5CE10']]['population'].iloc[0] > zip_code_quantiles.iloc[0]
    else (255,243,59)
    for zip_code in ca_customer_zip_geojson['features']
]

In [15]:
geojson_layer = gmaps.geojson_layer(ca_customer_zip_geojson, fill_color=colors)

fig.add_layer(geojson_layer)

fig

Figure(layout=FigureLayout(height='420px'))

In [47]:
civic_center = (37.779861,-122.413498)
sixteenth_st=(37.764847,-122.420042)
glen_park = (37.733118,-122.433808)
lake_merritt = (37.797773,-122.266588)
fruitvale=(37.774800,-122.224100)
san_leandro=(37.721764,-122.160684)
hayward = (37.669700,-122.087000)

fig = gmaps.figure(center=fruitvale, zoom_level=11)
drawing = gmaps.drawing_layer(features=[
    gmaps.Circle(
        radius=2414,  # 1.5 miles in meters
        center=berkeley_store,
        stroke_color='red', fill_color=(255, 0, 132)
    ),
    gmaps.Circle(
        radius=2414,  # 1.5 miles in meters
        center=civic_center,
        stroke_color='red', fill_color=(255, 0, 132)
    ),
    gmaps.Circle(
        radius=2414,  # 1.5 miles in meters
        center=glen_park,
        stroke_color='red', fill_color=(255, 0, 132)
    ),
    gmaps.Circle(
        radius=2414,  # 1.5 miles in meters
        center=lake_merritt,
        stroke_color='red', fill_color=(255, 0, 132)
    ),
    gmaps.Circle(
        radius=2414,  # 1.5 miles in meters
        center=fruitvale,
        stroke_color='red', fill_color=(255, 0, 132)
    ),
    gmaps.Circle(
        radius=2414,  # 1.5 miles in meters
        center=san_leandro,
        stroke_color='red', fill_color=(255, 0, 132)
    ),
    gmaps.Circle(
        radius=2414,  # 1.5 miles in meters
        center=hayward,
        stroke_color='red', fill_color=(255, 0, 132)
    )
], mode='DISABLED')
fig.add_layer(drawing)
marker_layer = gmaps.marker_layer([berkeley_store, civic_center, glen_park, lake_merritt, fruitvale, san_leandro, hayward], info_box_content=['Berkeley AGM Store', 'Civic Center', 'Glen Park', 'Lake Merritt', 'Fruitvale', 'San Leandro', 'Hayward'])
fig.add_layer(marker_layer)
fig

Figure(layout=FigureLayout(height='420px'))