# Shortest Path Demo

By Sander Aarts, 2020. Edited by Jody Zhu.

In this part of the lab, you will be introduced to the basics of using Jupyter notebooks and solve a larger instance of the pizza delivery problem that will otherwise be extremely time-consuming to compute by hand. You may skip ahead if you know how Jupyter notebooks work.  

As the course progresses, you will see more labs in the Jupyter notebook format which is made up of cells or individual segments. There are 2 main types of cells: code and markdown. Code cells have a <font color='blue'>$\text{In[ ]:}$</font> next to them and are for code (we will use Python3). Markdown cells contain normal-looking text and will be where instructions and answers are usually written.  

To run a cell, make sure the cell is selected and then click the Run button at the top. An alternate keyboard shortcut is *ctrl + enter*. The dark square (stop) button next to the Run button interrupts the cell as it runs. Running a code cell will only run the code in that cell but variables from previously ran cells are saved. For example, if you run a cell with `x = 1` before another cell with `y = x + 2`, then y = 3. To clear everything or restart the kernel, you can click the refresh arrow button. The fast forward button also restarts the kernel and then re-runs the entire notebook.  

To type in a markdown cell, double click until the background turns grey which means it is in editing mode. You can run it to make it look prettier by either the Run button or the keyboard shortcut. You will occasionally be asked to write your answers in a markdown cell and to upload a saved notebook for credit.  

Do not worry about not knowing any Python. We will walk you through step-by-step what you need to code or provide all the code for parts that are less crucial for this class.

## Part 0: Load the necessary packages

This lab should run by simply downloading the complete lab folder and opening the Jupyter notebook. If the packages below have not been installed, an error will appear (esp. if on a personal computer).

In [1]:
import pandas as pd
import numpy as np
import math
import itertools
import networkx as nx
from bokeh.io import output_notebook
output_notebook()

## Part 1: Making it Big - Delivering Pizza in NYC

Your pizza delivery service has enjoyed much success and opens up a new shop based at Cornell Tech. You are told that the central location on Roosevelt Island makes it possible to deliver pizzas anywhere in the city within 40 minutes. Your task is to find the best driving routes and to decide whether the 40-minute guarantee is realistic.

Here you will use the actual NYC road network. In this network, a node represents any intersection; edges are road segments that connect intersections. Most streets in New York City are included. Approximate travel times are estimated from millions of Yellow Cab travel times.  

Begin by loading the data files $\texttt{nyc_nodes.csv}$ and $\texttt{nyc_links.csv}$ from the $\texttt{data}$ folder. (Data originally from: https://lab-work.github.io/data/). The data is kept in pandas dataframes. To view the data as tables, run the cells belows.

In [2]:
# load nodes
data = pd.read_csv('data/nyc_nodes.csv')
dfn = pd.DataFrame(data)
# load edges
data = pd.read_csv('data/nyc_links.csv')
dfl = pd.DataFrame(data)

print('Loaded %d nodes and %d edges.' % (dfn.shape[0], dfl.shape[0]))

Loaded 20056 nodes and 44252 edges.


Use $\texttt{dfl.head}()$ to inspect the link data. Note that some streets have multiple edges. This is because there are multiple road segments on some streets. Also included are two delay columns: one for NYC at 8 pm, another at 5pm.

In [3]:
dfl.head()

Unnamed: 0,start,end,street_name,delay8pm,delay5pm
0,42445950,596775946,East59thStreet,36.797001,63.37646
1,42811333,42811336,27thAvenue,9.845895,10.703889
2,42811333,42811330,27thAvenue,10.547779,16.756385
3,42811330,42811333,27thAvenue,10.260058,9.621185
4,42445947,596775941,East58thStreet,37.502009,61.279778


We will restrict our focus to a handful of nodes that we treat as Points of Interest ($\texttt{PoIs}$). Our goal is to decide if pizza can be delivered to these locations in a timely fashion.

In [4]:
# define points of interest (poi)
poi = list((1241986499, 42446461, 42439861,
            103864622, 42428391, 599270647,
            42466966, 42487873))
origin = poi[0] # Roosevelt Island

# define results dataframe
results = pd.DataFrame({'node_id':poi})

Run the cell below to plot the road network and $\texttt{PoIs}$. All nodes except the Points of Interest ($\texttt{PoIs}$) have been made invisible to keep clutter at a minimum. Do you recognize them?

In [5]:
from graph_tools import plotNetwork
plotNetwork(dfn, dfl, title="NYC road network", targets=poi, on_map=True)

Next, load the data into a networkx model and solve. Networkx is a library for dealing with graphs and graph algorithms in Python. Here we use one of networkx's built-in shortest path solvers, but later in the course, we will write our own.

Recall that edges (see dfl) were defined by the 'start' node and the 'end' node. We load the data as a graph in the next cell by specifying (1) that our data sits in dfl, (2) that edges start at nodes from the 'start' column, (3) that edges end in nodes in the 'end' column, and (4) that edge costs are in the 'cost' column formatted as such:

$$\texttt{G = nx.from_pandas_edgelist(<dataframe of edges>, <start col name>, <end col name>, <cost col name>)} $$

Most pizza is delivered around 8pm, so use delay8pm as $\texttt{costs}$. Explore the travels times to the various Points of Interest. Is a 30-minute guarantee reasonable?

In [6]:
# load networkx model from edge dataset
G = nx.from_pandas_edgelist(dfl, 'start', 'end', ['delay8pm', 'delay5pm'])

In [7]:
# set delay variable to be 8pm delays
delay = 'delay8pm'

In [8]:
# solve shortest paths
out = nx.single_source_dijkstra(G, origin, weight=delay)
# record output times
results[delay] = results['node_id'].map(out[0]) / 60

In [9]:
# inspect the output
results

Unnamed: 0,node_id,delay8pm
0,1241986499,0.0
1,42446461,15.344263
2,42439861,24.244267
3,103864622,52.810546
4,42428391,28.011097
5,599270647,12.847955
6,42466966,24.325668
7,42487873,30.9402


Next, plot the shortest path tree. Because there are so many nodes, we're only interested in plotting the shortest paths to the  $\texttt{PoIs}$. What do you see about the paths? Are there edges (roads / bridges / driveways) that the shortest paths seem to rely heavily on?

In [10]:
from graph_tools import plotShortestPathTree
plotShortestPathTree(dfn, dfl, out, poi)

As you many have noticed, all deliveries to the west of Roosevelt Island take the Queensboro Bridge. Aside from Hoboken, is seems feasible to deliver to to all $\texttt{PoIs}$ in close to 30 minutes. But what if there is a traffic jam on the Queensboro Bridge? Add 10 minutes to the costs of edges using the Queensboro Bridge and re-solve the model. Print the resulting table and shortest path tree.

In [11]:
# get all edges with 'QueensboroBridge' in their name
queensboro = dfl['street_name'].str.contains('QueensboroBridge')
# define a new cost variable 'qb-cost'
dfl['qb_cost'] = dfl['delay8pm']
# Change the cost of all QueensboroBridge-related edges
dfl.loc[queensboro, 'qb_cost'] = dfl['delay8pm'] + 600 # add 10 minutes

In [12]:
# load networkx model from edge dataset
G = nx.from_pandas_edgelist(dfl, 'start', 'end', ['delay8pm', 'delay5pm', 'qb_cost'])

# set delay variable to be 8pm delays
delay = 'qb_cost'
# solve shortest paths
out = nx.single_source_dijkstra(G, origin, weight=delay)
# record output times
results[delay] = results['node_id'].map(out[0]) / 60

# inspect the output
results

Unnamed: 0,node_id,delay8pm,qb_cost
0,1241986499,0.0,0.0
1,42446461,15.344263,43.607207
2,42439861,24.244267,50.564301
3,103864622,52.810546,79.130581
4,42428391,28.011097,30.360873
5,599270647,12.847955,12.847955
6,42466966,24.325668,24.325668
7,42487873,30.9402,30.9402


In [13]:
# plot new paths
plotShortestPathTree(dfn, dfl, out, poi)

You have found the shortest path. Make sure to answer the questions on the lab sheet before turning it in.