# Project 3, Part 4, Verify the graph database in Neo4j for the BART system

University of California, Berkeley

Master of Information and Data Science (MIDS) program

w205 - Fundamentals of Data Engineering

Students in the group:
* Aris Chalini
* Jack Galvin
* Matt Lauritzen

Year: 2022

Semester: Spring

Section: 09


# Included Modules and Packages

Code cell containing your includes for modules and packages

Some starter code is provided

You may change the starter code as needed

You may add as much code and/or as many code cells as you need

In [15]:
import neo4j

import csv

import math
import numpy as np
import pandas as pd

import psycopg2

# Supporting code

Code cells containing any supporting code, such as connecting to the database, any functions, etc.  

Remember you can freely use any code from the labs. You do not need to cite code from the labs.

Some starter code is provided

You may change the starter code as needed

You may add as much code and/or as many code cells as you need

In [16]:
driver = neo4j.GraphDatabase.driver(uri="neo4j://neo4j:7687", auth=("neo4j","w205"))

In [17]:
session = driver.session(database="neo4j")

In [18]:
def my_neo4j_shortest_path(from_station, to_station):
    "given a from station and to station, run and print the shortest path"
    
    query = "CALL gds.graph.drop('ds_graph', false)"
    session.run(query)

    query = "CALL gds.graph.create('ds_graph', 'Station', 'LINK', {relationshipProperties: 'weight'})"
    session.run(query)

    query = """

    MATCH (source:Station {name: $source}), (target:Station {name: $target})
    CALL gds.shortestPath.dijkstra.stream(
        'ds_graph', 
        { sourceNode: source, 
          targetNode: target, 
          relationshipWeightProperty: 'weight'
        }
    )
    YIELD index, sourceNode, targetNode, totalCost, nodeIds, costs, path
    RETURN
        gds.util.asNode(sourceNode).name AS from,
        gds.util.asNode(targetNode).name AS to,
        totalCost,
        [nodeId IN nodeIds | gds.util.asNode(nodeId).name] AS nodes,
        costs
    ORDER BY index

    """

    result = session.run(query, source=from_station, target=to_station)
    
    for r in result:
        
        total_cost = int(r['totalCost'])
        
        print("\n--------------------------------")
        print("   Total Cost: ", total_cost)
        print("   Minutes: ", round(total_cost / 60.0,1))
        print("--------------------------------")
        
        nodes = r['nodes']
        costs = r['costs']
        
        i = 0
        previous = 0
        
        for n in nodes:
            
            print(n + ", " + str(int(costs[i]) - previous)  + ", " + str(int(costs[i])))
            
            previous = int(costs[i])
            i += 1
    

# 3.4.1 Verify the shortest path between Dublin and Antioch

Use the function my_neo4j_shortest_path() between 'depart Dublin' and 'arrive Antioch' to verify that the shortest path matches the following output

```
--------------------------------
   Total Cost:  5813
   Minutes:  96.9
--------------------------------
depart Dublin, 0, 0
blue Dublin, 0, 0
blue West Dublin, 180, 180
blue Castro Valley, 600, 780
blue Bay Fair, 240, 1020
blue San Leandro, 240, 1260
blue Coliseum, 240, 1500
orange Coliseum, 54, 1554
orange Fruitvale, 240, 1794
orange Lake Merritt, 300, 2094
orange 12th Street, 180, 2274
orange 19th Street, 120, 2394
orange MacArthur, 180, 2574
yellow MacArthur, 59, 2633
yellow Rockridge, 240, 2873
yellow Orinda, 300, 3173
yellow Lafayette, 300, 3473
yellow Walnut Creek, 300, 3773
yellow Pleasant Hill, 120, 3893
yellow Concord, 360, 4253
yellow North Concord, 180, 4433
yellow Pittsburg, 360, 4793
yellow Pittsburg Center, 600, 5393
yellow Antioch, 420, 5813
arrive Antioch, 0, 5813
```

In [19]:
my_neo4j_shortest_path("depart Dublin", "arrive Antioch")


--------------------------------
   Total Cost:  5813
   Minutes:  96.9
--------------------------------
depart Dublin, 0, 0
blue Dublin, 0, 0
blue West Dublin, 180, 180
blue Castro Valley, 600, 780
blue Bay Fair, 240, 1020
blue San Leandro, 240, 1260
blue Coliseum, 240, 1500
orange Coliseum, 54, 1554
orange Fruitvale, 240, 1794
orange Lake Merritt, 300, 2094
orange 12th Street, 180, 2274
orange 19th Street, 120, 2394
orange MacArthur, 180, 2574
yellow MacArthur, 59, 2633
yellow Rockridge, 240, 2873
yellow Orinda, 300, 3173
yellow Lafayette, 300, 3473
yellow Walnut Creek, 300, 3773
yellow Pleasant Hill, 120, 3893
yellow Concord, 360, 4253
yellow North Concord, 180, 4433
yellow Pittsburg, 360, 4793
yellow Pittsburg Center, 600, 5393
yellow Antioch, 420, 5813
arrive Antioch, 0, 5813


# 3.4.2 Verify the shortest path between SFO airport and OAK airport

Use the function my_neo4j_shortest_path() between 'depart SFO' and 'arrive OAK' to verify that the shortest path matches the following output

```
--------------------------------
   Total Cost:  3882
   Minutes:  64.7
--------------------------------
depart SFO, 0, 0
yellow SFO, 0, 0
yellow San Bruno, 240, 240
yellow South San Francisco, 240, 480
yellow Colma, 180, 660
yellow Daly City, 240, 900
yellow Balboa Park, 240, 1140
green Balboa Park, 48, 1188 (or blue, they have the same cost)
green Glen Park, 120, 1308
green 24th Street Mission, 180, 1488
green 16th Street Mission, 120, 1608
green Civic Center, 180, 1788
green Powell Street, 60, 1848
green Montgomery Street, 120, 1968
green Embarcadero, 60, 2028
green West Oakland, 420, 2448
green Lake Merritt, 360, 2808
green Fruitvale, 300, 3108
green Coliseum, 240, 3348
gray Coliseum, 54, 3402
gray OAK, 480, 3882
arrive OAK, 0, 3882
```

In [20]:
my_neo4j_shortest_path("depart SFO", "arrive OAK")


--------------------------------
   Total Cost:  3882
   Minutes:  64.7
--------------------------------
depart SFO, 0, 0
yellow SFO, 0, 0
yellow San Bruno, 240, 240
yellow South San Francisco, 240, 480
yellow Colma, 180, 660
yellow Daly City, 240, 900
yellow Balboa Park, 240, 1140
green Balboa Park, 48, 1188
green Glen Park, 120, 1308
green 24th Street Mission, 180, 1488
green 16th Street Mission, 120, 1608
green Civic Center, 180, 1788
green Powell Street, 60, 1848
green Montgomery Street, 120, 1968
green Embarcadero, 60, 2028
green West Oakland, 420, 2448
green Lake Merritt, 360, 2808
green Fruitvale, 300, 3108
green Coliseum, 240, 3348
gray Coliseum, 54, 3402
gray OAK, 480, 3882
arrive OAK, 0, 3882


# 3.4.3 Verify the shortest path between Downtown Berkeley and Castro Valley

Use the function my_neo4j_shortest_path() between 'depart Downtown Berkeley' and 'arrive Castro Valley' to verify that the shortest path matches the following output

```
--------------------------------
   Total Cost:  2214
   Minutes:  36.9
--------------------------------
depart Downtown Berkeley, 0, 0
orange Downtown Berkeley, 0, 0
orange Ashby, 180, 180
orange MacArthur, 240, 420
orange 19th Street, 180, 600
orange 12th Street, 120, 720
orange Lake Merritt, 180, 900
orange Fruitvale, 300, 1200
orange Coliseum, 240, 1440
blue Coliseum, 54, 1494
blue San Leandro, 240, 1734
blue Bay Fair, 240, 1974
blue Castro Valley, 240, 2214
arrive Castro Valley, 0, 2214
```

In [21]:
my_neo4j_shortest_path("depart Downtown Berkeley", "arrive Castro Valley")


--------------------------------
   Total Cost:  2214
   Minutes:  36.9
--------------------------------
depart Downtown Berkeley, 0, 0
orange Downtown Berkeley, 0, 0
orange Ashby, 180, 180
orange MacArthur, 240, 420
orange 19th Street, 180, 600
orange 12th Street, 120, 720
orange Lake Merritt, 180, 900
orange Fruitvale, 300, 1200
orange Coliseum, 240, 1440
blue Coliseum, 54, 1494
blue San Leandro, 240, 1734
blue Bay Fair, 240, 1974
blue Castro Valley, 240, 2214
arrive Castro Valley, 0, 2214


# 3.4.4 Verify the shortest path between San Bruno and San Leandro

Use the function my_neo4j_shortest_path() between 'depart Downtown San Bruno' and 'arrive San Leandro' to verify that the shortest path matches the following output

```
--------------------------------
   Total Cost:  3348
   Minutes:  55.8
--------------------------------
depart San Bruno, 0, 0
red San Bruno, 0, 0
red South San Francisco, 240, 240
red Colma, 180, 420
red Daly City, 240, 660
red Balboa Park, 240, 900
blue Balboa Park, 48, 948 (or green, they have the same cost)
blue Glen Park, 120, 1068
blue 24th Street Mission, 180, 1248
blue 16th Street Mission, 120, 1368
blue Civic Center, 180, 1548
blue Powell Street, 60, 1608
blue Montgomery Street, 120, 1728
blue Embarcadero, 60, 1788
blue West Oakland, 420, 2208
blue Lake Merritt, 360, 2568
blue Fruitvale, 300, 2868
blue Coliseum, 240, 3108
blue San Leandro, 240, 3348
arrive San Leandro, 0, 3348
```

In [23]:
my_neo4j_shortest_path("depart San Bruno", "arrive San Leandro")


--------------------------------
   Total Cost:  3348
   Minutes:  55.8
--------------------------------
depart San Bruno, 0, 0
red San Bruno, 0, 0
red South San Francisco, 240, 240
red Colma, 180, 420
red Daly City, 240, 660
red Balboa Park, 240, 900
green Balboa Park, 48, 948
green Glen Park, 120, 1068
green 24th Street Mission, 180, 1248
green 16th Street Mission, 120, 1368
green Civic Center, 180, 1548
green Powell Street, 60, 1608
green Montgomery Street, 120, 1728
green Embarcadero, 60, 1788
green West Oakland, 420, 2208
green Lake Merritt, 360, 2568
green Fruitvale, 300, 2868
green Coliseum, 240, 3108
green San Leandro, 240, 3348
arrive San Leandro, 0, 3348


# 3.4.5 Verify the shortest path between  Embarcadero and Civic Center

Use the function my_neo4j_shortest_path() between 'depart Embarcadero' and 'arrive Civic Center' to verify that the shortest path matches the following output

```
--------------------------------
   Total Cost:  240
   Minutes:  4.0
--------------------------------
depart Embarcadero, 0, 0
yellow Embarcadero, 0, 0 (or red or blue or green, they all have the same cost)
yellow Montgomery Street, 60, 60
yellow Powell Street, 120, 180
yellow Civic Center, 60, 240
arrive Civic Center, 0, 240
```

In [24]:
my_neo4j_shortest_path("depart Embarcadero", "arrive Civic Center")


--------------------------------
   Total Cost:  240
   Minutes:  4.0
--------------------------------
depart Embarcadero, 0, 0
blue Embarcadero, 0, 0
blue Montgomery Street, 60, 60
blue Powell Street, 120, 180
blue Civic Center, 60, 240
arrive Civic Center, 0, 240
