# Create a Graph of the London Transport Network using LOAD CSV
Example Text:
In this notebook, we will load some data about the London public transportation network into a Neo4j Graph Data Science instance in order to experiment further with some of the features of Neo4j Graph Database and Graph Data Science. 

The data used in this example has been cleaned up and simplified from the original raw data files downloaded from Transport for London. In a later section we will experiment with creating graphs from the raw data set itself, but for now this is a simple example to help us get things started quickly. 

## Setup
First off, let's just check and confirm that the Python environment is up and running and as we expect it. 

In [None]:
import sys
sys.version

Next we need to install some libraries.

In [None]:
%pip install --user graphdatascience
%pip install --user neo4j
%pip install --user IProgress
%pip install --user tqdm

Now restart the kernel.  That will allow the Python evironment to import the new packages.

In [None]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

## Establish Neo4j Connection

In [None]:
# username is neo4j by default
NEO4J_USERNAME = 'neo4j'

# You will need to change these to match your credentials
NEO4J_URI = 'neo4j+s://a2a3a4e4.databases.neo4j.io'
NEO4J_PASSWORD = '7RbblizpZDpB_4INFovS75lSbHkAOcOJlG7KvZWyx84'

In [None]:
from graphdatascience import GraphDataScience

gds = GraphDataScience(
    NEO4J_URI,
    auth=(NEO4J_USERNAME, NEO4J_PASSWORD),
    aura_ds=True
)
gds.set_database('neo4j')
gds.run_cypher('RETURN gds.version()')

## Loading Data from CSV Files 

Let's start by loading up the station nodes from a CSV file stored in a Google Cloud Storage bucket.

In [None]:
gds.run_cypher('''
LOAD CSV WITH HEADERS FROM 'https://storage.googleapis.com/leerazo-demos/london_transport/datasets/London_stations.csv' AS row
MERGE (s:Station {latitude:toFloat(row.Latitude), longitude:toFloat(row.Longitude), name:row.Station, zone:row.Zone})
RETURN count(s) as stations
''')

Next let's connect the stations and label the connections according to transit lines they represent.

In [None]:
gds.run_cypher('''
LOAD CSV WITH HEADERS FROM 'https://storage.googleapis.com/leerazo-demos/london_transport/datasets/London_tube_lines.csv' as row
MATCH (a:Station), (b:Station) WHERE a.name = row.From_Station AND b.name = row.To_Station
CALL apoc.create.relationship(a, toUpper(row.Tube_Line), {}, b)
YIELD rel as rel1
CALL apoc.create.relationship(b, toUpper(row.Tube_Line), {}, a)
YIELD rel as rel2
RETURN count(rel1) + count(rel2) AS relationships;
''')

And now the graph is complete! 


![Alt text](01-graph_complete-1.png)

You can delete the entire graph using the cell below and run this again or to move on to the next lab to try another data loading method. 

In [None]:
gds.run_cypher('''
MATCH (n) DETACH DELETE n
''')