## Summary notes

Initialise and populate an undirected weighted graph using NetworkX.
The source data is a CSV file listing the road network in Europe as an *edge list*. (Nodes are cities, and the edges represent roads connecting cities.)

We first import the data into a Pandas `DataFrame`.
You don't have to use a `DataFrame` to hold the source data, but there are special characters in the source data, and so we outsourced dealing with them to Pandas.
Next, we exported the `DataFrame` as a dictionary of dictionaries.
We initialised an empty graph, and populated it (using list comprehension and the dictionary.)
We closed the note by showing how to access the nodes, neighbors, and edges of the graph.

Whilst we could populate the graph during initialisation, we found it added unneeded complexity.
Final note, the |*edges*| ≠ |*edge list*| because the NetworkX `Graph` class does not permit parallel edges between two nodes.
(In other words, the source list has multiple roads connecting some cities.)

## Dependencies

In [1]:
import pandas as pd
import networkx as nx

## Global constants

This is the URL to the source data.

In [2]:
EROADS_URL = ('https://raw.githubusercontent.com/ljk233/laughingrook-datasets'
              + '/main/graphs/eroads_edge_list.csv')

Column titles in the source data.
We feel it improves the readability of the code when populating the graph, but it is optional, and you could instead directly pass the column titles.

In [3]:
U = 'origin_reference_place'
V = 'destination_reference_place'
UCO = 'origin_country_code'
VCO = 'destination_country_code'
W = 'distance'
RN = 'road_number'
WC = 'watercrossing'

## Main

### Import the data

In [4]:
eroads = pd.read_csv(EROADS_URL)
eroads.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1250 entries, 0 to 1249
Data columns (total 7 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   road_number                  1250 non-null   object
 1   origin_country_code          1250 non-null   object
 2   origin_reference_place       1250 non-null   object
 3   destination_country_code     1250 non-null   object
 4   destination_reference_place  1250 non-null   object
 5   distance                     1250 non-null   int64 
 6   watercrossing                1250 non-null   bool  
dtypes: bool(1), int64(1), object(5)
memory usage: 59.9+ KB


### Export data to a dictionary

Each entry in the dictionary is a dictionary representing a single row, where the keys are the column titles.

In [5]:
edges = eroads.to_dict(orient='records')
edges[0]

{'road_number': 'E01',
 'origin_country_code': 'GB',
 'origin_reference_place': 'Larne',
 'destination_country_code': 'GB',
 'destination_reference_place': 'Belfast',
 'distance': 36,
 'watercrossing': False}

### Initalise the graph

In [6]:
g = nx.Graph()

### Populate the graph

Adds the nodes.
We perform it on both the source and destination nodes in the *edge list* to ensure we populate all the cities.
(There's a chance that a city does not appear as a source city in the data.)
The dictionary we pass in the tuple are data describing a node.

In [7]:
g.add_nodes_from((e[U], {'country': e[UCO]}) for e in edges)
g.add_nodes_from((e[V], {'country': e[VCO]}) for e in edges)

Add the edges.
Given this is an undirected graph, there is no need to add the reverse edges, *v* → *u*.
The dictionary we pass in the tuple are data that describe an edge.

In [8]:
g.add_edges_from((e[U], e[V], {'weight': e[W], RN: e[RN], WC: e[WC]},)
                 for e in edges)

### Inspect the graph

Get a description of the graph.

In [9]:
print(g)

Graph with 894 nodes and 1198 edges


Get a selection of the nodes.

In [10]:
[n for n in g][:5]

['Larne', 'Belfast', 'Dublin', 'Wexford', 'Rosslare']

Output a more descriptive list of nodes by calling the `nodes()` method.

In [11]:
[n for n in g.nodes(data=True)][:5]

[('Larne', {'country': 'GB'}),
 ('Belfast', {'country': 'GB'}),
 ('Dublin', {'country': 'IRL'}),
 ('Wexford', {'country': 'IRL'}),
 ('Rosslare', {'country': 'IRL'})]

View the neighbours of the Roma node.

In [12]:
[neighbor for neighbor in g['Roma']]

['Arezzo', 'Grosseto', 'Pescara', 'San Cesareo']

We can get a more descriptive list of a node's neighbours by not using list comprehension.

In [13]:
print(g['Roma'])

{'Arezzo': {'weight': 219, 'road_number': 'E35', 'watercrossing': False}, 'Grosseto': {'weight': 182, 'road_number': 'E80', 'watercrossing': False}, 'Pescara': {'weight': 209, 'road_number': 'E80', 'watercrossing': False}, 'San Cesareo': {'weight': 36, 'road_number': 'E821', 'watercrossing': False}}


Finally, we can simply output the edges of the Roma node.

In [14]:
[e for e in g.edges('Roma', data=True)]

[('Roma',
  'Arezzo',
  {'weight': 219, 'road_number': 'E35', 'watercrossing': False}),
 ('Roma',
  'Grosseto',
  {'weight': 182, 'road_number': 'E80', 'watercrossing': False}),
 ('Roma',
  'Pescara',
  {'weight': 209, 'road_number': 'E80', 'watercrossing': False}),
 ('Roma',
  'San Cesareo',
  {'weight': 36, 'road_number': 'E821', 'watercrossing': False})]