## Summary notes

Initialise and populate an undirected weighted graph using NetworkX.

The source data is a CSV file listing the road network in Europe as an *edge list*.[^1]

We first imported the data into a Pandas `DataFrame`.[^2]
Next, we exported the `DataFrame` as a collection of dictionaries.[^3]
We initialised an empty graph,[^4] and populated it.
We closed the notebook by showing how to access the nodes, neighbors, and edges of the graph.

Whilst we could populate the graph during initialisation, we found it added unneeded complexity.

The |*edges*| ≠ |*edge list*| because NetworkX's `Graph` class does not permit parallel edges between two nodes.[^5]

## Dependencies

In [1]:
from dataclasses import dataclass
import pandas as pd
import networkx as nx

### Classes

In [2]:
@dataclass(frozen=True)
class PandasERoad:
    """A dataclass to help the conversion of the raod network data as a
    graph.

    Stores the remote url and maps the column titles to common graph
    terminology
    """

    url: str = ('https://raw.githubusercontent.com/ljk233'
                + '/laughingrook-datasets/main/graphs/eroads_edge_list.csv')
    u: str = 'origin_reference_place'
    v: str = 'destination_reference_place'
    uco: str = 'origin_country_code'
    vco: str = 'destination_country_code'
    w: str = 'distance'
    rn: str = 'road_number'
    wc: str = 'watercrossing'

## Functions

In [4]:
def edge_to_tuple(edge: dict, er: PandasERoad) -> tuple:
    return (
        edge[er.u],
        edge[er.v],
        {'weight': edge[er.w], er.rn: edge[er.rn], er.wc: edge[er.wc]}
    )

## Main

In [5]:
er = PandasERoad()

### Import the data

In [6]:
eroads = pd.read_csv(er.url)
eroads.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1250 entries, 0 to 1249
Data columns (total 7 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   road_number                  1250 non-null   object
 1   origin_country_code          1250 non-null   object
 2   origin_reference_place       1250 non-null   object
 3   destination_country_code     1250 non-null   object
 4   destination_reference_place  1250 non-null   object
 5   distance                     1250 non-null   int64 
 6   watercrossing                1250 non-null   bool  
dtypes: bool(1), int64(1), object(5)
memory usage: 59.9+ KB


### Export data to a dictionary

Each entry in the list is a dictionary representing a single row, where the keys are the column titles.

In [7]:
edges = eroads.to_dict(orient='records')
edges[0]

{'road_number': 'E01',
 'origin_country_code': 'GB',
 'origin_reference_place': 'Larne',
 'destination_country_code': 'GB',
 'destination_reference_place': 'Belfast',
 'distance': 36,
 'watercrossing': False}

### Initalise and populate the graph

In [8]:
g = nx.Graph()

Adds the nodes.
We perform it on both the source and destination nodes in the *edge list* to ensure we populate all the cities, given there's a chance that a city does not appear as a source city in the data.

In [9]:
g.add_nodes_from((e[er.u], {'country': e[er.uco]}) for e in edges)
g.add_nodes_from((e[er.v], {'country': e[er.vco]}) for e in edges)

Add the edges.
Given this is an undirected graph, there is no need to add the reverse edges *V* → *U*.

Example of output from `edge_to_tuple`

In [10]:
edge_to_tuple(edges[0], er)

('Larne',
 'Belfast',
 {'weight': 36, 'road_number': 'E01', 'watercrossing': False})

Populate the edges.

In [11]:
g.add_edges_from(edge_to_tuple(edge, er) for edge in edges)

### Inspect the graph

Get a description of the graph.

In [12]:
print(g)

Graph with 894 nodes and 1198 edges


Get a selection of the nodes.

In [13]:
[n for n in g][:5]

['Larne', 'Belfast', 'Dublin', 'Wexford', 'Rosslare']

Output a more descriptive list of nodes by calling the `nodes()` method.

In [14]:
[n for n in g.nodes(data=True)][:5]

[('Larne', {'country': 'GB'}),
 ('Belfast', {'country': 'GB'}),
 ('Dublin', {'country': 'IRL'}),
 ('Wexford', {'country': 'IRL'}),
 ('Rosslare', {'country': 'IRL'})]

View the neighbours of the Roma node.

In [15]:
[neighbor for neighbor in g['Roma']]

['Arezzo', 'Grosseto', 'Pescara', 'San Cesareo']

We can get a more descriptive output of a node's neighbours by not using list comprehension.

In [16]:
g['Roma']

AtlasView({'Arezzo': {'weight': 219, 'road_number': 'E35', 'watercrossing': False}, 'Grosseto': {'weight': 182, 'road_number': 'E80', 'watercrossing': False}, 'Pescara': {'weight': 209, 'road_number': 'E80', 'watercrossing': False}, 'San Cesareo': {'weight': 36, 'road_number': 'E821', 'watercrossing': False}})

Finally, we can simply output the edges of the Roma node.

In [17]:
[e for e in g.edges('Roma', data=True)]

[('Roma',
  'Arezzo',
  {'weight': 219, 'road_number': 'E35', 'watercrossing': False}),
 ('Roma',
  'Grosseto',
  {'weight': 182, 'road_number': 'E80', 'watercrossing': False}),
 ('Roma',
  'Pescara',
  {'weight': 209, 'road_number': 'E80', 'watercrossing': False}),
 ('Roma',
  'San Cesareo',
  {'weight': 36, 'road_number': 'E821', 'watercrossing': False})]

[^1]: Nodes are cities, and the edges represent roads connecting cities.
[^2]: You don't have to use a `DataFrame` to hold the source data, but there are special characters in the source data, so we outsourced dealing with them to Pandas.
[^3]: See [pandas.DataFrame.to_dict](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_dict.html)
[^4]: See [networkx.Graph](https://networkx.org/documentation/stable/reference/classes/graph.html#graph-undirected-graphs-with-self-loops)
[^5]: In other words, the source list has multiple roads connecting some cities.

In [18]:
%load_ext watermark
%watermark --iversions

networkx: 2.8.6
pandas  : 1.4.3
sys     : 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]

