# Graphs Creation

In this notebook, the data about the transport networks is read from the CSV files and networkx graphs are created and stored. The data for each city is provided in a zip folder. Among many files, the files useful for us are `network_nodes.csv` and `network_combined.csv`. The former file one contains information about the nodes (stops) such as the id, latitude, longitue and name. The latter file contains the information about the edges (routes) such as the from node, to node, the straight line distance, average duration between the stops, the number of times a public transport vehicle passes through that stop in a hour and the route type. The route type can be one of following seven types: ` tram, subway, rail, bus, ferry, cablecar, gondola and funicular`. As there are different edges for different modes of the transport the whole transport network graph is conceputalised as a MultiDiGraph. As the transport vehicles have a from and to node, it is represented in the graph as a directed graph. 

We found out that there are some self loops in the edge data and we could not find the relevant information on the transportation website of the city. Hence we remove the edges with same from and to id. 

We create a dictionary of graphs for each city with mode of transport as the key and the networkx graph as the value. If a certain mode of transport does not exist, then the value is set to None. Apart form the individual modes, there will be one full graph as well. This dictionary of graphs is stored as a pickle file on the drive. Along with the dictionary of graphs, we store the `network_nodes.csv` and `network_combined.csv` for each city.



1. For each city , creates an empty dictionary to store graphs representing different types of transportation routes in the city.
2. Extract the city name from the city zip file name, and read the node and edge data from CSV files in the zip file.
3. Remove the self loops from the edge data
3. Create a MultiDiGraph for full network of transportation routes in the city.
4. Get the unique transportation route types in the edge data CSV file, and create a new DiGraph for each route type.
5. Store the dictionary of graphs as a serialized pickle file with the name of the city

In [1]:
import glob
import pickle
import pathlib
import pandas as pd
import networkx as nx

from zipfile import ZipFile

from enum import Enum

In [2]:
# Set paths for input data and output graphs
rel_data_folder_path = pathlib.Path("./../../data")
transport_data_path = rel_data_folder_path.joinpath('transport_data')
city_network_graphs = rel_data_folder_path.joinpath('network_graphs').joinpath('graphs')
city_network_bones = rel_data_folder_path.joinpath('network_graphs').joinpath('nodes-edges')

# Get list of zip files with transportation data
city_zips = list(transport_data_path.glob('*.zip'))

# Define enum for route types
class RouteType(Enum):
    tram, subway, rail, bus, ferry, cablecar, gondola, funicular = range(8)

In [3]:
# Loop over each zip folder for the city
for city_data_path in city_zips:
    
    # Create dictionary to store graph representations of different route types and the full network
    city_graphs = {RouteType(idx).name: None for idx in range(8)}
    city_graphs["full"] = None
    
    # get the city name
    city_zf = ZipFile(city_data_path)
    city_name = city_data_path.name.removesuffix(".zip")
        
    # Read node information and save it to file
    city_nodes_df = pd.read_csv(city_zf.open(city_name + '/network_nodes.csv'),sep=";")
    city_zf.extract(city_name + '/network_nodes.csv', path=city_network_bones)

    # read the edges information and remove self loops where the from stop and to stop are the same and save it to file
    city_network_df = pd.read_csv(city_zf.open(city_name + '/network_combined.csv'),sep=";").query("from_stop_I != to_stop_I")
    city_zf.extract(city_name + '/network_combined.csv', path=city_network_bones)

     # Construct graph for full network
    full_city_graph = nx.MultiDiGraph()

    # Add nodes to the graph
    for _, row in city_nodes_df.iterrows():
        node_id = row['stop_I']
        full_city_graph.add_node(node_id, **row[1:].to_dict())

    # Add edges to the graph
    for _, row in city_network_df.iterrows():
        source = row['from_stop_I']
        target = row['to_stop_I']
        edge_data = row[2:].to_dict()
        full_city_graph.add_edge(source, target, **edge_data)

    city_graphs["full"] = full_city_graph

     # Construct graphs for different route types
    rte_types = city_network_df["route_type"].unique()

    for rte_type in rte_types:
        rte_network_df = city_network_df[city_network_df["route_type"] == rte_type]

        rte_type_graph = nx.DiGraph()

        # Add edges to the graph
        for _, row in rte_network_df.iterrows():
            source = row['from_stop_I']
            target = row['to_stop_I']
            edge_data = row[2:].to_dict()
            rte_type_graph.add_edge(source, target, **edge_data)

        city_graphs[RouteType(rte_type).name] = rte_type_graph

    # Save graphs for city to file
    with open(city_network_graphs.joinpath(city_name + '.gpickle'), 'wb') as f:
        pickle.dump(city_graphs, f, pickle.HIGHEST_PROTOCOL)