The dataset used to create the network can be found at [this](https://www.mimit.gov.it/index.php/it/open-data/elenco-dataset/carburanti-prezzi-praticati-e-anagrafica-degli-impianti) link

In this case the data refer to the second trimester of 2025 so from the start of april to the end of june 2025

---

## Reading and cleaning the dataset

Creating the data structure that contains all the attribute about the gas stations.

**stations** = dict[station_id] -> (dict with station attribute)

In [5]:
import csv

stations = {}

with open("..//dataset//anagrafica_impianti_attivi.csv", 'r', encoding="utf-8") as data:

    for line in csv.DictReader(data, delimiter=";"):
        stations[line["idImpianto"]] = line

Creating the data structure that contains for each day every price registered for that day.

**prices** = dict[data] -> dict[index] -> (dict with single prices attribute)

>For every day we could have different price for the same station this is because the same station can sell different type of fuel that have different price, like gasoline and diesel.

In [6]:
import datetime

prices = {}

# Start day for the dataset that contains the prices
start_date = datetime.datetime(2025, 4, 1)
actual_date = start_date

# Last day for the dataset that contains the prices
end_date = datetime.datetime(2025, 6, 30)


while(actual_date <= end_date):

    actual_date_str = actual_date.strftime("%Y%m%d")
    prices_actual_day = {}
    index = 0

    filename = "..//dataset//fuel_prices//prezzo_alle_8-" + actual_date_str + ".csv"
    with open(filename, 'r', encoding="utf-8") as data:

        reader = csv.reader(data)
        next(reader)

        for line in csv.DictReader(data, delimiter=';'): 
            prices_actual_day[index] = line
            index += 1
    
    prices[actual_date_str] = prices_actual_day

    actual_date = actual_date + datetime.timedelta(days=1)

Selecting only the station inside the region of **Lombardia** in Italy, as I don't want to analyze the data of the entire country.

In [7]:
# Acronym for the provinces of Lombardy
lombardia = ("BG","BS","CO","CR","LC","LO","MN","MI","MB","PV","SO","VA")

aux = {}

for station in stations:
    if stations[station]["Provincia"] in lombardia:
        aux[station] = stations[station]

stations_lombardia = aux

---

## Creating the temporal networks with the same price, one for every fuel we are intereste to track.

**Node** = single station with the attribute.

>Every node has several temporal timestamps (day), each timestamp refer to a price registered on a single day for a single type of fuel for a single station.
>Example for the station 3456 we could have at the timestamp = n {Benzina:1,998, Diesel: 1,879}, Benzina stand for gasoline in the italian language.
>Summing up, every node has a history of different price for different type of fuel.

**Edge** = connection between two different station if they have at the same day, the same price, for the same fuel.

>The edges are undirected, because if A is connected to B because they have the same price, B will be automacally connected with A.
>Every edge has at least one timestamp, this represents the day when two different stations have the same price, if two stations have the same price for several days the edge will have several timestamps with the different price.

It's important to say that raphtory can only manage directed networks, our network is undirected so in order to fix this issue I made a workaround.

When i create an edge between two stations I have a list of element that are coupled togheter as an edge in this order, list[n] --> list[n+1] until the end of the list, where n is the index of an element inside the list.

Given the edge source --> destination, the source and the destination are two different number representing the ids of the stations, they may not always be in the same position inside the list, so it could happend to have for one day the edge 12 --> 13 and for the next day the edge 13 --> 12, these are the same edge but not for raphtory, so in order to have always the same edges even on different days I order the list of ids that I use to create the edges, in this way I always have as source a id smaller than the destination.

I am always sure to have 12 --> 13 even if the edge repeat on several different day.

In this way for raphtory the edge are directed but we can treat them as non directed.

You might wonder why I don't use a library that can handle undirected graphs, this is because raphtory is able to handle millions of edges using relatively little memory and taking little time, while also having many functions and being very versatile.

In [4]:
import raphtory as rp

lombardia_stations_gasoline = rp.Graph()
lombardia_stations_diesel = rp.Graph()

def NetworkCreation(network, fuel_type, stations_region):

    new_stations = stations_region

    # prices = dict[data] -> dict[index] -> (dict with single prices attribute)
    for date in prices:

        # prices_gasonline_id = dict(price) -> list(station_id)
        prices_gasonline_id = {}

        # index -> (dict with single prices attribute)
        for index in prices[date]:

            # Single price
            price = prices[date][index]

            # Check if the station is the list
            # if the fuel type is the one we are looking for
            # if the price is for self service fuel
            if price["idImpianto"] in new_stations and price["descCarburante"] == fuel_type and price["isSelf"] == '1':

                if price["prezzo"] not in prices_gasonline_id:
                    prices_gasonline_id[price["prezzo"]] = set()
                
                prices_gasonline_id[price["prezzo"]].add(price["idImpianto"])
        
        # Scroll the id of every station associated to the price
        for price in prices_gasonline_id:

            actual_date = datetime.datetime.strptime(date, "%Y%m%d")

            for station_id in prices_gasonline_id[price]:

                # Check if the station is already in the graph
                # If not, add it to the graph with the price associated with the date (timestamp)
                # and the properties of the station
                # If yes, add the price associated with the date (timestamp)
                try:
                    if(network.has_node(station_id) == False):
                        network.add_node(actual_date, id=station_id, properties={"prezzo":price})
                        network.node(station_id).add_constant_properties(properties=stations[station_id])
                    else:
                        network.node(station_id).add_updates(actual_date, properties={"prezzo":price})

                except Exception as error:
                    print("Error",station_id, error)
                    pass
            
            station_ids = list(prices_gasonline_id[price])
            station_ids.sort(key=int)

            # Create the edges between the stations that have the same price
            # adding the price as properties with the date (timestamp)
            for i in range(0, len(station_ids)):
                for j in range(i+1, len(station_ids)):
                    network.add_edge(actual_date, station_ids[i], station_ids[j], properties={"prezzo":price})
                    # network.add_edge(actual_date, prices_gasonline_id[price][j], prices_gasonline_id[price][i], properties={"prezzo":price})
           

NetworkCreation(lombardia_stations_gasoline, "Gasolio", stations_lombardia)
NetworkCreation(lombardia_stations_diesel, "Benzina", stations_lombardia)         

KeyboardInterrupt: 

In [None]:
lombardia_stations_gasoline.save_to_file("..//network//same_price//same_price_lombardia_stations_gasoline")
lombardia_stations_diesel.save_to_file("..//network//same_price//same_price_lombardia_stations_diesel")

---

## Distance network creation

Creating the map of the region we are interested in, to be able to calculate the space between stations, this operation could take several time, downloading or loading from cache an entire italian region is an heavy task and will fill up your RAM.

The region is in the form of a graph wich every intersection is a node and every every nod is connected to those nearby by a edge.

By using OSMnx I have to cite:

Boeing, G. (2025). Modeling and Analyzing Urban Networks and Amenities with OSMnx. Geographical Analysis, published online ahead of print. doi:10.1111/gean.70009

In [None]:
import osmnx as ox
%env NX_CUGRAPH_AUTOCONFIG=True
import networkx as nx

ox.settings.use_cache=True, 
ox.settings.log_console=True,
ox.settings.max_quey_area_size = 875000000000

lombardia_map = ox.graph_from_place('Lombardsko', network_type='drive')

env: NX_CUGRAPH_AUTOCONFIG=True


  multi_poly_proj = utils_geo._consolidate_subdivide_geometry(poly_proj)


Given the latitude and longitude of every station we are able to get the nearest node (intersection) in the map, this will be useful to get the real driving distances between every station in the next step.

In [None]:
import numpy as np

def stations_to_map_nodes(region_stations, region_map):

    stations_id = list(region_stations)

    # Extract the longitudes and latitudes of the stations
    # Convert them to numpy arrays for efficient processing
    longitudes = np.array([float(stations[single_id]['Longitudine']) for single_id in stations_id])
    latitudes = np.array([float(stations[single_id]['Latitudine']) for single_id in stations_id])

    # Get the nearest nodes (intersection) in the map for each station
    stations_nearest_nodes = ox.distance.nearest_nodes(region_map, longitudes, latitudes)

    # Create a mapping from station ID to the nearest node ID
    station_to_node = {}
    for i in range(len(stations_id)):
        station_to_node[stations_id[i]] = stations_nearest_nodes[i]
    
    return station_to_node

station_to_node = stations_to_map_nodes(stations_lombardia, lombardia_map)

Now after getting the nearest node (intersection) on the map for every station, we can calculate usign the dijkstra algorithm the shorthest path between every station.

We are saving only the distances under 30km, this because we shouldn't find something useful in station that are far from each other more than 30km and also because we are saving a csv file containing all the distances and in this way the file will be much lighter.

**WARNING**<br>
>In order to compute the distance between every station we have tu run dijkstra a lot of times, this could require several computational time, to reduce the time needed to complete the entire script we are usign a Python library called cugraph that is able to use and Nvidia graphics card to complete some task, in particular it uses the cuda core of the card to run the algorithm, dijkstra in this case, this library only run on linux or windows WSL. <br>
> To get more information about the library follow [this](https://rapids.ai/) link. 


Even with this trick running the entire script it took around 180 minutes on my machine that has the following components: <br>
**CPU: Intel Ultra 7 265k, RAM: 32GB DDR5 6400MHz, GPU: Nvidia RTX 5070FE**

In [8]:
def distances_between_stations(station_to_node, region_map,region_name):

    nx.config.warnings_to_ignore.add("cache")
    
    stations_distances = []

    index = 0
    for i in station_to_node:

        # Calculate the shortest path distances from the current station to all other stations
        # using Dijkstra's algorithm with the 'length' attribute as the weight
        # and using the cugraph backend for performance
        # This will give us the distance in meters
        distance = nx.single_source_dijkstra_path_length(region_map, station_to_node[i], weight="length", backend="cugraph")
        for j in station_to_node:
            
            try:
                if i != j and distance[station_to_node[j]]/1000 <= 30:

                    # Append the distance in kilometers to the list
                    # Only include distances less than or equal to 30 km
                    # This is to avoid including very distant stations
                    # in the dataset, as they are not relevant for the analysis
                    stations_distances.append({"Source": i, "Destination": j, "Distance": distance[station_to_node[j]]/1000})

            except KeyError:
                pass
        
        index += 1
        if index % 200 == 0:
            print("Processed", index, "stations out of", len(station_to_node))
    
    # Write the distances to a CSV file
    fieldnames = ["Source", "Destination", "Distance"]
    file_name = f"../dataset/stations_distances_{region_name}.csv"
    with open(file_name, "w", newline="") as file:

        w = csv.DictWriter(file, fieldnames)
        w.writeheader()

        w.writerows(stations_distances)
    
    return stations_distances


import os

region_name = "Lombardia"
stations_distances = []

# Check if the distances file already exists
# If it does, read the distances from the file
# If it does not, calculate the distances and save them to the file
if(os.path.exists(f"../dataset/stations_distances_{region_name}.csv")):

    with open(f"../dataset/stations_distances_{region_name}.csv", 'r', encoding="utf-8") as data:

        for line in csv.DictReader(data, delimiter=","):
            stations_distances.append({"Source": line["Source"], "Destination": line["Destination"], "Distance": float(line["Distance"])})
else:

    stations_distances = distances_between_stations(station_to_node, lombardia_map, region_name)

Creating five different network, each one has a limit distance, the limit distances are 1,2,3,4,5.

Every network have always the same nodes that they represent the stations.
For the edges is differente, for example the network with limit distance 2 will have an edge connecting every node (stations) wich are separated by equal or less than 2km of driving distance, the same for the other network but with different km.

We obtain five network with different connection distances, this will be useful during the analysis of the price behaviour in order to understand how far the price of fuel it propagates between stations.

In [None]:
def distance_network(stations_distances, stations_region, region_name, distances):

    distances_network = {}
    # Create a network for each distance in the distances list
    # Each network will contain the stations as nodes and the edges will be created based on the distances
    for km in distances:
        distances_network[km] = nx.Graph()
        for station_id in stations_region:
            distances_network[km].add_node(station_id, label=station_id, latitude=float(stations_region[station_id]["Latitudine"]), longitude=float(stations_region[station_id]["Longitudine"]))
    
    # Iterate through the distances and add edges to the network based on the distances
    for distance in stations_distances:

        source = distance["Source"]
        destination = distance["Destination"]
        stations_distance = distance["Distance"]

        for km in distances:
            if stations_distance <= km:
                distances_network[km].add_edge(source, destination)
    
    # Write the networks to GEXF files
    for km in distances:
        file_name = f"../network/distance/{km}_km_distance_network_{region_name}.gexf"
        nx.write_gexf(distances_network[km], file_name)
    
    return distances_network

distances_network = distance_network(stations_distances, stations_lombardia, region_name, [1,2,3,4,5])