<center>
    <h1>Verbal Explanation of Spatial Temporal GNNs for Traffic Forecasting</h1>
    <h2>Verbal Explanations on the Metr-LA Dataset</h2>
</center>

---

The verbal translation occurs through a template-based approach. This method involves substituting placeholders in textual templates with the chosen content to form coherent narratives.

The verbal translation consists of composing a series of paragraphs by exploiting the content extracted from the graphical explanations. The first paragraph describes the predicted event and briefly sums up its causes, while the second to last paragraphs illustrate in detail each cause leading to the event which is each cluster of the important subgraph. The paragraphs describing the causes are sorted by the time they occurred.

In [25]:
import sys
import os

# Set the main path in the root folder of the project.
sys.path.append(os.path.join('..'))

In [26]:
# Settings for autoreloading.
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [27]:
from src.utils.seed import set_random_seed

# Set the random seed for deterministic operations.
SEED = 42
set_random_seed(SEED)

# 1 Loading the Data
In this section the data is loaded

In [28]:
import os

BASE_DATA_DIR = os.path.join('..', 'data', 'metr-la')

In [29]:
from src.data.data_extraction import get_adjacency_matrix

# Get the adjacency matrix
adj_matrix_structure = get_adjacency_matrix(
    os.path.join(BASE_DATA_DIR, 'raw', 'adj_mx_metr_la.pkl'))

# Get the header of the adjacency matrix, the node indices and the
# matrix itself.
_, node_ids_dict, _ = adj_matrix_structure

# Get the node positions dictionary.
node_pos_dict = { i: id for id, i in node_ids_dict.items() }

In [30]:
import pickle

# Get the node street and kilometrage dictionary.
with open(os.path.join(BASE_DATA_DIR, 'structured', 'node_locations.pkl'), 'rb') as f:
    node_info = pickle.load(f)

In [31]:
import os
import numpy as np

# Get the explained data.
x_train = np.load(os.path.join(BASE_DATA_DIR, 'explained', 'x_train.npy'))[..., :1]
y_train = np.load(os.path.join(BASE_DATA_DIR, 'explained', 'y_train.npy'))[..., :1]
x_val = np.load(os.path.join(BASE_DATA_DIR, 'explained', 'x_val.npy'))[..., :1]
y_val = np.load(os.path.join(BASE_DATA_DIR, 'explained', 'y_val.npy'))[..., :1]
x_test = np.load(os.path.join(BASE_DATA_DIR, 'explained', 'x_test.npy'))[..., :1]
y_test = np.load(os.path.join(BASE_DATA_DIR, 'explained', 'y_test.npy'))[..., :1]

# Get the clustered data.
x_train_clusters = np.load(os.path.join(BASE_DATA_DIR, 'clustered', 'x_train.npy'))
x_val_clusters = np.load(os.path.join(BASE_DATA_DIR, 'clustered', 'x_val.npy'))
x_test_clusters = np.load(os.path.join(BASE_DATA_DIR, 'clustered', 'x_test.npy'))


# Get the time information of the explained data.
x_train_times = np.load(os.path.join(BASE_DATA_DIR, 'explained', 'x_train_time.npy'))
y_train_times = np.load(os.path.join(BASE_DATA_DIR, 'explained', 'y_train_time.npy'))
x_val_times = np.load(os.path.join(BASE_DATA_DIR, 'explained', 'x_val_time.npy'))
y_val_times = np.load(os.path.join(BASE_DATA_DIR, 'explained', 'y_val_time.npy'))
x_test_times = np.load(os.path.join(BASE_DATA_DIR, 'explained', 'x_test_time.npy'))
y_test_times = np.load(os.path.join(BASE_DATA_DIR, 'explained', 'y_test_time.npy'))

The datasets are turned into km/h

In [32]:
from src.utils.config import MPH_TO_KMH_FACTOR

x_train = x_train * MPH_TO_KMH_FACTOR
y_train = y_train * MPH_TO_KMH_FACTOR

x_val = x_val * MPH_TO_KMH_FACTOR
y_val = y_val * MPH_TO_KMH_FACTOR

x_test = x_test * MPH_TO_KMH_FACTOR
y_test = y_test * MPH_TO_KMH_FACTOR

# 2 Verbal Translation
The translation is performed on the datasets

In [33]:
VERBAL_TRANSLATION_DIR = os.path.join(BASE_DATA_DIR, 'translated')

os.makedirs(VERBAL_TRANSLATION_DIR, exist_ok=True)

The RAG Model is initialized

In [35]:
from src.rag.rag_model import RAGModel

# Load raw documents for RAG
with open(os.path.join(BASE_DATA_DIR, 'raw', 'geo_data.txt'), 'r') as f:
    raw_lines = f.readlines()
    documents = [line.strip() for line in raw_lines]

# Initialize the RAG model
rag_model = RAGModel(documents)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

In [36]:
from src.verbal_explanations.verbal_translation import translate_dataset

train_translated = translate_dataset(
    x_train,
    x_train_times,
    x_train_clusters,
    y_train,
    y_train_times,
    node_pos_dict,
    node_info,
    rag_model)
    

np.save(os.path.join(VERBAL_TRANSLATION_DIR, 'train.npy'), train_translated)

val_translated = translate_dataset(
    x_val,
    x_val_times,
    x_val_clusters,
    y_val,
    y_val_times,
    node_pos_dict,
    node_info,
    rag_model)

np.save(os.path.join(VERBAL_TRANSLATION_DIR, 'val.npy'), val_translated)

test_translated = translate_dataset(
    x_test,
    x_test_times,
    x_test_clusters,
    y_test,
    y_test_times,
    node_pos_dict,
    node_info,
    rag_model)

np.save(os.path.join(VERBAL_TRANSLATION_DIR, 'test.npy'), test_translated)

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


Following an example of a verbal translation

In [40]:
print(test_translated[0])

A severe congestion was expected to occur on Glendale Freeway at kms 10 and 11 on Wednesday, 04/06/2012, from 07:45 to 08:40, with an average speed of 40.05 km/h. This was induced by a congestion and a free flow.

Firstly, a contributing free flow occurred on Glendale Freeway at kms 8, 9, 10 and 11, with an average speed of 102.17 km/h, from 06:45 to 07:40. The free flow extended on Golden State Freeway at km 7.

Lastly, a contributing severe congestion happened, averaging at a speed of 40.04 km/h, on, another time, Glendale Freeway at kms 9 and 10 from 07:10 to 07:40.

Glendale Freeway (SR-2) often experiences congestion due to merging issues
