
# GNN Benchmark Results Analysis

This notebook loads the SQLite database containing all benchmark runs, displays the results in tabular form, and visualizes the key metrics (validation accuracy and throughput) as plots. Ensure that you have run the benchmarking suite already, and that `results/results.db` and the `results/plots/` folder exist.


In [None]:

import sqlite3
import pandas as pd
import matplotlib.pyplot as plt

# Path to the results database
db_path = "results/results.db"

# Connect to the database and load the runs table into a DataFrame
conn = sqlite3.connect(db_path)
df = pd.read_sql_query("SELECT * FROM runs ORDER BY timestamp DESC", conn)
conn.close()

# Display the first few rows in a DataFrame
df.head()



The table above shows the first few rows of the `runs` data. The columns include:

- **id**: Auto-incremented primary key.
- **experiment_name**: Identifier of the experiment.
- **dataset**, **model**, **epochs**, **batch_size**, **lr**, **hidden_dim**, **seed**, **world_size**, **rank**: Hyperparameters.
- **final_train_loss**, **final_val_loss**, **final_val_acc**: Performance metrics.
- **total_train_time**, **throughput**: Additional metrics.
- **timestamp**: Time at which the run was logged.


In [None]:

# Group by model and world_size for accuracy plot
grouped_acc = df.groupby(['model', 'world_size'])['final_val_acc'].mean().unstack(level=0)

plt.figure(figsize=(8, 5))
for model in grouped_acc.columns:
    plt.plot(grouped_acc.index, grouped_acc[model], marker='o', label=model)

plt.xlabel("Number of GPUs (world_size)")
plt.ylabel("Average Final Validation Accuracy")
plt.title("Validation Accuracy vs. Number of GPUs")
plt.legend()
plt.grid(True)
plt.show()


In [None]:

# Group by model and world_size for throughput plot
grouped_thr = df.groupby(['model', 'world_size'])['throughput'].mean().unstack(level=0)

plt.figure(figsize=(8, 5))
for model in grouped_thr.columns:
    plt.plot(grouped_thr.index, grouped_thr[model], marker='o', label=model)

plt.xlabel("Number of GPUs (world_size)")
plt.ylabel("Average Throughput (samples/sec)")
plt.title("Throughput vs. Number of GPUs")
plt.legend()
plt.grid(True)
plt.show()



## Embedded Plot Images

If you prefer to view the saved PNG files directly, they are located in `results/plots/`. Below are the embedded images:


In [None]:

from IPython.display import Image, display
import os

img_dir = "results/plots"
for filename in sorted(os.listdir(img_dir)):
    if filename.endswith(".png"):
        display(Image(os.path.join(img_dir, filename)))
