# sTFTP vs TFTP vs SFTP — Results Analysis

This notebook loads your benchmark CSV(s) from the repo (e.g., `results_*_cold_sftp.csv`) and produces:
- Summary tables (median / p90 latency, median overhead, median goodput)
- Comparison charts (matplotlib-only)
- An exportable `summary_metrics.csv` for your report

> **Tip:** Place this notebook in the same folder as your `results_*.csv` files.


In [None]:
import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Matplotlib defaults only (no seaborn)
plt.rcParams.update({
    "figure.figsize": (8, 4.5),
    "axes.grid": True,
})


## 1) Select which CSV(s) to analyze

- By default, we auto-pick the **latest** `results_*_cold_sftp.csv` if present.
- You can also point to any pattern (e.g., `'results_*.csv'`) to combine multiple runs.


In [None]:
# Choose a glob pattern. If you only want the latest cold SFTP experiment, leave as-is.
pattern = "results_*_cold_sftp.csv"

files = sorted(glob.glob(pattern))
if not files:
    raise FileNotFoundError(f"No files matched pattern: {pattern}")
print("Files to load:")
for f in files:
    print(" -", f)

# Load & concat
dfs = [pd.read_csv(f) for f in files]
df = pd.concat(dfs, ignore_index=True)
print(f"Loaded {len(df)} rows from {len(files)} file(s).")
df.head()


## 2) Clean & normalize columns
- Convert `elapsed_s`, `overhead_pct`, `goodput_Mbps` to numeric
- Drop rows where `elapsed_s` is missing (failed runs)


In [None]:
df_clean = df.copy()

# Ensure types are numeric where applicable
for col in ["elapsed_s", "overhead_pct", "goodput_Mbps"]:
    df_clean[col] = pd.to_numeric(df_clean[col], errors="coerce")

# Remove failed/NA runs for latency stats
df_ok = df_clean.dropna(subset=["elapsed_s"]).copy()

print("Rows after dropping NA elapsed_s:", len(df_ok))
df_ok.head()


## 3) Summary metrics per (protocol, file)
We compute:
- `runs` — number of successful runs
- `median_latency_s`, `p90_latency_s`
- `median_overhead_pct`, `median_goodput_Mbps`


In [None]:
def summarize(grouped):
    out = grouped.agg(
        runs=("elapsed_s", "count"),
        median_latency_s=("elapsed_s", "median"),
        p90_latency_s=("elapsed_s", lambda x: np.percentile(x, 90)),
        median_overhead_pct=("overhead_pct", "median"),
        median_goodput_Mbps=("goodput_Mbps", "median"),
    ).reset_index()
    return out

summary = summarize(df_ok.groupby(["proto", "file"]))
summary.sort_values(["file", "proto"], inplace=True)
summary


### Save the summary table
This writes `summary_metrics.csv` next to the notebook.


In [None]:
summary_path = "summary_metrics.csv"
summary.to_csv(summary_path, index=False)
print("Saved:", summary_path)


## 4) Charts

### A) Median Latency by Protocol and File


In [None]:
lat_pivot = summary.pivot(index="file", columns="proto", values="median_latency_s").sort_index()
ax = lat_pivot.plot(kind="bar")
ax.set_title("Median Latency by Protocol and File")
ax.set_ylabel("Latency (seconds)")
ax.set_xlabel("File")
ax.legend(title="Protocol")
plt.tight_layout()
plt.show()


### B) Median On-Wire Overhead (%) by Protocol and File


In [None]:
ovh_pivot = summary.pivot(index="file", columns="proto", values="median_overhead_pct").sort_index()
ax = ovh_pivot.plot(kind="bar")
ax.set_title("Median On-Wire Overhead (%) by Protocol and File")
ax.set_ylabel("Overhead (%)")
ax.set_xlabel("File")
ax.legend(title="Protocol")
plt.tight_layout()
plt.show()


### C) Median Goodput (Mb/s) by Protocol and File


In [None]:
gp_pivot = summary.pivot(index="file", columns="proto", values="median_goodput_Mbps").sort_index()
ax = gp_pivot.plot(kind="bar")
ax.set_title("Median Goodput (Mb/s) by Protocol and File")
ax.set_ylabel("Goodput (Mb/s)")
ax.set_xlabel("File")
ax.legend(title="Protocol")
plt.tight_layout()
plt.show()


### D) Latency (log scale) trend across file sizes
Useful to highlight handshake impact on small files.


In [None]:
plt.figure()
for proto, g in df_ok.groupby("proto"):
    med = g.groupby("file")["elapsed_s"].median().sort_index()
    plt.plot(med.index, med.values, marker="o", label=proto)
plt.yscale("log")
plt.title("Latency (log scale) across file sizes")
plt.ylabel("Latency (s, log scale)")
plt.xlabel("File")
plt.legend()
plt.tight_layout()
plt.show()


## 5) Quick Interpretation Notes

- **Latency:** Expect TFTP < sTFTP < SFTP for small files; SFTP includes SSH handshake and TCP setup per cold transfer.
- **Overhead:** sTFTP overhead should be higher than TFTP (AEAD tag + framing). SFTP is typically highest due to SSH and TCP.
- **Goodput:** For larger files, sTFTP should approach TFTP; SFTP often lags due to per-chunk framing and TCP behavior over short transfers.
