# CSC 786 â€“ Audit + Wazuh Alert Analysis (CSV Only)

This notebook analyzes the processed CSV produced by `scripts/run_tests.sh` **when it already includes** a `wazuh_alerts` column.

## Expected input
- `data/processed/runs_*.csv`

## Expected columns
- `case`
- `file_hits`, `net_hits`, `exec_hits`
- `wazuh_alerts`

The notebook will:
- Compute descriptive stats by case
- Compute detection-rate proxies (hits > 0, wazuh_alerts > 0)
- Compare traditional vs io_uring cases
- Save figures/tables to `results/`


## 0) Load the newest runs CSV

In [None]:
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path

Path("results").mkdir(parents=True, exist_ok=True)


processed_dir = Path('data/processed')
csvs = sorted(processed_dir.glob('runs_*.csv'))
if not csvs:
    raise FileNotFoundError('No runs_*.csv found in data/processed/. Run scripts/run_tests.sh first.')

csv_path = csvs[-1]
print('Using CSV:', csv_path)

df = pd.read_csv(csv_path)
df.head()

## 1) Sanity checks

In [None]:
required = ['case','file_hits','net_hits','exec_hits','wazuh_alerts']
missing = [c for c in required if c not in df.columns]
if missing:
    raise ValueError(f'Missing required columns: {missing}. Ensure your CSV includes wazuh_alerts.')

print('Rows:', len(df))
print('Cases:', sorted(df['case'].unique()))
df[required].describe()

## 2) Descriptive statistics by case

In [None]:
metric_cols = ['file_hits','net_hits','exec_hits','wazuh_alerts']
summary = df.groupby('case')[metric_cols].describe()
summary

## 3) Detection-rate proxies

- Audit visibility proxy: `*_hits > 0`
- Wazuh detection proxy: `wazuh_alerts > 0`


In [None]:
det = df.copy()
det['file_detected'] = det['file_hits'] > 0
det['net_detected']  = det['net_hits'] > 0
det['exec_detected'] = det['exec_hits'] > 0
det['wazuh_detected'] = det['wazuh_alerts'] > 0

rates = det.groupby('case')[['file_detected','net_detected','exec_detected','wazuh_detected']].mean()
rates = rates.rename(columns=lambda c: c.replace('_detected','_detect_rate'))
rates

## 4) Means + paired deltas (traditional vs io_uring)

In [None]:
means = df.groupby('case')[metric_cols].mean().round(2)
means

In [None]:
def pair_delta(trad_name, uring_name):
    if trad_name in means.index and uring_name in means.index:
        return (means.loc[trad_name] - means.loc[uring_name]).to_frame(name=f'{trad_name} - {uring_name}')
    return None

pairs = [
    ('file_io_traditional', 'file_io_uring'),
    ('read_file_traditional', 'openat_uring'),
    ('net_connect_traditional', 'net_connect_uring'),
]

deltas = [d for d in (pair_delta(t,u) for t,u in pairs) if d is not None]
if deltas:
    import pandas as pd
    pd.concat(deltas, axis=1)
else:
    print('No trad/uring pairs found (check case names).')

## 5) Visualizations (saved to results/figures/)

In [None]:
from pathlib import Path

fig_dir = Path('results/figures')
fig_dir.mkdir(parents=True, exist_ok=True)

ax = means.plot(kind='bar', rot=45)
ax.set_title('Mean metrics per case (audit hits + Wazuh alerts)')
ax.set_xlabel('Case')
ax.set_ylabel('Mean value')
plt.tight_layout()
out_path = fig_dir / 'mean_metrics_per_case.png'
plt.savefig(out_path, dpi=200)
plt.show()
print('Saved:', out_path)

In [None]:
for col in metric_cols:
    plt.figure()
    df.boxplot(column=col, by='case', rot=45)
    plt.title(f'Distribution of {col} by case')
    plt.suptitle('')
    plt.xlabel('Case')
    plt.ylabel(col)
    plt.tight_layout()
    out_path = fig_dir / f'box_{col}_by_case.png'
    plt.savefig(out_path, dpi=200)
    plt.show()
    print('Saved:', out_path)

## 6) Export report-ready tables

In [None]:
out_dir = Path('results')
out_dir.mkdir(exist_ok=True)

means_out = out_dir / 'means_by_case_with_wazuh.csv'
rates_out = out_dir / 'detect_rates_by_case_with_wazuh.csv'

means.to_csv(means_out)
rates.to_csv(rates_out)

print('Saved:', means_out)
print('Saved:', rates_out)

## Notes for write-up
- If `*_uring` cases show lower syscall-key hit counts and lower `wazuh_alerts` than traditional cases, that supports the conclusion that syscall-focused monitoring can lose visibility when I/O is delegated through io_uring.
- If `wazuh_alerts` is consistently 0 across all cases, document it as an out-of-the-box rule limitation and rely on audit hit deltas as the primary evidence.
