## Metrics (Training Results)

This notebook shows the training results of different models, including CNN and Particle Transformer. Most of the training results are repeated 5 times with different random seeds.

The signal and background were set to be Higgs from VBF and GGF, respectively.

In [None]:
from pathlib import Path
import pandas as pd

# Define the root of the project
project_root = Path.cwd().parent

def print_metrics(channel: str, data_mode: str, date_time: str, information: str = ''):
    """Print summary metrics for different models under given configuration."""

    # Print header
    info_suffix = f" ({information})" if information else ""
    print(f"# Metrics for {channel}/{data_mode} at {date_time}{info_suffix}")

    # Define path to metrics output
    output_dir = project_root / 'output' / channel / data_mode

    # Loop through models
    for model in ['CNN_Baseline', 'CNN_EventCNN', 'ParT_Baseline', 'ParT_Light']:
        model_dir = output_dir / model
        if not model_dir.exists():
            continue

        # Collect metrics from each random seed run
        metrics = []
        rnd_seed = 1

        while True:
            metrics_file = model_dir / f'{date_time}-rnd_seed{rnd_seed}' / 'metrics.csv'
            if not metrics_file.exists():
                break

            df_tmp = pd.read_csv(metrics_file)
            metrics.append(df_tmp.tail(1))  # Use the last row (test result)
            rnd_seed += 1

        # Print summary statistics if any runs were found
        if metrics:
            df = pd.concat(metrics, ignore_index=True)
            acc_mean, acc_std = df['test_accuracy'].mean(), df['test_accuracy'].std()
            auc_mean, auc_std = df['test_auc'].mean(), df['test_auc'].std()
            epoch_mean, epoch_std = df['epoch'].mean(), df['epoch'].std()

            print(f"{model:<15} ({rnd_seed - 1} runs): "
                  f"ACC {acc_mean:.3f} ± {acc_std:.3f} | "
                  f"AUC {auc_mean:.3f} ± {auc_std:.3f} | "
                  f"Epochs {epoch_mean:.1f} ± {epoch_std:.1f}")

    print('\n' + '-' * 80 + '\n')


## $H \rightarrow \gamma \gamma$

### >>> Augmentation with $\phi$-rotations

This dataset is from the decay channel of $H \rightarrow \gamma\gamma$, with $L=3000~\text{fb}^{-1}$. The following data modes cover different number of augmentations with uniform $\phi$-rotations. The number in the suffix indicates how many augmentations.

- `jet_flavor`: indicating that the mixed training dataset is split by jet flavor, i.e., '2q0g' vs. '1q1g+0q2g'.
- `ex-diphoton`: trained without diphoton information.
- `diphoton`: trained with diphoton information.

In [None]:
for data_mode in ['jet_flavor', 'jet_flavor_uni5', 'jet_flavor_uni10', 'jet_flavor_uni15']:
    print_metrics(channel='diphoton', data_mode=data_mode, date_time='20250723_173318')
    print_metrics(channel='ex-diphoton', data_mode=data_mode, date_time='20250721_121840')

### >>> $p_T$ smearing

In this setup, we compared the 5 times larger dataset augmented with $p_T$-smearing, where the smearing formula is given by

\begin{equation*}
    p_T \sim \mathcal{N}(p_T, \sigma(p_T)) \quad \text{with} \quad \sigma(p_T) = \sqrt{0.052 p_T^2 + 1.502 p_T}.
\end{equation*}

In [None]:
for data_mode in ['jet_flavor']:
    print_metrics(channel='ex-diphoton', data_mode=data_mode, date_time='20250721_121840')
    print_metrics(channel='diphoton', data_mode=data_mode, date_time='20250723_173318')

for data_mode in ['jet_flavor_pt_smear']:
    print_metrics(channel='ex-diphoton', data_mode=data_mode, date_time='20250726_092055')
    print_metrics(channel='diphoton', data_mode=data_mode, date_time='20250726_092055')

### >>> Test $L=300~\text{fb}^{-1}$ with $\phi$-augmentation

In [None]:
for data_mode in ['jet_flavor', 'jet_flavor_uni5', 'jet_flavor_uni10', 'jet_flavor_uni15']:
    print_metrics(channel='diphoton', data_mode=data_mode, date_time='20250729_154839')
    print_metrics(channel='ex-diphoton', data_mode=data_mode, date_time='20250731_015137')

### >>> Supervised with proprocessings (`cop`+`pt_norm`)

Trained with true labels, instead of using CWoLa setup such as split with jet flavor.

Number of data: `num_train = 100000`, `num_valid = 25000`, `num_test = 25000`

In [None]:
for data_mode in ['supervised']:
    print_metrics(channel='diphoton', data_mode=data_mode, date_time='20250806_214603')
    print_metrics(channel='ex-diphoton', data_mode=data_mode, date_time='20250806_214603')

### >>> Test $L=100,1800,3000~\text{fb}^{-1}$ with preprocessings

In [None]:
for data_mode in ['jet_flavor']:
    print_metrics(channel='diphoton', data_mode=data_mode, date_time='20250804_155424', information='L=100')
    print_metrics(channel='ex-diphoton', data_mode=data_mode, date_time='20250804_155424', information='L=100')
    print_metrics(channel='diphoton', data_mode=data_mode, date_time='20250805_170948', information='L=1800')
    print_metrics(channel='ex-diphoton', data_mode=data_mode, date_time='20250805_170948', information='L=1800')
    print_metrics(channel='diphoton', data_mode=data_mode, date_time='20250805_220406', information='L=3000')
    print_metrics(channel='ex-diphoton', data_mode=data_mode, date_time='20250805_220406', information='L=3000')

## $H \rightarrow ZZ \rightarrow 4l$

### >>> CWoLa with different luminosities

Since the dataset of $H \rightarrow 4l$ is too small with $L=3000~\text{fb}^{-1}$, we tested different luminosities.

- `20250725_142111`: excluding 4 leptons information, with $L=3000~\text{fb}^{-1}$
- `20250727_200515`: excluding 4 leptons information, with $L=30000~\text{fb}^{-1}$
- `20250727_151539`: including 4 leptons information, with $L=3000~\text{fb}^{-1}$
- `20250728_121738`: including 4 leptons information, with $L=30000~\text{fb}^{-1}$

In [None]:
for data_mode in ['jet_flavor', 'jet_flavor_uni5', 'jet_flavor_uni10', 'jet_flavor_uni15']:
    print_metrics(channel='ex-zz4l', data_mode=data_mode, date_time='20250725_142111', information='L = 3000 fb^{-1}')
    print_metrics(channel='ex-zz4l', data_mode=data_mode, date_time='20250727_200515', information='L = 30000 fb^{-1}')
    print_metrics(channel='zz4l', data_mode=data_mode, date_time='20250727_151539', information='L = 3000 fb^{-1}')
    print_metrics(channel='zz4l', data_mode=data_mode, date_time='20250728_121738', information='L = 30000 fb^{-1}')

## $H \rightarrow \gamma\gamma$ + $H \rightarrow ZZ \rightarrow 4l$

### >>> Combine two dataset with $L = 300~\text{fb}^{-1}$

In [None]:
for data_mode in ['jet_flavor', 'jet_flavor_uni5', 'jet_flavor_uni10', 'jet_flavor_uni15']:
    print_metrics(channel='diphoton_zz4l', data_mode=data_mode, date_time='20250802_090922', information='L = 300 fb^{-1}')
    print_metrics(channel='ex-diphoton_zz4l', data_mode=data_mode, date_time='20250802_001209', information='L = 300 fb^{-1}')