# Tutorial 04 - Model Training

This notebook shows you how to train models of various methods used in this study. These models (methods) include:

- Dedicated supervised
- Parameterised supervised
- Ideal weakly
- Semi weakly (PAWS)

The easiest way will be to directly use the paws CLI:

In [2]:
# dedicated supervised model training
!paws train_dedicated_supervised --help

Usage: paws train_dedicated_supervised [OPTIONS]

  Train dedicated supervised models.

Options:
  -m, --mass-point TEXT           Signal mass point to use for training in the
                                  form "m1:m2".  [required]
  --high-level / --low-level      Whether to do training with low-evel or
                                  high-level features.  [default: high-level]
  --decay-modes [qq|qqq|qq,qqq]   Which decay mode should the signal undergo
                                  (qq or qqq).Use "qq,qqq" to include both
                                  decay modes.  [default: qq,qqq]
  --variables TEXT                Select certain high-level jet features to
                                  include in the trainingby the indices they
                                  appear in the feature vector. For
                                  example,"3,5,6" means select the 4th, 6th
                                  and 7th feature from the jetfeature vector
                    

In [None]:
# example command 
!paws train_dedicated_supervised -d "datasets" -o "outputs" --mass-point 300:300 --decay-modes qq,qqq \
--variables 3,5,6 --split-index 0 --version v1

In [3]:
# parameterised supervised model training
!paws train_param_supervised --help

Usage: paws train_param_supervised [OPTIONS]

  Train parameterised supervised models.

Options:
  --high-level / --low-level      Whether to do training with low-evel or
                                  high-level features.  [default: high-level]
  --decay-modes [qq|qqq|qq,qqq]   Which decay mode should the signal undergo
                                  (qq or qqq).Use "qq,qqq" to include both
                                  decay modes.  [default: qq]
  --variables TEXT                Select certain high-level jet features to
                                  include in the trainingby the indices they
                                  appear in the feature vector. For
                                  example,"3,5,6" means select the 4th, 6th
                                  and 7th feature from the jetfeature vector
                                  to be used in the training.
  --noise INTEGER                 Number of noise dimension to add to the
                           

In [None]:
# example command 
!paws train_param_supervised -d "datasets" -o "outputs" --decay-modes qq --variables 3,5,6 --split-index 0 --version v1

In [4]:
# ideal weakly model training
!paws train_ideal_weakly --help

Usage: paws train_ideal_weakly [OPTIONS]

  Train ideal weakly models.

Options:
  -m, --mass-point TEXT           Signal mass point to use for training in the
                                  form "m1:m2".  [required]
  --mu FLOAT                      Signal fraction in the training and
                                  validation dataset.  [required]
  --alpha FLOAT                   Signal branching fraction in the training
                                  and validation dataset. Ignored when only
                                  one signal decay mode is considered.
  --high-level / --low-level      Whether to do training with low-evel or
                                  high-level features.  [default: high-level]
  --decay-modes [qq|qqq|qq,qqq]   Which decay mode should the signal undergo
                                  (qq or qqq).Use "qq,qqq" to include both
                                  decay modes.  [default: qq,qqq]
  --variables TEXT                Select certain hi

In [None]:
# example command 
!paws train_ideal_weakly -d "datasets" -o "outputs" --mass-point 300:300 --decay-modes qq,qqq \
--variables 3,5,6 --mu 0.01 --alpha 0.5 --split-index 0 --version v1

In [5]:
# semi-weakly (PAWS) model training
!paws train_semi_weakly --help

Usage: paws train_semi_weakly [OPTIONS]

  Train semi-weakly (PAWS) models.

Options:
  -m, --mass-point TEXT           Signal mass point to use for training in the
                                  form "m1:m2".  [required]
  --mu FLOAT                      Signal fraction in the training and
                                  validation dataset.  [required]
  --alpha FLOAT                   Signal branching fraction in the training
                                  and validation dataset. Ignored when only
                                  one signal decay mode is considered.
  --high-level / --low-level      Whether to do training with low-evel or
                                  high-level features.  [default: high-level]
  --decay-modes [qq|qqq|qq,qqq]   Which decay mode should the signal undergo
                                  (qq or qqq).Use "qq,qqq" to include both
                                  decay modes.  [default: qq,qqq]
  --variables TEXT                Select certa

In [None]:
# example command 
!paws train_semi_weakly -d "datasets" -o "outputs" --mass-point 300:300 --decay-modes qq,qqq \
--variables 3,5,6 --mu 0.01 --alpha 0.5 --split-index 0 --version v1 --fs-version v1

Alternatively, you may use the paws API:

In [1]:
from paws.components import ModelTrainer

In [8]:
help(ModelTrainer.__init__)

Help on function __init__ in module paws.components.model_trainer:

__init__(self, model_type: Union[str, paws.settings.ModelType], model_options: Optional[Dict] = None, feature_level: str = 'high_level', decay_modes: str = 'qq,qqq', cache: bool = True, variables: Optional[str] = None, noise_dimension: Optional[int] = None, seed: int = 2023, split_index: int = 0, batchsize: Optional[int] = None, cache_dataset: Optional[bool] = None, version: str = 'v1', multi_gpu: bool = True, interrupt_freq: int = 0, datadir: str = 'datasets', outdir: str = 'outputs', index_path: Optional[str] = None, verbosity: str = 'INFO')
    Initialize the ModelTrainer class.
    
    Parameters
    ----------------------------------------------------
    model_type : str or ModelType
        The type of the model to train.
    model_options : Dict, optional
        Options specific to the model type.
    feature_level : str or FeatureLevel, default "high_level"
        Features to use for the training. It can be

In [2]:
# options for various models:
from paws.components.model_trainer import MODEL_OPTIONS
MODEL_OPTIONS

{<ModelType.DEDICATED_SUPERVISED: 0>: {'required': ['mass_point'],
  'optional': []},
 <ModelType.PARAM_SUPERVISED: 1>: {'required': [],
  'optional': ['include_masses', 'exclude_masses']},
 <ModelType.IDEAL_WEAKLY: 2>: {'required': ['mass_point', 'mu', 'alpha'],
  'optional': ['num_trials']},
 <ModelType.SEMI_WEAKLY: 3>: {'required': ['mass_point', 'mu', 'alpha'],
  'optional': ['num_trials',
   'weight_clipping',
   'retrain',
   'fs_version',
   'fs_version_2']}}

In [None]:
# dedicated supervised model training
model_options = {
    'mass_point': [300, 300]
}
datadir = "datasets"
outdir = "outputs"
model_trainer = ModelTrainer("dedicated_supervised", model_options=model_options, decay_modes='qq',
                             variables="3,5,6", version="v1", datadir=datadir, outdir=outdir)

In [None]:
model_trainer.train()

In [4]:
# parameterised supervised model training
model_options = {
}
datadir = "/pscratch/sd/c/chlcheng/projects/paws/datasets"
outdir = "/pscratch/sd/c/chlcheng/projects/paws/outputs"
model_trainer = ModelTrainer("param_supervised", model_options=model_options, decay_modes='qq',
                             variables="3,5,6", version="v1", datadir=datadir, outdir=outdir)

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
[INFO] Created MirroredStrategy for distributed training
[INFO] Number of devices : 1
[INFO]      aliad version : 0.1.0
[INFO] tensorflow version : 2.15.0
Sat Sep 28 01:26:37 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A100-PCI...  On   | 00000000:C3:00.0 Off |                    0 |
| N/A   34C    P0    36W / 250W |    834MiB / 40960MiB |      0%      Default |
|                               |                      |             Di

In [None]:
model_trainer.train()