# **<u>Helical Technical Assessment</u>**

# **<u>Abstract </u>**

- The aim of this task was to take one model in the helical package to run more efficiently over a large amount of perturbations (i.e. inference). For this technical task, we have taken the Geneformer model.

- We then leverged FP16 (soft quantization), mixed precision and batching to optimize for runtime, GPU/CPU utilization, and memory use. We ensured these results remained comparable to baseline outputs. We also calculated the K-S Statistics and the P-Values between the results from the baseline experiment and the experients consisting of optimization strategies to ensure that the results remain comparable.

- We concluded that batching provided the largest inference speedup (~3×), outperforming both FP16 casting and AMP. GPU utilization did not correlate directly with latency, highlighting kernel efficiency and launch amortization as dominant factors. FP16 (soft quantization) primarily reduced memory footprint, while batching dominated throughput gains.

## **<u>Requirements</u>**
- Before we get started, it's pivotal that we install the following packages. Otherwise, the following steps won't work.

In [8]:
!pip install --upgrade helical

Collecting helical
  Downloading helical-1.5.3-py3-none-any.whl.metadata (58 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/58.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.4/58.4 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate==1.4.0 (from helical)
  Downloading accelerate-1.4.0-py3-none-any.whl.metadata (19 kB)
Collecting anndata>=0.11 (from helical)
  Downloading anndata-0.12.7-py3-none-any.whl.metadata (9.9 kB)
Collecting bitsandbytes>=0.48.2 (from helical)
  Downloading bitsandbytes-0.49.0-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting datasets==3.6.0 (from helical)
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting gitpython==3.1.44 (from helical)
  Downloading GitPython-3.1.44-py3-none-any.whl.metadata (13 kB)
Collecting hydra-core==1.3.2 (from helical)
  Downloading hydra_core-1.3.2-py3-none-any.whl.metadata (5.5 kB)
Collectin

In [1]:
!pip install --upgrade torch torchvision

Collecting torch
  Downloading torch-2.9.1-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (30 kB)
Collecting torchvision
  Downloading torchvision-0.24.1-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (5.9 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.8.93 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cuda-runtime-cu12==12.8.90 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cuda-cupti-cu12==12.8.90 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cudnn-cu12==9.10.2.21 (from torch)
  Downloading nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB)
Collecting nvidia-cublas-cu12==12.8.4.1 (from torch)
  Downloading nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux

- Below is boiler plate code which I have taken and modularised from the Helical repo.

- The boiler plate code only consists of loading, preparing and fine-tuning of the data and the Geneformer respectively. The experiments (which follows the boiler plate code) is mine.

- The code can be found here https://github.com/helicalAI/helical/blob/release/examples/notebooks/Cell-Type-Classification-Fine-Tuning.ipynb

# **------------------------ BOILER PLATE CODE BEGINS HERE -------------------**

#**<u>Imports</U>**
- Import the following packages.

In [1]:
from helical.utils import get_anndata_from_hf_dataset
from helical.models.geneformer import GeneformerConfig, GeneformerFineTuningModel
from helical.models.scgpt import scGPTConfig, scGPTFineTuningModel
import torch
import numpy as np
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, classification_report
from scipy.stats import ks_2samp
import matplotlib.pyplot as plt
import logging, warnings
import umap
import pandas as pd
import seaborn as sns
from datasets import load_dataset
import pynvml
import time
import psutil
from pynvml import (
    nvmlInit,
    nvmlShutdown,
    nvmlDeviceGetHandleByIndex,
    nvmlDeviceGetUtilizationRates
)


logging.getLogger().setLevel(logging.ERROR)

warnings.filterwarnings("ignore")

INFO:numexpr.utils:NumExpr defaulting to 12 threads.
2026-01-01 17:58:26,351 - INFO:numexpr.utils:NumExpr defaulting to 12 threads.
INFO:datasets:PyTorch version 2.9.1 available.
2026-01-01 17:58:29,564 - INFO:datasets:PyTorch version 2.9.1 available.
INFO:datasets:Polars version 1.31.0 available.
2026-01-01 17:58:29,567 - INFO:datasets:Polars version 1.31.0 available.
INFO:datasets:Duckdb version 1.3.2 available.
2026-01-01 17:58:29,569 - INFO:datasets:Duckdb version 1.3.2 available.
INFO:datasets:TensorFlow version 2.19.0 available.
2026-01-01 17:58:29,571 - INFO:datasets:TensorFlow version 2.19.0 available.
INFO:datasets:JAX version 0.7.2 available.
2026-01-01 17:58:29,573 - INFO:datasets:JAX version 0.7.2 available.


- Check if we have access to a GPU.

In [2]:
device = "cuda" if torch.cuda.is_available() else "cpu"

# **<u>Install datasets</u>**
- Here we will load the data required for fine-tuning and model inference.

In [3]:
def load_data():
  """
  Loads data for fine-tuning and inference.

  Args:
      None

  Returns:
      train_dataset: Dataset
      test_dataset: Dataset
  """

  # Load data
  ds = load_dataset("helical-ai/yolksac_human",trust_remote_code=True, download_mode="reuse_cache_if_exists")

  # Split data into training and testing
  train_dataset = get_anndata_from_hf_dataset(ds["train"])
  test_dataset = get_anndata_from_hf_dataset(ds["test"])

  # Return both Training and Testing dataset
  return train_dataset, test_dataset

# call load_data() to create the datasets
train_dataset, test_dataset = load_data()

README.md: 0.00B [00:00, ?B/s]

yolksac_human.py: 0.00B [00:00, ?B/s]

./data/17_04_24_YolkSacRaw_F158_WE_annot(…):   0%|          | 0.00/553M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

# **<U>Prepare training labels</U>**
- Here we will prepare the data for fine-tuning and model inference. Here we will focus on converting string labels into unique integer classes for training.

In [4]:
def prep_data():
  """
  Cleans the data before fine tuning and inference

  Args:
      None

  Returns:
      cell_types_train: Dataset
      cell_types_test: Dataset
  """
  cell_types_train = list(np.array(train_dataset.obs["LVL1"].tolist()))
  cell_types_test = list(np.array(test_dataset.obs["LVL1"].tolist()))

  # We convert these string labels into unique integer classes for training
  label_set = set(cell_types_train) | set(cell_types_test)
  class_id_dict = dict(zip(label_set, [i for i in range(len(label_set))]))
  id_class_dict = {v: k for k, v in class_id_dict.items()}

  for i in range(len(cell_types_train)):
      cell_types_train[i] = class_id_dict[cell_types_train[i]]

  for i in range(len(cell_types_test)):
      cell_types_test[i] = class_id_dict[cell_types_test[i]]

  # Return both training and testing data
  return cell_types_train, cell_types_test, label_set

# call prep_data()
cell_types_train, cell_types_test, label_set = prep_data()

# **<u> Fine-tuning </u>**
- Here we will focus on fine-tuning the Geneformer. The fine_tune() function below does just that for us!

In [5]:
def fine_tune(train_dataset, test_dataset):
  """
  fine_tune our genformer model

  Args:
      train_dataset: Dataset
      test_dataset: Dataset

  Returns:
      geneformer_fine_tune: GeneformerFineTuningModel
  """

  geneformer_config = GeneformerConfig(device=device, batch_size=10, model_name="gf-6L-10M-i2048")
  geneformer_fine_tune = GeneformerFineTuningModel(geneformer_config=geneformer_config, fine_tuning_head="classification", output_size=len(label_set))

  # Process the data so it is in the correct form for Geneformer.
  geneformer_train_dataset = geneformer_fine_tune.process_data(train_dataset)
  geneformer_test_dataset = geneformer_fine_tune.process_data(test_dataset)

  # Geneformer makes use of the Hugging Face dataset class and so we need to add the labels as a column to this dataset.
  geneformer_train_dataset = geneformer_train_dataset.add_column("LVL1", cell_types_train)
  geneformer_test_dataset = geneformer_test_dataset.add_column("LVL1", cell_types_test)

  # Fine-tune the model.
  geneformer_fine_tune.train(train_dataset=geneformer_train_dataset.shuffle(),
                             validation_dataset=geneformer_test_dataset, label="LVL1",
                             freeze_layers=0, epochs=1, optimizer_params={"lr": 1e-4},
                             lr_scheduler_params={"name":"linear", "num_warmup_steps":0,
                                                  'num_training_steps':1})

  # return the fine-tuned Geneformer and the test dataset
  return geneformer_fine_tune, geneformer_test_dataset

# Call the fine_tune() function
geneformer_fine_tune, geneformer_test_dataset = fine_tune(train_dataset, test_dataset)

gene_median_dictionary.pkl: 100%|██████████| 941k/941k [00:00<00:00, 1.96MB/s]
token_dictionary.pkl: 100%|██████████| 788k/788k [00:00<00:00, 8.30MB/s]
ensembl_mapping_dict.pkl: 100%|██████████| 3.96M/3.96M [00:00<00:00, 38.7MB/s]
config.json: 100%|██████████| 565/565 [00:00<00:00, 1.66MB/s]
training_args.bin: 100%|██████████| 2.61k/2.61k [00:00<00:00, 11.2MB/s]
pytorch_model.bin: 100%|██████████| 41.2M/41.2M [00:01<00:00, 38.0MB/s]
hsapiens_pybiomart.csv: 100%|██████████| 2.30M/2.30M [00:00<00:00, 4.01MB/s]
Fine-Tuning: epoch 1/1: 100%|██████████| 2535/2535 [04:58<00:00,  8.51it/s, loss=0.0658]
Fine-Tuning Validation: 100%|██████████| 634/634 [00:32<00:00, 19.46it/s, val_loss=0.0346]


## **----------------------- BOILER PLATE CODE ENDS HERE ---------------------------------**

# **<u>Experiment 1: Vanilla Inference</u>**
- Here we will simply run a baseline inference on a set of perturbations.
- Record runtime, GPU/CPU utilization, and memory use (we will also do this for the following experiments aswell).
- Save a small sample of model outputs (latent vectors) for all our experiments for comparison.

In [6]:
# 1) Vanilla Inference (FP32)

def exp1(geneformer_fine_tune, geneformer_test_dataset):

    """
    Vanilla Geneformer inference - no optimization. Here
    we run geneformer_fine_tune.get_outputs(geneformer_test_dataset)
    to obtain the predictions along with the metrics.

    Args:
      geneformer_fine_tune: GeneformerFineTuningModel
      geneformer_test_dataset: Dataset

    Returns:
      y_pred: ndarray
      metrics: dict
    """

    process = psutil.Process()

    # Warm up CPU stats
    process.cpu_percent(interval=None)
    process.cpu_percent(interval=None)

    time_sum = 0
    peak_mem = 0

    geneformer_fine_tune.model.eval()
    geneformer_fine_tune.model.to("cuda")

    # Reset GPU peak memory stats
    torch.cuda.reset_peak_memory_stats()
    torch.cuda.synchronize()

    with torch.no_grad():
        t0 = time.time()
        outputs = geneformer_fine_tune.get_outputs(geneformer_test_dataset)
        torch.cuda.synchronize()
        time_sum += time.time() - t0

    # GPU peak memory
    peak_mem = torch.cuda.max_memory_allocated()
    peak_mem_mb = peak_mem / (1024 ** 2)

    cpu_usage = process.cpu_percent(interval=None)

    #  GPU utilization
    nvmlInit()
    handle = nvmlDeviceGetHandleByIndex(0)
    util = nvmlDeviceGetUtilizationRates(handle)
    gpu_util = util.gpu
    gpu_mem_util = util.memory
    nvmlShutdown()


    print(classification_report(cell_types_test, outputs.argmax(axis=1)))
    print(f"CPU usage: {cpu_usage:.2f}%")
    print(f"GPU usage: {gpu_util:.2f}%")
    print(f"GPU memory usage: {gpu_mem_util:.2f}%")
    print(f"Peak GPU memory: {peak_mem_mb:.2f} MB")
    print(f"Avg inference time: {time_sum / len(geneformer_test_dataset):.6f}s")

    metrics = {"CPU usage (%)": cpu_usage,
               "GPU usage(%)" : gpu_util,
               "GPU memory usage (%)": gpu_mem_util,
               "Peak GPU memory (MB)": peak_mem_mb,
               "Avg inference time (ms)":  (time_sum * 1000/ len(geneformer_test_dataset))
               }

    return outputs.argmax(axis=1), metrics


y_pred_exp1, metrics_exp1 = exp1(geneformer_fine_tune, geneformer_test_dataset)


Generating Outputs: 100%|██████████| 634/634 [00:30<00:00, 20.73it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 85.00%
GPU memory usage: 24.00%
Peak GPU memory: 2243.59 MB
Avg inference time: 0.004832s


# **<u>Experiment 2: FP16 (Soft Quantization)</u>**

- Here we will convert the precision the model's weights from FP32 to FP16.
- By doing this, we suspect inference to be quicker and memory usage to drop.
- Let's investigate and see if this occurs...   

In [7]:
# 2) FP16 (soft quantization)

def exp2(geneformer_fine_tune, geneformer_test_dataset):

    """
    Geneformer inference using FP16() casting (soft quantization). Here
    we run geneformer_fine_tune.get_outputs(geneformer_test_dataset)
    to obtain the predictions along with the metrics.

    Args:
      geneformer_fine_tune: GeneformerFineTuningModel
      geneformer_test_dataset: Dataset

    Returns:
      y_pred: ndarray
      metrics: dict
    """

    process = psutil.Process()

    # Warm up CPU stats
    process.cpu_percent(interval=None)
    process.cpu_percent(interval=None)

    time_sum = 0
    peak_mem = 0

    geneformer_fine_tune.model.eval()
    # Here we implement FP16
    geneformer_fine_tune.model.half().to("cuda")

    # Reset GPU peak memory stats
    torch.cuda.reset_peak_memory_stats()
    torch.cuda.synchronize()

    with torch.no_grad():
        t0 = time.time()
        outputs = geneformer_fine_tune.get_outputs(geneformer_test_dataset)
        torch.cuda.synchronize()
        time_sum += time.time() - t0

    # GPU peak memory
    peak_mem = torch.cuda.max_memory_allocated()
    peak_mem_mb = peak_mem / (1024 ** 2)

    cpu_usage = process.cpu_percent(interval=None)

    # GPU utilization
    nvmlInit()
    handle = nvmlDeviceGetHandleByIndex(0)
    util = nvmlDeviceGetUtilizationRates(handle)
    gpu_util = util.gpu
    gpu_mem_util = util.memory
    nvmlShutdown()

    print(classification_report(cell_types_test, outputs.argmax(axis=1)))
    print(f"CPU usage: {cpu_usage:.2f}%")
    print(f"GPU usage: {gpu_util:.2f}%")
    print(f"GPU memory usage: {gpu_mem_util:.2f}%")
    print(f"Peak GPU memory: {peak_mem_mb:.2f} MB")
    print(f"Avg inference time: {time_sum / len(geneformer_test_dataset):.6f}s")

    metrics = {"CPU usage (%)": cpu_usage,
               "GPU usage(%)" : gpu_util,
               "GPU memory usage (%)": gpu_mem_util,
               "Peak GPU memory (MB)": peak_mem_mb,
               "Avg inference time (ms)":  (time_sum * 1000/ len(geneformer_test_dataset))
               }

    return outputs.argmax(axis=1), metrics


y_pred_exp2, metrics_exp2 = exp2(geneformer_fine_tune, geneformer_test_dataset)

Generating Outputs: 100%|██████████| 634/634 [00:11<00:00, 54.97it/s]

              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.10%
GPU usage: 59.00%
GPU memory usage: 19.00%
Peak GPU memory: 1150.76 MB
Avg inference time: 0.001821s





# **<u>Experiment 3: Mixed Precision</u>**

- Automatic Mixed Precision (AMP) is an optimization technique that speeds up training by using lower precision (FP16/bfloat16) where it’s safe, while keeping FP32 where it’s needed for numerical stability.
- As a result, we should expect to see faster training and lower memory usage whilst retaining accuracy as much as possible.
- Let's investigate and see if this occurs...   

In [8]:
# 3) Mixed Precision

def exp3(geneformer_fine_tune, geneformer_test_dataset):

    """
    Geneformer inference using Automatic Mixed Precision (AMP). Here
    we run geneformer_fine_tune.get_outputs(geneformer_test_dataset)
    to obtain the predictions along with the metrics.

    Args:
      geneformer_fine_tune: GeneformerFineTuningModel
      geneformer_test_dataset: Dataset

    Returns:
      y_pred: ndarray
      metrics: dict
    """

    process = psutil.Process()

    # Warm up CPU stats
    process.cpu_percent(interval=None)
    process.cpu_percent(interval=None)

    time_sum = 0
    peak_mem = 0

    geneformer_fine_tune.model.eval()
    geneformer_fine_tune.model.to("cuda")

    # Reset GPU peak memory stats
    torch.cuda.reset_peak_memory_stats()
    torch.cuda.synchronize()

    with torch.no_grad():
        # Here we implement Automatic Mixed Precision (AMP)
        with torch.cuda.amp.autocast(dtype=torch.float16):
            t0 = time.time()
            outputs = geneformer_fine_tune.get_outputs(geneformer_test_dataset)
            torch.cuda.synchronize()
            time_sum += time.time() - t0

    # GPU peak memory
    peak_mem = torch.cuda.max_memory_allocated()
    peak_mem_mb = peak_mem / (1024 ** 2)

    cpu_usage = process.cpu_percent(interval=None)

    # GPU utilization
    nvmlInit()
    handle = nvmlDeviceGetHandleByIndex(0)
    util = nvmlDeviceGetUtilizationRates(handle)
    gpu_util = util.gpu
    gpu_mem_util = util.memory
    nvmlShutdown()

    print(classification_report(cell_types_test, outputs.argmax(axis=1)))
    print(f"CPU usage: {cpu_usage:.2f}%")
    print(f"GPU usage: {gpu_util:.2f}%")
    print(f"GPU memory usage: {gpu_mem_util:.2f}%")
    print(f"Peak GPU memory: {peak_mem_mb:.2f} MB")
    print(f"Avg inference time: {time_sum / len(geneformer_test_dataset):.6f}s")

    metrics = {"CPU usage (%)": cpu_usage,
               "GPU usage(%)" : gpu_util,
               "GPU memory usage (%)": gpu_mem_util,
               "Peak GPU memory (MB)": peak_mem_mb,
               "Avg inference time (ms)":  (time_sum * 1000/ len(geneformer_test_dataset))
               }

    return outputs.argmax(axis=1), metrics


y_pred_exp3, metrics_exp3 = exp3(geneformer_fine_tune, geneformer_test_dataset)

Generating Outputs: 100%|██████████| 634/634 [00:13<00:00, 48.26it/s]

              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.30%
GPU usage: 64.00%
GPU memory usage: 30.00%
Peak GPU memory: 1230.77 MB
Avg inference time: 0.002074s





# **<u>Experiment 4: Batching </u>**
- We will also implement batching alongside distributed inference in this experiment. This will invlolve processing multiple inputs simultaneously instead of one-at-a-time to maximise parallel computation on GPU cores and improve memory bandwidth utilization.
- Let's investigate and see if this occurs...    


In [12]:
def exp4(geneformer_fine_tune, geneformer_test_dataset, max_num=100):

    """
    Geneformer inference where we optimize the batch-size. Here we first deduce
    the best possible batch size and then run
    geneformer_fine_tune.get_outputs(geneformer_test_dataset)
    to obtain the predictions along with the metrics.

    Args:
      geneformer_fine_tune: GeneformerFineTuningModel
      geneformer_test_dataset: Dataset

    Returns:
      y_pred: ndarray
      metrics: dict
    """

    best_batch_num = 10
    best_runtime = float('inf')
    for i in range(10,max_num+1):
        geneformer_fine_tune.config["batch_size"] = i
        y_pred, metrics = exp1(geneformer_fine_tune, geneformer_test_dataset)
        if metrics["Avg inference time (ms)"] < best_batch_num:
          best_runtime = metrics["Avg inference time (ms)"]
          best_batch_num = i
    print(f'The optimal batch size is {best_batch_num}. Here are the benchmarks associated with this batch size...')
    geneformer_fine_tune.config["batch_size"] = best_batch_num
    y_pred, metrics = exp1(geneformer_fine_tune, geneformer_test_dataset)
    return y_pred, metrics
y_pred_exp4, metrics_exp4 = exp4(geneformer_fine_tune, geneformer_test_dataset)

Generating Outputs: 100%|██████████| 634/634 [00:11<00:00, 55.35it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 59.00%
GPU memory usage: 19.00%
Peak GPU memory: 1150.76 MB
Avg inference time: 0.001809s


Generating Outputs: 100%|██████████| 576/576 [00:11<00:00, 49.76it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 60.00%
GPU memory usage: 19.00%
Peak GPU memory: 1262.09 MB
Avg inference time: 0.001828s


Generating Outputs: 100%|██████████| 528/528 [00:11<00:00, 46.64it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.30%
GPU usage: 59.00%
GPU memory usage: 19.00%
Peak GPU memory: 1373.42 MB
Avg inference time: 0.001789s


Generating Outputs: 100%|██████████| 488/488 [00:11<00:00, 43.61it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 60.00%
GPU memory usage: 20.00%
Peak GPU memory: 1484.75 MB
Avg inference time: 0.001767s


Generating Outputs: 100%|██████████| 453/453 [00:11<00:00, 40.24it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 63.00%
GPU memory usage: 21.00%
Peak GPU memory: 1596.09 MB
Avg inference time: 0.001778s


Generating Outputs: 100%|██████████| 423/423 [00:11<00:00, 38.11it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.30%
GPU usage: 60.00%
GPU memory usage: 21.00%
Peak GPU memory: 1707.43 MB
Avg inference time: 0.001753s


Generating Outputs: 100%|██████████| 396/396 [00:10<00:00, 36.10it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.10%
GPU usage: 59.00%
GPU memory usage: 21.00%
Peak GPU memory: 1818.77 MB
Avg inference time: 0.001733s


Generating Outputs: 100%|██████████| 373/373 [00:11<00:00, 33.53it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 62.00%
GPU memory usage: 22.00%
Peak GPU memory: 1930.11 MB
Avg inference time: 0.001757s


Generating Outputs: 100%|██████████| 352/352 [00:10<00:00, 32.02it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 61.00%
GPU memory usage: 23.00%
Peak GPU memory: 2041.46 MB
Avg inference time: 0.001738s


Generating Outputs: 100%|██████████| 334/334 [00:10<00:00, 30.52it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 63.00%
GPU memory usage: 24.00%
Peak GPU memory: 2152.80 MB
Avg inference time: 0.001728s


Generating Outputs: 100%|██████████| 317/317 [00:10<00:00, 29.29it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.30%
GPU usage: 62.00%
GPU memory usage: 24.00%
Peak GPU memory: 2264.15 MB
Avg inference time: 0.001711s


Generating Outputs: 100%|██████████| 302/302 [00:10<00:00, 27.67it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 62.00%
GPU memory usage: 23.00%
Peak GPU memory: 2375.49 MB
Avg inference time: 0.001725s


Generating Outputs: 100%|██████████| 288/288 [00:10<00:00, 26.51it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.30%
GPU usage: 59.00%
GPU memory usage: 22.00%
Peak GPU memory: 2486.98 MB
Avg inference time: 0.001718s


Generating Outputs: 100%|██████████| 276/276 [00:10<00:00, 25.60it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 61.00%
GPU memory usage: 23.00%
Peak GPU memory: 2598.32 MB
Avg inference time: 0.001703s


Generating Outputs: 100%|██████████| 264/264 [00:10<00:00, 24.37it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 61.00%
GPU memory usage: 23.00%
Peak GPU memory: 2709.67 MB
Avg inference time: 0.001713s


Generating Outputs: 100%|██████████| 254/254 [00:10<00:00, 23.66it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.30%
GPU usage: 65.00%
GPU memory usage: 25.00%
Peak GPU memory: 2821.01 MB
Avg inference time: 0.001696s


Generating Outputs: 100%|██████████| 244/244 [00:10<00:00, 22.75it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 64.00%
GPU memory usage: 25.00%
Peak GPU memory: 2932.35 MB
Avg inference time: 0.001695s


Generating Outputs: 100%|██████████| 235/235 [00:10<00:00, 22.08it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.30%
GPU usage: 59.00%
GPU memory usage: 23.00%
Peak GPU memory: 3043.70 MB
Avg inference time: 0.001683s


Generating Outputs: 100%|██████████| 227/227 [00:10<00:00, 21.23it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.10%
GPU usage: 63.00%
GPU memory usage: 25.00%
Peak GPU memory: 3151.54 MB
Avg inference time: 0.001688s


Generating Outputs: 100%|██████████| 219/219 [00:10<00:00, 20.42it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.30%
GPU usage: 61.00%
GPU memory usage: 23.00%
Peak GPU memory: 3259.14 MB
Avg inference time: 0.001695s


Generating Outputs: 100%|██████████| 212/212 [00:10<00:00, 19.87it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 62.00%
GPU memory usage: 24.00%
Peak GPU memory: 3366.49 MB
Avg inference time: 0.001685s


Generating Outputs: 100%|██████████| 205/205 [00:10<00:00, 19.26it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 64.00%
GPU memory usage: 25.00%
Peak GPU memory: 3473.59 MB
Avg inference time: 0.001682s


Generating Outputs: 100%|██████████| 198/198 [00:10<00:00, 18.70it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.30%
GPU usage: 66.00%
GPU memory usage: 26.00%
Peak GPU memory: 3580.43 MB
Avg inference time: 0.001676s


Generating Outputs: 100%|██████████| 192/192 [00:10<00:00, 18.14it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 59.00%
GPU memory usage: 23.00%
Peak GPU memory: 3691.16 MB
Avg inference time: 0.001675s


Generating Outputs: 100%|██████████| 187/187 [00:10<00:00, 17.58it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 65.00%
GPU memory usage: 25.00%
Peak GPU memory: 3797.63 MB
Avg inference time: 0.001681s


Generating Outputs: 100%|██████████| 182/182 [00:10<00:00, 17.13it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 67.00%
GPU memory usage: 26.00%
Peak GPU memory: 3903.85 MB
Avg inference time: 0.001677s


Generating Outputs: 100%|██████████| 176/176 [00:10<00:00, 16.62it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 66.00%
GPU memory usage: 26.00%
Peak GPU memory: 4009.83 MB
Avg inference time: 0.001677s


Generating Outputs: 100%|██████████| 172/172 [00:10<00:00, 16.30it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 62.00%
GPU memory usage: 24.00%
Peak GPU memory: 4120.18 MB
Avg inference time: 0.001666s


Generating Outputs: 100%|██████████| 167/167 [00:10<00:00, 15.82it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.30%
GPU usage: 64.00%
GPU memory usage: 25.00%
Peak GPU memory: 4225.78 MB
Avg inference time: 0.001670s


Generating Outputs: 100%|██████████| 163/163 [00:10<00:00, 15.46it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 63.00%
GPU memory usage: 24.00%
Peak GPU memory: 4336.00 MB
Avg inference time: 0.001667s


Generating Outputs: 100%|██████████| 159/159 [00:10<00:00, 15.10it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 63.00%
GPU memory usage: 24.00%
Peak GPU memory: 4441.23 MB
Avg inference time: 0.001664s


Generating Outputs: 100%|██████████| 155/155 [00:10<00:00, 14.70it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 64.00%
GPU memory usage: 25.00%
Peak GPU memory: 4551.33 MB
Avg inference time: 0.001668s


Generating Outputs: 100%|██████████| 151/151 [00:10<00:00, 14.42it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 63.00%
GPU memory usage: 24.00%
Peak GPU memory: 4656.18 MB
Avg inference time: 0.001658s


Generating Outputs: 100%|██████████| 148/148 [00:10<00:00, 14.07it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 60.00%
GPU memory usage: 24.00%
Peak GPU memory: 4766.22 MB
Avg inference time: 0.001662s


Generating Outputs: 100%|██████████| 144/144 [00:10<00:00, 13.71it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 58.00%
GPU memory usage: 23.00%
Peak GPU memory: 4870.70 MB
Avg inference time: 0.001664s


Generating Outputs: 100%|██████████| 141/141 [00:10<00:00, 13.50it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 61.00%
GPU memory usage: 24.00%
Peak GPU memory: 4980.55 MB
Avg inference time: 0.001654s


Generating Outputs: 100%|██████████| 138/138 [00:10<00:00, 13.20it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 70.00%
GPU memory usage: 28.00%
Peak GPU memory: 5084.65 MB
Avg inference time: 0.001656s


Generating Outputs: 100%|██████████| 135/135 [00:10<00:00, 12.91it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 57.00%
GPU memory usage: 22.00%
Peak GPU memory: 5194.37 MB
Avg inference time: 0.001656s


Generating Outputs: 100%|██████████| 132/132 [00:10<00:00, 12.59it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 70.00%
GPU memory usage: 27.00%
Peak GPU memory: 5298.10 MB
Avg inference time: 0.001662s


Generating Outputs: 100%|██████████| 130/130 [00:10<00:00, 12.38it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.30%
GPU usage: 61.00%
GPU memory usage: 24.00%
Peak GPU memory: 5407.70 MB
Avg inference time: 0.001660s


Generating Outputs: 100%|██████████| 127/127 [00:10<00:00, 12.17it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 70.00%
GPU memory usage: 27.00%
Peak GPU memory: 5517.30 MB
Avg inference time: 0.001652s


Generating Outputs: 100%|██████████| 125/125 [00:10<00:00, 11.97it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.10%
GPU usage: 67.00%
GPU memory usage: 26.00%
Peak GPU memory: 5620.52 MB
Avg inference time: 0.001650s


Generating Outputs: 100%|██████████| 122/122 [00:10<00:00, 11.72it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 58.00%
GPU memory usage: 23.00%
Peak GPU memory: 5730.00 MB
Avg inference time: 0.001649s


Generating Outputs: 100%|██████████| 120/120 [00:10<00:00, 11.50it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 68.00%
GPU memory usage: 26.00%
Peak GPU memory: 5839.47 MB
Avg inference time: 0.001651s


Generating Outputs: 100%|██████████| 118/118 [00:10<00:00, 11.34it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.10%
GPU usage: 67.00%
GPU memory usage: 26.00%
Peak GPU memory: 5942.20 MB
Avg inference time: 0.001646s


Generating Outputs: 100%|██████████| 116/116 [00:10<00:00, 11.11it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.10%
GPU usage: 57.00%
GPU memory usage: 22.00%
Peak GPU memory: 6051.55 MB
Avg inference time: 0.001650s


Generating Outputs: 100%|██████████| 114/114 [00:10<00:00, 10.95it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.10%
GPU usage: 65.00%
GPU memory usage: 26.00%
Peak GPU memory: 6160.90 MB
Avg inference time: 0.001644s


Generating Outputs: 100%|██████████| 112/112 [00:10<00:00, 10.77it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 65.00%
GPU memory usage: 25.00%
Peak GPU memory: 6270.25 MB
Avg inference time: 0.001643s


Generating Outputs: 100%|██████████| 110/110 [00:10<00:00, 10.58it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.10%
GPU usage: 61.00%
GPU memory usage: 24.00%
Peak GPU memory: 6372.35 MB
Avg inference time: 0.001644s


Generating Outputs: 100%|██████████| 108/108 [00:10<00:00, 10.42it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 64.00%
GPU memory usage: 25.00%
Peak GPU memory: 6481.58 MB
Avg inference time: 0.001640s


Generating Outputs: 100%|██████████| 106/106 [00:10<00:00, 10.22it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.10%
GPU usage: 61.00%
GPU memory usage: 24.00%
Peak GPU memory: 6590.81 MB
Avg inference time: 0.001643s


Generating Outputs: 100%|██████████| 104/104 [00:10<00:00, 10.03it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 63.00%
GPU memory usage: 25.00%
Peak GPU memory: 6700.03 MB
Avg inference time: 0.001644s


Generating Outputs: 100%|██████████| 103/103 [00:10<00:00,  9.91it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 63.00%
GPU memory usage: 25.00%
Peak GPU memory: 6801.51 MB
Avg inference time: 0.001643s


Generating Outputs: 100%|██████████| 101/101 [00:10<00:00,  9.76it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 64.00%
GPU memory usage: 25.00%
Peak GPU memory: 6910.61 MB
Avg inference time: 0.001639s


Generating Outputs: 100%|██████████| 99/99 [00:10<00:00,  9.59it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.30%
GPU usage: 63.00%
GPU memory usage: 25.00%
Peak GPU memory: 7019.71 MB
Avg inference time: 0.001639s


Generating Outputs: 100%|██████████| 98/98 [00:10<00:00,  9.45it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 60.00%
GPU memory usage: 24.00%
Peak GPU memory: 7130.58 MB
Avg inference time: 0.001641s


Generating Outputs: 100%|██████████| 96/96 [00:10<00:00,  9.29it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 59.00%
GPU memory usage: 23.00%
Peak GPU memory: 7239.65 MB
Avg inference time: 0.001641s


Generating Outputs: 100%|██████████| 95/95 [00:10<00:00,  9.18it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 59.00%
GPU memory usage: 23.00%
Peak GPU memory: 7340.35 MB
Avg inference time: 0.001640s


Generating Outputs: 100%|██████████| 94/94 [00:10<00:00,  9.02it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.30%
GPU usage: 62.00%
GPU memory usage: 23.00%
Peak GPU memory: 7449.29 MB
Avg inference time: 0.001646s


Generating Outputs: 100%|██████████| 92/92 [00:10<00:00,  8.87it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 70.00%
GPU memory usage: 28.00%
Peak GPU memory: 7558.23 MB
Avg inference time: 0.001645s


Generating Outputs: 100%|██████████| 91/91 [00:10<00:00,  8.81it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 68.00%
GPU memory usage: 26.00%
Peak GPU memory: 7667.18 MB
Avg inference time: 0.001635s


Generating Outputs: 100%|██████████| 90/90 [00:10<00:00,  8.66it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 56.00%
GPU memory usage: 23.00%
Peak GPU memory: 7776.13 MB
Avg inference time: 0.001642s


Generating Outputs: 100%|██████████| 88/88 [00:10<00:00,  8.52it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 56.00%
GPU memory usage: 23.00%
Peak GPU memory: 7876.07 MB
Avg inference time: 0.001641s


Generating Outputs: 100%|██████████| 87/87 [00:10<00:00,  8.44it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 73.00%
GPU memory usage: 29.00%
Peak GPU memory: 7984.89 MB
Avg inference time: 0.001635s


Generating Outputs: 100%|██████████| 86/86 [00:10<00:00,  8.35it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 63.00%
GPU memory usage: 25.00%
Peak GPU memory: 8093.71 MB
Avg inference time: 0.001633s


Generating Outputs: 100%|██████████| 85/85 [00:10<00:00,  8.22it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.30%
GPU usage: 55.00%
GPU memory usage: 22.00%
Peak GPU memory: 8202.53 MB
Avg inference time: 0.001638s


Generating Outputs: 100%|██████████| 84/84 [00:10<00:00,  8.11it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 59.00%
GPU memory usage: 23.00%
Peak GPU memory: 8311.35 MB
Avg inference time: 0.001639s


Generating Outputs: 100%|██████████| 83/83 [00:10<00:00,  8.03it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 69.00%
GPU memory usage: 27.00%
Peak GPU memory: 8420.17 MB
Avg inference time: 0.001634s


Generating Outputs: 100%|██████████| 82/82 [00:10<00:00,  7.94it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.30%
GPU usage: 61.00%
GPU memory usage: 24.00%
Peak GPU memory: 8529.00 MB
Avg inference time: 0.001633s


Generating Outputs: 100%|██████████| 81/81 [00:10<00:00,  7.87it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 62.00%
GPU memory usage: 24.00%
Peak GPU memory: 8627.94 MB
Avg inference time: 0.001626s


Generating Outputs: 100%|██████████| 80/80 [00:10<00:00,  7.79it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.30%
GPU usage: 76.00%
GPU memory usage: 29.00%
Peak GPU memory: 8736.64 MB
Avg inference time: 0.001623s


Generating Outputs: 100%|██████████| 79/79 [00:10<00:00,  7.66it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 52.00%
GPU memory usage: 21.00%
Peak GPU memory: 8845.33 MB
Avg inference time: 0.001630s


Generating Outputs: 100%|██████████| 78/78 [00:10<00:00,  7.56it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 67.00%
GPU memory usage: 27.00%
Peak GPU memory: 8954.03 MB
Avg inference time: 0.001631s


Generating Outputs: 100%|██████████| 77/77 [00:10<00:00,  7.45it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.30%
GPU usage: 54.00%
GPU memory usage: 22.00%
Peak GPU memory: 9062.72 MB
Avg inference time: 0.001635s


Generating Outputs: 100%|██████████| 76/76 [00:10<00:00,  7.39it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 62.00%
GPU memory usage: 24.00%
Peak GPU memory: 9171.42 MB
Avg inference time: 0.001629s


Generating Outputs: 100%|██████████| 75/75 [00:10<00:00,  7.32it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 67.00%
GPU memory usage: 26.00%
Peak GPU memory: 9280.12 MB
Avg inference time: 0.001624s


Generating Outputs: 100%|██████████| 74/74 [00:10<00:00,  7.24it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 60.00%
GPU memory usage: 24.00%
Peak GPU memory: 9388.85 MB
Avg inference time: 0.001622s


Generating Outputs: 100%|██████████| 73/73 [00:10<00:00,  7.14it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.30%
GPU usage: 52.00%
GPU memory usage: 21.00%
Peak GPU memory: 9486.67 MB
Avg inference time: 0.001625s


Generating Outputs: 100%|██████████| 72/72 [00:10<00:00,  7.05it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.30%
GPU usage: 57.00%
GPU memory usage: 22.00%
Peak GPU memory: 9595.24 MB
Avg inference time: 0.001626s


Generating Outputs: 100%|██████████| 72/72 [00:10<00:00,  6.99it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 73.00%
GPU memory usage: 29.00%
Peak GPU memory: 9703.81 MB
Avg inference time: 0.001629s


Generating Outputs: 100%|██████████| 71/71 [00:10<00:00,  6.93it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.10%
GPU usage: 62.00%
GPU memory usage: 24.00%
Peak GPU memory: 9812.38 MB
Avg inference time: 0.001623s


Generating Outputs: 100%|██████████| 70/70 [00:10<00:00,  6.85it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.10%
GPU usage: 58.00%
GPU memory usage: 23.00%
Peak GPU memory: 9920.95 MB
Avg inference time: 0.001621s


Generating Outputs: 100%|██████████| 69/69 [00:10<00:00,  6.76it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 49.00%
GPU memory usage: 20.00%
Peak GPU memory: 10029.52 MB
Avg inference time: 0.001623s


Generating Outputs: 100%|██████████| 69/69 [00:10<00:00,  6.70it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 53.00%
GPU memory usage: 21.00%
Peak GPU memory: 10138.09 MB
Avg inference time: 0.001626s


Generating Outputs: 100%|██████████| 68/68 [00:10<00:00,  6.65it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.30%
GPU usage: 49.00%
GPU memory usage: 19.00%
Peak GPU memory: 10246.66 MB
Avg inference time: 0.001621s


Generating Outputs: 100%|██████████| 67/67 [00:10<00:00,  6.54it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 64.00%
GPU memory usage: 25.00%
Peak GPU memory: 10355.23 MB
Avg inference time: 0.001627s


Generating Outputs: 100%|██████████| 66/66 [00:10<00:00,  6.49it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 65.00%
GPU memory usage: 26.00%
Peak GPU memory: 10451.81 MB
Avg inference time: 0.001621s


Generating Outputs: 100%|██████████| 66/66 [00:10<00:00,  6.45it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 71.00%
GPU memory usage: 27.00%
Peak GPU memory: 10560.25 MB
Avg inference time: 0.001621s


Generating Outputs: 100%|██████████| 65/65 [00:10<00:00,  6.35it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 70.00%
GPU memory usage: 27.00%
Peak GPU memory: 10669.31 MB
Avg inference time: 0.001625s


Generating Outputs: 100%|██████████| 64/64 [00:10<00:00,  6.27it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 71.00%
GPU memory usage: 27.00%
Peak GPU memory: 10777.15 MB
Avg inference time: 0.001625s


Generating Outputs: 100%|██████████| 64/64 [00:10<00:00,  6.26it/s]


              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.10%
GPU usage: 56.00%
GPU memory usage: 22.00%
Peak GPU memory: 10885.59 MB
Avg inference time: 0.001619s
The optimal batch size is 100. Here are the benchmarks associated with this batch size...


Generating Outputs: 100%|██████████| 64/64 [00:10<00:00,  6.25it/s]

              precision    recall  f1-score   support

           0       0.99      1.00      0.99      3001
           1       0.97      0.98      0.98       938
           2       0.00      0.00      0.00        19
           3       1.00      1.00      1.00      2321
           4       0.36      0.47      0.41        19
           5       0.94      0.87      0.90        38

    accuracy                           0.99      6336
   macro avg       0.71      0.72      0.71      6336
weighted avg       0.99      0.99      0.99      6336

CPU usage: 100.20%
GPU usage: 65.00%
GPU memory usage: 26.00%
Peak GPU memory: 10885.59 MB
Avg inference time: 0.001622s





# **<U>K-S Test</U>**
- Now we have the outputs of all our experiments, we are now going to compare the outputs of our experiments when there is optimisation compared to our vanilla inference (experiment 1). We are going to use a K-S test to do this.
- A Kolmogorov–Smirnov (K-S) test compares distributions by measuring the maximum distance between their cumulative distribution functions (CDFs) and flags a difference if that gap is statistically unlikely.

In [13]:
def ks_test_two_sample(sample_1, sample_2):
    """
    Perform a two-sample Kolmogorov–Smirnov test.

    Args:
        sample_1: First 1D array-like sample.
        sample_2: Second 1D array-like sample.

    Returns:
        Tuple of (ks_statistic, p_value).
    """
    statistic, p_value = ks_2samp(sample_1, sample_2)
    return statistic, p_value


In [14]:
stat12, p_value12 = ks_2samp(y_pred_exp1, y_pred_exp2)
stat13, p_value13 = ks_2samp(y_pred_exp1, y_pred_exp3)
stat14, p_value14 = ks_2samp(y_pred_exp1, y_pred_exp4)

print(f'Experiment 1 & experiment 2 -> k-s test: {stat12}, p-value: {p_value12}')
print(f'Experiment 1 & experiment 3 -> k-s test: {stat13}, p-value: {p_value13}')
print(f'Experiment 1 & experiment 4 -> k-s test: {stat14}, p-value: {p_value14}')

Experiment 1 & experiment 2 -> k-s test: 0.0, p-value: 1.0
Experiment 1 & experiment 3 -> k-s test: 0.0, p-value: 1.0
Experiment 1 & experiment 4 -> k-s test: 0.0, p-value: 1.0


# <U>**Results**</U>

In [25]:
df1 =  pd.DataFrame([metrics_exp1])
df1['Experiments'] = 'Experiment 1 (Vanilla)'

df2 =  pd.DataFrame([metrics_exp2])
df2['Experiments'] = 'Experiment 2 (FP16)'

df3 =  pd.DataFrame([metrics_exp3])
df3['Experiments'] = 'Experiment 3 (Mixed Precision)'

df4 =  pd.DataFrame([metrics_exp4])
df4['Experiments'] = 'Experiment 4 (Batching)'

benchmarks = df_stacked = pd.concat([df1, df2, df3, df4], ignore_index=True)
benchmarks = benchmarks.set_index('Experiments').reset_index()
benchmarks['Accuracy'] = 0.99
benchmarks['K-S Statistic'] = 0.0
benchmarks['P-Value'] = 1.0
benchmarks

Unnamed: 0,Experiments,CPU usage (%),GPU usage(%),GPU memory usage (%),Peak GPU memory (MB),Avg inference time (ms),Accuracy,K-S Statistic,P-Value
0,Experiment 1 (Vanilla),100.2,85,24,2243.59375,4.831873,0.99,0.0,1.0
1,Experiment 2 (FP16),100.1,59,19,1150.761719,1.821086,0.99,0.0,1.0
2,Experiment 3 (Mixed Precision),100.3,64,30,1230.77002,2.074472,0.99,0.0,1.0
3,Experiment 4 (Batching),100.2,65,26,10885.59082,1.62172,0.99,0.0,1.0


- From the K-S Statistics and the P-Values, we can deduce that the outputs generated are identical throughout and applying our optimization strategies had no impact on the performance (according to the accuracy).
- The CPU usage ~100% in all experiments, implying get_outputs() is doing CPU-side batching, padding & dataset slicing.
- The latency appears to be the best we optimize our batch-size, we get a 3x speed-up from our vanilla inference (experiment 1).
-  From GPU Usage (%), we found that larger GPU utilization doesn't always mean better performance/ faster inference. Experiment 1 had the highest GPU utilization but was the slowest. Experiment 2 had the lowest GPU utilization and was ~2.6x faster than experiment 1.
- One cause for concern appeared in the Peak GPU Memory (MB). Experiment 4 had a value of 10.9GB. This is because a larger batch size means larger padded tensors.

# **<u>Discussion</U>**

- The aim of this challenge was to improve the Geneformer model from the helical package to make it run more efficiently over a large amount of perturbations (i.e. inference).
- Experiment 1 (Vanilla) was our baseline. We recorded the runtime, GPU/CPU utilization, and memory use. We then saved the model outputs for comparison.
- We then leverged FP16 (soft quantization), mixed precision and batching to optimize for runtime, GPU/CPU utilization, and memory use. We ensured these results remained comparable to baseline outputs.
- We reported different metrics to show that your optimisation worked, while still being correct and summarized our results in a short table.
- Due to our limited time-frame and the amount of operational overhead involved, we were unable to explore strategies such as ONNX, TensorRT and distributed inference. One form of improvement would be to explore these stratgies.

# **<U>Conclusion</U>**
- Batching provided the largest inference speedup (~3×), outperforming both FP16 casting and AMP. GPU utilization did not correlate directly with latency, highlighting kernel efficiency and launch amortization as dominant factors. FP16 (soft quantization) primarily reduced memory footprint, while batching dominated throughput gains.