# Bayesian Optimization for Hyperparameter Tuning

This notebook demonstrates the use of Bayesian Optimization using Optuna to tune hyperparameters for a character-level language model. The optimization process is logged, and visualizations of the results are provided.

The hyperparameters being tuned include:
- learning rate
- weight decay
- learning rate schedule.

In [1]:
import optuna
optuna.logging.set_verbosity(optuna.logging.INFO)
import optuna.visualization as vis
import kaleido
import plotly.io as pio
import time
import sys
import os
from pathlib import Path

cwd = Path.cwd()

project_root = cwd.parents[0]
sys.path.append(str(project_root))

from objective_fn import build_objective
from bo_utils import initialize_training_log
from callbacks import create_safety_callback



This means that static image generation (e.g. `fig.write_image()`) will not work.

Please upgrade Plotly to version 6.1.1 or greater, or downgrade Kaleido to version 0.2.1.

  from .kaleido import Kaleido


In [2]:
from google.colab import drive
drive.mount('/content/drive')

# Set up a backup drive and download directory
DRIVE_BACKUP_DIR = "/content/drive/MyDrive/character_llm_bo_backups"
os.makedirs(DRIVE_BACKUP_DIR, exist_ok=True)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Load the Experiment Setup

## Load data & set the relevant directory/output paths

The same text8 dataset is used, which has 100M characters of text data from Wikipedia articles. It contains only lowercase letters and spaces, and is already pre-split into 90M characters for training and 10M characters for testing.

- Here, we set the necessary directory paths for data and configuration files.
- We also define the maximum search time `max_search_hours` for the optimization process.
- We define how often to download intermediate results during the optimization process.
- Finally, we initialize the training log if it does not already exist.

In [3]:
# data_dir = "./../../data/text8_train.txt" # Set the data directory path for local setup
data_dir = "./text8_train.txt" # Set the data directory path for Google Colab
config_path = "./config.json" # Set the configuration file path
output_file = "./tuning_results.log"

max_search_hours = 10 # Set the maximum search time in hours
download_every_n_trials = 1 # Set how often to download intermediate results

In [4]:
# Read in training text file
with open(data_dir, 'r', encoding='utf-8') as f:
    train_text = f.read()
print(f"Training text loaded. Length: {len(train_text) :,} characters.")

Training text loaded. Length: 90,000,000 characters.


In [5]:
if not os.path.exists(output_file):
    initialize_training_log()

[initialize_training_log] Initialized tuning results log file.


# Optuna Bayesian Optimization Setup

We build the objective function for optimization using the training text and configuration file. We then create an Optuna study with the direction set to "minimize" and a median pruner to efficiently manage trials. The best hyperparameters are logged during the optimization process and will be visualized later.

In [6]:
GLOBAL_START_TIME = time.time()
GLOBAL_TIME_LIMIT = max_search_hours * 60 * 60

# Build the objective function for optimization
objective = build_objective(
    train_text,
    output_file,
    config_path=config_path,
    global_start_time = GLOBAL_START_TIME,
    global_time_limit = GLOBAL_TIME_LIMIT
)

# Optuna Bayesian Optimization Setup
study = optuna.create_study(
    study_name="Character_LLM_Hyperparameter_Tuning",
    storage="sqlite:///character_llm_hyperparam_tuning.db",
    load_if_exists=True,
    direction="minimize",
    pruner=optuna.pruners.MedianPruner(n_warmup_steps=200)
)

safety_callback = create_safety_callback(
    log_path = output_file,
    db_path="character_llm_hyperparam_tuning.db",
    drive_backup_dir = DRIVE_BACKUP_DIR,
    backup_every=download_every_n_trials
)

[I 2025-11-20 11:10:58,107] A new study created in RDB with name: Character_LLM_Hyperparameter_Tuning


In [7]:
# Start the optimization process
study.optimize(
    objective,
    timeout = 60 * 60 * max_search_hours,
    n_trials = 15,
    callbacks=[safety_callback],
    show_progress_bar=True
)

if len(study.get_trials(states=[optuna.trial.TrialState.COMPLETE])) == 0:
    print("\nNo completed trials. Only pruned ones.")
else:
    print("\nBest Trial Params: ")
    print(study.best_trial.params)

  0%|          | 0/15 [00:00<?, ?it/s]



[Trial 0] Starting throughput calculation...
Benchmark completed in 25.28 seconds.
Total tokens processed: 16384000
Throughput: 648030.77 tokens/second
Estimated max steps within compute budget: 213584.0

[Trial 0] Starting training loop.
  iter_max = 213,584
lr_schedule = constant
  weight_decay = 0.01
  learning_rate = 0.0012321854067764554

    Step     1/213584  (  0.0%) | val_loss = 4.4226
    Step 10680/213584  (  5.0%) | val_loss = 1.2228
    Step 21359/213584  ( 10.0%) | val_loss = 1.1491
    Step 32038/213584  ( 15.0%) | val_loss = 1.1386
    Step 42717/213584  ( 20.0%) | val_loss = 1.0815
    Step 53396/213584  ( 25.0%) | val_loss = 1.0771
    Step 64075/213584  ( 30.0%) | val_loss = 1.0793
    Step 74754/213584  ( 35.0%) | val_loss = 1.0767
    Step 85433/213584  ( 40.0%) | val_loss = 1.0781
    Step 96112/213584  ( 45.0%) | val_loss = 1.0530
    Step 106791/213584  ( 50.0%) | val_loss = 1.0509
    Step 117470/213584  ( 55.0%) | val_loss = 1.0249
    Step 128149/213584  ( 

## Optuna Bayesian Optimization Visualisations

We visualize the optimization history, parameter importance, and hyperparameter relationships using Optuna's built-in visualization tools. These plots help us understand how different hyperparameters affect the model's performance and identify the most influential ones.

In [13]:
# Optimization history to visualize the progress over trials
fig1 = vis.plot_optimization_history(study)
fig1.show()

# Save the optimization history plot
# fig1.write_image(f"{DRIVE_BACKUP_DIR}/optimization_history.png")


In [19]:
# Optuna visualization to show slice plot of hyperparameters
fig2 = vis.plot_slice(study)
fig2.show()

# Save the slice plot
# fig2.write_image(f"{DRIVE_BACKUP_DIR}/slice_plot.png")

AttributeError: 'ColorBar' object has no attribute '_set_property'

In [15]:
# Optuna visualization to show parallel coordinate plot of hyperparameters
fig3 = vis.plot_parallel_coordinate(study)
fig3.show()

# Save the parallel coordinate plot
# fig3.write_image(f"{DRIVE_BACKUP_DIR}/parallel_coordinate.png")

AttributeError: 'Parcoords' object has no attribute '_set_property'

In [None]:
# Optuna visualization to show parameter importances
fig4 = vis.plot_param_importances(study)
fig4.show()

# Save the parameter importances plot
# fig4.write_image(f"{DRIVE_BACKUP_DIR}/param_importances.png")

In [None]:
# Optuna visualization to show contour plot of hyperparameters
fig5 = optuna.visualization.plot_contour(study, params=["learning_rate", "weight_decay"])
fig6 = optuna.visualization.plot_contour(study, params=["learning_rate", "lr_schedule"])
fig7 = optuna.visualization.plot_contour(study, params=["weight_decay", "lr_schedule"])
fig5.show()

# Save the contour plot
# fig5.write_image(f"{DRIVE_BACKUP_DIR}/contour_plot.png")
# fig6.write_image(f"{DRIVE_BACKUP_DIR}/contour_plot_2.png")
# fig7.write_image(f"{DRIVE_BACKUP_DIR}/contour_plot_3.png")

In [None]:
# Empirical distribution function (EDF) plot to show cumulative distribution of objective values
fig8 = optuna.visualization.plot_edf(study)
fig8.show()

# Save the EDF plot
# fig8.write_image(f"{DRIVE_BACKUP_DIR}/edf_plot.png")