# Bayesian Optimization for Hyperparameter Tuning

This notebook demonstrates the use of Bayesian Optimization using Optuna to tune hyperparameters for a character-level language model. The optimization process is logged, and visualizations of the results are provided.

The hyperparameters being tuned include:
- learning rate
- weight decay
- learning rate schedule.

In [1]:
import optuna
optuna.logging.set_verbosity(optuna.logging.INFO)
import optuna.visualization as vis
import kaleido
import plotly.io as pio
import time
import sys
import os
from pathlib import Path

cwd = Path.cwd()

project_root = cwd.parents[0]
sys.path.append(str(project_root))

from objective_fn import build_objective
from bo_utils import initialize_training_log
from callbacks import create_safety_callback

In [2]:
from google.colab import drive
drive.mount('/content/drive')

# Set up a backup drive and download directory
DRIVE_BACKUP_DIR = "/content/drive/MyDrive/character_llm_bo_backups"
os.makedirs(DRIVE_BACKUP_DIR, exist_ok=True)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Load the Experiment Setup

## Load data & set the relevant directory/output paths

The same text8 dataset is used, which has 100M characters of text data from Wikipedia articles. It contains only lowercase letters and spaces, and is already pre-split into 90M characters for training and 10M characters for testing.

- Here, we set the necessary directory paths for data and configuration files.
- We also define the maximum search time `max_search_hours` for the optimization process.
- We define how often to download intermediate results during the optimization process.
- Finally, we initialize the training log if it does not already exist.

In [3]:
# data_dir = "./../../data/text8_train.txt" # Set the data directory path for local setup
data_dir = "./text8_train.txt" # Set the data directory path for Google Colab
config_path = "./config.json" # Set the configuration file path
output_file = "./tuning_results.log"

max_search_hours = 0.01 # Set the maximum search time in hours
download_every_n_trials = 1 # Set how often to download intermediate results

In [4]:
# Read in training text file
with open(data_dir, 'r', encoding='utf-8') as f:
    train_text = f.read()
print(f"Training text loaded. Length: {len(train_text) :,} characters.")

Training text loaded. Length: 90,000,000 characters.


In [5]:
if not os.path.exists(output_file):
    initialize_training_log()

# Optuna Bayesian Optimization Setup

We build the objective function for optimization using the training text and configuration file. We then create an Optuna study with the direction set to "minimize" and a median pruner to efficiently manage trials. The best hyperparameters are logged during the optimization process and will be visualized later.

In [6]:
GLOBAL_START_TIME = time.time()
GLOBAL_TIME_LIMIT = max_search_hours * 60 * 60

# Build the objective function for optimization
objective = build_objective(
    train_text,
    output_file,
    config_path=config_path,
    global_start_time = GLOBAL_START_TIME,
    global_time_limit = GLOBAL_TIME_LIMIT
)

# Optuna Bayesian Optimization Setup
study = optuna.create_study(
    study_name="Character_LLM_Hyperparameter_Tuning",
    storage="sqlite:///character_llm_hyperparam_tuning.db",
    load_if_exists=True,
    direction="minimize",
    pruner=optuna.pruners.MedianPruner(n_warmup_steps=200)
)

safety_callback = create_safety_callback(
    log_path = output_file,
    db_path="character_llm_hyperparam_tuning.db",
    drive_backup_dir = DRIVE_BACKUP_DIR,
    backup_every=download_every_n_trials
)

[I 2025-11-19 10:43:21,175] Using an existing study with name 'Character_LLM_Hyperparameter_Tuning' instead of creating a new one.


In [7]:
# Start the optimization process
study.optimize(
    objective,
    timeout = 60 * 60 * max_search_hours,
    n_trials = 20,
    callbacks=[safety_callback],
    show_progress_bar=True
)

if len(study.get_trials(states=[optuna.trial.TrialState.COMPLETE])) == 0:
    print("\nNo completed trials. Only pruned ones.")
else:
    print("\nBest Trial Params: ")
    print(study.best_trial.params)

  0%|          | 0/20 [00:00<?, ?it/s]



[Trial 6] Starting throughput calculation...
Stopping benchmark at iteration 7 due to time limit.
Benchmark completed in 21.72 seconds.
Total tokens processed: 32768
Throughput: 1508.61 tokens/second
Estimated max steps within compute budget: 13.0

[Trial 6] Starting training loop.
  iter_max = 13
lr_schedule = cosine
  weight_decay = 0.05
  learning_rate = 0.0003049100007677028

[Trial 6] Stopped early at step 0 due to global time limit.
Global time limit reached. (0.02h > 0.01h)
Pruning current trial.
[I 2025-11-19 10:44:36,395] Trial 6 pruned. 
[Drive Backup] Saving study + logs after trial 6...
[Drive Backup] Successfully saved to /content/drive/MyDrive/character_llm_bo_backups

Best Trial Params: 
{'lr_schedule': 'warmup_decay', 'learning_rate': 0.0010038199699643625, 'weight_decay': 0.0, 'warmup_ratio': 0.051440586316374784}


## Optuna Bayesian Optimization Visualisations

We visualize the optimization history, parameter importance, and hyperparameter relationships using Optuna's built-in visualization tools. These plots help us understand how different hyperparameters affect the model's performance and identify the most influential ones.

In [8]:
# Optimization history to visualize the progress over trials
fig1 = vis.plot_optimization_history(study)
fig1.show()

# Save the optimization history plot
fig1.write_image(f"{DRIVE_BACKUP_DIR}/optimization_history.png")


In [21]:
# Optuna visualization to show slice plot of hyperparameters
fig2 = vis.plot_slice(study)
fig2.show()

# Save the slice plot
fig2.write_image(f"{DRIVE_BACKUP_DIR}/slice_plot.png")

In [20]:
# Optuna visualization to show parallel coordinate plot of hyperparameters
fig3 = vis.plot_parallel_coordinate(study)
fig3.show()

# Save the parallel coordinate plot
fig3.write_image(f"{DRIVE_BACKUP_DIR}/parallel_coordinate.png")

In [19]:
# Optuna visualization to show parameter importances
fig4 = vis.plot_param_importances(study)
fig4.show()

# Save the parameter importances plot
fig4.write_image(f"{DRIVE_BACKUP_DIR}/param_importances.png")

In [18]:
# Optuna visualization to show contour plot of hyperparameters
fig5 = optuna.visualization.plot_contour(study, params=["learning_rate", "weight_decay"])
fig6 = optuna.visualization.plot_contour(study, params=["learning_rate", "lr_schedule"])
fig7 = optuna.visualization.plot_contour(study, params=["weight_decay", "lr_schedule"])
fig5.show()

# Save the contour plot
fig5.write_image(f"{DRIVE_BACKUP_DIR}/contour_plot.png")
fig6.write_image(f"{DRIVE_BACKUP_DIR}/contour_plot_2.png")
fig7.write_image(f"{DRIVE_BACKUP_DIR}/contour_plot_3.png")

In [17]:
# Empirical distribution function (EDF) plot to show cumulative distribution of objective values
fig8 = optuna.visualization.plot_edf(study)
fig8.show()

# Save the EDF plot
fig8.write_image(f"{DRIVE_BACKUP_DIR}/edf_plot.png")