# 超参数优化教程 (Optuna & Ray Tune)

欢迎来到超参数优化 (Hyperparameter Optimization, HPO) 教程！机器学习模型的性能往往对其超参数（如学习率、网络层数等）非常敏感。手动调优耗时且低效，而 HPO 工具则旨在自动化寻找最佳超参数组合的过程。

本教程将分别介绍两个流行的 Python HPO 框架：

1.  **Optuna**: 一个现代、易于使用的 HPO 框架，具有 Pythonic 的 API，支持多种高效的采样和剪枝算法。
2.  **Ray Tune**: Ray 生态系统的一部分，专注于提供可扩展、灵活的 HPO，支持分布式执行和高级调度策略。

我们将通过优化一个简单神经网络分类器的**学习率**和**隐藏层大小**的示例，分别展示如何使用这两个工具。

**本教程结构：**
1.  准备工作（安装库、公共数据准备）。
2.  使用 Optuna 进行超参数优化。
3.  使用 Ray Tune 进行超参数优化。
4.  简要比较与总结。

## 1. 准备工作

安装必要的库，并准备用于优化的数据集和基础模型结构。

### 1.1 安装库

```bash
pip install optuna "ray[tune]" scikit-learn torch torchvision numpy pandas matplotlib seaborn
# 为了 Optuna 可视化
pip install plotly
# 为了 Ray Tune 使用 Optuna 作为搜索算法 (可选)
# pip install "ray[tune, optuna]"
```
**注意**: Ray Tune 可能需要配置 Ray Core (`ray.init()`)。

In [None]:
# --- 公共导入 --- (虽然分开介绍，但先导入方便检查)
import optuna
import ray
from ray import tune
from ray.tune.search.optuna import OptunaSearch 
from ray.tune.schedulers import ASHAScheduler

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_classification
import numpy as np
import time
import os 

print(f"Optuna version: {optuna.__version__}")
print(f"Ray version: {ray.__version__}")

# --- Ray 初始化 (执行一次) --- 
if not ray.is_initialized():
    try:
        # Limit resources for notebook environment if needed
        ray.init(num_cpus=min(4, os.cpu_count()), ignore_reinit_error=True, log_to_driver=False)
        print("Ray initialized.")
    except Exception as e:
        print(f"Could not initialize Ray: {e}")
else:
    print("Ray already initialized.")

# --- 公共数据准备 (执行一次) --- 
print("\nPreparing synthetic dataset...")
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_redundant=5, n_classes=2, random_state=42)
X = X.astype(np.float32)
y = y.astype(np.int64)
X_train_hpo, X_val_hpo, y_train_hpo, y_val_hpo = train_test_split(X, y, test_size=0.25, random_state=42)

X_train_tensor_hpo = torch.from_numpy(X_train_hpo)
y_train_tensor_hpo = torch.from_numpy(y_train_hpo)
X_val_tensor_hpo = torch.from_numpy(X_val_hpo)
y_val_tensor_hpo = torch.from_numpy(y_val_hpo)

train_dataset_hpo = TensorDataset(X_train_tensor_hpo, y_train_tensor_hpo)
val_dataset_hpo = TensorDataset(X_val_tensor_hpo, y_val_tensor_hpo)

print(f"Dataset prepared: X_train shape={X_train_tensor_hpo.shape}, X_val shape={X_val_tensor_hpo.shape}")
input_size_hpo = X_train_hpo.shape[1]

# --- 公共模型定义 (定义一次) --- 
class SimpleNN_HPO(nn.Module):
    def __init__(self, input_size, hidden_size):
        super().__init__()
        self.layer1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.layer2 = nn.Linear(hidden_size, 2) # 2 classes

    def forward(self, x):
        return self.layer2(self.relu(self.layer1(x)))
print("SimpleNN_HPO model defined.")

## 2. 使用 Optuna 进行超参数优化

Optuna 通过定义一个 `objective` 函数来工作，该函数接收 `trial` 对象，使用它建议超参数，然后训练并评估模型，最后返回需要优化的指标值。

In [None]:
# --- Optuna: 导入特定库 (如果需要，但已在顶部导入) ---
# import optuna
# import torch
# ... (其他依赖)

print("\n--- Optuna Example --- ")
device_optuna = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# --- Optuna: 定义 Objective 函数 --- 
def objective_optuna(trial):
    # 建议超参数
    lr = trial.suggest_float("lr", 1e-4, 1e-1, log=True)
    hidden_size = trial.suggest_int("hidden_size", 32, 128, step=32)
    optimizer_name = trial.suggest_categorical("optimizer", ["Adam", "SGD"])
    batch_size = 64 # 固定 batch size 示例
    
    # 创建模型和优化器
    model = SimpleNN_HPO(input_size_hpo, hidden_size).to(device_optuna)
    if optimizer_name == "Adam":
        optimizer = optim.Adam(model.parameters(), lr=lr)
    else:
        optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.9)
        
    criterion = nn.CrossEntropyLoss()
    train_loader_optuna = DataLoader(train_dataset_hpo, batch_size=batch_size)
    val_loader_optuna = DataLoader(val_dataset_hpo, batch_size=batch_size)
    
    # 训练模型 (简化周期)
    n_epochs_optuna = 4
    for epoch in range(n_epochs_optuna):
        model.train()
        for batch_X, batch_y in train_loader_optuna:
            batch_X, batch_y = batch_X.to(device_optuna), batch_y.to(device_optuna)
            optimizer.zero_grad()
            outputs = model(batch_X)
            loss = criterion(outputs, batch_y)
            loss.backward()
            optimizer.step()
            
        # 评估并报告中间结果 (用于剪枝)
        model.eval()
        correct, total = 0, 0
        with torch.no_grad():
            for batch_X_val, batch_y_val in val_loader_optuna:
                batch_X_val, batch_y_val = batch_X_val.to(device_optuna), batch_y_val.to(device_optuna)
                outputs_val = model(batch_X_val)
                _, predicted = torch.max(outputs_val.data, 1)
                total += batch_y_val.size(0)
                correct += (predicted == batch_y_val).sum().item()
        accuracy = correct / total
        trial.report(accuracy, epoch)
        
        if trial.should_prune():
            # print(f"  Optuna Trial {trial.number} pruned at epoch {epoch+1}")
            raise optuna.TrialPruned()

    # 返回最终指标
    # print(f"Optuna Trial {trial.number} finished. Accuracy: {accuracy:.4f}, Params: {trial.params}")
    return accuracy

# --- Optuna: 创建 Study 并运行优化 --- 
study_optuna = optuna.create_study(direction="maximize", pruner=optuna.pruners.MedianPruner(n_startup_trials=2, n_warmup_steps=1))
n_trials_optuna = 15 # 运行试验次数
print(f"Starting Optuna optimization for {n_trials_optuna} trials...")
study_optuna.optimize(objective_optuna, n_trials=n_trials_optuna, timeout=120, show_progress_bar=True)
print("Optuna optimization finished.")

# --- Optuna: 查看结果 --- 
print("\n--- Optuna Results ---")
print(f"Number of finished trials: {len(study_optuna.trials)}")
try:
    best_trial_optuna = study_optuna.best_trial
    print(f"Best trial number: {best_trial_optuna.number}")
    print(f"  Value (Best Accuracy): {best_trial_optuna.value:.4f}")
    print("  Best Parameters:")
    for key, value in best_trial_optuna.params.items():
        print(f"    {key}: {value}")
except ValueError:
     print("No completed trials found for Optuna.")

# --- Optuna: 可视化 --- 
print("\nAttempting Optuna visualizations...")
try:
    import plotly
    if study_optuna.trials: # Check if there are trials before plotting
      fig1 = optuna.visualization.plot_optimization_history(study_optuna)
      fig2 = optuna.visualization.plot_param_importances(study_optuna)
      fig1.show()
      fig2.show()
    else:
        print("No trials to visualize for Optuna.")
except ImportError:
    print("Plotly not installed. Skipping Optuna visualizations.")
except Exception as e:
     print(f"Error during Optuna visualization: {e}")

## 3. 使用 Ray Tune 进行超参数优化

Ray Tune 使用可训练函数/类 (`Trainable`)、搜索空间、搜索算法和调度器来管理优化过程。它特别适合需要并行或分布式执行的场景。

In [None]:
# --- Ray Tune: 导入特定库 (如果需要) ---
# import ray
# from ray import tune
# from ray.tune.search.optuna import OptunaSearch 
# from ray.tune.schedulers import ASHAScheduler
# import torch
# ... (其他依赖)

print("\n--- Ray Tune Example --- ")
device_ray = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# --- Ray Tune: 定义 Trainable 函数 ---
# 注意：Trainable 函数通常应该在其内部加载数据，以便分布式运行。
# 为简化起见，这里仍然引用外部数据，但在实际分布式中需要调整。
def trainable_ray(config):
    lr = config["lr"]
    hidden_size = int(config["hidden_size"])
    optimizer_name = config["optimizer"]
    batch_size = config.get("batch_size", 64) # Allow default if not in config
    
    # DataLoaders within the function for potential distribution
    train_loader_tune = DataLoader(train_dataset_hpo, batch_size=batch_size)
    val_loader_tune = DataLoader(val_dataset_hpo, batch_size=batch_size)
    
    model = SimpleNN_HPO(input_size_hpo, hidden_size).to(device_ray)
    if optimizer_name == "Adam":
        optimizer = optim.Adam(model.parameters(), lr=lr)
    else:
        optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.9)
    criterion = nn.CrossEntropyLoss()

    n_epochs_ray = 10 # More epochs for scheduler
    for epoch in range(n_epochs_ray):
        model.train()
        for batch_X, batch_y in train_loader_tune:
            batch_X, batch_y = batch_X.to(device_ray), batch_y.to(device_ray)
            optimizer.zero_grad()
            outputs = model(batch_X)
            loss = criterion(outputs, batch_y)
            loss.backward()
            optimizer.step()

        # Evaluate and report to Ray Tune
        model.eval()
        correct, total = 0, 0
        with torch.no_grad():
            for batch_X_val, batch_y_val in val_loader_tune:
                batch_X_val, batch_y_val = batch_X_val.to(device_ray), batch_y_val.to(device_ray)
                outputs_val = model(batch_X_val)
                _, predicted = torch.max(outputs_val.data, 1)
                total += batch_y_val.size(0)
                correct += (predicted == batch_y_val).sum().item()
        accuracy = correct / total
        
        # 使用 tune.report()
        tune.report(mean_accuracy=accuracy, epoch=epoch)

# --- Ray Tune: 定义搜索空间 --- 
search_space_ray = {
    "lr": tune.loguniform(1e-4, 1e-1),
    "hidden_size": tune.qrandint(32, 128, q=32),
    "optimizer": tune.choice(["Adam", "SGD"]),
    # "batch_size": tune.choice([64, 128]) # Can also tune batch size
}

# --- Ray Tune: 配置搜索算法和调度器 --- 
# 使用 Optuna 搜索算法 (示例)
optuna_search = OptunaSearch(metric="mean_accuracy", mode="max")

# 使用 ASHA 调度器
asha_scheduler = ASHAScheduler(
    metric="mean_accuracy", mode="max", max_t=10, grace_period=1, reduction_factor=2
)

# --- Ray Tune: 运行 Tuner --- 
num_samples_ray = 10 # Number of trials
print(f"\nStarting Ray Tune optimization for {num_samples_ray} samples...")
if ray.is_initialized():
    try:
        tuner = tune.Tuner(
            trainable_ray,
            param_space=search_space_ray,
            tune_config=tune.TuneConfig(
                search_alg=optuna_search,
                scheduler=asha_scheduler,
                num_samples=num_samples_ray,
                metric="mean_accuracy",
                mode="max"
            ),
            run_config=ray.train.RunConfig(
                name="hpo_ray_tune_demo",
                verbose=1,
                # Stop criteria can be added here, e.g.:
                # stop={"training_iteration": n_epochs_ray}
            )
        )
        results_ray = tuner.fit()
        print("Ray Tune optimization finished.")

        # --- Ray Tune: 分析结果 --- 
        print("\n--- Ray Tune Results ---")
        best_result_ray = results_ray.get_best_result(metric="mean_accuracy", mode="max")
        if best_result_ray:
            print("Best trial config:")
            print(best_result_ray.config)
            print(f"Best trial final mean_accuracy: {best_result_ray.metrics['mean_accuracy']:.4f}")
        else:
            print("No best result found for Ray Tune.")
            
    except Exception as e:
        print(f"An error occurred during Ray Tune execution: {e}")
else:
    print("Ray is not initialized. Skipping Ray Tune execution.")

## 4. 比较与选择

| 特性             | Optuna                           | Ray Tune                         |
|------------------|----------------------------------|----------------------------------|
| **易用性**       | 非常高，Pythonic API              | 相对复杂，配置项更多            |
| **核心概念**     | Study, Trial, Objective          | Trainable, Search Space, Scheduler, Search Alg |
| **搜索算法**     | 内置多种 (TPE, CMA-ES, Random)  | 可插拔多种 (HyperOpt, Optuna, BayesOpt, etc.) |
| **剪枝 (Pruning)**| 内置多种剪枝器, 与框架集成良好 | 通过 Scheduler 实现 (ASHA, PBT等) |
| **并行/分布式**  | 有限 (需要手动或 RDB 后端)       | 核心优势，基于 Ray 构建         |
| **可扩展性**     | 良好                             | 非常高                           |
| **依赖**         | 轻量级                           | 需要安装 Ray Core               |
| **可视化**       | 内置 Plotly 可视化               | 依赖 TensorBoard 或其他工具      |
| **主要优势**     | 易用性, 快速上手, 强大采样/剪枝 | 可扩展性, 分布式, 高级调度      |

**选择建议**: 
*   对于**单机实验、快速原型设计、易用性优先**的场景，**Optuna** 通常是极好的选择。
*   对于需要**大规模并行、分布式训练、复杂调度策略或与其他 Ray 组件集成**的场景，**Ray Tune** 是更强大的选择。
*   可以结合使用：在 Ray Tune 中使用 Optuna 作为搜索算法。

## 总结

超参数优化是提升机器学习模型性能的关键步骤。Optuna 和 Ray Tune 是两个功能强大且流行的 Python HPO 框架。

*   **Optuna** 以其易用性和高效的采样/剪枝算法著称，非常适合快速上手和单机实验。
*   **Ray Tune** 则在可扩展性、分布式执行和高级调度方面表现出色，适合大规模或复杂的优化任务。

掌握 HPO 工具可以让你从繁琐的手动调参中解放出来，更系统地优化模型。