## 什麼是實驗跟蹤

如果您只運行少數幾個模型（就像我們到目前為止所做的那樣），那麼只需在列印輸出和一些字典中跟蹤它們的結果即可。但是，隨著您運行的實驗數量開始增加，這種幼稚的跟蹤方式可能會失控。

## 跟蹤機器學習實驗的不同方法
- Python 詞典、CSV 檔、列印輸出
- TensorBoard 張量板：PyTorch 中內置的擴展被廣泛認可和使用，可輕鬆擴展，但體驗不佳
- Weights & Biases Experiment Tracking：體驗好，但需要額外資源
- MLFlow

## 開始設置

In [None]:
import torch
class CFG:
  SEED = 42

device = "cuda" if torch.cuda.is_available() else "cpu"

In [None]:
# For this notebook to run with updated APIs, we need torch 1.12+ and torchvision 0.13+
try:
    import torch
    import torchvision
    assert int(torch.__version__.split(".")[1]) >= 12, "torch version should be 1.12+"
    assert int(torchvision.__version__.split(".")[1]) >= 13, "torchvision version should be 0.13+"
    print(f"torch version: {torch.__version__}")
    print(f"torchvision version: {torchvision.__version__}")
except:
    print(f"[INFO] torch/torchvision versions not as required, installing nightly versions.")
    !pip3 install -U torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
    import torch
    import torchvision
    print(f"torch version: {torch.__version__}")
    print(f"torchvision version: {torchvision.__version__}")

[INFO] torch/torchvision versions not as required, installing nightly versions.
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu113
torch version: 2.2.2+cu121
torchvision version: 0.17.2+cu121


In [None]:
# Continue with regular imports
import matplotlib.pyplot as plt
import torch
import torchvision

from torch import nn
from torchvision import transforms

# Try to get torchinfo, install it if it doesn't work
try:
    from torchinfo import summary
except:
    print("[INFO] Couldn't find torchinfo... installing it.")
    !pip install -q torchinfo
    from torchinfo import summary

# Try to import the going_modular directory, download it from GitHub if it doesn't work
try:
    from going_modular.going_modular import data_setup, engine
except:
    # Get the going_modular scripts
    print("[INFO] Couldn't find going_modular scripts... downloading them from GitHub.")
    !git clone https://github.com/mrdbourke/pytorch-deep-learning
    !mv pytorch-deep-learning/going_modular .
    !rm -rf pytorch-deep-learning
    from going_modular.going_modular import data_setup, engine

## 獲取數據

In [None]:
import os
import zipfile

from pathlib import Path

import requests

def download_data(source: str,
                  destination: str,
                  remove_source: bool = True) -> Path:
    """Downloads a zipped dataset from source and unzips to destination.

    Args:
        source (str): A link to a zipped file containing data.
        destination (str): A target directory to unzip data to.
        remove_source (bool): Whether to remove the source after downloading and extracting.

    Returns:
        pathlib.Path to downloaded data.

    Example usage:
        download_data(source="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip",
                      destination="pizza_steak_sushi")
    """
    # Setup path to data folder
    data_path = Path("data/")
    image_path = data_path / destination

    # If the image folder doesn't exist, download it and prepare it...
    if image_path.is_dir():
        print(f"[INFO] {image_path} directory exists, skipping download.")
    else:
        print(f"[INFO] Did not find {image_path} directory, creating one...")
        image_path.mkdir(parents=True, exist_ok=True)

        # Download pizza, steak, sushi data
        target_file = Path(source).name
        with open(data_path / target_file, "wb") as f:
            request = requests.get(source)
            print(f"[INFO] Downloading {target_file} from {source}...")
            f.write(request.content)

        # Unzip pizza, steak, sushi data
        with zipfile.ZipFile(data_path / target_file, "r") as zip_ref:
            print(f"[INFO] Unzipping {target_file} data...")
            zip_ref.extractall(image_path)

        # Remove .zip file
        if remove_source:
            os.remove(data_path / target_file)

    return image_path

image_path = download_data(source="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip",
                           destination="pizza_steak_sushi")
image_path

[INFO] data/pizza_steak_sushi directory exists, skipping download.


PosixPath('data/pizza_steak_sushi')

## 建立數據集和 DataLoader

使用 torchvision.models.MODEL_NAME.MODEL_WEIGHTS.DEFAULT.transforms() 自動創建的轉換。

In [None]:
# 設置資料目錄
train_dir = image_path / "train"
test_dir = image_path / "test"

# 設置預訓練權重（torchvision.models 中有許多這樣的權重可用）
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT

# 從權重中獲取轉換（這些轉換是用於獲取權重的）
automatic_transforms = weights.transforms()
print(f"自動創建的轉換: {automatic_transforms}")

# 創建資料加載器
train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(
    train_dir=train_dir,
    test_dir=test_dir,
    transform=automatic_transforms, # 使用自動創建的轉換
    batch_size=32
)

train_dataloader, test_dataloader, class_names

自動創建的轉換: ImageClassification(
    crop_size=[224]
    resize_size=[256]
    mean=[0.485, 0.456, 0.406]
    std=[0.229, 0.224, 0.225]
    interpolation=InterpolationMode.BICUBIC
)


(<torch.utils.data.dataloader.DataLoader at 0x7c2b6acead70>,
 <torch.utils.data.dataloader.DataLoader at 0x7c2b6acebaf0>,
 ['pizza', 'steak', 'sushi'])

## 獲取預訓練模型，凍結基礎層並更改分類器頭

In [None]:
# 在 torchvision 0.13 中新增的功能，"DEFAULT" 意味著"最佳可用權重"
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT

# 使用預訓練權重設置模型並將其發送到目標設備
model = torchvision.models.efficientnet_b0(weights=weights).to(device)

model

EfficientNet(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): SiLU(inplace=True)
    )
    (1): Sequential(
      (0): MBConv(
        (block): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
            (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): SiLU(inplace=True)
          )
          (1): SqueezeExcitation(
            (avgpool): AdaptiveAvgPool2d(output_size=1)
            (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
            (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
            (activation): SiLU(inplace=True)
            (scale_activation): Sigmoid()
          )
          (2): Conv2dNormActivat

In [None]:
# Freeze all base layers by setting requires_grad attribute to False
for param in model.features.parameters():
    param.requires_grad = False

torch.manual_seed(CFG.SEED)
torch.cuda.manual_seed(CFG.SEED)

# 更改分類層
model.classifier = torch.nn.Sequential(
    nn.Dropout(p=0.2, inplace=True),
    nn.Linear(in_features=1280,
              out_features=len(class_names),
              bias=True).to(device))

# 基礎層凍結，分類器頭更換，讓我們用 torchinfo.summary()
from torchinfo import summary

summary(model,
        input_size=(32, 3, 224, 224), # make sure this is "input_size", not "input_shape" (batch_size, color_channels, height, width)
        verbose=0,
        col_names=["input_size", "output_size", "num_params", "trainable"],
        col_width=20,
        row_settings=["var_names"]
)

Layer (type (var_name))                                      Input Shape          Output Shape         Param #              Trainable
EfficientNet (EfficientNet)                                  [32, 3, 224, 224]    [32, 3]              --                   Partial
├─Sequential (features)                                      [32, 3, 224, 224]    [32, 1280, 7, 7]     --                   False
│    └─Conv2dNormActivation (0)                              [32, 3, 224, 224]    [32, 32, 112, 112]   --                   False
│    │    └─Conv2d (0)                                       [32, 3, 224, 224]    [32, 32, 112, 112]   (864)                False
│    │    └─BatchNorm2d (1)                                  [32, 32, 112, 112]   [32, 32, 112, 112]   (64)                 False
│    │    └─SiLU (2)                                         [32, 32, 112, 112]   [32, 32, 112, 112]   --                   --
│    └─Sequential (1)                                        [32, 32, 112, 112]   [32, 

## 建立損失函數及優化器

In [None]:
# Define loss and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

## 訓練模型並跟蹤結果

我們可以使用 PyTorch 的 torch.utils.tensorboard.SummaryWriter() 類將模型訓練進度的各個部分保存到檔中。

默認情況下，該 SummaryWriter() 類將有關模型的各種資訊保存到參數設置的 log_dir 檔中。

log_dir 預設位置位於 下 runs/CURRENT_DATETIME_HOSTNAME ，其中 是 HOSTNAME 計算機的名稱。

SummaryWriter() 輸出以 TensorBoard 格式保存。

TensorBoard 是 TensorFlow 深度學習庫的一部分，是可視化模型不同部分的絕佳方式。

In [None]:
from torch.utils.tensorboard import SummaryWriter

# Create a writer with all default settings
writer = SummaryWriter()

現在要使用 writer，我們可以編寫一個新的訓練迴圈，或者我們可以調整我們在 05 中創建的現有 train() 函數。

我們將從 engine.py 中取得函數 train() 並調整它以使用 writer 。

我們將為我們的 train() 函數添加記錄模型的訓練和測試損失和準確性值的功能。我們可以用 writer.add_scalars(main_tag, tag_scalar_dict) 來做到這一點，其中：


*   main_tag （string） - 被追蹤的標量的名稱（例如“Accuracy”）
*   tag_scalar_dict （dict） - 被追蹤值的字典（例如） {"train_loss": 0.3454}

一旦我們完成跟蹤值，我們將打電話 writer.close() 告訴停止 writer 查找要跟蹤的值。



In [None]:
from typing import Dict, List
from tqdm.auto import tqdm

from going_modular.going_modular.engine import train_step, test_step

# 從以下位置導入 train() 函式:
# https://github.com/mrdbourke/pytorch-deep-learning/blob/main/going_modular/going_modular/engine.py
def train(model: torch.nn.Module,
          train_dataloader: torch.utils.data.DataLoader,
          test_dataloader: torch.utils.data.DataLoader,
          optimizer: torch.optim.Optimizer,
          loss_fn: torch.nn.Module,
          epochs: int,
          device: torch.device) -> Dict[str, List]:
    """訓練和測試一個 PyTorch 模型。

    將一個目標 PyTorch 模型通過 train_step() 和 test_step() 函式進行訓練和測試，
    在同一個迭代循環中對模型進行訓練和測試。

    計算、打印和存儲評估指標。

    參數:
      model: 要訓練和測試的 PyTorch 模型。
      train_dataloader: 用於訓練模型的 DataLoader 實例。
      test_dataloader: 用於測試模型的 DataLoader 實例。
      optimizer: 幫助最小化損失函數的 PyTorch 優化器。
      loss_fn: 用於計算兩個數據集上的損失的 PyTorch 損失函數。
      epochs: 一個整數，表示要訓練的 epoch 數。
      device: 計算的目標設備（例如 "cuda" 或 "cpu"）。

    返回:
      包含訓練和測試損失以及訓練和測試準確度指標的字典。每個指標在每個 epoch 中都有一個值的列表。
      格式為: {train_loss: [...],
            train_acc: [...],
            test_loss: [...],
            test_acc: [...]}
      例如，如果訓練 2 個 epochs:
              {train_loss: [2.0616, 1.0537],
                train_acc: [0.3945, 0.3945],
                test_loss: [1.2641, 1.5706],
                test_acc: [0.3400, 0.2973]}
    """
    # 創建空的結果字典
    results = {"train_loss": [],
               "train_acc": [],
               "test_loss": [],
               "test_acc": []
    }

    # 進行一定數量的 epoch 的訓練和測試步驟
    for epoch in tqdm(range(epochs)):
        train_loss, train_acc = train_step(model=model,
                                           dataloader=train_dataloader,
                                           loss_fn=loss_fn,
                                           optimizer=optimizer,
                                           device=device)
        test_loss, test_acc = test_step(model=model,
                                        dataloader=test_dataloader,
                                        loss_fn=loss_fn,
                                        device=device)

        # 打印正在發生的情況
        print(
          f"Epoch: {epoch+1} | "
          f"train_loss: {train_loss:.4f} | "
          f"train_acc: {train_acc:.4f} | "
          f"test_loss: {test_loss:.4f} | "
          f"test_acc: {test_acc:.4f}"
        )

        # 更新結果字典
        results["train_loss"].append(train_loss)
        results["train_acc"].append(train_acc)
        results["test_loss"].append(test_loss)
        results["test_acc"].append(test_acc)

        ### 新增: 實驗追蹤 ###
        # 將損失結果添加到 SummaryWriter
        writer.add_scalars(main_tag="Loss",
                           tag_scalar_dict={"train_loss": train_loss,
                                            "test_loss": test_loss},
                           global_step=epoch)

        # 將準確度結果添加到 SummaryWriter
        writer.add_scalars(main_tag="Accuracy",
                           tag_scalar_dict={"train_acc": train_acc,
                                            "test_acc": test_acc},
                           global_step=epoch)

        # 追蹤 PyTorch 模型結構
        writer.add_graph(model=model,
                         # 傳遞一個示例輸入
                         input_to_model=torch.randn(32, 3, 224, 224).to(device))

    # 關閉 SummaryWriter
    writer.close()

    ### 新增結束 ###

    # 在 epoch 結束時返回填充的結果
    return results


In [None]:
# Train model
# Note: Not using engine.train() since the original script isn't updated to use writer
torch.manual_seed(CFG.SEED)
torch.cuda.manual_seed(CFG.SEED)

results = train(model=model,
                train_dataloader=train_dataloader,
                test_dataloader=test_dataloader,
                optimizer=optimizer,
                loss_fn=loss_fn,
                epochs=5,
                device=device)

results

  0%|          | 0/5 [00:00<?, ?it/s]

  self.pid = os.fork()
  self.pid = os.fork()


Epoch: 1 | train_loss: 1.0966 | train_acc: 0.3867 | test_loss: 0.8843 | test_acc: 0.6828
Epoch: 2 | train_loss: 0.9204 | train_acc: 0.6445 | test_loss: 0.8134 | test_acc: 0.7746
Epoch: 3 | train_loss: 0.7602 | train_acc: 0.8750 | test_loss: 0.6562 | test_acc: 0.8864
Epoch: 4 | train_loss: 0.6868 | train_acc: 0.7695 | test_loss: 0.6201 | test_acc: 0.8759
Epoch: 5 | train_loss: 0.6467 | train_acc: 0.7969 | test_loss: 0.6054 | test_acc: 0.8665


{'train_loss': [1.0965989083051682,
  0.9203867316246033,
  0.7602227106690407,
  0.6868080832064152,
  0.6466987729072571],
 'train_acc': [0.38671875, 0.64453125, 0.875, 0.76953125, 0.796875],
 'test_loss': [0.8842612306276957,
  0.813412606716156,
  0.6561941703160604,
  0.620058536529541,
  0.6053728858629862],
 'test_acc': [0.6827651515151515,
  0.774621212121212,
  0.8863636363636364,
  0.8759469696969697,
  0.8664772727272728]}

## 在 TensorBoard 中查看模型的結果

可以通過多種方式查看 TensorBoard：


*   VS Code （筆記本或 Python 腳本）：按 SHIFT + CMD + P 打開命令面板並搜索命令「Python：啟動 TensorBoard」。
*   Jupyter 和 Colab 筆記本：確保已安裝 TensorBoard，載入 %load_ext tensorboard 它，然後使用 %tensorboard --logdir DIR_WITH_LOGS .

您還可以將實驗上傳到 tensorboard.dev，以便與他人公開分享。

In [None]:
#  IPython 魔法命令

# 加載 tensorboard 擴展模塊
%load_ext tensorboard
# 啟動 TensorBoard 的主要命令。它會打開一個 TensorBoard 服務，並指定日誌文件的目錄為 runs
%tensorboard --logdir runs

## 建立輔助函數來構建 SummaryWriter() 實例

從本質上講，每個實驗都有自己的日誌目錄。例如，假設我們想跟蹤以下內容：

*   實驗日期/時間戳
*   實驗名稱
*   型號名稱
*   額外 - 是否還應該跟蹤其他內容？





In [None]:
def create_writer(experiment_name: str,
                  model_name: str,
                  extra: str=None) -> torch.utils.tensorboard.writer.SummaryWriter():
    """創建一個 torch.utils.tensorboard.writer.SummaryWriter() 實例，保存到特定的 log_dir。

    log_dir 是 runs/timestamp/experiment_name/model_name/extra 的組合。

    其中 timestamp 是當前日期的 YYYY-MM-DD 格式。

    Args:
        experiment_name (str): 實驗名稱。
        model_name (str): 模型名稱。
        extra (str, optional): 要添加到目錄的任何額外內容。預設為 None。

    Returns:
        torch.utils.tensorboard.writer.SummaryWriter(): 保存到 log_dir 的 writer 實例。

    Example usage:
        # 創建一個保存到 "runs/2022-06-04/data_10_percent/effnetb2/5_epochs/" 的 writer
        writer = create_writer(experiment_name="data_10_percent",
                               model_name="effnetb2",
                               extra="5_epochs")
        # 上面的代碼等同於:
        writer = SummaryWriter(log_dir="runs/2022-06-04/data_10_percent/effnetb2/5_epochs/")
    """
    from datetime import datetime
    import os

    # 獲取當前日期的時間戳 (同一天的所有實驗都存儲在同一個文件夾中)
    timestamp = datetime.now().strftime("%Y-%m-%d") # 返回 YYYY-MM-DD 格式的當前日期

    if extra:
        # 創建日誌目錄路徑
        log_dir = os.path.join("runs", timestamp, experiment_name, model_name, extra)
    else:
        log_dir = os.path.join("runs", timestamp, experiment_name, model_name)

    print(f"[INFO] 創建了 SummaryWriter，保存到: {log_dir}...")
    return SummaryWriter(log_dir=log_dir)


In [None]:
# Create an example writer
example_writer = create_writer(experiment_name="data_10_percent",
                               model_name="effnetb0",
                               extra="5_epochs")

[INFO] 創建了 SummaryWriter，保存到: runs/2024-04-23/data_10_percent/effnetb0/5_epochs...


### 更新 train() 函數以包含參數 writer

例如，假設我們正在運行一系列實驗，為多個不同的模型調用 train() 多次，如果每個實驗都使用不同的 writer .每個實驗一個 writer = 每個實驗一個日誌目錄。

為了調整函數， train() 我們將向函數添加一個 writer 參數，然後我們將添加一些代碼來查看是否存在一個 writer ，如果有，我們將在那裡跟蹤我們的資訊。

In [None]:
from typing import Dict, List
from tqdm.auto import tqdm

# 將 writer 參數新增到 train() 函數中
def train(model: torch.nn.Module,
          train_dataloader: torch.utils.data.DataLoader,
          test_dataloader: torch.utils.data.DataLoader,
          optimizer: torch.optim.Optimizer,
          loss_fn: torch.nn.Module,
          epochs: int,
          device: torch.device,
          writer: torch.utils.tensorboard.writer.SummaryWriter # 新增的參數，用於接收 writer
          ) -> Dict[str, List]:
    """訓練並測試一個 PyTorch 模型。

    將目標 PyTorch 模型通過 train_step() 和 test_step() 函數進行訓練和測試，
    進行指定數量的 epochs，在同一個 epoch 循環中訓練和測試模型。

    在整個過程中計算、打印並存儲評估指標。

    如果存在指定的 writer log_dir，則將指標保存到其中。

    Args:
      model: 要訓練和測試的 PyTorch 模型。
      train_dataloader: 用於訓練模型的 DataLoader 實例。
      test_dataloader: 用於測試模型的 DataLoader 實例。
      optimizer: 幫助最小化損失函數的 PyTorch 優化器。
      loss_fn: 用於計算兩個數據集上的損失的 PyTorch 損失函數。
      epochs: 表示要訓練的 epoch 數量的整數。
      device: 計算的目標設備（例如 "cuda" 或 "cpu"）。
      writer: 用於記錄模型結果的 SummaryWriter() 實例。

    Returns:
      包含訓練和測試損失以及訓練和測試準確度指標的字典。
      每個指標在每個 epoch 中都有一個值列表。
      形式如下: {train_loss: [...],
                train_acc: [...],
                test_loss: [...],
                test_acc: [...]}
      例如，如果 epochs=2:
              {train_loss: [2.0616, 1.0537],
                train_acc: [0.3945, 0.3945],
                test_loss: [1.2641, 1.5706],
                test_acc: [0.3400, 0.2973]}
    """
    # 創建空的結果字典
    results = {"train_loss": [],
               "train_acc": [],
               "test_loss": [],
               "test_acc": []
    }

    # 遍歷指定數量的 epochs 的訓練和測試步驟
    for epoch in tqdm(range(epochs)):
        # 訓練步驟
        train_loss, train_acc = train_step(model=model,
                                          dataloader=train_dataloader,
                                          loss_fn=loss_fn,
                                          optimizer=optimizer,
                                          device=device)
        # 測試步驟
        test_loss, test_acc = test_step(model=model,
          dataloader=test_dataloader,
          loss_fn=loss_fn,
          device=device)

        # 打印當前進度
        print(
          f"Epoch: {epoch+1} | "
          f"train_loss: {train_loss:.4f} | "
          f"train_acc: {train_acc:.4f} | "
          f"test_loss: {test_loss:.4f} | "
          f"test_acc: {test_acc:.4f}"
        )

        # 更新結果字典
        results["train_loss"].append(train_loss)
        results["train_acc"].append(train_acc)
        results["test_loss"].append(test_loss)
        results["test_acc"].append(test_acc)

        ### 新增: 使用 writer 參數追踪實驗 ###
        # 看看是否存在 writer，如果存在，則記錄到其中
        if writer:
            # 將結果添加到 SummaryWriter
            writer.add_scalars(main_tag="Loss",
                               tag_scalar_dict={"train_loss": train_loss,
                                                "test_loss": test_loss},
                               global_step=epoch)
            writer.add_scalars(main_tag="Accuracy",
                               tag_scalar_dict={"train_acc": train_acc,
                                                "test_acc": test_acc},
                               global_step=epoch)

            # 關閉 writer
            writer.close


## 設置一系列建模實驗

每個超參數都代表不同實驗的起點：

- 更改 epoch 的數目。
- 更改層次/隱藏單位的數量（layers/hidden units）。
- 更改數據量。
- 更改學習率（learning rate）。
- 嘗試不同類型的數據增強（data augmentation）。
- 選擇不同的模型體系結構（model architecture）。

讓我們嘗試以下組合：


- 不同數量的數據（比薩餅、牛排、壽司的 10% 與 20%）
- 不同的模型 （ torchvision.models.efficientnet_b0 vs. torchvision.
- 不同的訓練時間（5 個週期與 10 個週期）


### 下載不同的數據集

In [None]:
# 披薩、牛排、壽司 10% 訓練數據。
data_10_percent_path = download_data(source="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip",
                                     destination="pizza_steak_sushi")

# 披薩、牛排、壽司 20% 訓練數據。
data_20_percent_path = download_data(source="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi_20_percent.zip",
                                     destination="pizza_steak_sushi_20_percent")

[INFO] data/pizza_steak_sushi directory exists, skipping download.
[INFO] data/pizza_steak_sushi_20_percent directory exists, skipping download.


In [None]:
# Setup training directory paths
train_dir_10_percent = data_10_percent_path / "train"
train_dir_20_percent = data_20_percent_path / "train"

# Setup testing directory paths (note: use the same test dataset for both to compare the results)
test_dir = data_10_percent_path / "test"

# Check the directories
print(f"Training directory 10%: {train_dir_10_percent}")
print(f"Training directory 20%: {train_dir_20_percent}")
print(f"Testing directory: {test_dir}")

Training directory 10%: data/pizza_steak_sushi/train
Training directory 20%: data/pizza_steak_sushi_20_percent/train
Testing directory: data/pizza_steak_sushi/test


### 轉換數據集並創建DataLoader

In [None]:
# 設置預訓練權重（torchvision.models 中有許多這樣的權重可用）
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT

# 從權重中獲取轉換（這些轉換是用於獲取權重的）
automatic_transforms = weights.transforms()

In [None]:
BATCH_SIZE = 32

# Create 10% training and test DataLoaders
train_dataloader_10_percent, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir_10_percent,
    test_dir=test_dir,
    transform=automatic_transforms,
    batch_size=BATCH_SIZE
)

# Create 20% training and test data DataLoders
train_dataloader_20_percent, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir_20_percent,
    test_dir=test_dir,
    transform=automatic_transforms,
    batch_size=BATCH_SIZE
)

# Find the number of samples/batches per dataloader (using the same test_dataloader for both experiments)
print(f"Number of batches of size {BATCH_SIZE} in 10 percent training data: {len(train_dataloader_10_percent)}")
print(f"Number of batches of size {BATCH_SIZE} in 20 percent training data: {len(train_dataloader_20_percent)}")
print(f"Number of batches of size {BATCH_SIZE} in testing data: {len(train_dataloader_10_percent)} (all experiments will use the same test set)")
print(f"Number of classes: {len(class_names)}, class names: {class_names}")

Number of batches of size 32 in 10 percent training data: 8
Number of batches of size 32 in 20 percent training data: 15
Number of batches of size 32 in testing data: 8 (all experiments will use the same test set)
Number of classes: 3, class names: ['pizza', 'steak', 'sushi']


### 創建特徵提取器模型

- torchvision.models.efficientnet_b0() 預訓練主幹 + 自定義分類器頭（簡稱 EffNetB0）。
- torchvision.models.efficientnet_b2() 預訓練主幹 + 自定義分類器頭（簡稱 EffNetB2）。


在上一章中看到了 EffNetB0 分類器頭的 in_features 參數是 1280 （主幹將輸入圖像轉換為大小 1280 的特徵向量）。由於 EffNetB2 具有不同數量的層和參數，因此我們需要相應地對其進行調整。

要找到 EffNetB2 最後一層所需的輸入形狀，讓我們：

- 創建的 torchvision.models.efficientnet_b2(pretrained=True) 實例。
- 通過運行 torchinfo.summary() 查看各種輸入和輸出形狀。
- 通過檢查 state_dict() EffNetB2 的分類器部分並列印權重矩陣的長度來列印出 的數量 in_features 。

In [None]:
import torchvision
from torchinfo import summary

# 1. Create an instance of EffNetB2 with pretrained weights
effnetb2_weights = torchvision.models.EfficientNet_B2_Weights.DEFAULT
effnetb2 = torchvision.models.efficientnet_b2(weights=effnetb2_weights)

# # 2. Get a summary of standard EffNetB2 from torchvision.models (uncomment for full output)
# summary(model=effnetb2,
#         input_size=(32, 3, 224, 224), # make sure this is "input_size", not "input_shape"
#         # col_names=["input_size"], # uncomment for smaller output
#         col_names=["input_size", "output_size", "num_params", "trainable"],
#         col_width=20,
#         row_settings=["var_names"]
# )

# 3. Get the number of in_features of the EfficientNetB2 classifier layer
effnetb2.classifier

Sequential(
  (0): Dropout(p=0.3, inplace=True)
  (1): Linear(in_features=1408, out_features=1000, bias=True)
)

現在我們知道了 EffNetB2 模型所需的數量 in_features ，讓我們創建幾個幫助程式函數來設置 EffNetB0 和 EffNetB2 特徵提取器模型。

我們希望這些函數能夠：


從中 torchvision.models 獲取基本模型

- 凍結模型中的基礎圖層（設置 requires_grad=False ）
- 設置隨機種子（我們不需要這樣做，但由於我們正在運行一系列實驗並啟動具有隨機權重的新- 層次，因此我們希望每個實驗的隨機性相似）
- 變更分類器頭（以適應我們的問題）
- 為模型命名（例如，EffNetB0 的」effnetb0“）


In [None]:
import torchvision
from torch import nn

# 獲取輸出特徵的數量（每個類別一個，比如披薩、牛排、壽司）
OUT_FEATURES = len(class_names)

# 創建一個 EffNetB0 特徵提取器
def create_effnetb0():
    # 1. 使用預訓練權重獲取基礎模型並發送到目標設備
    weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT
    model = torchvision.models.efficientnet_b0(weights=weights).to(device)

    # 2. 凍結基礎模型層
    for param in model.features.parameters():
        param.requires_grad = False

    # 3. 設置隨機種子
    torch.manual_seed(CFG.SEED)
    torch.cuda.manual_seed(CFG.SEED)

    # 4. 更改分類器頭部
    model.classifier = nn.Sequential(
        nn.Dropout(p=0.2),
        nn.Linear(in_features=1280, out_features=OUT_FEATURES)
    ).to(device)

    # 5. 為模型命名
    model.name = "effnetb0"
    print(f"[INFO] 創建了新的 {model.name} 模型。")
    return model

# 創建一個 EffNetB2 特徵提取器
def create_effnetb2():
    # 1. 使用預訓練權重獲取基礎模型並發送到目標設備
    weights = torchvision.models.EfficientNet_B2_Weights.DEFAULT
    model = torchvision.models.efficientnet_b2(weights=weights).to(device)

    # 2. 凍結基礎模型層
    for param in model.features.parameters():
        param.requires_grad = False

    # 3. 設置隨機種子
    torch.manual_seed(CFG.SEED)
    torch.cuda.manual_seed(CFG.SEED)

    # 4. 更改分類器頭部
    model.classifier = nn.Sequential(
        nn.Dropout(p=0.3),
        nn.Linear(in_features=1408, out_features=OUT_FEATURES)
    ).to(device)

    # 5. 為模型命名
    model.name = "effnetb2"
    print(f"[INFO] 創建了新的 {model.name} 模型。")
    return model


In [None]:
effnetb0 = create_effnetb0()

# Get an output summary of the layers in our EffNetB0 feature extractor model (uncomment to view full output)
# summary(model=effnetb0,
#         input_size=(32, 3, 224, 224), # make sure this is "input_size", not "input_shape"
#         # col_names=["input_size"], # uncomment for smaller output
#         col_names=["input_size", "output_size", "num_params", "trainable"],
#         col_width=20,
#         row_settings=["var_names"]
# )

[INFO] 創建了新的 effnetb0 模型。


In [None]:
effnetb2 = create_effnetb2()

# Get an output summary of the layers in our EffNetB2 feature extractor model (uncomment to view full output)
# summary(model=effnetb2,
#         input_size=(32, 3, 224, 224), # make sure this is "input_size", not "input_shape"
#         # col_names=["input_size"], # uncomment for smaller output
#         col_names=["input_size", "output_size", "num_params", "trainable"],
#         col_width=20,
#         row_settings=["var_names"]
# )

[INFO] 創建了新的 effnetb2 模型。


### 創建實驗並設置訓練代碼

我們將從創建兩個清單和一個字典開始：


- 我們要測試的紀元數清單 （ [5, 10] ）
- 我們要測試的模型清單 （ ["effnetb0", "effnetb2"] ）
- 不同訓練 DataLoader 的字典

In [None]:
# 1. Create epochs list
num_epochs = [5, 10]

# 2. Create models list (need to create a new model for each experiment)
models = ["effnetb0", "effnetb2"]

# 3. Create dataloaders dictionary for various dataloaders
train_dataloaders = {"data_10_percent": train_dataloader_10_percent,
                     "data_20_percent": train_dataloader_20_percent}

- 設置隨機種子（因此我們的實驗結果是可重複的，在實踐中，您可以在 ~3 個不同的種子上運行相同的實驗並平均結果）。
- 跟蹤不同的實驗編號（這主要用於漂亮的列印輸出）。
- 迴圈遍歷每個不同訓練 DataLoader 的 train_dataloaders 字典項。
- 迴圈瀏覽紀元編號清單。
- 循環流覽不同型號名稱的清單。
- 為當前正在運行的實驗創建資訊列印輸出（以便我們知道發生了什麼）。
- 檢查哪個模型是目標模型，並創建一個新的 EffNetB0 或 EffNetB2 實例（我們在每個實驗中創建一個新的模型實例，以便所有模型從相同的角度開始）。
- 為每個新實驗創建一個新的損失函數 （ torch.nn.CrossEntropyLoss() ） 和優化器 （ torch.optim.Adam(params=model.parameters(), lr=0.001) ）。
- 使用修改 train() 後的函數訓練模型，將適當的詳細資訊傳遞給 writer 參數。
- 使用適當的檔名保存訓練的模型，以便使用 save_model() from utils.py 進行文件處理。
- 我們還可以使用這個 %%time 魔術來查看我們所有的實驗在一個 Jupyter/Google Colab 單元中總共花費了多長時間。

In [None]:
%%time
from going_modular.going_modular.utils import save_model

# 1. 設置隨機種子
torch.manual_seed(CFG.SEED)
torch.cuda.manual_seed(CFG.SEED)

# 2. 記錄實驗編號
experiment_number = 0

# 3. 遍歷每個 DataLoader
for dataloader_name, train_dataloader in train_dataloaders.items():

    # 4. 遍歷每個 epoch 數
    for epochs in num_epochs:

        # 5. 遍歷每個模型名稱並基於名稱創建新模型
        for model_name in models:

            # 6. 創建信息打印
            experiment_number += 1
            print(f"[INFO] 實驗編號：{experiment_number}")
            print(f"[INFO] 模型：{model_name}")
            print(f"[INFO] DataLoader：{dataloader_name}")
            print(f"[INFO] epoch 數：{epochs}")

            # 7. 選擇模型
            if model_name == "effnetb0":
                model = create_effnetb0()  # 每次創建一個新模型（重要，因為我們希望每個實驗都從頭開始）
            else:
                model = create_effnetb2()  # 每次創建一個新模型（重要，因為我們希望每個實驗都從頭開始）

            # 8. 為每個模型創建新的損失和優化器
            loss_fn = nn.CrossEntropyLoss()
            optimizer = torch.optim.Adam(params=model.parameters(), lr=0.001)

            # 9. 使用目標 DataLoader 訓練目標模型並跟踪實驗
            train(model=model,
                  train_dataloader=train_dataloader,
                  test_dataloader=test_dataloader,
                  optimizer=optimizer,
                  loss_fn=loss_fn,
                  epochs=epochs,
                  device=device,
                  writer=create_writer(experiment_name=dataloader_name,
                                       model_name=model_name,
                                       extra=f"{epochs}_epochs"))

            # 10. 將模型保存到文件以便獲得最佳模型
            save_filepath = f"07_{model_name}_{dataloader_name}_{epochs}_epochs.pth"
            save_model(model=model,
                       target_dir="models",
                       model_name=save_filepath)
            print("-"*50 + "\n")

[INFO] 實驗編號：1
[INFO] 模型：effnetb0
[INFO] DataLoader：data_10_percent
[INFO] epoch 數：5
[INFO] 創建了新的 effnetb0 模型。
[INFO] 創建了 SummaryWriter，保存到: runs/2024-04-23/data_10_percent/effnetb0/5_epochs...


  0%|          | 0/5 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0552 | train_acc: 0.4805 | test_loss: 0.8709 | test_acc: 0.6818
Epoch: 2 | train_loss: 0.8930 | train_acc: 0.6133 | test_loss: 0.8002 | test_acc: 0.7235
Epoch: 3 | train_loss: 0.8017 | train_acc: 0.7148 | test_loss: 0.6766 | test_acc: 0.8864
Epoch: 4 | train_loss: 0.7204 | train_acc: 0.7383 | test_loss: 0.6337 | test_acc: 0.8864
Epoch: 5 | train_loss: 0.6434 | train_acc: 0.7695 | test_loss: 0.6318 | test_acc: 0.8352
[INFO] Saving model to: models/07_effnetb0_data_10_percent_5_epochs.pth
--------------------------------------------------

[INFO] 實驗編號：2
[INFO] 模型：effnetb2
[INFO] DataLoader：data_10_percent
[INFO] epoch 數：5
[INFO] 創建了新的 effnetb2 模型。
[INFO] 創建了 SummaryWriter，保存到: runs/2024-04-23/data_10_percent/effnetb2/5_epochs...


  0%|          | 0/5 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0882 | train_acc: 0.3945 | test_loss: 0.9509 | test_acc: 0.6903
Epoch: 2 | train_loss: 0.8747 | train_acc: 0.7891 | test_loss: 0.8623 | test_acc: 0.7642
Epoch: 3 | train_loss: 0.8157 | train_acc: 0.7109 | test_loss: 0.7661 | test_acc: 0.8258
Epoch: 4 | train_loss: 0.6984 | train_acc: 0.7734 | test_loss: 0.7038 | test_acc: 0.8968
Epoch: 5 | train_loss: 0.6116 | train_acc: 0.9062 | test_loss: 0.6557 | test_acc: 0.9176
[INFO] Saving model to: models/07_effnetb2_data_10_percent_5_epochs.pth
--------------------------------------------------

[INFO] 實驗編號：3
[INFO] 模型：effnetb0
[INFO] DataLoader：data_10_percent
[INFO] epoch 數：10
[INFO] 創建了新的 effnetb0 模型。
[INFO] 創建了 SummaryWriter，保存到: runs/2024-04-23/data_10_percent/effnetb0/10_epochs...


  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0552 | train_acc: 0.4805 | test_loss: 0.8709 | test_acc: 0.6818
Epoch: 2 | train_loss: 0.8930 | train_acc: 0.6133 | test_loss: 0.8002 | test_acc: 0.7235
Epoch: 3 | train_loss: 0.8017 | train_acc: 0.7148 | test_loss: 0.6766 | test_acc: 0.8864
Epoch: 4 | train_loss: 0.7204 | train_acc: 0.7383 | test_loss: 0.6337 | test_acc: 0.8864
Epoch: 5 | train_loss: 0.6434 | train_acc: 0.7695 | test_loss: 0.6318 | test_acc: 0.8352
Epoch: 6 | train_loss: 0.5703 | train_acc: 0.7969 | test_loss: 0.5895 | test_acc: 0.8352
Epoch: 7 | train_loss: 0.5910 | train_acc: 0.7969 | test_loss: 0.5566 | test_acc: 0.8561
Epoch: 8 | train_loss: 0.5358 | train_acc: 0.7969 | test_loss: 0.4820 | test_acc: 0.9167
Epoch: 9 | train_loss: 0.4827 | train_acc: 0.8047 | test_loss: 0.4733 | test_acc: 0.8561
Epoch: 10 | train_loss: 0.4918 | train_acc: 0.8359 | test_loss: 0.5293 | test_acc: 0.7945
[INFO] Saving model to: models/07_effnetb0_data_10_percent_10_epochs.pth
------------------------------------

  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0882 | train_acc: 0.3945 | test_loss: 0.9509 | test_acc: 0.6903
Epoch: 2 | train_loss: 0.8747 | train_acc: 0.7891 | test_loss: 0.8623 | test_acc: 0.7642
Epoch: 3 | train_loss: 0.8157 | train_acc: 0.7109 | test_loss: 0.7661 | test_acc: 0.8258
Epoch: 4 | train_loss: 0.6984 | train_acc: 0.7734 | test_loss: 0.7038 | test_acc: 0.8968
Epoch: 5 | train_loss: 0.6116 | train_acc: 0.9062 | test_loss: 0.6557 | test_acc: 0.9176
Epoch: 6 | train_loss: 0.6365 | train_acc: 0.7969 | test_loss: 0.6029 | test_acc: 0.9176
Epoch: 7 | train_loss: 0.5557 | train_acc: 0.7930 | test_loss: 0.5743 | test_acc: 0.9479
Epoch: 8 | train_loss: 0.4838 | train_acc: 0.9570 | test_loss: 0.5499 | test_acc: 0.8968
Epoch: 9 | train_loss: 0.4911 | train_acc: 0.9453 | test_loss: 0.5256 | test_acc: 0.9271
Epoch: 10 | train_loss: 0.4338 | train_acc: 0.9414 | test_loss: 0.5120 | test_acc: 0.9176
[INFO] Saving model to: models/07_effnetb2_data_10_percent_10_epochs.pth
------------------------------------

  0%|          | 0/5 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 0.9627 | train_acc: 0.5500 | test_loss: 0.6460 | test_acc: 0.9271
Epoch: 2 | train_loss: 0.7259 | train_acc: 0.7729 | test_loss: 0.5569 | test_acc: 0.9280
Epoch: 3 | train_loss: 0.5433 | train_acc: 0.8667 | test_loss: 0.4535 | test_acc: 0.9271
Epoch: 4 | train_loss: 0.4873 | train_acc: 0.8500 | test_loss: 0.4362 | test_acc: 0.9072
Epoch: 5 | train_loss: 0.4414 | train_acc: 0.8646 | test_loss: 0.3697 | test_acc: 0.9271
[INFO] Saving model to: models/07_effnetb0_data_20_percent_5_epochs.pth
--------------------------------------------------

[INFO] 實驗編號：6
[INFO] 模型：effnetb2
[INFO] DataLoader：data_20_percent
[INFO] epoch 數：5
[INFO] 創建了新的 effnetb2 模型。
[INFO] 創建了 SummaryWriter，保存到: runs/2024-04-23/data_20_percent/effnetb2/5_epochs...


  0%|          | 0/5 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0002 | train_acc: 0.5042 | test_loss: 0.7862 | test_acc: 0.8759
Epoch: 2 | train_loss: 0.7888 | train_acc: 0.7417 | test_loss: 0.6610 | test_acc: 0.8968
Epoch: 3 | train_loss: 0.6287 | train_acc: 0.8104 | test_loss: 0.5663 | test_acc: 0.9479
Epoch: 4 | train_loss: 0.5229 | train_acc: 0.8646 | test_loss: 0.5065 | test_acc: 0.8968
Epoch: 5 | train_loss: 0.4839 | train_acc: 0.8646 | test_loss: 0.4699 | test_acc: 0.9375
[INFO] Saving model to: models/07_effnetb2_data_20_percent_5_epochs.pth
--------------------------------------------------

[INFO] 實驗編號：7
[INFO] 模型：effnetb0
[INFO] DataLoader：data_20_percent
[INFO] epoch 數：10
[INFO] 創建了新的 effnetb0 模型。
[INFO] 創建了 SummaryWriter，保存到: runs/2024-04-23/data_20_percent/effnetb0/10_epochs...


  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 0.9627 | train_acc: 0.5500 | test_loss: 0.6460 | test_acc: 0.9271
Epoch: 2 | train_loss: 0.7259 | train_acc: 0.7729 | test_loss: 0.5569 | test_acc: 0.9280
Epoch: 3 | train_loss: 0.5433 | train_acc: 0.8667 | test_loss: 0.4535 | test_acc: 0.9271
Epoch: 4 | train_loss: 0.4873 | train_acc: 0.8500 | test_loss: 0.4362 | test_acc: 0.9072
Epoch: 5 | train_loss: 0.4414 | train_acc: 0.8646 | test_loss: 0.3697 | test_acc: 0.9271
Epoch: 6 | train_loss: 0.3738 | train_acc: 0.9187 | test_loss: 0.3561 | test_acc: 0.8873
Epoch: 7 | train_loss: 0.3456 | train_acc: 0.9083 | test_loss: 0.3273 | test_acc: 0.8873
Epoch: 8 | train_loss: 0.3540 | train_acc: 0.9229 | test_loss: 0.3004 | test_acc: 0.9479
Epoch: 9 | train_loss: 0.3856 | train_acc: 0.8896 | test_loss: 0.2954 | test_acc: 0.9081
Epoch: 10 | train_loss: 0.3496 | train_acc: 0.8812 | test_loss: 0.2988 | test_acc: 0.9176
[INFO] Saving model to: models/07_effnetb0_data_20_percent_10_epochs.pth
------------------------------------

  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0002 | train_acc: 0.5042 | test_loss: 0.7862 | test_acc: 0.8759
Epoch: 2 | train_loss: 0.7888 | train_acc: 0.7417 | test_loss: 0.6610 | test_acc: 0.8968
Epoch: 3 | train_loss: 0.6287 | train_acc: 0.8104 | test_loss: 0.5663 | test_acc: 0.9479
Epoch: 4 | train_loss: 0.5229 | train_acc: 0.8646 | test_loss: 0.5065 | test_acc: 0.8968
Epoch: 5 | train_loss: 0.4839 | train_acc: 0.8646 | test_loss: 0.4699 | test_acc: 0.9375
Epoch: 6 | train_loss: 0.4399 | train_acc: 0.8854 | test_loss: 0.4384 | test_acc: 0.9271
Epoch: 7 | train_loss: 0.4455 | train_acc: 0.8396 | test_loss: 0.4039 | test_acc: 0.9583
Epoch: 8 | train_loss: 0.3840 | train_acc: 0.8667 | test_loss: 0.3774 | test_acc: 0.9271
Epoch: 9 | train_loss: 0.3324 | train_acc: 0.9229 | test_loss: 0.4100 | test_acc: 0.9375
Epoch: 10 | train_loss: 0.3773 | train_acc: 0.9021 | test_loss: 0.3571 | test_acc: 0.9583
[INFO] Saving model to: models/07_effnetb2_data_20_percent_10_epochs.pth
------------------------------------