# 实验三：Fashion MNIST 正则化前后对比

 本次实验旨在通过构建两个卷积神经网络（CNN）来对比正则化技术的效果。
 我们将使用Fashion-MNIST数据集，一个模型不使用正则化，另一个模型使用Dropout和BatchNorm作为正则化手段。

 **我的思考**：
 正则化是防止模型过拟合的关键技术。过拟合指的是模型在训练集上表现很好，但在未见过的测试集上表现较差。
 - **Dropout**: 在训练过程中随机“丢弃”一部分神经元的输出，可以强制网络学习更加鲁棒的特征，因为它不能依赖于任何单个神经元。
 - **BatchNorm**: 对每一层的输入进行归一化，可以加速模型收敛，并在一定程度上起到正则化的作用。
 我的预期是看到带有正则化的模型在测试集上能获得更高的准确率。

## 1. 环境准备与库导入

 导入所有必需的库，并设置MindSpore的运行环境。

In [1]:
import os
import struct
import sys
from easydict import EasyDict as edict
import matplotlib.pyplot as plt
import numpy as np
import mindspore
import mindspore.dataset as ds
import mindspore.nn as nn
from mindspore import context, Tensor
from mindspore.train import Model
from mindspore.nn.metrics import Accuracy
from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor

context.set_context(mode=context.GRAPH_MODE, device_target='Ascend')

## 2. 数据准备与预处理

 ### 2.1 下载并解压数据集

 首先，我们通过命令行下载并解压Fashion-MNIST数据集。

In [2]:
!wget https://ascend-professional-construction-dataset.obs.myhuaweicloud.com/deep-learning/fashion-mnist.zip

!unzip fashion-mnist.zip

--2025-07-05 23:13:39--  https://ascend-professional-construction-dataset.obs.myhuaweicloud.com/deep-learning/fashion-mnist.zip
Resolving ascend-professional-construction-dataset.obs.myhuaweicloud.com (ascend-professional-construction-dataset.obs.myhuaweicloud.com)... 100.125.83.133, 100.125.83.5, 100.125.76.5
Connecting to ascend-professional-construction-dataset.obs.myhuaweicloud.com (ascend-professional-construction-dataset.obs.myhuaweicloud.com)|100.125.83.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 30874889 (29M) [application/zip]
Saving to: ‘fashion-mnist.zip’


2025-07-05 23:13:39 (229 MB/s) - ‘fashion-mnist.zip’ saved [30874889/30874889]

Archive:  fashion-mnist.zip
   creating: fashion-mnist/
   creating: fashion-mnist/test/
  inflating: fashion-mnist/test/t10k-images-idx3-ubyte  
  inflating: fashion-mnist/test/t10k-labels-idx1-ubyte  
   creating: fashion-mnist/train/
  inflating: fashion-mnist/train/train-images-idx3-ubyte  
  inflating: fa

### 2.2 定义常量和数据读取函数

 我们定义一些常量来管理数据集，并编写函数来从二进制文件中读取图像和标签。

In [3]:
cfg = edict({
    'train_size': 60000,
    'test_size': 10000,
    'channel': 1,
    'image_height': 28,
    'image_width': 28,
    'batch_size': 64,
    'num_classes': 10,
    'lr': 0.001,
    'epoch_size': 3,
    'data_dir_train': os.path.join('fashion-mnist', 'train'),
    'data_dir_test': os.path.join('fashion-mnist', 'test'),
})

def read_image(file_name):
    with open(file_name, "rb") as f:
        buf = f.read()
    magic, img_num, rows, cols = struct.unpack_from('>IIII', buf, 0)
    offset = struct.calcsize('>IIII')
    imgs = np.frombuffer(buf, dtype=np.uint8, offset=offset).reshape(img_num, rows, cols)
    return imgs

def read_label(file_name):
    with open(file_name, "rb") as f:
        buf = f.read()
    magic, label_num = struct.unpack_from('>II', buf, 0)
    offset = struct.calcsize('>II')
    labels = np.frombuffer(buf, dtype=np.uint8, offset=offset)
    return labels

def get_data():
    train_image = read_image(os.path.join(cfg.data_dir_train, 'train-images-idx3-ubyte'))
    train_label = read_label(os.path.join(cfg.data_dir_train, 'train-labels-idx1-ubyte'))
    test_image = read_image(os.path.join(cfg.data_dir_test, 't10k-images-idx3-ubyte'))
    test_label = read_label(os.path.join(cfg.data_dir_test, 't10k-labels-idx1-ubyte'))
    
    train_x = train_image.reshape(-1, 1, cfg.image_height, cfg.image_width).astype(np.float32) / 255.0
    test_x = test_image.reshape(-1, 1, cfg.image_height, cfg.image_width).astype(np.float32) / 255.0
    
    train_y = train_label.astype(np.int32)
    test_y = test_label.astype(np.int32)
    
    return train_x, train_y, test_x, test_y

### 2.3 创建Dataset对象

 将numpy数据转换为MindSpore的Dataset对象，以便进行高效的训练。

In [4]:
def create_dataset():
    train_x, train_y, test_x, test_y = get_data()
    
    XY_train = list(zip(train_x, train_y))
    ds_train = ds.GeneratorDataset(XY_train, ['x', 'y'])
    ds_train = ds_train.shuffle(buffer_size=1000).batch(cfg.batch_size, drop_remainder=True)
    
    XY_test = list(zip(test_x, test_y))
    ds_test = ds.GeneratorDataset(XY_test, ['x', 'y'])
    ds_test = ds_test.shuffle(buffer_size=1000).batch(cfg.batch_size, drop_remainder=True)
    
    return ds_train, ds_test

## 3. 模型构建

 ### 3.1 无正则化模型

In [5]:
class ForwardFashion(nn.Cell):
    def __init__(self, num_class=10):
        super(ForwardFashion, self).__init__()
        self.num_class = num_class
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=0, has_bias=False, pad_mode="valid")
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=0, has_bias=False, pad_mode="valid")
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=0, has_bias=False, pad_mode="valid")
        self.maxpool2d = nn.MaxPool2d(kernel_size=2, stride=2)
        self.relu = nn.ReLU()
        self.flatten = nn.Flatten()
        self.fc1 = nn.Dense(128 * 11 * 11, 128)
        self.fc2 = nn.Dense(128, self.num_class)

    def construct(self, x):
        x = self.conv1(x)
        x = self.relu(x)
        x = self.conv2(x)
        x = self.relu(x)
        x = self.conv3(x)
        x = self.relu(x)
        x = self.maxpool2d(x)
        x = self.flatten(x)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

### 3.2 有正则化模型

In [None]:
class ForwardFashionRegularization(nn.Cell):
    def __init__(self, num_class=10):
        super(ForwardFashionRegularization, self).__init__()
        self.num_class = num_class
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=0, has_bias=False, pad_mode="valid")
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=0, has_bias=False, pad_mode="valid")
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=0, has_bias=False, pad_mode="valid")
        self.maxpool2d = nn.MaxPool2d(kernel_size=2, stride=2)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout()
        self.flatten = nn.Flatten()
        self.fc1 = nn.Dense(3200, 128) # Note: The input size to fc1 changes due to dropout placement
        self.bn = nn.BatchNorm1d(128)
        self.fc2 = nn.Dense(128, self.num_class)

    def construct(self, x):
        x = self.conv1(x)
        x = self.relu(x)
        
        x = self.conv2(x)
        x = self.relu(x)
        x = self.maxpool2d(x) # Apply pooling
        x = self.dropout(x)   # Apply dropout after pooling
        
        x = self.conv3(x)
        x = self.relu(x)
        x = self.maxpool2d(x) # Apply pooling
        x = self.dropout(x)   # Apply dropout after pooling

        x = self.flatten(x)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.bn(x)        # Apply BatchNorm after activation
        x = self.dropout(x)   # Apply dropout after activation/bn
        
        x = self.fc2(x)
        return x

## 4. 训练与评估

 我们定义一个统一的训练函数来处理两个模型的训练和评估流程。

In [7]:
def train_and_eval(Net):
    ds_train, ds_test = create_dataset()
    network = Net(cfg.num_classes)
    
    net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean")
    net_opt = nn.Adam(network.trainable_params(), cfg.lr)
    
    model = Model(network, loss_fn=net_loss, optimizer=net_opt, metrics={'acc': Accuracy()})
    
    print(f"============== Starting Training for {Net.__name__} ==============")
    model.train(cfg.epoch_size, ds_train, callbacks=[LossMonitor()], dataset_sink_mode=False)
    
    metric = model.eval(ds_test)
    print(f"============== Evaluation for {Net.__name__} ==============")
    print(metric)
    return metric

## 5. 执行并对比结果

In [None]:
print("--- Training without Regularization ---")
acc_no_reg = train_and_eval(ForwardFashion)

print("\n--- Training with Regularization ---")
acc_with_reg = train_and_eval(ForwardFashionRegularization)

--- Training without Regularization ---
epoch: 1 step: 1, loss is 2.302584648132324
epoch: 1 step: 2, loss is 2.3025059700012207
epoch: 1 step: 3, loss is 2.3019862174987793
epoch: 1 step: 4, loss is 2.3009018898010254
epoch: 1 step: 5, loss is 2.2970476150512695
epoch: 1 step: 6, loss is 2.280426502227783
epoch: 1 step: 7, loss is 2.278353691101074
epoch: 1 step: 8, loss is 2.2417538166046143
epoch: 1 step: 9, loss is 2.1988327503204346
epoch: 1 step: 10, loss is 2.216838836669922
epoch: 1 step: 11, loss is 2.1320013999938965
epoch: 1 step: 12, loss is 2.010350227355957
epoch: 1 step: 13, loss is 1.855591058731079
epoch: 1 step: 14, loss is 1.6714884042739868
epoch: 1 step: 15, loss is 1.4043824672698975
epoch: 1 step: 16, loss is 1.5570423603057861
epoch: 1 step: 17, loss is 1.1973830461502075
epoch: 1 step: 18, loss is 1.3200874328613281
epoch: 1 step: 19, loss is 1.1063528060913086
epoch: 1 step: 20, loss is 0.971346378326416
epoch: 1 step: 21, loss is 1.134806752204895
epoch: 1 st



epoch: 1 step: 32, loss is 1.409163236618042
epoch: 1 step: 33, loss is 1.3224217891693115
epoch: 1 step: 34, loss is 1.3310039043426514
epoch: 1 step: 35, loss is 1.5378003120422363
epoch: 1 step: 36, loss is 1.3085674047470093
epoch: 1 step: 37, loss is 1.366121768951416
epoch: 1 step: 38, loss is 1.298721432685852
epoch: 1 step: 39, loss is 1.3537378311157227
epoch: 1 step: 40, loss is 1.3253750801086426
epoch: 1 step: 41, loss is 1.2786190509796143
epoch: 1 step: 42, loss is 1.3129839897155762
epoch: 1 step: 43, loss is 1.148064136505127
epoch: 1 step: 44, loss is 1.2019751071929932
epoch: 1 step: 45, loss is 1.1124002933502197
epoch: 1 step: 46, loss is 1.2596540451049805
epoch: 1 step: 47, loss is 1.1668334007263184
epoch: 1 step: 48, loss is 1.206587791442871
epoch: 1 step: 49, loss is 1.1631790399551392
epoch: 1 step: 50, loss is 1.2542027235031128
epoch: 1 step: 51, loss is 1.3010598421096802
epoch: 1 step: 52, loss is 1.213181972503662
epoch: 1 step: 53, loss is 0.90777015686

### 实验总结

 **结果分析**:

 1.  **无正则化模型**: 准确率约为 `90.42%`。
 2.  **有正则化模型**: 准确率约为 `88.29%`(不知道为什么上面结果没有显示出来)。

 **结论**:
 从结果可以看出，添加了Dropout和BatchNorm的正则化模型在Fashion-MNIST测试集上的表现并没有优于没有正则化的模型，这与我的预期有一些差距。
 我认为这可能是因为Fashion-MNIST数据集相对简单，正则化技术对它的影响有限。
 在实践中，我们应当根据模型的复杂度和数据集的大小，合理地组合使用不同的正则化方法是提升模型性能的常用策略。