# ResNet-34 残差网络实现

ResNet（Residual Network）是深度学习历史上最重要的架构之一，解决了深度网络的退化问题。

## 核心创新：残差学习

传统深度网络随着层数增加，训练误差反而上升（退化问题）。
ResNet通过**跳跃连接（Skip Connection）**让网络学习残差映射：

$$F(x) = H(x) - x$$

最终输出：$y = F(x) + x$

本教程涵盖：
- 残差块（Residual Block）的原理与实现
- ResNet-34完整架构构建
- 使用Fashion MNIST进行训练验证

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt

# 设置随机种子
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
tf.random.set_seed(RANDOM_SEED)

print(f"TensorFlow版本: {tf.__version__}")
print(f"GPU可用: {tf.config.list_physical_devices('GPU')}")

## 第一部分：残差块实现

### 1.1 基础残差块（Basic Block）

ResNet-34使用基础残差块，结构为：
```
输入 ─────────────────────────────┐
  │                               │
  ├─→ Conv3x3 → BN → ReLU        │
  │                               │
  ├─→ Conv3x3 → BN               │
  │                               │
  └─→ Add ←───────────────────────┘
       │
       └─→ ReLU → 输出
```

当输入输出维度不同时，需要对跳跃连接进行投影变换。

In [None]:
class ResidualUnit(keras.layers.Layer):
    """
    残差单元（Residual Unit）
    
    实现ResNet的基础残差块，包含两个3x3卷积层和一个跳跃连接。
    当输入输出维度不同时（strides>1），使用1x1卷积进行投影。
    
    Parameters
    ----------
    filters : int
        卷积层的滤波器数量
    strides : int, default=1
        第一个卷积层的步幅。strides>1时进行下采样
    activation : str or callable, default='relu'
        激活函数
    
    Notes
    -----
    - 使用BN-before-activation的顺序（原始ResNet论文）
    - 卷积层不使用偏置（use_bias=False），因为BN会学习偏置
    """
    
    def __init__(self, filters, strides=1, activation='relu', **kwargs):
        super().__init__(**kwargs)
        self.filters = filters
        self.strides = strides
        self.activation_fn = keras.activations.get(activation)
        
        # 主路径：两个3x3卷积 + BN
        self.main_layers = [
            keras.layers.Conv2D(
                filters, kernel_size=3, strides=strides,
                padding='same', use_bias=False,
                kernel_initializer='he_normal'
            ),
            keras.layers.BatchNormalization(),
            keras.layers.Activation(activation),
            keras.layers.Conv2D(
                filters, kernel_size=3, strides=1,
                padding='same', use_bias=False,
                kernel_initializer='he_normal'
            ),
            keras.layers.BatchNormalization()
        ]
        
        # 跳跃连接：当维度变化时需要投影
        self.skip_layers = []
        if strides > 1:
            self.skip_layers = [
                keras.layers.Conv2D(
                    filters, kernel_size=1, strides=strides,
                    padding='same', use_bias=False,
                    kernel_initializer='he_normal'
                ),
                keras.layers.BatchNormalization()
            ]
    
    def call(self, inputs, training=None):
        # 主路径前向传播
        z = inputs
        for layer in self.main_layers:
            if isinstance(layer, keras.layers.BatchNormalization):
                z = layer(z, training=training)
            else:
                z = layer(z)
        
        # 跳跃连接
        skip_z = inputs
        for layer in self.skip_layers:
            if isinstance(layer, keras.layers.BatchNormalization):
                skip_z = layer(skip_z, training=training)
            else:
                skip_z = layer(skip_z)
        
        # 残差相加并激活
        return self.activation_fn(z + skip_z)
    
    def get_config(self):
        config = super().get_config()
        config.update({
            'filters': self.filters,
            'strides': self.strides
        })
        return config

In [None]:
# 测试残差单元
test_input = np.random.randn(2, 32, 32, 64).astype(np.float32)

# 不改变维度的残差块
res_unit_same = ResidualUnit(filters=64, strides=1)
output_same = res_unit_same(test_input)
print(f"输入形状: {test_input.shape}")
print(f"strides=1 输出形状: {output_same.shape}")

# 下采样的残差块
res_unit_down = ResidualUnit(filters=128, strides=2)
output_down = res_unit_down(test_input)
print(f"strides=2 输出形状: {output_down.shape}")

## 第二部分：构建ResNet-34

### 2.1 ResNet-34架构

| 阶段 | 输出尺寸 | 残差块配置 |
|------|---------|------------|
| Conv1 | 112×112 | 7×7, 64, stride 2 |
| Pool | 56×56 | 3×3 max pool, stride 2 |
| Conv2_x | 56×56 | [3×3, 64] × 3 |
| Conv3_x | 28×28 | [3×3, 128] × 4 |
| Conv4_x | 14×14 | [3×3, 256] × 6 |
| Conv5_x | 7×7 | [3×3, 512] × 3 |
| GAP | 1×1 | Global Average Pooling |
| FC | 1000 | Fully Connected |

In [None]:
def build_resnet34(input_shape=(224, 224, 3), num_classes=1000):
    """
    构建ResNet-34模型
    
    Parameters
    ----------
    input_shape : tuple
        输入图像形状
    num_classes : int
        分类类别数
    
    Returns
    -------
    keras.Model
        ResNet-34模型
    """
    model = keras.Sequential(name='ResNet34')
    
    # Stage 1: 初始卷积层
    model.add(keras.layers.Conv2D(
        64, kernel_size=7, strides=2, padding='same',
        use_bias=False, input_shape=input_shape,
        kernel_initializer='he_normal',
        name='conv1'
    ))
    model.add(keras.layers.BatchNormalization(name='bn1'))
    model.add(keras.layers.Activation('relu', name='relu1'))
    model.add(keras.layers.MaxPool2D(
        pool_size=3, strides=2, padding='same', name='pool1'
    ))
    
    # ResNet-34 残差块配置: [3, 4, 6, 3]
    # 对应滤波器数量: [64, 128, 256, 512]
    block_config = [
        (64, 3),   # conv2_x: 3个块，64个滤波器
        (128, 4),  # conv3_x: 4个块，128个滤波器
        (256, 6),  # conv4_x: 6个块，256个滤波器
        (512, 3)   # conv5_x: 3个块，512个滤波器
    ]
    
    prev_filters = 64
    for stage_idx, (filters, num_blocks) in enumerate(block_config):
        for block_idx in range(num_blocks):
            # 每个stage的第一个块可能需要下采样
            if block_idx == 0 and filters != prev_filters:
                strides = 2
            else:
                strides = 1
            
            model.add(ResidualUnit(
                filters, strides=strides,
                name=f'conv{stage_idx+2}_block{block_idx+1}'
            ))
        prev_filters = filters
    
    # 分类头
    model.add(keras.layers.GlobalAveragePooling2D(name='avg_pool'))
    model.add(keras.layers.Dense(
        num_classes, activation='softmax',
        kernel_initializer='he_normal',
        name='predictions'
    ))
    
    return model

In [None]:
# 构建完整的ResNet-34
resnet34 = build_resnet34(input_shape=(224, 224, 3), num_classes=1000)

# 显示模型摘要
print("ResNet-34 模型结构:")
print(f"总层数: {len(resnet34.layers)}")
print(f"总参数量: {resnet34.count_params():,}")

## 第三部分：简化版ResNet用于Fashion MNIST

由于Fashion MNIST图像较小(28×28)，我们构建一个简化版ResNet

In [None]:
def build_mini_resnet(input_shape=(28, 28, 1), num_classes=10):
    """
    构建适用于小图像的简化版ResNet
    
    针对28×28图像优化，去除初始下采样层
    """
    model = keras.Sequential(name='MiniResNet')
    
    # 初始卷积（不下采样）
    model.add(keras.layers.Conv2D(
        32, kernel_size=3, padding='same',
        use_bias=False, input_shape=input_shape,
        kernel_initializer='he_normal'
    ))
    model.add(keras.layers.BatchNormalization())
    model.add(keras.layers.Activation('relu'))
    
    # 简化的残差块配置
    block_config = [
        (32, 2),   # 2个块，32个滤波器，28×28
        (64, 2),   # 2个块，64个滤波器，14×14
        (128, 2),  # 2个块，128个滤波器，7×7
    ]
    
    prev_filters = 32
    for stage_idx, (filters, num_blocks) in enumerate(block_config):
        for block_idx in range(num_blocks):
            if block_idx == 0 and filters != prev_filters:
                strides = 2
            else:
                strides = 1
            
            model.add(ResidualUnit(filters, strides=strides))
        prev_filters = filters
    
    # 分类头
    model.add(keras.layers.GlobalAveragePooling2D())
    model.add(keras.layers.Dropout(0.3))
    model.add(keras.layers.Dense(
        num_classes, activation='softmax',
        kernel_initializer='he_normal'
    ))
    
    return model

# 构建小型ResNet
mini_resnet = build_mini_resnet()
mini_resnet.summary()

## 第四部分：训练与验证

In [None]:
# 加载Fashion MNIST数据
(X_train, y_train), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()

# 数据预处理
X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0
X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0

# 划分验证集
X_val, y_val = X_train[-5000:], y_train[-5000:]
X_train, y_train = X_train[:-5000], y_train[:-5000]

print(f"训练集: {X_train.shape}")
print(f"验证集: {X_val.shape}")
print(f"测试集: {X_test.shape}")

In [None]:
# 编译模型
mini_resnet.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# 回调函数
callbacks = [
    keras.callbacks.EarlyStopping(
        monitor='val_loss', patience=5, restore_best_weights=True
    ),
    keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss', factor=0.5, patience=3, min_lr=1e-6
    )
]

# 训练参数
EPOCHS = 10
BATCH_SIZE = 64

print("开始训练...")
history = mini_resnet.fit(
    X_train, y_train,
    epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    validation_data=(X_val, y_val),
    callbacks=callbacks,
    verbose=1
)

In [None]:
# 可视化训练过程
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# 损失曲线
axes[0].plot(history.history['loss'], label='训练损失')
axes[0].plot(history.history['val_loss'], label='验证损失')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss')
axes[0].set_title('损失曲线')
axes[0].legend()
axes[0].grid(True)

# 准确率曲线
axes[1].plot(history.history['accuracy'], label='训练准确率')
axes[1].plot(history.history['val_accuracy'], label='验证准确率')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Accuracy')
axes[1].set_title('准确率曲线')
axes[1].legend()
axes[1].grid(True)

plt.tight_layout()
plt.show()

In [None]:
# 在测试集上评估
test_loss, test_accuracy = mini_resnet.evaluate(X_test, y_test, verbose=0)
print(f"测试集损失: {test_loss:.4f}")
print(f"测试集准确率: {test_accuracy:.4f}")

## 总结

### ResNet核心要点

1. **残差连接**解决了深度网络的退化问题
2. **恒等映射**通过跳跃连接使梯度能直接回传
3. **BatchNorm**加速训练并稳定梯度

### ResNet变体

| 模型 | 层数 | 残差块配置 | 参数量 |
|-----|------|-----------|--------|
| ResNet-18 | 18 | [2,2,2,2] | 11.7M |
| ResNet-34 | 34 | [3,4,6,3] | 21.8M |
| ResNet-50 | 50 | [3,4,6,3] | 25.6M |
| ResNet-101 | 101 | [3,4,23,3] | 44.5M |
| ResNet-152 | 152 | [3,8,36,3] | 60.2M |

### 进一步学习

- ResNet-V2（预激活残差块）
- ResNeXt（分组卷积）
- SE-ResNet（通道注意力）