简答题：
1. 如果你的训练集具有数百万个特征，那么可以使用哪种线性回归训练算法？

2. 如果你的训练集里特征的数值大小迥异，那么哪些算法可能会受到影响？受影响程度如何？你应该怎么做？

3. 训练逻辑回归模型时，梯度下降可能会卡在局部最小值中吗？

4. 如果你让它们运行足够长的时间，是否所有的梯度下降算法都能得出相同的模型？

5. 假设你使用批量梯度下降，并在每个轮次绘制验证误差。如果你发现验证错误持续上升，那么可能是什么情况？你该如何解决？

6. 当验证误差上升时立即停止小批量梯度下降是个好主意吗？

7. 哪种梯度下降算法（在我们讨论过的算法中）将最快地到达最佳解附近？哪个实际上会收敛？如何使其他的也收敛

8. 假设你正在使用多项式回归。绘制学习曲线后，你会发现训练误差和验证误差之间存在很大的差距。发生了什么？解决此问题的三种方法是什么？

9. 假设你正在使用岭回归，并且你注意到训练误差和验证误差几乎相等且相当高。你是否会说模型存在高偏差或高方差？你应该增加正则化超参数α还是减小它呢？

10. 为什么要使用：a.岭回归而不是简单的线性回归（即没有任何正则化）？b.Lasso而不是岭回归？c.弹性网络而不是Lasso回归？

11. 假设你要将图片分类为室外/室内和白天/夜间。你应该实现两个逻辑回归分类器还是一个softmax回归分类器？

编程题：

In [183]:
# todo 编程题: 在不使用sklearn的情况下，仅使用Numpy，为softmax回归实现带早停的批量梯度下降，将它用于分类任务，
#  例如鸢尾花数据集  load_iris, 只用两个特征就可以："petal width (cm)", "petal length (cm)"
#  强调：除了读数据，其他全用numpy （包括分离测试+验证），不用sklearn

#  注意：
#  1. 要实现l2正则化
#  2. 除了数据读取，其他仅使用numpy，包括训练集+验证集分离，以及softmax预测 和 损失计算

In [28]:
import numpy as np
from sklearn.datasets import load_iris

# 1. 加载数据
iris = load_iris(as_frame=True)
X = iris.data[["petal width (cm)", "petal length (cm)"]].values# 只使用 petal length 和 petal width
y = iris["target"]

# 2. One-Hot 编码
num_classes = 3
y_one_hot = np.eye(num_classes)[y]

# 3. 特征归一化
X_mean = np.mean(X, axis=0)
X_std = np.std(X, axis=0)
X_normalized = (X - X_mean) / X_std

# 4. 数据集划分
def shuffle_and_split_data(data, test_ratio):
    np.random.seed(42)
    shuffled_indices = np.random.permutation(len(data))
    test_set_size = int(len(data) * test_ratio)
    test_indices = shuffled_indices[:test_set_size]
    train_indices = shuffled_indices[test_set_size:]
    return data[train_indices], data[test_indices]
X_train, X_val = shuffle_and_split_data(X_normalized, 0.2)
y_train, y_val = shuffle_and_split_data(y_one_hot, 0.2)

In [37]:
# 5. Softmax 函数
def softmax(Z):
    exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))
    return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)

# 6. 损失函数（带 L2 正则化）
def compute_loss(X, y_true, W, b, reg_lambda):
    m = X.shape[0]
    Z = X @ W + b
    y_pred = softmax(Z)
    cross_entropy = -np.sum(y_true * np.log(y_pred)) / m
    l2_penalty = reg_lambda * np.sum(W ** 2)
    return cross_entropy + l2_penalty

# 7. 梯度计算
def compute_gradients(X, y_true, y_pred, W, reg_lambda):
    m = X.shape[0]
    dW = (X.T @ (y_pred - y_true)) / m + 2 * reg_lambda * W
    db = np.sum(y_pred - y_true, axis=0, keepdims=True) / m
    return dW, db

# 8. 模型初始化
input_dim = X_train.shape[1]
W = np.random.randn(input_dim, 3) * 0.01
b = np.zeros((1, 3))

# 9. 训练参数
learning_rate = 0.1
reg_lambda = 0.01
n_epochs = 10000
patience = 100
best_loss = float('inf')
no_improvement_count = 0
min_delta = 1e-6
# 10. 训练循环（带早停）
for epoch in range(n_epochs):
    Z_train = X_train @ W + b
    y_train_pred = softmax(Z_train)
    dW, db = compute_gradients(X_train, y_train, y_train_pred, W, reg_lambda)
    W -= learning_rate * dW
    b -= learning_rate * db

    train_loss = compute_loss(X_train, y_train, W, b, reg_lambda)
    val_loss = compute_loss(X_val, y_val, W, b, reg_lambda)

    if val_loss < best_loss - min_delta:
        best_loss = val_loss
        best_W = W.copy()
        best_b = b.copy()
        no_improvement_count = 0
    else:
        no_improvement_count += 1

    if no_improvement_count >= patience:
        print(f"Early stopping at epoch {epoch}, best validation loss: {best_loss}")
        break

    if epoch % 100 == 0:
        print(f"Epoch {epoch}, Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}")

# 11. 准确率评估
def accuracy(X, y, W, b):
    Z = X @ W + b
    y_pred = np.argmax(softmax(Z), axis=1)
    y_true = np.argmax(y, axis=1)
    return np.mean(y_pred == y_true)

train_acc = accuracy(X_train, y_train, best_W, best_b)
val_acc = accuracy(X_val, y_val, best_W, best_b)

print(f"Train Accuracy: {train_acc:.4f}")
print(f"Validation Accuracy: {val_acc:.4f}")

Epoch 0, Train Loss: 1.0455, Val Loss: 1.0413
Epoch 100, Train Loss: 0.4423, Val Loss: 0.3989
Epoch 200, Train Loss: 0.3923, Val Loss: 0.3477
Epoch 300, Train Loss: 0.3760, Val Loss: 0.3301
Epoch 400, Train Loss: 0.3697, Val Loss: 0.3229
Epoch 500, Train Loss: 0.3670, Val Loss: 0.3196
Epoch 600, Train Loss: 0.3658, Val Loss: 0.3180
Epoch 700, Train Loss: 0.3653, Val Loss: 0.3172
Epoch 800, Train Loss: 0.3650, Val Loss: 0.3167
Epoch 900, Train Loss: 0.3649, Val Loss: 0.3165
Epoch 1000, Train Loss: 0.3648, Val Loss: 0.3163
Epoch 1100, Train Loss: 0.3648, Val Loss: 0.3162
Epoch 1200, Train Loss: 0.3648, Val Loss: 0.3161
Epoch 1300, Train Loss: 0.3648, Val Loss: 0.3161
Epoch 1400, Train Loss: 0.3648, Val Loss: 0.3161
Epoch 1500, Train Loss: 0.3648, Val Loss: 0.3160
Epoch 1600, Train Loss: 0.3648, Val Loss: 0.3160
Epoch 1700, Train Loss: 0.3648, Val Loss: 0.3160
Epoch 1800, Train Loss: 0.3648, Val Loss: 0.3160
Epoch 1900, Train Loss: 0.3648, Val Loss: 0.3160
Epoch 2000, Train Loss: 0.3648, 