# Logistic Regression Lab

## 准备工作
### 环境准备

请确保完成以下依赖包的安装，并且通过下面代码来导入与验证。运行成功后，你会看到一个新的窗口，其展示了一张空白的figure。

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from typing import Tuple, List

# display the plot in a separate window
%matplotlib tk

np.random.seed(12)

# create a figure and axis
plt.ion()
fig = plt.figure(figsize=(12, 5))

### 数据集准备

你将使用以下二维数据集来训练逻辑分类器，并观察随着训练的进行，线性分割面的变化。

该数据集包含两个特征和一个标签，其中标签 $ y \in \{-1,1\} $。

请执行下面的代码以加载数据集并对其进行可视化。

In [2]:
from data_generator import gen_2D_dataset

x_train, y_train = gen_2D_dataset(100, 100, noise = 0)
x_test, y_test = gen_2D_dataset(50, 50, noise = 0.7)


In [3]:
from vis_util import visualize_2D_dataset, visualize_2D_border

visualize_2D_dataset(x_train, y_train)
visualize_2D_dataset(x_test,y_test)

## 逻辑回归 (10 pts)

在这一部分，你将学习并完成逻辑回归相关代码的编写与训练。

在运行这部分代码之前，请确保你已经完成了 `logistics.py` 文件的代码补全。

完成后，运行以下代码，你会看到一张figure来展示$||w||$，loss和决策边界的变化。

In [4]:
from logistic import LogisticRegression

# create a LogisticRegression object 
LR = LogisticRegression()

# fit the model to the training data without regularization (reg = 0)
LR.fit(x_train, y_train, lr=0.1, n_iter=1000,reg=0)


iter: 0, loss: 0.6807266354549091, w_module: 1.863605190916175
iter: 10, loss: 0.6047988783795631, w_module: 1.8463015287291755
iter: 20, loss: 0.5660832091387538, w_module: 1.8229347764809727
iter: 30, loss: 0.5309314279730247, w_module: 1.8003667428545462
iter: 40, loss: 0.49873990315881916, w_module: 1.7800162646162347
iter: 50, loss: 0.4692664495478232, w_module: 1.7620409406227315
iter: 60, loss: 0.4422816596337917, w_module: 1.7463734583242816
iter: 70, loss: 0.4175668558824793, w_module: 1.732898794089459
iter: 80, loss: 0.3949164282672946, w_module: 1.7214872221452302
iter: 90, loss: 0.3741392267477511, w_module: 1.712003083354751
iter: 100, loss: 0.35505915516223857, w_module: 1.7043089950428967
iter: 110, loss: 0.33751516416979, w_module: 1.698268763393051
iter: 120, loss: 0.32136081353797324, w_module: 1.6937495784510823
iter: 130, loss: 0.30646354486334976, w_module: 1.6906236420774396
iter: 140, loss: 0.29270377665622754, w_module: 1.6887693160021564
iter: 150, loss: 0.279

运行上述代码，你会发现，在不考虑正则化的情况下，$||w||$ 随着训练次数的增加会不断增大。

训练完成后，你可以利用训练得到的分类器来进行预测。请你编写代码，计算训练集和测试集中的预测准确率。

In [6]:
# Implement the code to compute the accuracy of logistic regression (LR) in the test set. Note that LR itself is already trained, if you have run the above code.

# training accuracy

# TODO: compute the y_pred using LR.predict() function
_, train_pred = LR.predict(x_train)

# TODO: compute the accuracy
train_acc = np.mean(train_pred == y_train) * 100


print("Train accuracy: {}".format(train_acc))


# TODO: test accuracy, proceed similarly as above
_, test_pred = LR.predict(x_test)
test_acc = np.mean(test_pred == y_test) * 100


print("Test accuracy: {}".format(test_acc))


Train accuracy: 100.0
Test accuracy: 98.0


In [11]:
# create a LogisticRegression object and train it when using regularization
from logistic import LogisticRegression

LR = LogisticRegression()
LR.fit(x_train, y_train, lr=0.1, n_iter=1000,reg=0.1)
# LR.fit(x_train, y_train, lr=0.1, n_iter=1000,reg=1)



iter: 0, loss: 1.141181588625334, w_module: 0.3904283290747596
iter: 10, loss: 0.8595507393239986, w_module: 0.07406965014180235
iter: 20, loss: 0.7879284649495942, w_module: 0.06578661868177142
iter: 30, loss: 0.7326919494437688, w_module: 0.11440695528050666
iter: 40, loss: 0.6820895786646239, w_module: 0.17315036076248214
iter: 50, loss: 0.6357649872581045, w_module: 0.23222013767348068
iter: 60, loss: 0.5934601510018419, w_module: 0.2899487967230823
iter: 70, loss: 0.5548934089119313, w_module: 0.34592273927558675
iter: 80, loss: 0.5197714261467372, w_module: 0.4000002565589039
iter: 90, loss: 0.4878003944221882, w_module: 0.45213706287942257
iter: 100, loss: 0.4586946072129149, w_module: 0.5023380305862867
iter: 110, loss: 0.43218247975386254, w_module: 0.5506373613810752
iter: 120, loss: 0.40801035161278965, w_module: 0.5970879354275748
iter: 130, loss: 0.3859445066923714, w_module: 0.6417545853153225
iter: 140, loss: 0.36577185014318214, w_module: 0.6847094569326769
iter: 150, l

: 

In [7]:
# TODO: Implement the code to compute the accuracy of logistic regression (LR) in the test set. Note that LR itself is already trained, if you have run the above code.
_, train_pred = LR.predict(x_train)
train_acc = np.mean(train_pred == y_train) * 100

print("Train accuracy: {}".format(train_acc))

_, test_pred = LR.predict(x_test)
test_acc = np.mean(test_pred == y_test) * 100

print("Test accuracy: {}".format(test_acc))

Train accuracy: 100.0
Test accuracy: 97.0


运行上述带有正则化的代码后，请观察 $||w||$ 的变化，并讨论正则化的实际意义。(请将答案写在下方)

||w||变小了，可见正则化可以限制模型的复杂度，从而防止过拟合