# Logistic Regression#

**学习内容：**  
- 从0搭建Logistic Regression算法：
    - 数据的简单预处理
    - 网络结构的搭建
    - 权值的初始化  
    - 损失函数的计算，梯度计算
    - 优化方法(梯度下降)

## 需要调用的库 #

- numpy： 常用的科学计算库。
- h5py： 用于hdf5文件的读取、操作。
- matplotlib： 用于数据可视化。
- PIL and scipy： 这里用于测试自己的图片。

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage

In [2]:
from depends.lr_utils import load_dataset

%matplotlib inline

## 1.数据预处理#

**数据集：**  

这里的数据集为带label的图片数据
- label = 1代表cat
- label = 0代表noncat

train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, 分别存储训练集原始图像、训练集labels、测试集原始图像、测试集labels

后续会对原始图像进行处理

In [3]:
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

为方便计算，将每张图片压缩为一个一维向量  

**这里用到：**  


X.reshape(X.shape[0], -1) 

对于train_set_x_orig,train_set_x_orig.shape(0)代表样本数,-1代表{总维数/train_set_x_orig.shape(0)}

In [4]:
print("训练集原始形状：", train_set_x_orig.shape)
print("测试集集原始形状：", test_set_x_orig.shape)

训练集原始形状： (209, 64, 64, 3)
测试集集原始形状： (50, 64, 64, 3)


In [5]:
train_set_x_orig = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T
test_set_x_orig = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T
print("训练集压缩后形状：", train_set_x_orig.shape)
print("测试集压缩后形状：", test_set_x_orig.shape)

训练集压缩后形状： (12288, 209)
测试集压缩后形状： (12288, 50)


**归一化:**

一般来说在训练模型之前需要对数据进行中心化与归一化,处理过程遵循公式：
<br><br>
<font size = 6>
$\frac{x-μ}{σ}$
</font>

$μ$是样本均值,$σ$是样本标准差

这里只简单对图像像素值进行缩放，让每个像素的值除以255(像素值大小上限)

In [6]:
train_set_x = train_set_x_orig/255
test_set_x = test_set_x_orig/255

## 2.网络搭建 #

我们搭建一个最简单的不含隐层的Neural Network,基本结构如下图：
<img src = "source/simple_nn.jpg"></img>

左边一列向量代表一个压缩后的样本  
每个小圆圈是一个输入层神经元，负责接收样本的一个像素数据  
每个输入神经元对输出神经元进行一个线性连接  
用sigmoid函数对输出进行激励  
用logloss计算损失  
最小化代价函数  

**用数学公式解释：**  
对每个样本 <font size = 4>$x^{(i)}$:  

$z^{(i)} = w^T x^{(i)} + b$  
$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})$  
$ \mathcal{L}(a^{(i)}, y^{(i)}) =  - y^{(i)}  \log(a^{(i)}) - (1-y^{(i)} )  \log(1-a^{(i)})$ </font>  

最终的代价函数为:
<font size = 4>$ J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})$</font>   
而我们的目标就是最小化代价函数

### 我们通过以下步骤搭建网络#
1.定义网络各部分组件(Sigmoid函数、logloss等等)  
2.初始化网络权重  
3.迭代训练:
- 计算损失 (前向传播)
- 计算梯度 (后向传播)
- 更新权重 (梯度下降)

### 1.定义Sigmoid函数#

<font size = 4>$Sigmoid = \frac{1}{1 - e^-x}$<font>

In [7]:
def Sigmoid(x):
    result = 1 / (1 + np.exp(-x))
    return result

### 2.权值初始化#
这里利用np.zero(dim)初始化w,dim为维度  
直接用0初始化b

In [8]:
def init_params(dims):
    w = np.zeros((dims, 1))
    b = 0
    return w, b

### 3.前后向传播 #
- 前向传播即是求cost的过程
- 后向传播即是求偏导   

<font size = 5>
$\frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T$
<br><br>
$\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})$
</font>

In [9]:
def propagate(w, b, X, Y):
    m = X.shape[1]
    A = Sigmoid(np.dot(w.T, X) + b)
    cost = -(1.0 / m) * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))
    cost = np.squeeze(cost)
    dw = (1.0 / m) * np.dot(X, (A - Y).T)
    db = (1.0 / m) * np.sum(A - Y) 
    return cost, dw, db

### 4.迭代训练优化#
对于参数$\theta$, 更新规则是： $ \theta = \theta - \alpha \text{ } d\theta$, $\alpha$ 是指学习率

In [10]:
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
    for i in range(num_iterations):
        cost, dw, db = propagate(w, b, X, Y)
        w = w - learning_rate * dw
        b = b - learning_rate * db
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
    return w, b

### 5.将训练好的w与b用于测试 #

最终的输出是一个概率,这里设概率大于0.5为cat,反之为noncat

In [11]:
def predict(w, b ,X):
    m = X.shape[1]
    A = Sigmoid(np.dot(w.T, X) + b)
    mask = A > 0.5
    Y = np.ones_like(A)
    Y = Y * mask
    return Y

### 6.组件集成 #

In [12]:
def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.005, print_cost = False):
    dim = X_train.shape[0]
    w, b = init_params(dim)
    _w, _b = optimize(w, b, X_train, Y_train, num_iterations = num_iterations, learning_rate = learning_rate, print_cost = print_cost)
    pred_train = predict(_w, _b, X_train)
    pred_test = predict(_w, _b, X_test)
    accuracy_train = 1 -  np.mean(np.abs(pred_train - Y_train))
    accurary_test = 1- np.mean(np.abs(pred_test - Y_test))
    print("accuracy in train set = ", accuracy_train)
    print("accuracy in test set = ", accurary_test)

In [13]:
model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)

Cost after iteration 0: 0.693147
Cost after iteration 100: 0.584508
Cost after iteration 200: 0.466949
Cost after iteration 300: 0.376007
Cost after iteration 400: 0.331463
Cost after iteration 500: 0.303273
Cost after iteration 600: 0.279880
Cost after iteration 700: 0.260042
Cost after iteration 800: 0.242941
Cost after iteration 900: 0.228004
Cost after iteration 1000: 0.214820
Cost after iteration 1100: 0.203078
Cost after iteration 1200: 0.192544
Cost after iteration 1300: 0.183033
Cost after iteration 1400: 0.174399
Cost after iteration 1500: 0.166521
Cost after iteration 1600: 0.159305
Cost after iteration 1700: 0.152667
Cost after iteration 1800: 0.146542
Cost after iteration 1900: 0.140872
accuracy in train set =  0.99043062201
accuracy in test set =  0.7
