# Logistic回归
## sigmoid函数
$$\sigma (z) = \frac 1{1+e^{-z}}$$
### sigmoid函数的输入
$$z = w_0x_0 + w_1x_1 + ... + w_nx_n = W^TX$$
## 梯度下降算法
$$w:=w+\alpha \nabla f(w)$$
### 梯度下降伪代码
```
初始化回归系数w
重复R次：
    计算整个数据集的梯度
    使用alpha*gradient更新回归系数的向量
    返回回归系数
```

In [2]:
import numpy as np

In [4]:
def load_dataset():
    dataset, labels = [], []
    with open('logistic-dataset.txt') as f:
        for line in f.readlines():
            line_list = line.strip().split()
            dataset.append([1.0, float(line_list[0]), float(line_list[1])])
            labels.append(int(line_list[2]))
    return dataset, labels

In [5]:
dataset, labels = load_dataset()

In [7]:
len(dataset)

100

In [8]:
len(labels)

100

In [9]:
def sigmoid(x):
    return 1.0 / (1 + np.exp(-x))

In [23]:
def gradientDescent(dataset, labels):
    data_matrix = np.mat(dataset)
    label_matrix = np.mat(labels).transpose()
    m, n = data_matrix.shape
    alpha = 0.001
    epochs = 500
    weights = np.ones((n, 1))
    for k in range(epochs):  # heavy on matrix operations
        h = sigmoid(data_matrix * weights)  # matrix multiple
        error = (label_matrix - h)  # vector subtraction
        weights = weights + alpha * data_matrix.transpose() * error  # matrix mult
    return weights

In [24]:
gradientDescent(dataset, labels)

matrix([[ 4.12414349],
        [ 0.48007329],
        [-0.6168482 ]])

## 随机梯度下降算法
梯度下降算法在每次更新回归系数时都需要遍历整个数据集，该方法在处理100个左右的数据集是尚可，但如果有数十亿的样本和成千上万的特征，那么该方法的计算复杂度就太高了。

一种改进方法是一次仅用一个样本点来更新回归系数，该方法成为随机梯度下降算法。
### 随机梯度下降伪代码
```
所有回归系数初始化为1
对数据集中的每个样本
    计算该样本的梯度
    使用alpha*gradient更新回归系数值
返回回归系数值
```

In [25]:
def stochasticGradDes(data_matrix, labels, epochs=150):
    m, n = data_matrix.shape
    weights = np.ones(n)
    for j in range(epochs):
        data_index = list(range(m))
        for i in range(m):
            alpha = 4 / (1.0 + j + i) + 0.0001  # alpha decreases with iteration
            random_index = int(np.random.uniform(0, len(data_index)))
            h = sigmoid(sum(data_matrix[random_index] * weights))
            error = labels[random_index] - h
            weights = weights + alpha * error * data_matrix[random_index]
    return weights

In [26]:
stochasticGradDes(np.array(dataset), labels)

array([ 13.86193504,   1.36557302,  -1.87783856])