#  线性回归实现

针对简单线性模型，求解 $y(x_0,x_1,...,x_n) = w_0w_0+w_1x_1+...+w_nx_n+b$

$x_n$为第n个属性

写成矩阵形式:

 $X=[{x_0,x_1,...,x_n，1}],  \theta=[{w_0,w_1,...,w_n,b}]^T$

$y = X\theta$ 


代价函数(Cost Function):
$$J(w_0,w_1,...,w_n,b) = \frac{1}{2m}\sum_{i=1}^m(y_i-w_ix_i-b)^2 = \frac{1}{2m}(y-X\theta)^T(y-X\theta)$$

均方误差 $(w^*,b^*) = argmin\sum_{i=1}^m(y_i-wx^i-b)^2$

梯度下降法求解

In [None]:
import numpy as np
import matplotlib import pyplot plt
import pylab

In [35]:
#输入：数据集数目，权重数目
#输出：真实X，y
def generate_dataset( data_num, weight_num ):
    x = np.random.random((data_num, weight_num))
    X = np.c_[x, np.ones((data_num, 1))]     #添加全为1的列向量
    theta = np.random.random((weight_num+1, 1))
    mu, sigma = 0, 0.1    #均值，方差
    noise = np.random.normal(mu, sigma, (data_num, 1))
    y = X.dot(theta) + noise
    print("groundtruth_weight: ")
    print(theta)
    return X, y, theta

X, y, theta = generate_dataset( 5, 3 )
#print(X,y)

groundtruth_weight: 
[[ 0.62472515]
 [ 0.68910792]
 [ 0.27349391]
 [ 0.43104141]]


In [40]:
def compute_cost( X, y, theta ):
    
    m = len(y)
    cost = np.transpose(y -  X.dot(theta)) @ (y -  X.dot(theta))/(2*m)   # x@y 等价于x.dot(y)  or  x.matmul(y)
    return cost

#cost = compute_cost( X, y, theta)
#print(cost)

(5, 1)
[[ 0.00125174]]


In [42]:
#gamma: 步长系数
#eps: 终止条件
#max_iter: 最大迭代次数

def linear_regression( X, y, method="closed_form", gamma=0.001, eps=0.0001, max_iter=10000):
    #使得w最小
    #如果method == "closed_form": 则用闭式解方法求出w，不需要后面三个参数
    #如果method == "gd": 则用梯度下降法求解
    
    if method == "closed_form":
        return linear_regression_by_closed_form( X, y )
    if method == "gd":
        return linear_regression_by_gd( X, y, gamma, eps, max_iter )
    
    print ( "args error" )
    
def linear_regression_by_closed_form( X, y ):
    #见西瓜书式3.11
    theta = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)
    return w

def linear_regression_by_gd( X, y, gamma=0.001, eps=0.0001, max_iter=10000 ):
    
    
w = linear_regression(X,y)
w

array([[ 0.80457494],
       [ 0.83312121],
       [ 0.32289208],
       [ 0.2302967 ]])

In [None]:
# 对数线性回归

def log_regression(X, y, method="gd", gamma=0.001, eps=0.001, max_iter=10000):
    ln_y = np.log(y)
    return linear_regression(X, ln_y, method, gamma, eps, max_iter)

In [None]:
def logistic_regression(X, y, method="gd", gamma=0.001, eps=0.001, max_iter=10000):
    ln_y = np.log(y)
    return linear_regression(X, ln_y, method, gamma, eps, max_iter)

## 代价函数更新权值优化方法
1. 批量梯度下降：batch gradient descent
2. 随机梯度下降
3. 小批量随机梯度下降

In [4]:
def compute_gradient_descent():
    pass

In [None]:
def optimizer():
    pass

In [6]:
画出散点图
def plot_data(x, y):
    plt.scatter(x,y)
    plt.show()

In [None]:
if __name__ == '__main__':
    li