# 线性求解正规方程
回顾一下，我们所需要求解的是

$$ a = \frac{\sum^m_{i=1}(X^{i} - x_{mean})(y^{i} - y_{mean})} {\sum^{m}_{i=1}(x^{i} - x_{mean})^2} $$

$$ b = y_{mean} - a*x_{mean} $$

通过观察我们可以发现，a,b的求解都满足$\sum_{i=1}^mw^{i} \cdot v^{i}$

In [4]:
import numpy as np

size = 100000
x = np.random.random(size=size)
y = 2.3 * x + 1.1 + np.random.normal(size=size)

x_mean = np.mean(x)
y_mean = np.mean(y)

a = (x - x_mean).dot(y - y_mean) / (x - x_mean).dot(x - x_mean)
b = y_mean - a * x_mean

In [5]:
a

2.302741788901535

In [6]:
b

1.0975773015754746

对比之后发现，使用向量点乘的方式可以大大提高运算效率，更大程度地节约时间:


In [7]:
from moon import linear_model

m = 1000000
big_x = np.random.random(size=m)
big_y = big_x * 2 + 3 + np.random.normal(size=m)

linearR1 = linear_model.LinearRegression1()
linearR2 = linear_model.LinearRegression2()

%timeit linearR1.fit(big_x,big_y)
%timeit linearR2.fit(big_x,big_y)

""" result:
1.19 s ± 18.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
13.7 ms ± 339 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
"""

1.19 s ± 18.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
13.7 ms ± 339 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
