归一化：将不同的数值缩放到同一个区间内，便于梯度下降 归一化也叫标准化

归一化的一个目的是，使得梯度下降在不同维度 $\theta$ 参数（不同数量级）上，可以步调一致协同的进行梯度下降。

### 最大值最小值归一化
也称为差值标准化

公式： $X^* = \frac{X - X\_min}{X\_max -X\_min}$

In [22]:
import numpy as np
import matplotlib.pyplot as plt 

In [23]:
x1 = np.random.randint(1, 10, size=10)

In [25]:
x2 = np.random.randint(1000, 5000, size=10)
x = np.c_[x1, x2]  # 按行拼接
x

array([[   8, 4706],
       [   9, 2511],
       [   4, 4401],
       [   5, 3809],
       [   6, 2225],
       [   7, 1054],
       [   7, 4601],
       [   9, 4247],
       [   3, 3495],
       [   3, 2901]])

In [26]:
x.min(axis=0)  # 按列计算最小值

array([   3, 1054])

### 使用公式进行最大值最小值归一化处理

In [27]:
X = (x-x.min(axis=0)) / (x.max(axis=0)-x.min(axis=0))
X.round(2)

array([[0.83, 1.  ],
       [1.  , 0.4 ],
       [0.17, 0.92],
       [0.33, 0.75],
       [0.5 , 0.32],
       [0.67, 0.  ],
       [0.67, 0.97],
       [1.  , 0.87],
       [0.  , 0.67],
       [0.  , 0.51]])

In [28]:
# 使用sklearn进行归一化处理
from sklearn.preprocessing import MinMaxScaler

In [29]:
x1 = np.random.randint(1, 10, size=10)
x2 = np.random.randint(1000, 5000, size=10)
x = np.c_[x1, x2]
X = MinMaxScaler()
y = X.fit_transform(x)
y.round(2)

array([[0.  , 0.42],
       [0.38, 0.15],
       [0.25, 0.94],
       [0.5 , 0.72],
       [0.25, 1.  ],
       [0.  , 0.  ],
       [0.62, 0.08],
       [1.  , 0.27],
       [0.25, 0.96],
       [0.88, 0.62]])

### 使用z-score归一化

公式：  $X^* = \frac{X - \mu}{\sigma}$

X:x数据
$\mu$:均值
$\sigma$:标准差

归一化后的数据符合正态分布，均值为0， 标准差为1

该标准化后得出的数据，不一定全在0-1之间，也会有负数

In [34]:
x1 = np.random.randint(1, 100, size=(10, 2))
x2 = np.random.randint(1000, 10000, size=(10, 2))
x = np.concatenate([x1, x2])
x

array([[  94,    6],
       [  13,   26],
       [  77,   57],
       [  84,   14],
       [  20,   67],
       [  42,   39],
       [  68,    1],
       [  24,   96],
       [  23,   46],
       [  19,   55],
       [8048, 7418],
       [8685, 3303],
       [7122, 4400],
       [1725, 3955],
       [6168, 8959],
       [3067, 1181],
       [9548, 5144],
       [6413, 6290],
       [4986, 3528],
       [1704, 8630]])

### 使用公式进行均值归一化

In [35]:
np.set_printoptions(suppress=True)  # 不使用科学计数法进行输出
X = (x - x.mean(axis=0)) / x.std(axis=0)
X

array([[-0.81995033, -0.85243668],
       [-0.84364916, -0.8460147 ],
       [-0.82492416, -0.83606064],
       [-0.82287611, -0.84986789],
       [-0.84160112, -0.83284966],
       [-0.8351644 , -0.84184042],
       [-0.82755736, -0.85404217],
       [-0.8404308 , -0.8235378 ],
       [-0.84072338, -0.83959273],
       [-0.84189369, -0.83670284],
       [ 1.50721646,  1.52754662],
       [ 1.69358876,  0.20622562],
       [ 1.23628907,  0.55847085],
       [-0.34275533,  0.41558194],
       [ 0.95716949,  2.02235966],
       [ 0.04988458, -0.47514575],
       [ 1.94608372,  0.79736826],
       [ 1.02885115,  1.16534732],
       [ 0.61134209,  0.27847282],
       [-0.34889947,  1.9167182 ]])

### 使用sklearn进行0均值归一化  

In [36]:
# preprocessing:预处理
from sklearn.preprocessing import StandardScaler

In [44]:
x1 = np.random.randint(1, 100, size= (10, 10))
x2 = np.random.randint(1000, 10000, size=(10, 10))
X = np.c_[x1, x2]
X
x = StandardScaler()
y = x.fit_transform(X)
y

array([[ 0.05581101, -0.69232086,  0.29801825, -1.16093959, -0.62284954,
         0.40969106, -1.33241682,  1.37778839, -1.15035515, -1.34799993,
         1.31762337,  0.83734221, -0.87397039,  0.64530721, -1.0670018 ,
        -0.84349735,  1.09352083, -0.26847203,  0.47506227,  0.23899582],
       [ 1.05243617, -1.74015783,  2.08612774,  1.0832559 ,  0.74105456,
        -0.82890983,  0.07918756,  1.61533811, -0.37253494,  0.80482551,
        -0.56429945,  1.07477113, -0.96525294,  0.02373796,  0.7476778 ,
         1.4861028 ,  1.57017211,  1.1922972 ,  1.60973203, -0.97575833],
       [-1.29959921,  0.72974361, -0.26076597,  0.39273421,  1.10476232,
         0.60024505, -0.85040557, -0.52260939, -0.49534866, -0.78495327,
         0.85477101,  0.75486188,  1.15526478, -1.84389141,  1.3869533 ,
         0.02310952,  0.07659746,  0.34685314,  1.72341418, -0.22129242],
       [ 1.92946631,  0.76716636, -0.93130703, -1.2472548 , -0.16821484,
        -0.97182532,  1.62850944,  0.46718112,  