# 資料的正規化(Normalization)

    在機器學習的領域中，資料類型不同不能直接處理所以要做正規化

## z 分數正規化(z-score normalization)

    也稱z 分數標準化(Standardization)
    如果資料中帶有離群值(Outlines, 與其他數差異很大的值)的話，就可以降低其影響

![title](../img/normalization1.jpg)

In [1]:
import numpy as np

In [2]:
def zscore(x, axis=None, num=None):
    xmean = x.mean(axis=axis, keepdims=True)
    xstd = x.std(axis=axis, keepdims=True)
    if num is None:
        zscore = (x - xmean) / xstd
    else:
        zscore = (num - xmean) / xstd
    return zscore

In [3]:
a = np.array([[6, 4, 6, 6, 0], [7, 0, 9, 2, 2]])

In [4]:
zscore(a)

array([[ 0.61522733, -0.06835859,  0.61522733,  0.61522733, -1.43553045],
       [ 0.9570203 , -1.43553045,  1.64060622, -0.75194452, -0.75194452]])

In [5]:
zscore(a, num=6)

array([[0.61522733]])

In [6]:
zscore(a, axis=1)

array([[ 0.68599434, -0.17149859,  0.68599434,  0.68599434, -1.88648444],
       [ 0.88083033, -1.17444044,  1.46805055, -0.58722022, -0.58722022]])

## 最小值 - 最大值正規化(Min-max Normalization)

    把所有資料轉換成0 ~ 1 的數字

![title](../img/normalization2.jpg)

In [7]:
def min_max(x, axis=None, num=None):
    xmin = x.min(axis=axis, keepdims=True)
    xmax = x.max(axis=axis, keepdims=True)
    if num is None:
        result = (x-xmin)/(xmax-xmin)
    else:
        result = (num-xmin)/(xmax-xmin)
    return result

In [8]:
b = np.array([[6, 4, 6, 6, 0], [7, 0, 9, 2, 2]])

In [9]:
min_max(b)

array([[0.66666667, 0.44444444, 0.66666667, 0.66666667, 0.        ],
       [0.77777778, 0.        , 1.        , 0.22222222, 0.22222222]])

In [10]:
min_max(b, num=6)

array([[0.66666667]])

In [11]:
min_max(b, axis=1)

array([[1.        , 0.66666667, 1.        , 1.        , 0.        ],
       [0.77777778, 0.        , 1.        , 0.22222222, 0.22222222]])