## データの正規化
### 標準化
- Z = (x - μ) / σ
    - μ … 平均
    - σ … 標準偏差

In [1]:
import numpy as np

まず、axisについて挙動を確認してみる。

In [16]:
a = np.array(range(0, 6))
a = a.reshape(3, 2)
print("a = ")
print(a)

print()
print("axis = None → ", end = "")
print(a.sum())
print("axis = 0 → ", end = "")
print(a.sum(axis = 0))
print("axis = 1 → ", end = "")
print(a.sum(axis = 1))

a = 
[[0 1]
 [2 3]
 [4 5]]

axis = None → 15
axis = 0 → [6 9]
axis = 1 → [1 5 9]


- None … 全部足す
- 0 … 縦方向に足す
- 1 … 横方向に足す

では標準化する関数を作ってみる。

In [2]:
def zscore (x, axis = None):
    mu = x.mean(axis = axis, keepdims = True)
    sigma = np.std(x, axis = axis, keepdims = True)
    zscore = (x - mu) / sigma
    return zscore

- keepdims … 次元を保存するか否か設定できる。

In [32]:
a = np.random.randint(10, size = (2,5))
print("data = ")
print(a, end = "\n\n")

b = zscore(a)
c = zscore(a, axis = 1)

print(b, end = "\n\n")
print(c, end = "\n\n")

data = 
[[1 4 6 3 8]
 [6 6 0 5 4]]

[[-1.41878082 -0.12898007  0.73088709 -0.55891365  1.59075425]
 [ 0.73088709  0.73088709 -1.8487144   0.30095351 -0.12898007]]

[[-1.40693001 -0.16552118  0.66208471 -0.57932412  1.4896906 ]
 [ 0.80822386  0.80822386 -1.88585567  0.3592106  -0.08980265]]



In [38]:
bmu = b.mean()
bsigma = b.std()

cmu = c.mean(axis = 1)
csigma = c.std(axis = 1)

print("bのμ = " + str(bmu))
print("bのσ = " + str(bsigma), end = "\n\n")
print("cのμ = " + str(cmu))
print("cのσ = " + str(csigma), end = "\n\n")

bのμ = 1.0824674490095276e-16
bのσ = 1.0

cのμ = [-2.22044605e-16 -5.82867088e-17]
cのσ = [1. 1.]



平均0、標準偏差1になっていることがわかる。

### min-max normalization
(x - min) / (max - min) で正規化する方法。

In [40]:
def min_max (x, axis = None):
    min = x.min(axis = axis, keepdims = True)
    max = x.max(axis = axis, keepdims = True)
    result = (x - min) / (max - min)
    return result

In [41]:
a = np.random.randint(10, size = (2,5))
print("data = ")
print(a, end = "\n\n")
print(min_max(a), end = "\n\n")
print(min_max(a, 1), end = "\n\n")

data = 
[[1 6 5 8 8]
 [6 9 2 3 6]]

[[0.    0.625 0.5   0.875 0.875]
 [0.625 1.    0.125 0.25  0.625]]

[[0.         0.71428571 0.57142857 1.         1.        ]
 [0.57142857 1.         0.         0.14285714 0.57142857]]



### ノルムの求め方 → ベクトルの正規化
各要素 / ノルム でベクトルを正規化できる。

ということでノルムの求め方を見ていく。

In [42]:
import numpy.linalg as LA

In [62]:
a = np.random.randint(5, size = 2)
norm = LA.norm(a)

print("a = " + str(a))
print("|a| = " + str(norm))

a = [2 4]
|a| = 4.47213595499958
