# 机器学习基础

### 矩阵

主要用途：解线性方程组、方程降次、变换

In [1]:
import numpy as np

In [19]:
a = np.zeros((4,1,1))
a

array([[[0.]],

       [[0.]],

       [[0.]],

       [[0.]]])

In [20]:
print(type(a))
print(a.shape)

<class 'numpy.ndarray'>
(4, 1, 1)


## np.linalg

In [26]:
A = np.mat([[1,2,3],[4,5,6],[7,8,9]])

- 行列式值

In [29]:
np.linalg.det(A)

6.66133814775094e-16

- 行列式的逆

In [31]:
np.linalg.inv(A)

matrix([[-4.50359963e+15,  9.00719925e+15, -4.50359963e+15],
        [ 9.00719925e+15, -1.80143985e+16,  9.00719925e+15],
        [-4.50359963e+15,  9.00719925e+15, -4.50359963e+15]])

- 行列式转置

In [32]:
A.T

matrix([[1, 4, 7],
        [2, 5, 8],
        [3, 6, 9]])

- 对称矩阵

In [33]:
A*A.T

matrix([[ 14,  32,  50],
        [ 32,  77, 122],
        [ 50, 122, 194]])

- 矩阵的秩

In [36]:
np.linalg.matrix_rank(A)

2

- 解方程

In [42]:
b=np.mat([2,3,6])
np.linalg.solve(A,b.T)

matrix([[-9.00719925e+15],
        [ 1.80143985e+16],
        [-9.00719925e+15]])

- 范数

In [46]:
a = np.array([1,2,3])

In [51]:
# list 不能这样
np.sqrt(sum(pow(a,2)))

3.7416573867739413

In [54]:
# l2范数
np.linalg.norm(a)

3.7416573867739413

## 距离
闵可夫斯基距离
$$d_{1,2} =  \sqrt [p] {\sum _i^n{(x_{1i}-x_{2i})^p}}$$

欧式距离：L2范数  
```python
v1 = np.mat([1,2,3])
v2 = np.mat([2,3,4])
np.sqrt((v1-v2)*(v1-v2).T)
```

曼哈顿距离：L1范数  
```python
v1 = np.mat([1,2,3])
v2 = np.mat([2,3,4])
np.sum(abs(v1-v2))
```

切比雪夫距离：$L\infty$范数，$max(|x_1-x_2|,|y_1-y_2|)$ 
```python
v1 = np.mat([1,2,0])
v2 = np.mat([2,3,4])
np.max(abs(v1-v2))
```

In [57]:
v1 = np.mat([1,2,0])
v2 = np.mat([2,3,4])
np.max(abs(v1-v2))

4

- 夹角余弦：向量间的差异，样本数据间的差异.$Cos\theta = \frac{AB}{|A|+|B|}$  
```python
v1 = np.mat([1,2,0])
v2 = np.mat([2,3,4])
v1*v2/(np.linalg(v1)+np.linalg(v2))
```

- JSC杰卡德相似系数：交集占并集的多少$J(a,b) = \frac{交集数}{并集数}$

In [66]:
#行之间的关系
import scipy.spatial.distance as dist
matv = np.mat([[1,2,3,4],[2,4,3,8]])
dist.pdist(matv,'jaccard')

array([0.75])

- 列之间的关系，协方差$\rho_{xy} = \frac{E((x-E(x)(y-E(y))}{\sigma_x\sigma_y}$

In [77]:
v1 = np.mat([1,2,0])
v2 = np.mat([1,3,1])
mean1 = np.mean(v1)
mean2 = np.mean(v2)

In [78]:
p = np.mean(np.multiply(v1-mean1,v2-mean2)) / (np.std(v1) * np.std(v2))
p

0.8660254037844386

In [79]:
np.corrcoef(v1,v2)

array([[1.       , 0.8660254],
       [0.8660254, 1.       ]])