# 分类算法与回归算法


## 分类算法

### P1: Distance

在 scikit-learn 中有如下 Distance：

- 向量空间实数值 real-valued vector space
    - 欧几里得距离 Euclidean distance $\sqrt{\sum(\vec x-\vec y)^2}$
    - 曼哈顿距离 Manhattan distance $\sum |\vec x-\vec y|$
    - 切比雪夫距离 Chebyshev distance $\max(|\vec x-\vec y|)$
    - 闵可夫斯基距离 Minkowski distance $\sum(|\vec x-\vec y|^{p})^{\frac{1}{p}}$
    - W闵可夫斯基距离 WMinkowski distance $\sum(|w(\vec x-\vec y)|^{p})^{\frac{1}{p}}$
    - S欧几里得距离 Seuclidean distance $\sqrt{\frac{\sum(\vec x-\vec y)^2}{V}}$
    - 马哈拉诺比斯距离 Mahalanobis distance $\sqrt{(\vec x-\vec y)'V^{-1}(\vec x-\vec y)}$
- 向量空间二维 two-dimensional vector space
    - 半正失距离 Haversine distance
- 向量空间整数值 integer-valued vector space
    - 海明距离 Hamming distance
    - 堪培拉距离 Canberra distance
    - 布雷柯蒂斯距离 Bray Curtis Distance

In [3]:
import numpy as np
from sklearn.neighbors import DistanceMetric

In [18]:
X = np.arange(6).reshape((2, 3))
print(f'vector a is {X[0, :]}')
print(f'vector b is {X[1, :]}')

dist_name = ['euclidean', 'manhattan', 'chebyshev']
for d in dist_name:
    dist = DistanceMetric.get_metric(d)
    print(f'distance name is {d}, distance value is \n {dist.pairwise(X)}')

vector a is [0 1 2]
vector b is [3 4 5]
distance name is euclidean, distance value is 
 [[0.         5.19615242]
 [5.19615242 0.        ]]
distance name is manhattan, distance value is 
 [[0. 9.]
 [9. 0.]]
distance name is chebyshev, distance value is 
 [[0. 3.]
 [3. 0.]]


上述结果中的说明：

- [0, 0]: 向量 a 与向量 a 的距离
- [0, 1]: 向量 a 与向量 b 的距离
- [1, 0]: 向量 b 与向量 a 的距离
- [1, 1]: 向量 b 与向量 b 的距离