 #### **判别分析法**

* **距离判别法**
    距离判别法就是建立待判定对象到A的距离 d(ac, Ai),然后根据距离最近原则进行判别, 即判别函数 W(i, æ) = d(x, A;). 若 W (k, æ) = min{W(i,2)1,2,., r}, x∈ Ak.距离 d(x, A4) 一般采用 Mahalanobis 距离 (马氏距离).
![](./img/Snipaste_2025-07-09_12-00-53.png)

In [1]:
import numpy as np
from sklearn.neighbors import KNeighborsClassifier

# 已知样本数据（15个样本，每个样本2个特征）
x0 = np.array([
    [1.24, 1.27], [1.36, 1.74], [1.38, 1.64], [1.38, 1.82], [1.38, 1.90],
    [1.40, 1.70], [1.48, 1.82], [1.54, 1.82], [1.56, 2.08], [1.14, 1.78],
    [1.18, 1.96], [1.20, 1.86], [1.26, 2.00], [1.28, 2.00], [1.30, 1.96]
])

# 待分类样本（3个样本）
x = np.array([
    [1.24, 1.80], [1.28, 1.84], [1.40, 2.04]
])

# 已知样本的类别标签（前9个为类别1，后6个为类别2）
g = np.hstack([np.ones(9), 2 * np.ones(6)])

# 计算协方差矩阵（用于马氏距离）
v = np.cov(x0.T)

# 使用马氏距离的KNN分类器
# 这里的n_neighbors这个参数比较难理解，建议自己去查一下
knn = KNeighborsClassifier(n_neighbors=2, metric='mahalanobis', metric_params={'V': v})
knn.fit(x0, g)
pre = knn.predict(x)
print("马氏距离分类结果:", pre)
print("马氏距离已知样本的误判率:", 1 - knn.score(x0, g))   

# 使用欧氏距离的KNN分类器
knn2 = KNeighborsClassifier(n_neighbors=2)  # 默认metric='euclidean'
knn2.fit(x0, g)
pre2 = knn2.predict(x)
print("欧氏距离分类结果:", pre2)
print("欧氏距离已知样本的误判率:", 1 - knn2.score(x0, g))  

马氏距离分类结果: [2. 2. 1.]
马氏距离已知样本的误判率: 0.0
欧氏距离分类结果: [2. 1. 2.]
欧氏距离已知样本的误判率: 0.0


![](./img/Snipaste_2025-07-09_12-25-07.png)

In [2]:
import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
a = pd.read_excel("./10.1心电图数据.xlsx")
b = a.values
 
x0 = b[:-2,1:-1].astype(float)
x = b[-2:,1:-1]
g = b[:-2,-1:].astype(int).ravel()
v = np.cov(x0.T)

# 使用马氏距离的KNN分类器
# 这里的n_neighbors这个参数比较难理解，建议自己去查一下
knn = KNeighborsClassifier(n_neighbors=3, metric='mahalanobis', metric_params={'V': v})
knn.fit(x0, g)
pre = knn.predict(x)
print("马氏距离分类结果:", pre)
print("马氏距离已知样本的误判率:", 1 - knn.score(x0, g))  

马氏距离分类结果: [1 1]
马氏距离已知样本的误判率: 0.15000000000000002


* **Fisher判别法**

In [3]:
# 11.1题  你别管看懂了算法没有,我只能说  会抄
import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

# 已知样本数据（15个样本，每个样本2个特征）
x0 = np.array([
    [1.24, 1.27], [1.36, 1.74], [1.38, 1.64], [1.38, 1.82], [1.38, 1.90],
    [1.40, 1.70], [1.48, 1.82], [1.54, 1.82], [1.56, 2.08], [1.14, 1.78],
    [1.18, 1.96], [1.20, 1.86], [1.26, 2.00], [1.28, 2.00], [1.30, 1.96]
])
# 待分类样本（3个样本）
x = np.array([[1.24, 1.80], [1.28, 1.84], [1.40, 2.04]])
# 已知样本的类别标签（前9个为类别1，后6个为类别2）
g = np.hstack([np.ones(9), 2 * np.ones(6)])

clf = LDA()
clf.fit(x0,g)
print("判别结果为:", clf.predict (x))
print("已知样本的误判率为:",1-clf.score (x0,g))

判别结果为: [2. 2. 2.]
已知样本的误判率为: 0.0


In [4]:
import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
a = pd.read_excel("./10.1心电图数据.xlsx")
b = a.values
x0 = b[:-2,1:-1].astype(float)
x = b[-2:,1:-1]
g = b[:-2,-1:].astype(int).ravel()
clf = LDA()
clf.fit(x0,g)
print("判别结果为:", clf.predict (x))
print("已知样本的误判率为:",1-clf.score (x0,g))

判别结果为: [1 2]
已知样本的误判率为: 0.0


* **贝叶斯判别法**

In [None]:
import numpy as np
from sklearn.naive_bayes import GaussianNB
# 已知样本数据（15个样本，每个样本2个特征）
x0 = np.array([
    [1.24, 1.27], [1.36, 1.74], [1.38, 1.64], [1.38, 1.82], [1.38, 1.90],
    [1.40, 1.70], [1.48, 1.82], [1.54, 1.82], [1.56, 2.08], [1.14, 1.78],
    [1.18, 1.96], [1.20, 1.86], [1.26, 2.00], [1.28, 2.00], [1.30, 1.96]
])
# 待分类样本（3个样本）
x = np.array([
    [1.24, 1.80], [1.28, 1.84], [1.40, 2.04]
])
# 已知样本的类别标签（前9个为类别1，后6个为类别2）
g = np.hstack([np.ones(9), 2 * np.ones(6)])

clf = GaussianNB()
clf.fit(x0,g)
print("判别结果为:", clf.predict (x))
print("已知样本的误判率为:",1-clf.score (x0,g))

In [None]:
import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
a = pd.read_excel("./10.1心电图数据.xlsx")
b = a.values
x0 = b[:-2,1:-1].astype(float)
x = b[-2:,1:-1]
g = b[:-2,-1:].astype(int).ravel()
clf = GaussianNB()
clf.fit(x0,g)
print("判别结果为:", clf.predict (x))
print("已知样本的误判率为:",1-clf.score (x0,g))

* **判别准则的评价**
    * **回代误判率**
    * **交叉误判率**

In [7]:
#线性交叉误判率
import numpy as np
import pandas as pd
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.model_selection import cross_val_score
a = pd.read_excel("./10.1心电图数据.xlsx")
b = a.values
x0 = b[:-2,1:-1].astype(float)
g = b[:-2,-1:].astype(int).ravel()
model = LinearDiscriminantAnalysis()
print(f"准确率为{cross_val_score(model,x0,g,cv=2)}")

准确率为[0.9 0.8]


#### **主成分分析法**
[查看reference](./Method_Reference/PCA.ipynb)
* **主成分的目的是降维, 减少变量的个数,因此一般选取少量的主成分(一般不超过6个),只要累积贡献率超过85%即可**

![](./img/Snipaste_2025-07-09_14-05-08.png)

In [11]:
import numpy as np
from sklearn.decomposition import PCA
# 根据表11.3数据创建NumPy数组
# 列顺序：序号 | 身高(cm) | 胸围(cm) | 体重(kg)
data = np.array([
    [1, 149.5, 69.5, 38.5],    # 样本1
    [2, 162.5, 77.0, 55.5],     # 样本2
    [3, 162.7, 78.5, 50.8],     # 样本3
    [4, 162.2, 87.5, 65.5],     # 样本4
    [5, 156.5, 74.5, 49.0],     # 样本5
    [6, 156.1, 74.5, 45.5],     # 样本6
    [7, 172.0, 76.5, 51.0],     # 样本7
    [8, 173.2, 81.5, 59.5],     # 样本8
    [9, 159.5, 74.5, 43.5],     # 样本9
    [10, 157.7, 79.0, 53.5]     # 样本10
])

b=data[:,1:4] #构造数据矩阵
md = PCA().fit(b)
print("特征值为:",md.explained_variance_)
print("各主成分的贡献率:",md.explained_variance_ratio_)
print("奇异值为:",md.singular_values_)
print("各主成分的系数: \n",md.components_) #每行是一个主成分

print("下面直接计算特征值和特征向量,和库函数进行对比")
print("------------------\n")
cf=np.cov(b.T) #计算协方差阵
c, d=np.linalg.eig(cf) #求特征值和特征向量
print("特征值为:",c)
print("特征向量为:\n",d)
print("各主成分的贡献率为:",c/np.sum(c))

特征值为: [110.00413886  25.32447973   1.56804807]
各主成分的贡献率: [0.80355601 0.18498975 0.01145425]
奇异值为: [31.46485738 15.09703009  3.75665179]
各主成分的系数: 
 [[ 0.55915657  0.42128705  0.71404562]
 [ 0.82767368 -0.33348264 -0.45138188]
 [ 0.04796048  0.84338992 -0.53515721]]
下面直接计算特征值和特征向量,和库函数进行对比
------------------

特征值为: [110.00413886  25.32447973   1.56804807]
特征向量为:
 [[ 0.55915657  0.82767368 -0.04796048]
 [ 0.42128705 -0.33348264 -0.84338992]
 [ 0.71404562 -0.45138188  0.53515721]]
各主成分的贡献率为: [0.80355601 0.18498975 0.01145425]


#### **因子分析**

**因子分析**
![](./img/Snipaste_2025-07-09_15-43-51.png)

***这里挂一张图片，理解一下PCA和因子分析的区别***
![](./img/Snipaste_2025-07-09_15-33-29.png)