### 常用方法
对于一个模型来说, 有些特征可能很关键, 有些特征用处不大, 因此**特征选择**在特征工程中是最基本最常见的操作, 另外, 在模型训练的过程中也会遇到维度灾难, 特征数量过多, 我们希望在确保不丢失重要特征的前提下来减少维度, 也会用到**特征降维**的方法

1. **低方差过滤法**: 对于特征的选择可以直接基于方差来进行判断, 这是最简单的一种方式, 低方差意味着这个特征的所有样本值几乎相同, 对预测的影响很小, 可以将其去掉

In [1]:
import numpy as np

In [4]:
# 构造特征
a = np.random.normal(5, 0.1, size=100)
# 查看方差
print(np.var(a))

0.010281775186515725


In [5]:
b = np.random.normal(5, 1, size=100)
print(np.var(b))

0.8099404257396152


In [8]:
# 构造特征向量
X = np.vstack((a, b)).T
print(X)
print(X.shape)

[[5.23533838 6.07195069]
 [4.91058285 5.2761389 ]
 [5.13464856 5.66409772]
 [4.98587004 5.65480261]
 [5.15803313 5.59307049]
 [5.01279418 4.7308407 ]
 [4.9892487  5.07123266]
 [4.98061731 5.01905576]
 [5.09872776 6.61295432]
 [5.08082606 4.37513313]
 [5.06372088 3.31339841]
 [5.13169128 3.6460076 ]
 [5.0070733  4.94513214]
 [5.04927684 4.32199743]
 [5.125244   4.61267481]
 [4.78207413 4.69798229]
 [4.99577224 3.00538308]
 [5.22315706 5.42365884]
 [4.91335098 6.20760284]
 [4.86585906 5.52468593]
 [4.84390804 3.86877592]
 [5.06637425 5.04978041]
 [5.01235851 4.90093065]
 [4.92228654 6.5915852 ]
 [4.85050158 5.75762579]
 [4.96786305 4.7455235 ]
 [5.20468148 4.40779167]
 [4.89314463 5.83813422]
 [4.9682027  5.25509874]
 [5.06387739 5.4080415 ]
 [4.81535872 4.45684964]
 [4.91925436 5.56958795]
 [5.13192894 4.10893903]
 [4.98544276 5.12027164]
 [4.93004754 6.80577293]
 [5.16514018 7.29421871]
 [5.06632473 4.22992605]
 [5.12461742 3.93663184]
 [5.23205107 5.75098207]
 [5.07253825 4.10980999]


In [10]:
# 低方差过滤, varianceThreshold方差阈值
from sklearn.feature_selection import VarianceThreshold
# 通过定义阈值来剔除掉方差小的特征
vt = VarianceThreshold(0.1)
X_filtered = vt.fit_transform(X)
print(X_filtered)
print(X_filtered.shape)

[[6.07195069]
 [5.2761389 ]
 [5.66409772]
 [5.65480261]
 [5.59307049]
 [4.7308407 ]
 [5.07123266]
 [5.01905576]
 [6.61295432]
 [4.37513313]
 [3.31339841]
 [3.6460076 ]
 [4.94513214]
 [4.32199743]
 [4.61267481]
 [4.69798229]
 [3.00538308]
 [5.42365884]
 [6.20760284]
 [5.52468593]
 [3.86877592]
 [5.04978041]
 [4.90093065]
 [6.5915852 ]
 [5.75762579]
 [4.7455235 ]
 [4.40779167]
 [5.83813422]
 [5.25509874]
 [5.4080415 ]
 [4.45684964]
 [5.56958795]
 [4.10893903]
 [5.12027164]
 [6.80577293]
 [7.29421871]
 [4.22992605]
 [3.93663184]
 [5.75098207]
 [4.10980999]
 [6.29443726]
 [5.47308742]
 [5.15388982]
 [6.03702489]
 [4.01523114]
 [6.22360892]
 [6.21645152]
 [6.18002805]
 [4.70806789]
 [5.39576309]
 [6.24549971]
 [3.67056897]
 [6.11069944]
 [6.11608005]
 [5.82584887]
 [4.00284664]
 [5.2753846 ]
 [4.54909584]
 [4.48858659]
 [5.02207416]
 [5.07672508]
 [5.60862237]
 [4.88580683]
 [4.09418066]
 [3.93103881]
 [5.66593067]
 [6.74953797]
 [6.97577466]
 [4.19966038]
 [4.28978131]
 [6.94997467]
 [4.98