## Hard/Soft Voting实现

* 硬投票的实现 （不用sklearn自带的函数）
* 硬投票的实现 （用sklearn自带的函数）
* 软投票的实现 （用sklearn自带的函数）

### 先加载模块

In [2]:
from sklearn import datasets, linear_model, svm, neighbors
from sklearn.metrics import accuracy_score
from numpy import argmax

### 加载乳腺癌的数据

In [3]:
breast_cancer = datasets.load_breast_cancer()
x, y = breast_cancer.data, breast_cancer.target

### 硬投票的实现 （不用sklearn自带的函数）

**初始化基础学习器**

In [4]:
learner_1 = neighbors.KNeighborsClassifier(n_neighbors=5)  
learner_2 = linear_model.Perceptron(tol=1e-2, random_state=0)
learner_3 = svm.SVC(gamma=0.001)

**生成训练集和验证集**

In [5]:
test_samples = 100
x_train, y_train = x[:-test_samples], y[:-test_samples]
x_test, y_test = x[-test_samples:], y[-test_samples:]

**训练三个基本学习器**

In [6]:
learner_1.fit(x_train, y_train)
learner_2.fit(x_train, y_train)
learner_3.fit(x_train, y_train)

predictions_1 = learner_1.predict(x_test)
predictions_2 = learner_2.predict(x_test)
predictions_3 = learner_3.predict(x_test)

**应用硬投票法**

In [7]:
hard_predictions = []
for i in range(test_samples):
    # Count the votes for each class
    counts = [0 for _ in range(2)]
    counts[predictions_1[i]] = counts[predictions_1[i]]+1
    counts[predictions_2[i]] = counts[predictions_2[i]]+1
    counts[predictions_3[i]] = counts[predictions_3[i]]+1
    # Find the class with most votes
    final = argmax(counts)
    # Add the class to the final predictions
    hard_predictions.append(final) 

**打印结果**

In [8]:
# Accuracies of base learners
print('L1:', accuracy_score(y_test, predictions_1))
print('L2:', accuracy_score(y_test, predictions_2))
print('L3:', accuracy_score(y_test, predictions_3))
# Accuracy of hard voting
print('-'*30)
print('Hard Voting:', accuracy_score(y_test, hard_predictions))

L1: 0.94
L2: 0.78
L3: 0.88
------------------------------
Hard Voting: 0.9


### 硬投票的实现 （用sklearn自带的函数）

**加载voting模块**

In [15]:
from sklearn.ensemble import VotingClassifier
from sklearn import datasets, naive_bayes, svm, neighbors

**用voting模块训练模型**

In [11]:
learner_1 = neighbors.KNeighborsClassifier(n_neighbors=5)
learner_2 = linear_model.Perceptron(tol=1e-2, random_state=0)
learner_3 = svm.SVC(gamma=0.001)
voting = VotingClassifier([('KNN', learner_1),
                           ('Prc', learner_2),
                           ('SVM', learner_3)])

In [12]:
# Fit classifier with the training data
voting.fit(x_train, y_train)

# Predict the most voted class
hard_predictions = voting.predict(x_test)

**打印结果**

In [13]:
print('-'*30)
print('Hard Voting:', accuracy_score(y_test, hard_predictions))

------------------------------
Hard Voting: 0.9


### 软投票的实现 （用sklearn自带的函数）

这里我们把perception换成贝叶斯因为我们希望基础学习器可以算概率。而Perception没这个选择。SVC可以用probability=True 来生成概率

In [25]:
# Instantiate the learners (classifiers)
learner_1 = neighbors.KNeighborsClassifier(n_neighbors=5)
learner_2 = naive_bayes.GaussianNB()
learner_3 = svm.SVC(gamma=0.001, probability=True)

# --- SECTION 3 ---
# Instantiate the voting classifier
voting = VotingClassifier([('KNN', learner_1),
                           ('NB', learner_2),
                           ('SVM', learner_3)],
                            voting='soft')

In [26]:
voting.fit(x_train, y_train)
learner_1.fit(x_train, y_train)
learner_2.fit(x_train, y_train)
learner_3.fit(x_train, y_train)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='rbf',
    max_iter=-1, probability=True, random_state=None, shrinking=True, tol=0.001,
    verbose=False)

In [27]:
# Predict the most probable class
hard_predictions = voting.predict(x_test)

# Get the base learner predictions
predictions_1 = learner_1.predict(x_test)
predictions_2 = learner_2.predict(x_test)
predictions_3 = learner_3.predict(x_test)

**打印结果**

In [24]:
# Accuracies of base learners
print('L1:', accuracy_score(y_test, predictions_1))
print('L2:', accuracy_score(y_test, predictions_2))
print('L3:', accuracy_score(y_test, predictions_3))
# Accuracy of soft voting
print('-'*30)
print('Soft Voting:', accuracy_score(y_test, hard_predictions))

L1: 0.9
L2: 0.96
L3: 0.88
------------------------------
Soft Voting: 0.94


软投票要求基础学习器返回每个预测的概率。如果基础学习器大大取代或低估了概率，则集成的预测能力将受到影响