## Adaboost

#### AdaBoost是一种迭代算法，用于提高弱分类器的准确性。它的基本原理是，通过构建一系列弱分类器，然后将这些弱分类器集成在一起，形成一个强分类器。在每次迭代中，AdaBoost会根据上一次迭代的结果，调整各个样本的权重，使得那些被错误分类的样本在下一次迭代中得到更多的关注。

In [8]:
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from imblearn.over_sampling import RandomOverSampler
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# 导入自己写的工具类
from my_tools import *
# 忽略warning
import warnings
warnings.filterwarnings("ignore")

### 读数据

In [9]:
jibing_res = pd.read_excel("./jibing_feature_res_final.xlsx")
jibing = pd.read_excel("./jibing_feature_final.xlsx")

### 归一化

In [10]:
jibing = guiyihua(jibing)

### 标准化

In [11]:
jibing = biaozhunhua(jibing)

### 划分 均衡化

In [12]:
sampler = RandomOverSampler(sampling_strategy=1, random_state=42)
X_resampled, y_resampled = sampler.fit_resample(jibing,jibing_res)
Xtrain,Xtest,Ytrain,Ytest = train_test_split(X_resampled,y_resampled,test_size=0.3)

### 训练

In [13]:
# 创建AdaBoost分类器
clf= AdaBoostClassifier(
#     base_estimator=None, # 使用决策树作为基学习器（默认）
#     n_estimators=50, # 使用50个弱学习器
#     learning_rate=1.0, # 学习率为1.0
#     algorithm='SAMME.R', # 使用SAMME.R算法
#     random_state=42 # 设置随机种子，以便复现结果
)

# 训练模型
clf.fit(Xtrain, Ytrain)

# 在测试集上进行预测
y_pred = clf.predict(Xtest)

### 指标

In [14]:
metrics_ = res_metrics(Ytest,y_pred,"Adaboost")

#####################Adaboost#####################
+-------------------+--------------------+-------------------+
|     precision     |       recall       |         f1        |
+-------------------+--------------------+-------------------+
| 0.742152466367713 | 0.8112745098039216 | 0.775175644028103 |
+-------------------+--------------------+-------------------+


### 调参
- 弱分类器的类型
- 迭代次数
- 学习率

In [None]:
param_grid = {'n_estimators': np.linspace(10,200,20,dtype=int),
              'learning_rate': np.linspace(0.1,1,10),
              'algorithm':["SAMME.R","SAMME"]}
clf = AdaBoostClassifier()
grid_search = GridSearchCV(clf, param_grid, cv=5)
grid_search.fit(Xtrain, Ytrain)

In [None]:
grid_search.best_params_