### sklearn.ensemble.AdaBoostClassifie

* _class_ sklearn.ensemble.AdaBoostClassifier(_base_estimator=None_, _*_, _n_estimators=50_, _learning_rate=1.0_, _algorithm='SAMME.R'_, _random_state=None_)[[source]](https://github.com/scikit-learn/scikit-learn/blob/f3f51f9b6/sklearn/ensemble/_weight_boosting.py#L328)[¶](https://scikit-learn.org/1.1/modules/generated/sklearn.ensemble.AdaBoostClassifier.html?highlight=ada+boost#sklearn.ensemble.AdaBoostClassifier "Permalink to this definition")

Parameters:

**base_estimator**object, default=None

The base estimator from which the boosted ensemble is built. Support for sample weighting is required, as well as proper  `classes_`  and  `n_classes_`  attributes. If  `None`, then the base estimator is  [`DecisionTreeClassifier`](https://scikit-learn.org/1.1/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier "sklearn.tree.DecisionTreeClassifier")  initialized with  `max_depth=1`.

**n_estimators**int, default=50

The maximum number of estimators at which boosting is terminated. In case of perfect fit, the learning procedure is stopped early. Values must be in the range  `[1,  inf)`.

**learning_rate**float, default=1.0

Weight applied to each classifier at each boosting iteration. A higher learning rate increases the contribution of each classifier. There is a trade-off between the  `learning_rate`  and  `n_estimators`  parameters. Values must be in the range  `(0.0,  inf)`.

**algorithm**{‘SAMME’, ‘SAMME.R’}, default=’SAMME.R’

If ‘SAMME.R’ then use the SAMME.R real boosting algorithm.  `base_estimator`  must support calculation of class probabilities. If ‘SAMME’ then use the SAMME discrete boosting algorithm. The SAMME.R algorithm typically converges faster than SAMME, achieving a lower test error with fewer boosting iterations.

**random_state**int, RandomState instance or None, default=None

Controls the random seed given at each  `base_estimator`  at each boosting iteration. Thus, it is only used when  `base_estimator`  exposes a  `random_state`. Pass an int for reproducible output across multiple function calls. See  [Glossary](https://scikit-learn.org/1.1/glossary.html#term-random_state).

Attributes:

**base_estimator_**estimator

The base estimator from which the ensemble is grown.

**estimators_**list of classifiers

The collection of fitted sub-estimators.

**classes_**ndarray of shape (n_classes,)

The classes labels.

**n_classes_**int

The number of classes.

**estimator_weights_**ndarray of floats

Weights for each estimator in the boosted ensemble.

**estimator_errors_**ndarray of floats

Classification error for each estimator in the boosted ensemble.

[`feature_importances_`](https://scikit-learn.org/1.1/modules/generated/sklearn.ensemble.AdaBoostClassifier.html?highlight=ada+boost#sklearn.ensemble.AdaBoostClassifier.feature_importances_ "sklearn.ensemble.AdaBoostClassifier.feature_importances_")ndarray of shape (n_features,)

The impurity-based feature importances.

**n_features_in_**int

Number of features seen during  [fit](https://scikit-learn.org/1.1/glossary.html#term-fit).

New in version 0.24.

**feature_names_in_**ndarray of shape (`n_features_in_`,)

Names of features seen during  [fit](https://scikit-learn.org/1.1/glossary.html#term-fit). Defined only when  `X`  has feature names that are all strings.

New in version 1.0.

In [2]:
import pandas as pd


feature_name_df = pd.read_csv("./datasets/human_activity/features.txt", sep="\s+",header=None,
           names=["column_index", 'column_name'])

In [3]:
feature_dup_df = feature_name_df.groupby("column_name").count()
print(feature_dup_df[feature_dup_df["column_index"]>1].count())
feature_dup_df[feature_dup_df["column_index"]>1].head()


def get_new_feature_name_df(old_feature_name_df):
    #column_name으로 중복된 컬럼명에 대해서는 중복 차수 부여, col1, col1과 같이 2개의 중복 컬럼이 있을 경우 1, 2 
    feature_dup_df = pd.DataFrame(data=old_feature_name_df.groupby('column_name').cumcount(), columns=['dup_cnt'])
    # feature_dup_df의 index인 column_name을 reset_index()를 이용하여 컬럼으로 변환. 
    feature_dup_df = feature_dup_df.reset_index()
    # 인자로 받은 features_txt의 컬럼명 DataFrame과 feature_dup_df를 조인. 
    new_feature_name_df = pd.merge(old_feature_name_df.reset_index(), feature_dup_df, how='outer')
    # 새로운 컬럼명은 앞에 중복 차수를 접미어로 결합. 
    new_feature_name_df['column_name'] = new_feature_name_df[['column_name', 'dup_cnt']].apply(lambda x : x[0]+'_'+str(x[1]) 
                                                                                           if x[1] >0 else x[0] ,  axis=1)
    new_feature_name_df = new_feature_name_df.drop(['index'], axis=1)
    return new_feature_name_df

def get_human_dataset( ):
    
    # 각 데이터 파일들은 공백으로 분리되어 있으므로 read_csv에서 공백 문자를 sep으로 할당.
    feature_name_df = pd.read_csv('./datasets/human_activity/features.txt',sep='\s+',
                        header=None,names=['column_index','column_name'])
    
    # 중복된 feature명을 새롭게 수정하는 get_new_feature_name_df()를 이용하여 새로운 feature명 DataFrame생성. 
    new_feature_name_df = get_new_feature_name_df(feature_name_df)
    
    # DataFrame에 피처명을 컬럼으로 부여하기 위해 리스트 객체로 다시 변환
    feature_name = new_feature_name_df.iloc[:, 1].values.tolist()
    
    # 학습 피처 데이터 셋과 테스트 피처 데이터을 DataFrame으로 로딩. 컬럼명은 feature_name 적용
    X_train = pd.read_csv('./datasets/human_activity/train/X_train.txt',sep='\s+', names=feature_name )
    X_test = pd.read_csv('./datasets/human_activity/test/X_test.txt',sep='\s+', names=feature_name)
    
    # 학습 레이블과 테스트 레이블 데이터을 DataFrame으로 로딩하고 컬럼명은 action으로 부여
    y_train = pd.read_csv('./datasets/human_activity/train/y_train.txt',sep='\s+',header=None,names=['action'])
    y_test = pd.read_csv('./datasets/human_activity/test/y_test.txt',sep='\s+',header=None,names=['action'])
    
    # 로드된 학습/테스트용 DataFrame을 모두 반환 
    return X_train, X_test, y_train, y_test


X_train, X_test, y_train, y_test = get_human_dataset()

column_index    42
dtype: int64


In [6]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score

ada_clf = AdaBoostClassifier(n_estimators=100, random_state=0)
ada_clf.fit(X_train,y_train)
ada_pred = ada_clf.predict(X_test)
ada_accuracy = accuracy_score(y_test,ada_pred)
ada_accuracy

  y = column_or_1d(y, warn=True)


0.5310485239226331