# AdaBoost with cost-sensitive weight initialization

AdaBoost is an ensemble of *weak* classifiers. Each of these weak classifiers is trained on a weighted version of the original dataset, where samples that are poorly classified by the ensemble of the previous classifiers are weighted more than others. For the initial classifier, all samples are weighted equally.

An easy way to make the ensemble cost-sensitive is to use the **misclassification costs as initial sample weights**.

In [None]:
from lib.creditcard_fraud_dataset import get_train_test_dfs

df_train, df_test = get_train_test_dfs()

## Train

In [None]:
from lib.cs_train import train_clf
from sklearn.ensemble import AdaBoostClassifier


clf_ada_weighted = train_clf(
    df_train,
    Classifier=AdaBoostClassifier,
    sample_weight=df_train['C_misclf']
)

clf_ada_unweighted = train_clf(
    df_train,
    Classifier=AdaBoostClassifier,
)

## Evaluate

In [None]:
from lib.cs_eval import evaluate_clf

eval_metrics_weighted = evaluate_clf(clf_ada_weighted, df_test)
eval_metrics_unweighted = evaluate_clf(clf_ada_unweighted, df_test)

In [None]:
import pandas as pd

df = pd.DataFrame([
    {
        'method': 'AdaBoost, CS Weighted',
        **eval_metrics_weighted
    },
    {
        'method': 'AdaBoost baseline',
        **eval_metrics_unweighted
    }
])

In [None]:
df

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

sns.barplot(data=df, x='method', y='cost_recall')
plt.title('Cost Recall')

In [None]:
sns.barplot(data=df, x='method', y='cost_precision')
plt.title('Cost Precision')