<a href="https://www.kaggle.com/code/minhazengg/ciol-task-3?scriptVersionId=215964752" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

### In the previous task, we have got the best model was Stacking Classifier (estimators are RandomForest and Gradient Boosting). Now we will do hyperparameter tuning on this to increase model performance

#### Its previous performance was:
 - Accuracy: 0.9990
 - Precision: 1.0000
 - Recall: 0.9672
 - F1-Score: 0.9833
 - AUROC: 0.9832
 - AUPRC: 0.9694

#### Importing all libraries that needed

In [1]:
from sklearn.model_selection import GridSearchCV, StratifiedKFold, cross_validate
from sklearn.ensemble import StackingClassifier, RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, average_precision_score, make_scorer
import pandas as pd

#### Loading the scaled dataset that saved from previous 

In [2]:
df = pd.read_csv("/kaggle/input/processed-dataset/processed_dataset.csv")

In [3]:
df

Unnamed: 0,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Type_encoded,Failure_Type_encoded,Temperature_diff,Torque_per_speed,Temperature_avg,log_Temperature_diff,Target
0,-0.952389,-0.947360,0.068185,0.282200,-1.695984,1.333889,-0.102889,0.498849,0.079191,-0.980410,0.528532,0
1,-0.902393,-0.879959,-0.729472,0.633308,-1.648852,-0.332223,-0.102889,0.498849,0.677574,-0.921180,0.528532,0
2,-0.952389,-1.014761,-0.227450,0.944290,-1.617430,-0.332223,-0.102889,0.398954,0.688185,-1.010026,0.433531,0
3,-0.902393,-0.947360,-0.590021,-0.048845,-1.586009,-0.332223,-0.102889,0.398954,0.075735,-0.950795,0.433531,0
4,-0.902393,-0.879959,-0.729472,0.001313,-1.554588,-0.332223,-0.102889,0.498849,0.171294,-0.921180,0.528532,0
...,...,...,...,...,...,...,...,...,...,...,...,...
9995,-0.602417,-1.082162,0.363820,-1.052012,-1.476034,1.333889,-0.102889,-0.400212,-0.962189,-0.832334,-0.357912,0
9996,-0.552421,-1.082162,0.520005,-0.821283,-1.428902,-1.998335,-0.102889,-0.500108,-0.838429,-0.802719,-0.461017,0
9997,-0.502425,-0.947360,0.592519,-0.660777,-1.350349,1.333889,-0.102889,-0.400212,-0.745799,-0.713873,-0.357912,0
9998,-0.502425,-0.879959,-0.729472,0.854005,-1.303217,-1.998335,-0.102889,-0.300317,0.854370,-0.684258,-0.255775,0


#### Splitting

In [4]:
X = df.drop(columns=["Target"])
y = df["Target"]

Stratified k fold for consistant splits

In [5]:
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

#### Now we will defined the base model

In [6]:
base_estimators = [
    ('rf', RandomForestClassifier(random_state=42)),
    ('gb', GradientBoostingClassifier(random_state=42))
]

#### Lets initialize the stacking classifier

In [7]:
stacking_model = StackingClassifier(
    estimators=base_estimators,
    final_estimator=RandomForestClassifier(random_state=42),
    n_jobs=-1
)

#### list of hyperparameteres

In [8]:
param_grid = {
    'final_estimator__n_estimators': [50, 100, 200],
    'final_estimator__max_depth': [3, 5, 7],
    'final_estimator__min_samples_split': [2, 5],
    'final_estimator__min_samples_leaf': [2, 4],
    'final_estimator__criterion': ['gini', 'entropy']
}

#### Defining which metrices that we need

In [9]:
scorers = {
    'accuracy': make_scorer(accuracy_score),
    'precision': make_scorer(precision_score, average='weighted'),
    'recall': make_scorer(recall_score, average='weighted'),
    'f1_score': make_scorer(f1_score, average='weighted'),
    'roc_auc': make_scorer(roc_auc_score, needs_proba=True, multi_class='ovo'),
    'average_precision': make_scorer(average_precision_score, needs_proba=True)
}

#### Lets initialize the GridSearchCV model

In [10]:
grid_search = GridSearchCV(
    estimator=stacking_model,
    param_grid=param_grid,
    scoring=scorers,
    refit='f1_score',
    cv=skf,
    n_jobs=-1,
    verbose=4
)

#### Training the gridsearch model

In [11]:
import time


start_time = time.time()
grid_search.fit(X, y)
end_time = time.time()
training_time = end_time - start_time

Fitting 5 folds for each of 72 candidates, totalling 360 fits


In [12]:
print(f"\nBest Parameters: {grid_search.best_params_}")
print(f"Best F1-Score: {grid_search.best_score_:.4f}")
print(f"Total Training Time: {training_time:.2f} seconds")


Best Parameters: {'final_estimator__criterion': 'entropy', 'final_estimator__max_depth': 5, 'final_estimator__min_samples_leaf': 4, 'final_estimator__min_samples_split': 2, 'final_estimator__n_estimators': 100}
Best F1-Score: 0.9990
Total Training Time: 1684.92 seconds


#### getting the best model

In [13]:
best_model = grid_search.best_estimator_

#### LEts perform cross validation in the best model

In [14]:
cv_results = cross_validate(
    best_model, X, y, cv=skf, 
    scoring=scorers, 
    return_train_score=False, 
    n_jobs=-1
)

In [15]:
print("\nEvaluation Metrics Across Folds:")
print(f"Accuracy: {cv_results['test_accuracy'].mean():.4f}")
print(f"Precision: {cv_results['test_precision'].mean():.4f}")
print(f"Recall: {cv_results['test_recall'].mean():.4f}")
print(f"F1-Score: {cv_results['test_f1_score'].mean():.4f}")
print(f"AUROC: {cv_results['test_roc_auc'].mean():.4f}")
print(f"AUPRC: {cv_results['test_average_precision'].mean():.4f}")


Evaluation Metrics Across Folds:
Accuracy: 0.9990
Precision: 0.9990
Recall: 0.9990
F1-Score: 0.9990
AUROC: 0.9890
AUPRC: 0.9758


## The model performance improved significantly!
#### Its previous performance was:
 - Accuracy: 0.9990
 - Precision: 1.0000
 - Recall: 0.9672
 - F1-Score: 0.9833
 - AUROC: 0.9832
 - AUPRC: 0.9694