**ENSEMBLE MODEL: STACKING**

**DATA LOADING AND PREPARATION**

In [2]:
import pandas as pd

In [3]:
# Read the CSV
tfidf_df = pd.read_csv("tfidf_sncb.csv", sep='\,', engine='python')

tfidf_df['incident_type'] = tfidf_df['incident_type'].astype('string') 

tfidf_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1011 entries, 0 to 1010
Columns: 828 entries, incident_id to 998
dtypes: float64(826), int64(1), string(1)
memory usage: 6.4 MB


In [4]:
from sklearn.model_selection import train_test_split

# Filter in the Features (the values acquired from the events sequence after TF-IDF)
X = tfidf_df.drop(['incident_type', 'incident_id'], axis=1) 

# Filter in the Target variable (labels / incident types)
y = tfidf_df['incident_type']  

# setting random_state constant to be used in the whole pipeline and guarantee reproducibility
r_state = 123

# Split data into training+validation and testing sets
train_val_X, test_X, train_val_y, test_y = train_test_split(X, 
                                                            y, 
                                                            train_size = 0.8, 
                                                            random_state = r_state, # setting random_state for reproducibility
                                                            stratify = y) # to respect class imbalance in the label column

print(f"The train_val_X pandas df has {len(train_val_X)} rows and {len(train_val_X.columns)} columns.")
print(f"The test_y pandas series has {len(test_y)} rows and 1 column.")

The train_val_X pandas df has 808 rows and 826 columns.
The test_y pandas series has 203 rows and 1 column.


In [5]:
from collections import Counter

# get the size of the smallest incident type class
value_counts = Counter(train_val_y)
min_class_setsize = min(value_counts.values())

print(f"In RepeatedStratifiedKFold() function, the parameter n_splits has to be set atmost to {min_class_setsize}, due to class imbalance in the label column.")

In RepeatedStratifiedKFold() function, the parameter n_splits has to be set atmost to 3, due to class imbalance in the label column.


**MODEL TRAINING AND VALIDATION**

In [6]:
import numpy as np
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, StackingClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score
from datetime import datetime

# Define array of base models
base_models = [
    ('Random Forest', RandomForestClassifier(random_state = r_state)),
    ('Neural Networks', MLPClassifier(random_state = r_state)),
    ('Gradient Boosting', GradientBoostingClassifier(random_state = r_state))
]

# Define Stacking
model_stacking = StackingClassifier(estimators=base_models, final_estimator=LogisticRegression())

# Set up cross-validation
rskf = RepeatedStratifiedKFold(n_splits = min_class_setsize, 
                               n_repeats = 34, 
                               random_state = r_state)

# Start timing
start_time = datetime.now()

# Initialize lists to store metrics
accuracy_scores = []
weighted_f1_scores = []
micro_f1_scores = []
macro_f1_scores = []
fold = 1

# Cross-validation loop
for train_idx, val_idx in rskf.split(train_val_X, train_val_y):

    # Uncomment the print statement for debugging only
    #print(f"Starting model training and validation on fold {fold}")
    
    # Split data
    fold_train_X, fold_val_X = train_val_X.iloc[train_idx], train_val_X.iloc[val_idx]
    fold_train_y, fold_val_y = train_val_y.iloc[train_idx], train_val_y.iloc[val_idx]

    # Train the ensemble model
    model_stacking.fit(fold_train_X, fold_train_y)  # Train the ensemble model
    
    # Predict on the test set
    fold_pred_y = model_stacking.predict(fold_val_X)
    
    # Compute metrics
    accuracy_scores.append(accuracy_score(fold_val_y, fold_pred_y))
    weighted_f1_scores.append(f1_score(fold_val_y, fold_pred_y, average='weighted'))
    micro_f1_scores.append(f1_score(fold_val_y, fold_pred_y, average='micro'))
    macro_f1_scores.append(f1_score(fold_val_y, fold_pred_y, average='macro'))
    
    fold += 1

# End timing
end_time = datetime.now()
elapsed_time = end_time - start_time

# Aggregate results
print(f"The training and validation of model_stacking took {elapsed_time.total_seconds():.2f} seconds across {rskf.get_n_splits()} iterations.\n")

print(f"Mean Accuracy:          {np.mean(accuracy_scores):.8f} ± {np.std(accuracy_scores):.8f}")
print(f"Mean Weighted F1-Score: {np.mean(weighted_f1_scores):.8f} ± {np.std(weighted_f1_scores):.8f}")
print(f"Mean Micro F1-Score:    {np.mean(micro_f1_scores):.8f} ± {np.std(micro_f1_scores):.8f}")
print(f"Mean Macro F1-Score:    {np.mean(macro_f1_scores):.8f} ± {np.std(macro_f1_scores):.8f}\n")

print(f"Each fold had {len(fold_train_X)} entries for training and {len(fold_val_X)} for validation.")



The training and validation of model_stacking took 8862.37 seconds across 102 iterations.



NameError: name 'np' is not defined

**The training and validation of model_stacking took 8862.37 seconds across 102 iterations.**

Mean Accuracy:          0.68379134 ± 0.02067524

Mean Weighted F1-Score: 0.66741732 ± 0.01987092

Mean Micro F1-Score:    0.68379134 ± 0.02067524

Mean Macro F1-Score:    0.35068933 ± 0.01554369

Each fold had 539 entries for training and 269 for validation.

**TEST SET RESULTS**

In [13]:
from sklearn.metrics import classification_report, confusion_matrix

# Train the ensemble model
model_stacking.fit(train_val_X, train_val_y)

# Predict on the test set
test_pred_y = model_stacking.predict(test_X)

# Compute and display metrics
print(f"The model classified correctly {sum(test_y == test_pred_y)} entries from a total of {len(test_X)}.\n")

print(f"Accuracy on test set:          {accuracy_score(test_y, test_pred_y)}")
print(f"Weighted F1-Score on test set: {f1_score(test_y, test_pred_y, average='weighted')}\n")

print("F1-Score per class\n")

# Generate classification report
report = classification_report(test_y, test_pred_y, output_dict=True, zero_division=0)

# Display F1-score per class
for class_label, metrics in report.items():
    if isinstance(metrics, dict) and 'f1-score' in metrics:
        print(f"Class {class_label}: F1-Score = {metrics['f1-score']:.6f}")

print("\nAccuracy per class\n")

# Display F1-score per class
for class_label, metrics in report.items():
    if isinstance(metrics, dict) and 'recall' in metrics:
        print(f"Class {class_label}: Recall = {metrics['recall']:.6f}") # Recall is equivalent to per-class accuracy

[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   3 out of   3 | elapsed:    1.0s finished
[Parallel(n_jobs=-1)]: Done   3 out of   3 | elapsed:    4.7s finished
[Parallel(n_jobs=-1)]: Done   3 out of   3 | elapsed:   18.4s finished


The model classified correctly 146 entries from a total of 203.

Accuracy on test set:          0.7192118226600985
Weighted F1-Score on test set: 0.6991354711180391

F1-Score per class

Class 11: F1-Score = 0.000000
Class 13: F1-Score = 0.852941
Class 14: F1-Score = 0.703704
Class 16: F1-Score = 0.000000
Class 17: F1-Score = 0.000000
Class 2: F1-Score = 0.807692
Class 3: F1-Score = 0.000000
Class 4: F1-Score = 0.642857
Class 6: F1-Score = 0.000000
Class 7: F1-Score = 0.000000
Class 9: F1-Score = 0.708333
Class 99: F1-Score = 0.578947
Class macro avg: F1-Score = 0.357873
Class weighted avg: F1-Score = 0.699135

Accuracy per class

Class 11: Recall = 0.000000
Class 13: Recall = 0.906250
Class 14: Recall = 0.633333
Class 16: Recall = 0.000000
Class 17: Recall = 0.000000
Class 2: Recall = 0.875000
Class 3: Recall = 0.000000
Class 4: Recall = 0.562500
Class 6: Recall = 0.000000
Class 7: Recall = 0.000000
Class 9: Recall = 0.739130
Class 99: Recall = 0.628571
Class macro avg: Recall = 0.3620

**The model classified correctly 146 entries from a total of 203.**

Accuracy on test set:          0.7192118226600985

Weighted F1-Score on test set: 0.6990450091747766

**F1-Score per class**

Class 11: F1-Score = 0.000000

Class 13: F1-Score = 0.846715

Class 14: F1-Score = 0.703704

Class 16: F1-Score = 0.000000

Class 17: F1-Score = 0.000000

Class 2: F1-Score = 0.823529

Class 3: F1-Score = 0.000000

Class 4: F1-Score = 0.642857

Class 6: F1-Score = 0.000000

Class 7: F1-Score = 0.000000

Class 9: F1-Score = 0.708333

Class 99: F1-Score = 0.578947

Class macro avg: F1-Score = 0.358674

Class weighted avg: F1-Score = 0.699045

ValueError: cross_val_predict only works for partitions

**SAVE AND EXPORT RESULTS**

In [8]:
"""
# Create DataFrame for Scores
accuracy_stacking_df = pd.DataFrame({'score': accuracy_scores})
weighted_f1_stacking_df = pd.DataFrame({'score': weighted_f1_scores})
micro_f1_stacking_df = pd.DataFrame({'score': micro_f1_scores})
macro_f1_stacking_df = pd.DataFrame({'score': macro_f1_scores})

# Export the DataFrame to a CSV file
accuracy_stacking_df .to_csv('accuracy_stacking.csv', index=False)
weighted_f1_stacking_df.to_csv('weighted_f1_stacking.csv', index=False)
micro_f1_stacking_df.to_csv('micro_f1_stacking.csv', index=False)
macro_f1_stacking_df.to_csv('macro_f1_stacking.csv', index=False)
"""