# DATA622 -  Assignment 2
## Experimentation & Model Training

### Introduction
In Machine Learning, Experimentation refers to the systematic process of designing, executing, and analyzing different configurations to identify the optimal settings that performs best on a given task. Experimentation is learning by doing. It involves systematically changing parameters, evaluating results with metrics, and comparing different approaches to find the best solution; essentially, it's the practice of testing and refining machine learning models through controlled experiments to improve their performance.

The key is to modify only one or a few variables at a time to isolate the impact of each change and understand its effect on model performance. In the assignment you will conduct at least 6 experiments. In real life, data scientists run anywhere from a dozen to hundreds of experiments (depending on the dataset and problem domain). 

### Assignment
This assignment consists of conducting at least two (2) experiments for different algorithms: Decision Trees, Random Forest and Adaboost. That is, at least six (6) experiments in total (3 algorithms x 2 experiments each). For each experiment you will define what you are trying to achieve (before each run), conduct the experiment, and at the end you will review how your experiment went. These experiments will allow you to compare algorithms and choose the optimal model. 

Using the dataset and EDA from the previous assignment, perform the following: 
#### Algorithm Selection
You will perform experiments using the following algorithms:
- Decision Trees
- Random Forest
- Adaboost

#### Experiment 
For each of the algorithms (above), perform at least two (2) experiments. In a typical experiment you should:
- Define the objective of the experiment (hypothesis)
- Decide what will change, and what will stay the same
- Select the evaluation metric (what you want to measure)
- Perform the experiment
- Document the experiment so you compare results (track progress)
    
#### Variations
There are many things you can vary between experiments, here are some examples:
- Data sampling  (feature selection)
- Data augmentation e.g., regularization, normalization, scaling
- Hyperparameter optimization (you decide, random search, grid search, etc.)
- Decision Tree breadth & depth (this is an example of a hyperparameter)
- Evaluation metrics e.g., Accuracy, precision, recall, F1-score, AUC-ROC
- Cross-validation strategy e.g., holdout, k-fold, leave-one-out
- Number of trees (for ensemble models)
- Train-test split: Using different data splits to assess model generalization ability

#### Deliverable
##### Essay (minimum 500 words)
- Format: PDF
- Write a short essay summarizing your findings. Your essay should include:
    - Explain why you chose the experiments you did
    - Discuss bias & variance across the experiments e.g., between Decision Tree experiments, and with Random Forest & Adaboost
    - A table with experiments & results
    - What was the optimal model you found, and why
    - What conclusion did you came to? What do you recommend.
- Code
    - This should include your code, as well as the outputs of your code e.g. correlation chart
    - Format: Code should be saved in https://rpubs.com or https://github.com. Please provide a link to your code repo in the submission.
    - Please do not submit your code via Google Colab (due to permissioning issues).

### Start of Assignment 2 Work

In [1]:
from ucimlrepo import fetch_ucirepo 
import pandas as pd
from pandas.plotting import scatter_matrix
import numpy as np
import json
# import seaborn as sns
# import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.metrics import classification_report
# from sklearn.compose import ColumnTransformer
# from sklearn.pipeline import Pipeline
# from sklearn.preprocessing import OneHotEncoder
# from sklearn.impute import SimpleImputer

from sklearn import metrics

In [2]:
### Pulling in the data from Assignment 1 
df = pd.read_csv("raw_df_dropped.csv")

In [3]:
df.columns # No duration because dropped in assignment one. (Data Leakage) 

## Encoding the y var for (1/0) | (yes/no)
df['y'] = df['y'].map({'no':0, 'yes':1})

## Parsing the 'y' from the x vars 
X = df.drop(columns='y')
y = df['y']

## Splitting the test and train sets for future experiments and modeling
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)

# Confirming no nulls; no additional imputation needed because of Assignment 1 cleaning etc. 
# X_train.info()
# y_train.info()
# X_test.info()
# y_test.info()

In [4]:
# Getting lists of the x vars and their types For encoding pre-modeling 
num_cols = X_train.select_dtypes(include=['number']).columns.tolist()
print(num_cols)
cat_cols = [c for c in X_train.columns if c not in num_cols]
print(cat_cols)

['age', 'balance', 'day_of_week', 'campaign', 'previous']
['job', 'marital', 'education', 'default', 'housing', 'loan', 'contact', 'month', 'poutcome', 'previously_contacted']


In [5]:
X_train_encoded = pd.get_dummies(X_train, columns=cat_cols, drop_first=False)
X_test_encoded  = pd.get_dummies(X_test,  columns=cat_cols, drop_first=False)

In [6]:
print([i for i in X_train_encoded.columns if i not in X_test_encoded.columns])
print([i for i in X_test_encoded.columns if i not in X_train_encoded.columns])
## Columns are identical, no missing categories in each df so no need for alignment 

[]
[]


In [7]:
running_results = {}

### Beginning Experiments
- Keeping the training and test the same through out, i want to get more of a sense for how each model works with the same data.
- The primary metrics i will be watching over these experiements will be PR-AUC and the False Negatives. False negatives are missed clients, and the PR-AUC 


### Decision Tree
#### Experiment 1 
**GOAL:** Using the decision tree classifier default inputs in orderto just carry out a baseline decision tree model for the data. 

In [8]:
## Default settigns for baseline so threshoold is 0.5
dec_tree1 = DecisionTreeClassifier(random_state=42)
dec_tree1.fit(X_train_encoded, y_train)
prob1 = dec_tree1.predict_proba(X_test_encoded)[:, 1]
y_pred = dec_tree1.predict(X_test_encoded)

## Depth 
print(dec_tree1.tree_.max_depth)

# Print the classification report
print("Classification Report:")
print(classification_report(y_test, y_pred))
print("Accuracy Score")
print(metrics.accuracy_score(y_test, y_pred))

print("ROC-AUC :", round(metrics.roc_auc_score(y_test, prob1), 4))        
print("PR-AUC  :", round(metrics.average_precision_score(y_test, prob1), 4)) 

tn, fp, fn, tp = metrics.confusion_matrix(y_test, y_pred).ravel()
print("TP:", tp, "FP:", fp, "FN:", fn, "TN:", tn)

## Adding to the dict 
running_results.update(
    {"dec_tree1":{
        "prob":prob1,
        "y_pred":y_pred,       
    }
    })

43
Classification Report:
              precision    recall  f1-score   support

           0       0.91      0.89      0.90      7833
           1       0.29      0.33      0.31      1051

    accuracy                           0.82      8884
   macro avg       0.60      0.61      0.60      8884
weighted avg       0.84      0.82      0.83      8884

Accuracy Score
0.8246285457001351
ROC-AUC : 0.6106
PR-AUC  : 0.1746
TP: 347 FP: 854 FN: 704 TN: 6979


**Notes on Baseline Decision Tree Experiment 1:** 
 - it seems that the "no" predicitons perform super well, while the "yes" values do not. This goes fir precision, recall, and f1 score. However this is most likely because of the imbalanced nature of the training and test data. Essentially no values are much more prominant then yes values.
 - When the DT model predics a yes value it is only correct 30% of time, it also misses a lot of the actual yes values. This is not a good model. 
 - I think in experiment 2 we need to add balanced weights to the yes values in order to obtain better results.
 - The Macro Average, where wieghts and imbalances are considered is very different from the othern umbers. We need to tweak the weightings.
 - ROC_AUC is just slightly better than 50/50 chance; similarly the PR_AUC is just slightly better than the actual ratios
 - The False Positives in the confusion matrix are very high, we want to reduce this number.
 - The False Negatives are also very high. We want to reduce this. 
 - For next experiment we want to change the settings so that the imbalance yes and no valueas are considered. Also, maybe limit the length of the deepest tree to stop overfitting and improve generalizations. 


#### Experiment 2
**GOAL:** Now that we have a baseline performancewith the default settings of the decision tree model, We want to change the input settings to improve the model. The max depth of the tree will be limited to 10. as it went over 40 in the first one. Additionally, the min sample leaves will be set to 15 to prevent small splits. Also, adding the balance wiehgts because of the yes vs no ratio. 

In [9]:
dec_tree2 = DecisionTreeClassifier(
    # LImiting the depth of the tree, with the baseline the depth was 42, way too deep. 
    max_depth=10,
    # Kepping the leaves larger, small leaves will be prone to over fitting the model
    min_samples_leaf=15,
    ### Helping balacne the weights for the minority "yes" values for more accurate results.
    class_weight='balanced',  
    # Reproducibility.
    random_state=42)      
dec_tree2.fit(X_train_encoded, y_train)
prob2 = dec_tree2.predict_proba(X_test_encoded)[:, 1]
y_pred = dec_tree2.predict(X_test_encoded)


## Depth 
print("maxDepth")
print(dec_tree2.tree_.max_depth)
# Print the classification report
print("Classification Report:")
print(classification_report(y_test, y_pred))
print("Accuracy Score")
print(metrics.accuracy_score(y_test, y_pred))

print("ROC-AUC :", round(metrics.roc_auc_score(y_test, prob2), 4))        
print("PR-AUC  :", round(metrics.average_precision_score(y_test, prob2), 4)) 

tn, fp, fn, tp = metrics.confusion_matrix(y_test, y_pred).ravel()
print("TP:", tp, "FP:", fp, "FN:", fn, "TN:", tn)

## Adding to the dict 
running_results.update(
    {"dec_tree2":{
        "prob":prob2,
        "y_pred":y_pred,       
    }
    })

maxDepth
10
Classification Report:
              precision    recall  f1-score   support

           0       0.94      0.85      0.89      7833
           1       0.35      0.59      0.44      1051

    accuracy                           0.82      8884
   macro avg       0.64      0.72      0.67      8884
weighted avg       0.87      0.82      0.84      8884

Accuracy Score
0.8208014407924359
ROC-AUC : 0.7744
PR-AUC  : 0.404
TP: 624 FP: 1165 FN: 427 TN: 6668


**Notes on Baseline Decision Tree Experiment 2:**
-  The confusion matrix results have improved. The True Positives increased. However the False Positives Also increased, which is what we dont want. That being said the False negatives decreased which is good.
-  The precision numbers didnt move too much, but the amount that they moved was in the proper direction. Secondly, the recall score increased a substaintial amount, so the model finds 2/3s of actual yes values. These shifts give a better F1 score.
-  For the AUC values, the ROC-AOC increased which indicates a better perfromance, as did the PR-AUC.
-  OVERALL: BETTER THAN DT EXPERIMENT 1

### Random Forest
#### Experiment 1
**GOAL:** Using the random forest model type, i want to again establish a baseline of performance for the first experiment for this model. Using default input settings. 

In [10]:
rand_forest1 = RandomForestClassifier(random_state=42)
rand_forest1.fit(X_train_encoded, y_train)
rf_prob1 = rand_forest1.predict_proba(X_test_encoded)[:, 1]
y_pred = rand_forest1.predict(X_test_encoded)

# Print the classification report
print("Classification Report:")
print(classification_report(y_test, y_pred))
print("Accuracy Score")
print(metrics.accuracy_score(y_test, y_pred))

print("ROC-AUC :", round(metrics.roc_auc_score(y_test, rf_prob1), 4))        
print("PR-AUC  :", round(metrics.average_precision_score(y_test, rf_prob1), 4)) 

tn, fp, fn, tp = metrics.confusion_matrix(y_test, y_pred).ravel()
print("TP:", tp, "FP:", fp, "FN:", fn, "TN:", tn)

# depths of all trees
depths = [est.tree_.max_depth for est in rand_forest1.estimators_]
print("Tree depth — min:", min(depths), "mean:", round(np.mean(depths), 2), "max:", max(depths))

# number of leaves (optional)
leaves = [est.get_n_leaves() for est in rand_forest1.estimators_]
print("Leaves — min:", min(leaves), "mean:", round(np.mean(leaves), 1), "max:", max(leaves))

## Adding to the dict 
running_results.update(
    {"rand_forest1":{
        "prob":rf_prob1,
        "y_pred":y_pred,       
    }
    })


Classification Report:
              precision    recall  f1-score   support

           0       0.91      0.98      0.94      7833
           1       0.60      0.24      0.34      1051

    accuracy                           0.89      8884
   macro avg       0.75      0.61      0.64      8884
weighted avg       0.87      0.89      0.87      8884

Accuracy Score
0.8912651958577218
ROC-AUC : 0.7861
PR-AUC  : 0.4239
TP: 253 FP: 168 FN: 798 TN: 7665
Tree depth — min: 33 mean: 38.08 max: 51
Leaves — min: 5234 mean: 5400.9 max: 5571


**Random Forest Experiment 1 Notes:**
- As a baseline the random forest results, with default inputs, performed better than the initial decision tree baseline. This shows up specifically in the AUC numbers, which is expected because of the methodology shift. 
- This margin of better performance was most relevant for the precision of the model, not the recall. Becayse of the recall scores, the F1 scroe is very similar to that of the baseline decision tree scores.
- When looking at the confusion matrics the TP are actually worse than the Decision tree baseline. That also goes for the FN and the FP too. This may just be becuase the depth of the trees in the first DT model was overfitting to the data.
- OVERALL: BETTER THAN DT 2 EXPERIMENT

##### Experiment 2
**GOAL:** Improve upon the baseline default settings of the random forest technique with this data set. We will shift the inputs in order to get a better result. Will be limiting the depth fo the trees to prevent over fitting, also will be adding the weights for the yes/no ration imbalance. Were also going to mandate 5 samples minimum befor splitting. 

In [11]:
rand_forest2 = RandomForestClassifier(
    random_state=42,
    max_depth = 20, ## max tree is 46 in exepriment 1, limiting this in 20
    min_samples_split=5,
    class_weight='balanced')

rand_forest2.fit(X_train_encoded, y_train)
rf_prob2 = rand_forest2.predict_proba(X_test_encoded)[:, 1]
y_pred = rand_forest2.predict(X_test_encoded)

# Print the classification report
print("Classification Report:")
print(classification_report(y_test, y_pred))
print("Accuracy Score")
print(metrics.accuracy_score(y_test, y_pred))

print("ROC-AUC :", round(metrics.roc_auc_score(y_test, rf_prob2), 4))        
print("PR-AUC  :", round(metrics.average_precision_score(y_test, rf_prob2), 4)) 

tn, fp, fn, tp = metrics.confusion_matrix(y_test, y_pred).ravel()
print("TP:", tp, "FP:", fp, "FN:", fn, "TN:", tn)

# depths of all trees
depths = [est.tree_.max_depth for est in rand_forest2.estimators_]
print("Tree depth — min:", min(depths), "mean:", round(np.mean(depths), 2), "max:", max(depths))

# number of leaves (optional)
leaves = [est.get_n_leaves() for est in rand_forest2.estimators_]
print("Leaves — min:", min(leaves), "mean:", round(np.mean(leaves), 1), "max:", max(leaves))


## Adding to the dict 
running_results.update(
    {"rand_forest2":{
        "prob":rf_prob2,
        "y_pred":y_pred,       
    }
    })


Classification Report:
              precision    recall  f1-score   support

           0       0.92      0.94      0.93      7833
           1       0.50      0.41      0.45      1051

    accuracy                           0.88      8884
   macro avg       0.71      0.68      0.69      8884
weighted avg       0.87      0.88      0.88      8884

Accuracy Score
0.8811346240432237
ROC-AUC : 0.7958
PR-AUC  : 0.435
TP: 435 FP: 440 FN: 616 TN: 7393
Tree depth — min: 20 mean: 20.0 max: 20
Leaves — min: 2279 mean: 2814.8 max: 3400


**Random Forest Experiment 2 Notes:**
- Precision stayed the same, with a very slight decrease. however the recall improved fopr the minority yes values. This led to an increase in the F1 score.
- The ROC-AUC and the PR-AUC both essentially stayed the same with slight increases in performance
- The min leaves were ~2300, should factor in for next run.
- OVERALL: IMPROVEMENT ON RF 1

##### Experiment 3
**GOAL:** Improve upon the baseline default settings, and that of the second experiment for Random Forest.Again, shifting the inputs in order to get a better result.  Increassing tree depth slightly, increasing the number of estimators with hopes of improving results. Additionally were increasing the number of minimum sampels needed for decining splits. 

In [12]:
rand_forest3 = RandomForestClassifier(
    random_state=42,
    max_depth = 25, ## max tree is 46 in exepriment 1, limiting this in 2
    min_samples_split=20,
    n_estimators=200, ## also increasing this from default
    class_weight='balanced')
rand_forest3.fit(X_train_encoded, y_train)
rf_prob3 = rand_forest3.predict_proba(X_test_encoded)[:, 1]
y_pred = rand_forest3.predict(X_test_encoded)


# Print the classification report
print("Classification Report:")
print(classification_report(y_test, y_pred))
print("Accuracy Score")
print(metrics.accuracy_score(y_test, y_pred))

print("ROC-AUC :", round(metrics.roc_auc_score(y_test, rf_prob3), 4))        
print("PR-AUC  :", round(metrics.average_precision_score(y_test, rf_prob3), 4)) 

tn, fp, fn, tp = metrics.confusion_matrix(y_test, y_pred).ravel()
print("TP:", tp, "FP:", fp, "FN:", fn, "TN:", tn)

# depths of all trees
depths = [est.tree_.max_depth for est in rand_forest3.estimators_]
print("Tree depth — min:", min(depths), "mean:", round(np.mean(depths), 2), "max:", max(depths))

# number of leaves (optional)
leaves = [est.get_n_leaves() for est in rand_forest3.estimators_]
print("Leaves — min:", min(leaves), "mean:", round(np.mean(leaves), 1), "max:", max(leaves))


## Adding to the dict 
running_results.update(
    {"rand_forest3":{
        "prob":rf_prob3,
        "y_pred":y_pred,       
    }
    })


Classification Report:
              precision    recall  f1-score   support

           0       0.93      0.91      0.92      7833
           1       0.45      0.52      0.48      1051

    accuracy                           0.87      8884
   macro avg       0.69      0.72      0.70      8884
weighted avg       0.88      0.87      0.87      8884

Accuracy Score
0.8681900045024764
ROC-AUC : 0.8017
PR-AUC  : 0.4422
TP: 551 FP: 671 FN: 500 TN: 7162
Tree depth — min: 25 mean: 25.0 max: 25
Leaves — min: 1553 mean: 1744.1 max: 1937


**Random Forest Experiment 3 Notes:**
- Further refined the inputs based on the second experiment with random forest. This time allowed for more depth in the trees and increased the number of sampels taken for each node.Slso increased the n_estimators var. 
- This yeilded additional gains in recall but some slight decrease in precision for the yes values.
- Very slight gains in the AUC values.
- True positives incrased but so did false positives. True negs decreased while the false negs increased.
- OVERALL: BETTER THAN RF 2. BEST SO FAR.

### AdaBoost
#### Experiment 1
**GOAL:** Again getting a baseline for this modeling methodology at first. Default inputs. 

In [14]:
## Runnin defaults 
ab1 = AdaBoostClassifier(random_state=42)   
ab1.fit(X_train_encoded, y_train)
ab1_proba = ab1.predict_proba(X_test_encoded)[:, 1]
y_pred = ab1.predict(X_test_encoded)


print("=== AdaBoost Baseline (defaults) ===")
print(classification_report(y_test, y_pred, target_names=['no','yes']))

print("ROC-AUC :", round(metrics.roc_auc_score(y_test, ab1_proba), 4))
print("PR-AUC  :", round(metrics.average_precision_score(y_test, ab1_proba), 4))

tn, fp, fn, tp = metrics.confusion_matrix(y_test, y_pred).ravel()
print("TP:", tp, "FP:", fp, "FN:", fn, "TN:", tn)

## Adding to the dict 
running_results.update(
    {"adaboost1":{
        "prob":ab1_proba,
        "y_pred":y_pred,       
    }
    })


=== AdaBoost Baseline (defaults) ===
              precision    recall  f1-score   support

          no       0.90      0.99      0.94      7833
         yes       0.64      0.18      0.28      1051

    accuracy                           0.89      8884
   macro avg       0.77      0.58      0.61      8884
weighted avg       0.87      0.89      0.86      8884

ROC-AUC : 0.767
PR-AUC  : 0.4153
TP: 190 FP: 105 FN: 861 TN: 7728


**AdaBoost Experiment 1 Notes**
- This method did the best for precision score on the first defa...ever, the recall values here are super low for the 'yes' values.
- The ROC-AUC & PR-AUC is lower than the all of the random forest experiments
- The confusiion matrix is unimpressive compared to all of the random forest experiments
- OVERALL: WORSE THAN RF 3, CURRENT BEST.

#### Experiment 2
**GOAL:** Adjusting the inputs to better tune the model to better results. Limiting depth of trees to 5 and increasing the learning rate by 50%. 

In [15]:
## Runnin defaults 
tree_obj = DecisionTreeClassifier(max_depth=5, random_state=42) 
ab2 = AdaBoostClassifier(random_state=42,
                        estimator= tree_obj ,
                        n_estimators=100,
                        learning_rate = 1.5)   
ab2.fit(X_train_encoded, y_train)
ab2_proba = ab2.predict_proba(X_test_encoded)[:, 1]
y_pred = ab2.predict(X_test_encoded)

print("=== AdaBoost Baseline (defaults) ===")
print(classification_report(y_test, y_pred, target_names=['no','yes']))

print("ROC-AUC :", round(metrics.roc_auc_score(y_test, ab2_proba), 4))
print("PR-AUC  :", round(metrics.average_precision_score(y_test, ab2_proba), 4))

tn, fp, fn, tp = metrics.confusion_matrix(y_test, y_pred).ravel()
print("TP:", tp, "FP:", fp, "FN:", fn, "TN:", tn)

## Adding to the dict 
running_results.update(
    {"adaboost2":{
        "prob":ab2_proba,
        "y_pred":y_pred,       
    }
    })

=== AdaBoost Baseline (defaults) ===
              precision    recall  f1-score   support

          no       0.91      0.97      0.94      7833
         yes       0.54      0.28      0.37      1051

    accuracy                           0.89      8884
   macro avg       0.72      0.62      0.65      8884
weighted avg       0.87      0.89      0.87      8884

ROC-AUC : 0.7721
PR-AUC  : 0.3874
TP: 291 FP: 250 FN: 760 TN: 7583


**AdaBoost Experiment 2 Notes**
- Precision decreased for yes values when compared to experiment 1. The recall value for yes values did increase, yeilding a vbetter F1 score. 
- The PR- AUC decreased with these changes.
  False positives increased a bit, however the true positive increase seems to out weigh this.
- False negatives decreased compared to the first experiment.
- OVERALL: WORSE THAN RF 3, CURRENTLY OVERALL BEST.

#### Experiment 3
**GOAL:** Again increasing the depth of trees a bit and the learning rate to see if this improves the model, as the second experiment's shifts were unimpressive.

In [16]:
## Runnin defaults 
tree_obj = DecisionTreeClassifier(max_depth=10, random_state=42) 
ab3 = AdaBoostClassifier(random_state=42,
                        estimator= tree_obj ,
                        n_estimators=100,
                        learning_rate = 2)   
ab3.fit(X_train_encoded, y_train)
ab3_proba = ab3.predict_proba(X_test_encoded)[:, 1]
y_pred = ab3.predict(X_test_encoded)

print("=== AdaBoost Baseline (defaults) ===")
print(classification_report(y_test, y_pred, target_names=['no','yes']))
print("ROC-AUC :", round(metrics.roc_auc_score(y_test, ab3_proba), 4))
print("PR-AUC  :", round(metrics.average_precision_score(y_test, ab3_proba), 4))

tn, fp, fn, tp = metrics.confusion_matrix(y_test, y_pred).ravel()
print("TP:", tp, "FP:", fp, "FN:", fn, "TN:", tn)

## Adding to the dict 
running_results.update(
    {"adaboost3":{
        "prob":ab3_proba,
        "y_pred":y_pred,       
    }
    })

=== AdaBoost Baseline (defaults) ===
              precision    recall  f1-score   support

          no       0.90      0.80      0.85      7833
         yes       0.20      0.37      0.26      1051

    accuracy                           0.75      8884
   macro avg       0.55      0.58      0.55      8884
weighted avg       0.82      0.75      0.78      8884

ROC-AUC : 0.6406
PR-AUC  : 0.2746
TP: 389 FP: 1596 FN: 662 TN: 6237


**AdaBoost Experiment 3 Notes**
- True positives did increase, however, the False Positives increased drastically. The false negatvies decreased a bit.
- PR-AUC decreased. Worse than the second experiment.
- OVERALL: WORSE THAN RF 3 CURRENT BEST.


#### Experiment 4
**GOAL:** Trying to better tune the model. I will keep the depth the same, however i will be lowering the learning rate and increasing the number of estimators in order to try and get better reuslts. 

In [17]:
## Runnin defaults 
tree_obj = DecisionTreeClassifier(max_depth=10, random_state=42) 
ab4 = AdaBoostClassifier(random_state=42,
                        estimator= tree_obj ,
                        n_estimators=300,
                        learning_rate = 1.25)   
ab4.fit(X_train_encoded, y_train)
ab4_proba = ab4.predict_proba(X_test_encoded)[:, 1]
y_pred = ab4.predict(X_test_encoded)

print("=== AdaBoost Baseline (defaults) ===")
print(classification_report(y_test, y_pred, target_names=['no','yes']))
print("ROC-AUC :", round(metrics.roc_auc_score(y_test, ab4_proba), 4))
print("PR-AUC  :", round(metrics.average_precision_score(y_test, ab4_proba), 4))

tn, fp, fn, tp = metrics.confusion_matrix(y_test, y_pred).ravel()
print("TP:", tp, "FP:", fp, "FN:", fn, "TN:", tn)

## Adding to the dict 
running_results.update(
    {"adaboost4":{
        "prob":ab4_proba,
        "y_pred":y_pred,       
    }
    })

=== AdaBoost Baseline (defaults) ===
              precision    recall  f1-score   support

          no       0.90      0.98      0.94      7833
         yes       0.56      0.23      0.33      1051

    accuracy                           0.89      8884
   macro avg       0.73      0.60      0.63      8884
weighted avg       0.86      0.89      0.87      8884

ROC-AUC : 0.7636
PR-AUC  : 0.3879
TP: 242 FP: 191 FN: 809 TN: 7642


**AdaBoost Experiment 4 Notes**
- False Negatives increaed. True Posuitives decreased. 
- PR-AUC increased from Experiment 3, as did ROC-AUC. 
- Precison increased a decent amount, but recal fell.
- OVERALL: WORSE THAN RF 3, CURRENT BEST. 

### RESULTS WORK

In [18]:
rows = []

for model_name, vals in running_results.items():
    prob = np.array(vals["prob"])
    y_pred = np.array(vals["y_pred"]).ravel()
    if prob.ndim == 2 and prob.shape[1] == 2:
        prob = prob[:, 1]
    else:
        prob = prob.ravel()
    classfctn_rpt = classification_report(y_test, y_pred, target_names=['no','yes'],output_dict=True)
    roc_auc = round(metrics.roc_auc_score(y_test, prob), 4)
    pr_auc = round(metrics.average_precision_score(y_test, prob), 4)
    tn, fp, fn, tp = metrics.confusion_matrix(y_test, y_pred).ravel()

    rows.append({
        "model": model_name,
        "roc_auc": roc_auc,
        "pr_auc": pr_auc,
        "precision_pos": classfctn_rpt["yes"]["precision"],
        "recall_pos": classfctn_rpt["yes"]["recall"],
        "f1_pos": classfctn_rpt["yes"]["f1-score"],
        "precision_neg": classfctn_rpt["no"]["precision"],
        "recall_neg": classfctn_rpt["no"]["recall"],
        "f1_neg": classfctn_rpt["no"]["f1-score"],
        "macro_precision": classfctn_rpt["macro avg"]["precision"],
        "macro_recall": classfctn_rpt["macro avg"]["recall"],
        "macro_f1": classfctn_rpt["macro avg"]["f1-score"],
        "weighted_precision": classfctn_rpt["weighted avg"]["precision"],
        "weighted_recall": classfctn_rpt["weighted avg"]["recall"],
        "weighted_f1": classfctn_rpt["weighted avg"]["f1-score"],
        "tn": tn, "fp": fp, "fn": fn, "tp": tp
    })

results_df = pd.DataFrame(rows).set_index("model").round(4)
results_df  = results_df.reset_index()

In [25]:
results_df.sort_values(by =["pr_auc"], ascending=False)

Unnamed: 0,model,roc_auc,pr_auc,precision_pos,recall_pos,f1_pos,precision_neg,recall_neg,f1_neg,macro_precision,macro_recall,macro_f1,weighted_precision,weighted_recall,weighted_f1,tn,fp,fn,tp
4,rand_forest3,0.8017,0.4422,0.4509,0.5243,0.4848,0.9347,0.9143,0.9244,0.6928,0.7193,0.7046,0.8775,0.8682,0.8724,7162,671,500,551
3,rand_forest2,0.7958,0.435,0.4971,0.4139,0.4517,0.9231,0.9438,0.9333,0.7101,0.6789,0.6925,0.8727,0.8811,0.8764,7393,440,616,435
2,rand_forest1,0.7861,0.4239,0.601,0.2407,0.3438,0.9057,0.9786,0.9407,0.7533,0.6096,0.6422,0.8697,0.8913,0.8701,7665,168,798,253
5,adaboost1,0.767,0.4153,0.6441,0.1808,0.2823,0.8998,0.9866,0.9412,0.7719,0.5837,0.6117,0.8695,0.8913,0.8632,7728,105,861,190
1,dec_tree2,0.7744,0.404,0.3488,0.5937,0.4394,0.9398,0.8513,0.8934,0.6443,0.7225,0.6664,0.8699,0.8208,0.8397,6668,1165,427,624
8,adaboost4,0.7636,0.3879,0.5589,0.2303,0.3261,0.9043,0.9756,0.9386,0.7316,0.6029,0.6324,0.8634,0.8874,0.8661,7642,191,809,242
6,adaboost2,0.7721,0.3874,0.5379,0.2769,0.3656,0.9089,0.9681,0.9376,0.7234,0.6225,0.6516,0.865,0.8863,0.8699,7583,250,760,291
7,adaboost3,0.6406,0.2746,0.196,0.3701,0.2563,0.904,0.7962,0.8467,0.55,0.5832,0.5515,0.8203,0.7458,0.7769,6237,1596,662,389
0,dec_tree1,0.6106,0.1746,0.2889,0.3302,0.3082,0.9084,0.891,0.8996,0.5986,0.6106,0.6039,0.8351,0.8246,0.8296,6979,854,704,347


In [35]:
# results_df[['model', 'pr_auc','recall_pos','precision_pos','f1_pos','fn','roc_auc']][results_df["model"].astype(str).str.contains("ada")].sort_values(by =["model"], ascending=True)


In [21]:
### From this DF im taking my tiop 3 models based on my chosen metrics (PR-AUC) and False negatives. 
top_3 = results_df.sort_values(by =["pr_auc"], ascending=False)[:3]
top_3

# The top 3 models were all the random forest models. Based on the pr_auc score.

Unnamed: 0,model,roc_auc,pr_auc,precision_pos,recall_pos,f1_pos,precision_neg,recall_neg,f1_neg,macro_precision,macro_recall,macro_f1,weighted_precision,weighted_recall,weighted_f1,tn,fp,fn,tp
4,rand_forest3,0.8017,0.4422,0.4509,0.5243,0.4848,0.9347,0.9143,0.9244,0.6928,0.7193,0.7046,0.8775,0.8682,0.8724,7162,671,500,551
3,rand_forest2,0.7958,0.435,0.4971,0.4139,0.4517,0.9231,0.9438,0.9333,0.7101,0.6789,0.6925,0.8727,0.8811,0.8764,7393,440,616,435
2,rand_forest1,0.7861,0.4239,0.601,0.2407,0.3438,0.9057,0.9786,0.9407,0.7533,0.6096,0.6422,0.8697,0.8913,0.8701,7665,168,798,253


In [24]:
### If looking at the best model per category, 
best_dt  = results_df[results_df['model'].str.startswith('dec_tree')].sort_values('pr_auc', ascending=False).iloc[[0]]
best_rf  = results_df[results_df['model'].str.startswith('rand_forest')].sort_values('pr_auc', ascending=False).iloc[[0]]
best_ab  = results_df[results_df['model'].str.startswith('adaboost')].sort_values('pr_auc', ascending=False).iloc[[0]]

best_of_each = pd.concat([best_rf,best_dt,best_ab])
best_of_each
## The third random forest experiemtn was the best,  the first adaboost model was the best, and the second decision tree experiment was the best. 

Unnamed: 0,model,roc_auc,pr_auc,precision_pos,recall_pos,f1_pos,precision_neg,recall_neg,f1_neg,macro_precision,macro_recall,macro_f1,weighted_precision,weighted_recall,weighted_f1,tn,fp,fn,tp
4,rand_forest3,0.8017,0.4422,0.4509,0.5243,0.4848,0.9347,0.9143,0.9244,0.6928,0.7193,0.7046,0.8775,0.8682,0.8724,7162,671,500,551
1,dec_tree2,0.7744,0.404,0.3488,0.5937,0.4394,0.9398,0.8513,0.8934,0.6443,0.7225,0.6664,0.8699,0.8208,0.8397,6668,1165,427,624
5,adaboost1,0.767,0.4153,0.6441,0.1808,0.2823,0.8998,0.9866,0.9412,0.7719,0.5837,0.6117,0.8695,0.8913,0.8632,7728,105,861,190


## Overall Best MOdel was the Third Experiement for Random Forest Methodology. It scores thebest for the PR-AUC. 