# Experiment - Data Preparation Techniques
* StelllarAlgo Data Science
* Ryan Kazmerik, Grant Donst, Peter Morrison
* Mar 07, 2023

This experiment aims to test the performance of three different MacBooks (i7, i9, M1) when used to train our event propensity model.

In [1]:
import pandas as pd

from datetime import datetime, timedelta
from data_sci_toolkit.aws_tools import redshift_tools
from pycaret.classification import *

In [2]:
df = pd.read_parquet("./data/macbook-perf-dataset.parquet")

In [3]:
setup(
    data= df, 
    target="did_purchase", 
    train_size = 0.85,
    data_split_shuffle=True,
    ignore_features=[
        "count_merchowned",
        "daysout",
        "dimcustomermasterid",
        "eventdate",
        "inmarket",
        "mindaysout",
        "maxdaysout"
    ],
    silent=True,
    verbose=False,
    numeric_features=[
        "distancetovenue",
        "events_purchased",
        "frequency_eventday",
        "frequency_opponent",
        "frequency_eventtime",
        "tenure"
    ]
);

In [4]:
start = datetime.now()

model_matrix = compare_models(
    fold= 10,
    include= ["dt","lightgbm","lr","xgboost","rf"]
)

end = datetime.now()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
dt,Decision Tree Classifier,0.7931,0.8179,0.6927,0.8668,0.7701,0.5862,0.5984,0.524
rf,Random Forest Classifier,0.7887,0.8608,0.7164,0.8377,0.7723,0.5775,0.5836,15.941
xgboost,Extreme Gradient Boosting,0.6998,0.7781,0.719,0.6925,0.7055,0.3996,0.3999,3.942
lightgbm,Light Gradient Boosting Machine,0.685,0.7584,0.7159,0.6743,0.6945,0.37,0.3707,0.661
lr,Logistic Regression,0.6198,0.6718,0.5971,0.6256,0.611,0.2397,0.2399,0.409


In [6]:
print(f"EXECUTION TIME: {end-start}")

EXECUTION TIME: 0:03:38.121161
