# Experiment - Data Preparation Techniques
* StelllarAlgo Data Science
* Ryan Kazmerik, Grant Donst, Peter Morrison
* Mar 07, 2023

This experiment aims to test the performance of three different MacBooks (i7, i9, M1) when used to train our event propensity model.

In [2]:
import pandas as pd

from datetime import datetime, timedelta
from data_sci_toolkit.aws_tools import redshift_tools
from pycaret.classification import *



In [3]:
df = pd.read_parquet("./data/macbook-perf-dataset.parquet")

In [4]:
setup(
    data= df, 
    target="did_purchase", 
    train_size = 0.85,
    data_split_shuffle=True,
    ignore_features=[
        "count_merchowned",
        "daysout",
        "dimcustomermasterid",
        "eventdate",
        "inmarket",
        "mindaysout",
        "maxdaysout"
    ],
    silent=True,
    verbose=False,
    numeric_features=[
        "distancetovenue",
        "events_purchased",
        "frequency_eventday",
        "frequency_opponent",
        "frequency_eventtime",
        "tenure"
    ]
);

In [5]:
start = datetime.now()

model_matrix = compare_models(
    fold= 10,
    include= ["dt","lightgbm","lr","xgboost","rf"]
)

end = datetime.now()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
dt,Decision Tree Classifier,0.7939,0.8188,0.6941,0.8677,0.7712,0.5879,0.6001,1.206
rf,Random Forest Classifier,0.789,0.8614,0.717,0.8379,0.7728,0.578,0.5841,38.576
xgboost,Extreme Gradient Boosting,0.6992,0.7775,0.7201,0.6916,0.7056,0.3985,0.3988,10.597
lightgbm,Light Gradient Boosting Machine,0.6851,0.7584,0.7144,0.6752,0.6942,0.3701,0.3708,1.563
lr,Logistic Regression,0.62,0.6724,0.5989,0.6257,0.612,0.24,0.2403,1.178


In [12]:
print(f"EXECUTION TIME: {delta}")

EXECUTION TIME: 0:08:56.243204
