# Experiment - Data Preparation Techniques
* StelllarAlgo Data Science
* Ryan Kazmerik, Grant Donst, Peter Morrison
* Mar 07, 2023

This experiment aims to test the performance of three different MacBooks (i7, i9, M1) when used to train our event propensity model.

In [5]:
import pandas as pd

from datetime import datetime, timedelta
from data_sci_toolkit.aws_tools import redshift_tools
from pycaret.classification import *

In [6]:
df = pd.read_parquet("./data/macbook-perf-dataset.parquet")

In [7]:
setup(
    data= df, 
    target="did_purchase", 
    train_size = 0.85,
    data_split_shuffle=True,
    ignore_features=[
        "count_merchowned",
        "daysout",
        "dimcustomermasterid",
        "eventdate",
        "inmarket",
        "mindaysout",
        "maxdaysout"
    ],
    silent=True,
    verbose=False,
    numeric_features=[
        "distancetovenue",
        "events_purchased",
        "frequency_eventday",
        "frequency_opponent",
        "frequency_eventtime",
        "tenure"
    ]
);

In [8]:
start = datetime.now()

model_matrix = compare_models(
    fold= 10,
    include= ["dt","lightgbm","lr","xgboost","rf"]
)

end = datetime.now()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
dt,Decision Tree Classifier,0.7932,0.8182,0.6931,0.8664,0.7701,0.5863,0.5984,1.046
rf,Random Forest Classifier,0.7887,0.8609,0.717,0.837,0.7724,0.5774,0.5834,49.401
xgboost,Extreme Gradient Boosting,0.7001,0.7783,0.7178,0.6931,0.7052,0.4001,0.4004,11.92
lightgbm,Light Gradient Boosting Machine,0.6853,0.7587,0.7135,0.6753,0.6939,0.3706,0.3712,1.38
lr,Logistic Regression,0.6201,0.6722,0.5955,0.6262,0.6104,0.2401,0.2404,1.199


In [11]:
print(f"EXECUTION TIME: {end-start}")

EXECUTION TIME: 0:10:53.726453
