## Optimizing the Mall Customer Segmentation Model

The efficiency of the model may be improved by employing strategies like efficient numerics representation and efficient model representation. These strategies will optimize memory usage and training time without significantly impacting model performance.

First, I will import the original model to benchmark the model using the following metrics:
- training time
- prediction speed
- performance
- memory usage

In [55]:
# The following cell is copied and pasted from the D804_PA_Model_Customer_Segmentation.ipynb notebook
import pandas as pd
df = pd.read_csv('Mall_Customers.csv')

# Encode gender column
df['Gender'] = df['Gender'].astype('category').cat.codes
# Male = 1, Female = 0
# Dropping the ID column as that has little impact on patterns
df.drop(['CustomerID'], axis=1, inplace=True)

# Scale the data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(df)
print("Data type: ", X_scaled.dtype, " | Memory Usage: ", X_scaled.nbytes, " bytes") # Get memory usage

from sklearn.mixture import GaussianMixture
import time

# Base model
base_model = GaussianMixture(n_components=5, covariance_type='full', 
                        reg_covar=0.0001, n_init=5, random_state=42)

# Measure training time
train_base_start = time.time()
base_model.fit(X_scaled)
train_base_end = time.time()
print("Training time: ", train_base_end - train_base_start, " s")

# Measure prediction speed
pred_base_start = time.time()
base_labels = base_model.predict(X_scaled)
base_probas = base_model.predict_proba(X_scaled)
pred_base_end = time.time()
print("Prediction speed: ", pred_base_end - pred_base_start, " s")

# Performance
print("Log-Likelihood: ", base_model.score(X_scaled))
print("AIC: ", base_model.aic(X_scaled))
print("BIC: ", base_model.bic(X_scaled))

Data type:  float64  | Memory Usage:  6400  bytes
Training time:  0.08642077445983887  s
Prediction speed:  0.0006823539733886719  s
Log-Likelihood:  -0.6597161202802081
AIC:  411.88644811208326
BIC:  655.961933236638


### Baseline Benchmarks
The baseline model yields the following benchmark metrics:
- training time: 0.086s
- prediction speed: 0.0007s
- performance: Log-likelihood -0.66, AIC 411.89, BIC 655.96
- memory usage: scaled data using 6400 bytes

## Model Optimization Process

To improve the benchmarks of the current working model, I will employ the following optimization strategies:
- efficient numerics representation by changing data types
- modify learning dynamics by changing hyperparameters

In [61]:
# Efficient numerics by changing the float64 data types into float32
X_scaled = X_scaled.astype("float16")
print("Data type: ", X_scaled.dtype, " | Memory Usage: ", X_scaled.nbytes, " bytes") # Get memory usage

# Modifying model hyperparameters
opt_model = GaussianMixture(n_components=5, covariance_type='full', 
                        reg_covar=0.00007, n_init=5, random_state=42,
                        max_iter=90, tol=0.001)
# Added new hyperparameters: max_iter and tol

# Measure training time
train_opt_start = time.time()
opt_model.fit(X_scaled)
train_opt_end = time.time()
print("Training time: ", train_opt_end - train_opt_start, " s")

# Measure prediction speed
pred_opt_start = time.time()
opt_labels = opt_model.predict(X_scaled)
opt_probas = opt_model.predict_proba(X_scaled)
pred_opt_end = time.time()
print("Prediction speed: ", pred_opt_end - pred_opt_start, " s")

# Performance
print("Log-Likelihood: ", opt_model.score(X_scaled))
print("AIC: ", opt_model.aic(X_scaled))
print("BIC: ", opt_model.bic(X_scaled))

Data type:  float16  | Memory Usage:  1600  bytes
Training time:  0.07280659675598145  s
Prediction speed:  0.0007600784301757812  s
Log-Likelihood:  -0.4815075219378944
AIC:  340.6030087751578
BIC:  584.6784938997125


In [None]:
# Compare cluster assignment confidence
base_conf = base_probas.max(axis=1)
opt_conf = opt_probas.max(axis=1)

print("Baseline Cluster Assignment Confidence")
print(pd.Series(base_conf).describe())

print("-"*50)

print("Optimized Cluster Assignment Confidence")
print(pd.Series(opt_conf).describe())

Baseline Cluster Assignment Confidence
count    200.000000
mean       0.981629
std        0.070013
min        0.504017
25%        0.997215
50%        0.999951
75%        1.000000
max        1.000000
dtype: float64
--------------------------------------------------
Optimized Cluster Assignment Confidence
count    200.000000
mean       0.981756
std        0.069346
min        0.507706
25%        0.997243
50%        0.999951
75%        1.000000
max        1.000000
dtype: float64


## Model Optimization Actions
- reduce feature data type to float16
- added new hyperparameters to the model: max_iter and tol
    - max_iter: reduce the E-M steps > faster training time
    - tol: convergence tolerance to integrate early stopping > faster training time
- modified existing hyperparameter: reg_covar
    - prevents oversmoothing of covariance matrices > improve model stability

## Evaluating Optimized Model
- training time: 0.073s
- prediction speed: 0.0008s
- performance: Log-likelihood -0.48, AIC 340.60, BIC 584.68
- memory usage: scaled data using 1600 bytes

As seen by the results, the optimized model yielded much faster training and prediction speeds and decreased memory usage. Below is a comparison of before and after metrics:
- training time: 0.086s >> 0.073s
- prediction speed: 0.0007s >> 0.0008s
- performance: 
    - log-likelihood: -0.66 >> -0.48
    - AIC: 411.89 >> 340.60
    - BIC: 655.96 >> 584.68
- memory usage: 6400 bytes >> 1600 bytes
- cluster assignment confidence remained consistent between both models, indicating stable predictions

I observe that the optimized model's training time is slightly reduced and prediction speed is similar. The model performance has significantly improved, increasing log-likelihood and reducing AIC/BIC scores, indicating better clustering performance. Furthermore, memory reduction transforms 6400 byte numerical data into 1600 bytes, significantly reducing memory usage. The increase in performance and memory usage while maintaing relatively similar speeds defines the success of model optimization.