In [1]:
# In R (MCLUST is primarily an R package)

library(mclust, lib.loc = Sys.getenv("R_LIBS_USER"))



Package 'mclust' version 6.1.2
Type 'citation("mclust")' for citing this R package in publications.



In [2]:
# Load your rich flow features
flow <- read.csv("flow_features.csv")

# Pick the 11 paper-perfect numeric columns
X <- flow[, c("duration_seconds", "event_count", "unique_items",
              "view_count", "addtocart_count", "transaction_count",
              "view_to_cart_ratio", "cart_to_purchase_ratio",
              "events_per_minute", "hour_of_day", "day_of_week")]

# Scale (Mclust loves standardized data)
X_scaled <- scale(X)

# FULL MCLUST with automatic BIC selection
cat("Fitting Mclust — this takes ~90 seconds on 1.7M rows...\n")

# --- MODIFIED CODE BLOCK ---
system.time({
  # Set verbose = 3 (or higher, e.g., 5 or 9) to see more details
  model <- Mclust(X_scaled, verbose = 3)
})
# --------------------------

# Show results
cat("\nBEST MODEL FOUND:\n")
summary(model)

Fitting Mclust — this takes ~90 seconds on 1.7M rows...
fitting ...


   user  system elapsed 
1454.10   22.18 1595.49 


BEST MODEL FOUND:


---------------------------------------------------- 
Gaussian finite mixture model fitted by EM algorithm 
---------------------------------------------------- 

Mclust XXX (ellipsoidal multivariate normal) model with 1 component: 

 log-likelihood       n df      BIC      ICL
       20521268 1761675 77 41041428 41041428

Clustering table:
      1 
1761675 

In [3]:
# 1. BIC plot (your paper figure)
png("BIC_plot.png", width=800, height=600)
plot(model, what = "BIC", main = "BIC: Optimal Clusters & Covariance")
dev.off()

# 2. Classification map
png("Clusters_map.png", width=800, height=600)
plot(model, what = "classification", main = "User Flow Tribes")
dev.off()

# 3. Add cluster column
flow$cluster <- model$classification

# 4. Save everything
write.csv(flow, "flow_features_clustered.csv", row.names = FALSE)

# 5. Victory message
cat("DONE!\n")
cat("Clusters found :", model$G, "\n")
cat("Best model      :", model$modelName, "\n")
cat("Files created   : flow_features_clustered.csv, BIC_plot.png, Clusters_map.png\n")
cat("Open the PNGs → paste into your paper → done!\n")

DONE!
Clusters found : 1 
Best model      : XXX 
Files created   : flow_features_clustered.csv, BIC_plot.png, Clusters_map.png
Open the PNGs → paste into your paper → done!
