# KMeans Clustering with Scikit-learn Wrapper (KMeansSK)

## 1. Import Required Libraries
In this section, we import the necessary libraries for loading data, applying the custom `KMeansSK` class, and visualizing the results. We also import the `KMeansSK` class from the appropriate module.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_blobs  # For generating synthetic data
import sys
import os

# Add path to access K_Means.py or the module containing KMeansSK
sys.path.append(os.path.abspath('../src/models'))  # Adjust path as needed
from K_Means import KMeansSK  # Assuming KMeansSK is in K_Means.py

# Plotting settings
%matplotlib inline
plt.style.use('seaborn')

## 2. Generate or Load Data
We will generate synthetic data using `make_blobs` for demonstration purposes. If you have your own data (e.g., `placeholder_1.csv` in the repo), you can replace this section with loading your CSV file.

In [None]:
df = pd.read_csv('../data/placeholder_1.csv')
X = df[['Feature_1', 'Feature_2']].values  # Adjust column names as needed

## 3. Apply KMeansSK (Scikit-learn Wrapper)
We will use the `KMeansSK` class to train the model on the data with custom inertia tracking and optional early stopping.

In [None]:
# Create KMeansSK model with early stopping enabled
kmeans_sk = KMeansSK(
    n_clusters=4,
    init='k-means++',
    n_init=10,
    max_iter=300,
    early_stopping=True,
    n_iter_no_change=10,
    tol=1e-4,
    verbose=True,
    random_state=42
)

# Train the model
kmeans_sk.fit(X)

# Assign cluster labels to the data
df['Cluster'] = kmeans_sk.predict(X)

# Display the first 5 rows with cluster labels
df.head()

## 4. Visualize the Results
We will plot the data points colored by their assigned clusters and mark the centroids (accessible via the underlying scikit-learn model).

In [None]:
# Plot the clusters
plt.figure(figsize=(8, 6))
sns.scatterplot(x='Feature_1', y='Feature_2', hue='Cluster', data=df, palette='Set1', s=100)
plt.scatter(kmeans_sk.model.cluster_centers_[:, 0], kmeans_sk.model.cluster_centers_[:, 1], 
            s=300, c='black', marker='X', label='Centroids')
plt.title('KMeansSK Clustering Results')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()

## 5. Plot Inertia History
We will plot the inertia history tracked by `KMeansSK` to observe the convergence of the algorithm, including the effect of early stopping if triggered.

In [None]:
# Get inertia history
inertia_history = kmeans_sk.get_inertia_history()

# Plot inertia history
plt.figure(figsize=(8, 5))
plt.plot(inertia_history, marker='o')
plt.title('Inertia History Over Iterations (KMeansSK)')
plt.xlabel('Iteration')
plt.ylabel('Inertia')
plt.grid(True)
plt.show()

## 6. Conclusion
In this notebook, we implemented the `KMeansSK` class, a wrapper around scikit-learn's KMeans, with custom inertia tracking and optional early stopping. We generated synthetic data, trained the model, visualized the clustering results, and plotted the inertia history to monitor convergence.