# KMeans Clustering (From Scratch)

## 1. Import Required Libraries
In this section, we import the necessary libraries for loading data, applying KMeans, and visualizing the results. We also import the `KMeans` class from the `K_Means.py` file.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_blobs  # For generating synthetic data
import sys
import os

# Add path to access K_Means.py
sys.path.append(os.path.abspath('../src/models'))
from K_Means import KMeans

# Plotting settings
%matplotlib inline
plt.style.use('seaborn')

## 2. Generate or Load Data
We will generate synthetic data using `make_blobs` for demonstration purposes. If you have your own data (e.g., `placeholder_1.csv` in the repo), you can replace this section with loading your CSV file.

In [None]:

df = pd.read_csv('../data/placeholder_1.csv')
X = df[['Feature_1', 'Feature_2']].values  # Adjust column names as needed

## 3. Apply KMeans (From Scratch)
We will use the `KMeans` class from  code to train the model on the data.

In [None]:
# Create KMeans model
kmeans = KMeans(n_clusters=4, n_init=10, max_iter=300, tol=1e-4, verbose=True, random_state=42)

# Train the model
kmeans.fit(X)

# Assign cluster labels to the data
df['Cluster'] = kmeans.labels_

# Display the first 5 rows with cluster labels
df.head()

## 4. Visualize the Results
We will plot the data points colored by their assigned clusters and mark the centroids.

In [None]:
# Plot the clusters
plt.figure(figsize=(8, 6))
sns.scatterplot(x='Feature_1', y='Feature_2', hue='Cluster', data=df, palette='Set1', s=100)
plt.scatter(kmeans.centroids_[:, 0], kmeans.centroids_[:, 1], s=300, c='black', marker='X', label='Centroids')
plt.title('KMeans Clustering Results (From Scratch)')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()

## 5. Plot Loss History
We will plot the loss history (inertia) over iterations for the best run to observe the convergence of the algorithm.

In [None]:
# Get loss history
loss_history = kmeans.get_loss_history()

# Plot loss history
plt.figure(figsize=(8, 5))
plt.plot(loss_history, marker='o')
plt.title('Loss History (Inertia) Over Iterations')
plt.xlabel('Iteration')
plt.ylabel('Inertia')
plt.grid(True)
plt.show()

## 6. Conclusion
In this notebook, we implemented the KMeans clustering algorithm from scratch using the code in `K_Means.py`. We generated synthetic data, trained the model, visualized the clustering results, and plotted the loss history to monitor the algorithm's convergence.