
# Week 9: Clustering Models - Practice Exercises (Student Version)

This notebook contains hands-on exercises designed to help you practice and apply clustering techniques.  
You will encounter **partially completed code cells** that require your input to complete the tasks.  
Instructions and hints are provided to guide you through each step.

---



## Exercise 1: k-Means Clustering on Iris Dataset

### Task:

1. Load the **Iris dataset**.
2. Apply **k-Means clustering** with 3 clusters.
3. Visualize the clusters (hint: scatter plot using two features).
4. Use the **Elbow method** to determine the optimal number of clusters.
5. Evaluate clustering performance using **Silhouette Score**.

### Starter Code:


In [None]:

from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Step 1: Load the dataset
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)

# Step 2: Apply k-Means clustering
# Fill in the code below
kmeans = KMeans(n_clusters=____, random_state=42)
labels = kmeans.fit_predict(X)

# Step 3: Visualize clusters
sns.scatterplot(x=X.iloc[:, 0], y=X.iloc[:, 1], hue=labels, palette='deep')
plt.title('k-Means Clustering on Iris Dataset')
plt.show()

# Step 4: Use the Elbow method to determine optimal k
inertia = []
k_range = range(1, 10)
for k in k_range:
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(X)
    inertia.append(kmeans.inertia_)

plt.plot(k_range, inertia, 'bo-')
plt.xlabel('Number of Clusters (k)')
plt.ylabel('Inertia')
plt.title('Elbow Method for Optimal k')
plt.show()

# Step 5: Calculate Silhouette Score
print("Silhouette Score:", silhouette_score(X, labels))



## Exercise 2: DBSCAN on a Noisy Dataset

### Task:

1. Generate a synthetic dataset using `make_moons` with noise.
2. Apply **DBSCAN clustering**.
3. Visualize the clusters and noise points.
4. Experiment by changing `eps` and `min_samples` values.

### Starter Code:


In [None]:

from sklearn.datasets import make_moons
from sklearn.cluster import DBSCAN

# Step 1: Generate synthetic dataset
X, _ = make_moons(n_samples=300, noise=0.05, random_state=0)

# Step 2: Apply DBSCAN clustering
# Fill in the missing parameters
dbscan = DBSCAN(eps=____, min_samples=____)
labels = dbscan.fit_predict(X)

# Step 3: Visualize clusters
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.title('DBSCAN Clustering on Noisy Dataset')
plt.show()



## Exercise 3: Dimensionality Reduction + Clustering

### Task:

1. Load the **Breast Cancer dataset**.
2. Apply **PCA** to reduce dimensions to 2.
3. Apply **k-Means clustering** on reduced data.
4. Visualize clusters.
5. Evaluate using **Silhouette Score**.

### Starter Code:


In [None]:

from sklearn.datasets import load_breast_cancer
from sklearn.decomposition import PCA

# Step 1: Load Breast Cancer dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)

# Step 2: Apply PCA
pca = PCA(n_components=____)
X_pca = pca.fit_transform(X)

# Step 3: Apply k-Means clustering
kmeans = KMeans(n_clusters=____, random_state=42)
labels = kmeans.fit_predict(X_pca)

# Step 4: Visualize
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=labels, cmap='coolwarm')
plt.title('k-Means Clustering after PCA')
plt.show()

# Step 5: Evaluate clustering
from sklearn.metrics import silhouette_score
print("Silhouette Score:", silhouette_score(X_pca, labels))
