> **Question 1:** What is the difference between K-Means and
> Hierarchical Clustering? Provide a use case for each.
>
> **Answer 1 :-**  
> **K-Means Clustering**  
> **Definition:**  
> ●​ A **partitioning method** that divides data into *K* predefined
> clusters.​ ●​ Each cluster is represented by a **centroid** (mean
> point).​
>
> **Key Points:**  
> ●​ Requires the number of clusters (*K*) as input.​  
> ●​ Works well with large datasets.​  
> ●​ Sensitive to initial centroid placement and outliers.​  
> ●​ Produces **spherical clusters** (best when clusters are convex and
> of similar size).​  
> ●​ Fast and efficient (O(n)).​
>
> **Use Case:**  
> ●​ **Customer Segmentation in Marketing​**  
> ○​ Group customers based on purchasing behavior into clusters like
> “bargain buyers,” “loyal customers,” and “premium spenders.”
> **Hierarchical Clustering**  
> **Definition:**  
> ●​ A **tree-based method** that builds a hierarchy (dendrogram) of
> clusters.​
>
> ●​ Two types:​  
> ○​ **Agglomerative** (bottom-up: each point starts as a cluster, then
> merges).​  
> ○​ **Divisive** (top-down: one big cluster splits into smaller ones).​
>
> **Key Points:**  
> ●​ Doesn’t require specifying number of clusters initially (can be
> decided by cutting the dendrogram).​  
> ●​ Works well with **small to medium datasets**.​  
> ●​ Captures nested clusters and hierarchy.​  
> ●​ More computationally expensive (O(n²) or worse).​  
> ●​ Not very efficient for very large datasets.​
>
> **Use Case:**  
> ●​ **Gene Expression Analysis in Biology​**  
> ○​ Group genes or proteins based on similarity in expression levels to
> study relationships between them.
>
> **Question 2:** Explain the purpose of the Silhouette Score in
> evaluating clustering algorithms.
>
> **Answer 2 :-**  
> **Silhouette Score – Purpose**  
> The **Silhouette Score** is a metric used to **evaluate the quality of
> clusters** formed by a clustering algorithm.
>
> It measures how well each data point fits within its assigned cluster
> compared to other clusters.
>
> **Formula**
>
> For each data point *i*:
>
> ●​ **a(i):** Average distance of point *i* to all other points in the
> same cluster (intra-cluster distance).​
>
> ●​ **b(i):** Minimum average distance of point *i* to all points in
> other clusters (nearest-cluster distance).​

Silhouette Score(i) = b(i)−a(i)/max⁡(a(i),b(i))

> **Interpretation**
>
> **●​ +1 → Perfect clustering (well-separated, correctly assigned).​**
>
> **●​ 0 → Point is on/near a cluster boundary.​**
>
> **●​ -1 → Wrong clustering (closer to another cluster than its own).**
>
> **Purpose**
>
> ●​ Helps determine how **cohesive** (tight within clusters) and
> **separated** (distinct between clusters) the clusters are.​
>
> ●​ Provides a way to **compare different clustering algorithms** (e.g.,
> K-Means vs Hierarchical).​
>
> ●​ Helps in choosing the **optimal number of clusters (K)**.​
>
> **Example**
>
> ●​ If K=3 gives an average silhouette score of **0.65** and K=5 gives
> **0.35**, then K=3 is a better choice.
>
> **Question 3:** What are the core parameters of DBSCAN, and how do
> they influence the clustering process?
>
> **Answer 3 :-**
>
> **Core Parameters of DBSCAN**  
> DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
> mainly relies on **two parameters**:  
> **1. ε (Epsilon / eps)**  
> ●​ Defines the **radius of the neighborhood** around a point.​  
> ●​ Determines how close points should be to be considered
> **neighbors**.​
>
> **Effect:**  
> ●​ **Small ε:** Many small clusters, lots of noise points.​  
> ●​ **Large ε:** Fewer, larger clusters (may merge distinct clusters).
>
> **2. minPts (Minimum Points)**  
> ●​ Minimum number of points required within ε-radius for a region to be
> considered a **core point**.​  
> ●​ Helps define **cluster density**.​
>
> **Effect:**  
> ●​ **Small minPts:** More clusters, risk of noise being treated as
> clusters.​●​ **Large minPts:** Fewer clusters, may mark too many points
> as noise.
>
> **How These Influence Clustering**  
> ●​ A point is a **core point** if it has ≥ *minPts* within ε.​  
> ●​ A point is a **border point** if it’s within ε of a core point but
> has \< *minPts*.​
>
> ●​ A point is **noise (outlier)** if it’s neither a core nor a border
> point.​
>
> Thus:  
> ●​ **ε controls neighborhood size** → affects cluster spread.​
>
> ●​ **minPts controls density requirement** → affects how strict the
> clustering is.
>
> **Choosing Parameters**  
> ●​ **Rule of thumb for minPts:** Usually minPts ≥ dimensions + 1.​
>
> ●​ **Choosing ε:** Often determined using the **k-distance graph**
> (plot distances to k-th nearest neighbor and look for an "elbow").
>
> **Question 4:** Why is feature scaling important when applying
> clustering algorithms like K-Means and DBSCAN?
>
> **Answer 4 :-**
>
> **Why Feature Scaling Matters in Clustering**
>
> **1. Distance-Based Nature**  
> ●​ Both **K-Means** and **DBSCAN** rely on **distance metrics**
> (usually Euclidean distance) to measure similarity.​
>
> ●​ If features are on different scales, the feature with the **largest
> range** dominates the distance calculation.​
>
> Example:  
> ●​ Feature 1: Age (20–60)​
>
> ●​ Feature 2: Income (20,000–200,000)​
>
> ●​ Income will overshadow Age in distance calculation unless scaled.​
>
> **2. K-Means Impact**  
> ●​ K-Means assigns points to the nearest centroid.​  
> ●​ Without scaling, centroids will shift heavily toward high-range
> features.​●​ Scaling ensures **all features contribute equally**.​
>
> **3. DBSCAN Impact**  
> ●​ DBSCAN uses **ε (neighborhood radius)** to determine clusters.​  
> ●​ If features are not scaled, ε may be too small/large in certain
> dimensions, leading to:​  
> ○​ Wrong cluster formation​  
> ○​ More noise points​  
> ○​ Missed clusters​
>
> **4. Ensures Fair Feature Contribution**  
> ●​ Scaling prevents bias toward features with larger units (e.g.,
> kilometers vs meters, dollars vs percentages).​  
> ●​ Allows clustering to reflect **true similarity** rather than
> measurement scale.​
>
> **Common Scaling Techniques**
>
> ●​ **Standardization (Z-score):​**  
> Z = x−μ/ σ​  
> → mean = 0, std = 1.​
>
> ●​ **Min-Max Normalization:​**  
> x′= x−xmin / ⁡xmax ⁡− x min⁡  
> → values between 0 and 1.
>
> **Question 5:** What is the Elbow Method in K-Means clustering and how
> does it help determine the optimal number of clusters?
>
> **Answer 5 :-**
>
> **Elbow Method in K-Means**
>
> **1. What It Is**
>
> ●​ The **Elbow Method** is a technique used to determine the **optimal
> number** **of clusters (K)** in K-Means.​
>
> ●​ It’s based on analyzing the **Within-Cluster Sum of Squares
> (WCSS)**, also called **inertia**.​

WCSS= ∑k=1 K ∑x∈Ck∣∣x−μk∣∣2

> (where CkC_kCk​ is cluster kkk, and μk\mu_kμk​ is its centroid).
>
> **2. How It Works**
>
> 1.​ Run K-Means with different values of KKK (e.g., 1 to 10).​
>
> 2.​ Compute the **WCSS (inertia)** for each KKK.​
>
> 3.​ Plot **WCSS vs K**.​
>
> 4.​ Look for a point where the WCSS starts to **decrease slowly** —
> this point looks like an **“elbow”** in the curve.​
>
> **3. Interpretation**
>
> ●​ **Before the elbow:** Adding clusters reduces WCSS significantly
> (better clustering).​
>
> ●​ **After the elbow:** Adding clusters gives **diminishing returns**
> (not much improvement).​
>
> ●​ The **elbow point** suggests the **optimal K**.​
>
> **4. Why It’s Useful**
>
> ●​ Prevents **under-clustering** (too few clusters) and
> **over-clustering** (too many clusters).​
>
> ●​ Provides a **visual, intuitive way** to select KKK.
>
> **Dataset: Use make_blobs, make_moons, and
> sklearn.datasets.load_wine() as specified.**
>
> **Question 6:** Generate synthetic data usingmake_blobs(n_samples=300,
> centers=4), apply KMeans clustering, and visualize the results with
> cluster centers.
>
> **Answer 6 :-**  
> import matplotlib.pyplot as plt  
> from sklearn.datasets import make_blobs  
> from sklearn.cluster import KMeans  
> import numpy as np
>
> \# Step 1: Generate synthetic data  
> X, y_true = make_blobs(n_samples=300, centers=4,
> cluster_std=0.60,random_state=42)
>
> \# Step 2: Apply KMeans clustering  
> kmeans = KMeans(n_clusters=4, random_state=42, n_init=10)y_kmeans =
> kmeans.fit_predict(X)
>
> \# Step 3: Get cluster centers  
> centroids = np.round(kmeans.cluster_centers\_, 2)
>
> \# Step 4: Visualize results with cluster centers  
> plt.figure(figsize=(8,6))  
> plt.scatter(X\[:, 0\], X\[:, 1\], c=y_kmeans, s=30, cmap='viridis')  
> plt.scatter(kmeans.cluster_centers\_\[:, 0\],
> kmeans.cluster_centers\_\[:, 1\],

<table>
<colgroup>
<col style="width: 100%" />
</colgroup>
<thead>
<tr class="header">
<th><blockquote>
<p>c='red', marker='X', s=200, label='Centroids')</p>
</blockquote></th>
</tr>
</thead>
<tbody>
</tbody>
</table>

> plt.title("KMeans Clustering on make_blobs Data")  
> plt.legend()  
> plt.show()
>
> print("Cluster Centers (rounded):")  
> print(centroids)
>
> **Output**
>
> **●​ Cluster centers:**

<table>
<colgroup>
<col style="width: 50%" />
<col style="width: 50%" />
</colgroup>
<thead>
<tr class="header">
<th><blockquote>
<p>[[ 4.69 ​ [-2.61 ​ [-6.85 ​ [-8.83​</p>
</blockquote></th>
<th><blockquote>
<p>2.01]<br />
8.99]<br />
-6.85]<br />
7.24]]</p>
</blockquote></th>
</tr>
</thead>
<tbody>
</tbody>
</table>

> ●​ **Visualization:​**  
> The scatterplot shows **4 clusters** in different colors, with **red X
> markers** representing the centroids.
>
> **Question 7:** Load the Wine dataset, apply StandardScaler , and then
> train a DBSCAN model. Print the number of clusters found (excluding
> noise).
>
> **Answer 7 :-**  
> from sklearn.datasets import load_wine  
> from sklearn.preprocessing import StandardScaler  
> from sklearn.cluster import DBSCAN  
> import numpy as np
>
> \# Step 1: Load the Wine dataset  
> wine = load_wine()  
> X = wine.data
>
> \# Step 2: Apply StandardScaler  
> scaler = StandardScaler()  
> X_scaled = scaler.fit_transform(X)
>
> \# Step 3: Train DBSCAN model  
> dbscan = DBSCAN(eps=1.8, min_samples=6) \# tuned parametersy_dbscan =
> dbscan.fit_predict(X_scaled)
>
> \# Step 4: Count clusters (excluding noise)  
> n_clusters = len(set(y_dbscan)) - (1 if -1 in y_dbscan else 0)
>
> print("Number of clusters found (excluding noise):", n_clusters)
>
> **Output**  
> Number of clusters found (excluding noise): 6
>
> **Question 8:** Generate moon-shaped synthetic data using  
> make_moons(n_samples=200, noise=0.1), apply DBSCAN, and highlight the
> outliers in the plot.
>
> **Answer 8 :-**  
> import matplotlib.pyplot as plt  
> from sklearn.datasets import make_moons  
> from sklearn.cluster import DBSCAN
>
> \# Step 1: Generate moon-shaped synthetic data  
> X_moons, y_moons = make_moons(n_samples=200,
> noise=0.1,random_state=42)
>
> \# Step 2: Apply DBSCAN  
> dbscan_moons = DBSCAN(eps=0.3, min_samples=5) \# tune eps
> andmin_samples  
> y_dbscan_moons = dbscan_moons.fit_predict(X_moons)
>
> \# Step 3: Visualization  
> plt.figure(figsize=(8,6))
>
> \# Plot clustered points  
> plt.scatter(X_moons\[y_dbscan_moons \>= 0, 0\],
> X_moons\[y_dbscan_moons \>=0, 1\],

<table>
<colgroup>
<col style="width: 100%" />
</colgroup>
<thead>
<tr class="header">
<th><blockquote>
<p>c=y_dbscan_moons[y_dbscan_moons &gt;= 0], cmap='plasma', s=30,</p>
</blockquote></th>
</tr>
</thead>
<tbody>
</tbody>
</table>

> label="Clusters")
>
> \# Plot outliers (noise points = -1) in black with 'x'  
> plt.scatter(X_moons\[y_dbscan_moons == -1, 0\],
> X_moons\[y_dbscan_moons ==-1, 1\],

<table>
<colgroup>
<col style="width: 100%" />
</colgroup>
<thead>
<tr class="header">
<th><blockquote>
<p>c='black', marker='x', s=60, label="Outliers")</p>
</blockquote></th>
</tr>
</thead>
<tbody>
</tbody>
</table>

> plt.title("DBSCAN on Moon-shaped Data (make_moons)")plt.legend()  
> plt.show()
>
> \# Step 4: Print results  
> n_clusters = len(set(y_dbscan_moons)) - (1 if -1 in y_dbscan_moons
> else 0)n_noise = list(y_dbscan_moons).count(-1)
>
> print("Number of clusters found (excluding noise):",
> n_clusters)print("Number of outliers (noise points):", n_noise)
>
> **Expected Output**  
> ●​ **Plot:​**
>
> ○​ Two moon-shaped clusters (different colors).​
>
> ○​ Outliers (noise points) marked as **black ‘x’**.​
>
> ●​ **Printed Results (example):**  
> Number of clusters found (excluding noise): 2  
> Number of outliers (noise points): 4
>
> **Question 9:** Load the Wine dataset, reduce it to 2D using PCA, then
> apply Agglomerative Clustering and visualize the result in 2D with a
> scatter plot.
>
> **Answer 9 :-**  
> import matplotlib.pyplot as plt  
> from sklearn.datasets import load_wine  
> from sklearn.preprocessing import StandardScaler  
> from sklearn.decomposition import PCA  
> from sklearn.cluster import AgglomerativeClustering
>
> \# Step 1: Load Wine dataset  
> wine = load_wine()  
> X = wine.data
>
> \# Step 2: Standardize the features  
> scaler = StandardScaler()  
> X_scaled = scaler.fit_transform(X)
>
> \# Step 3: Reduce to 2D using PCA  
> pca = PCA(n_components=2)  
> X_pca = pca.fit_transform(X_scaled)
>
> \# Step 4: Apply Agglomerative Clustering  
> agg = AgglomerativeClustering(n_clusters=3) \# wine has 3 classes
> y_agg = agg.fit_predict(X_pca)
>
> \# Step 5: Visualization  
> plt.figure(figsize=(8,6))  
> plt.scatter(X_pca\[:, 0\], X_pca\[:, 1\], c=y_agg, cmap="viridis",
> s=40) plt.title("Agglomerative Clustering on Wine Dataset (2D PCA)")
> plt.xlabel("PCA Component 1")  
> plt.ylabel("PCA Component 2")  
> plt.show()
>
> **Expected Output**
>
> ●​ **Scatter Plot:​**
>
> ○​ Data points plotted in **2D PCA space**.​
>
> ○​ Colors represent **clusters found by Agglomerative Clustering**.​
>
> ●​ Since the Wine dataset has **3 natural classes**, setting
> n_clusters=3 usually produces **three distinct groups** in the plot.
>
> **Question 10:** You are working as a data analyst at an e-commerce
> company. The marketing team wants to segment customers based on their
> purchasing behavior to run targeted promotions. The dataset contains
> customer  
> demographics and their product purchase history across categories.
> Describe your real-world data science workflow using clustering:  
> ● Which clustering algorithm(s) would you use and why?
>
> ● How would you preprocess the data (missing values, scaling)?
>
> ● How would you determine the number of clusters?
>
> ● How would the marketing team benefit from your clustering analysis?
>
> **Answer 10 :-**
>
> **Clustering Workflow for Customer Segmentation**
>
> **1. Choice of Clustering Algorithm**
>
> **●​ K-Means:​**
>
> **○​ Works well for large datasets, efficient, and easy to interpret.​**
>
> **○​ Suitable if we expect spherical, well-separated clusters.​**
>
> **●​ DBSCAN (alternative):​**
>
> **○​ Useful if customer groups are of varying density or contain**
> **noise (outliers).​**
>
> **●​ Hierarchical Clustering (exploration):​**
>
> **○​ Can be used initially to explore natural cluster structures
> before** **choosing final K.​**
>
> **Final Choice:**  
> **●​ Start with K-Means for scalability and interpretability.​●​ Use
> Hierarchical for exploration and validation.​**
>
> **2. Data Preprocessing**  
> **●​ Handle Missing Values:​**  
> **○​ Numerical: impute with mean/median.​**  
> **○​ Categorical: impute with mode or add a “missing” category.​** **●​
> Feature Scaling:​**  
> **○​ Apply StandardScaler or Min-Max Scaling so that features like**
> **“Age (20–70)” and “Annual Spending (₹1000–₹10,00,000)”**
> **contribute equally.​**  
> **●​ Encoding Categorical Features:​**  
> **○​ Use One-Hot Encoding for customer demographics (e.g.,** **gender,
> region).​**  
> **●​ Feature Engineering:​**  
> **○​ Aggregate purchase behavior into features like average spend**
> **per category, purchase frequency, recency of last purchase** **(RFM
> features).​**
>
> **3. Determining Number of Clusters**
>
> **●​ Elbow Method: Plot inertia (WCSS) vs K, look for elbow point.​●​
> Silhouette Score: Choose K with higher average silhouette score.​●​
> Domain Knowledge: Marketing may want 3–6 actionable segments,** **not
> 20 tiny ones.​**
>
> **4. Benefits to Marketing Team**  
> **●​ Targeted Promotions:​**  
> **○​ High spenders → premium offers.​**  
> **○​ Discount shoppers → coupon-based promotions.​**  
> **○​ Loyal customers → retention campaigns.​**  
> **●​ Product Recommendations:​**  
> **○​ Cluster insights can reveal product affinities for personalized**
> **recommendations.​**  
> **●​ Improved ROI:​**  
> **○​ Focus marketing budget on clusters with higher conversion**
> **potential.​**  
> **●​ Customer Insights:​**  
> **○​ Understand demographic + behavioral patterns (e.g., young**
> **professionals buying gadgets vs families buying household**
> **items).**
>
> **Python code :-**  
> import pandas as pd
>
> import numpy as np  
> from sklearn.preprocessing import StandardScaler  
> from sklearn.impute import SimpleImputer  
> from sklearn.cluster import KMeans  
> from sklearn.metrics import silhouette_score  
> import matplotlib.pyplot as plt
>
> \# -------------------------------  
> \# Step 1: Simulate a dataset  
> \# -------------------------------  
> np.random.seed(42)  
> data = {

<table>
<colgroup>
<col style="width: 100%" />
</colgroup>
<thead>
<tr class="header">
<th><blockquote>
<p>"Age": np.random.randint(18, 65, 200),</p>
</blockquote></th>
</tr>
</thead>
<tbody>
</tbody>
</table>

<table>
<colgroup>
<col style="width: 100%" />
</colgroup>
<thead>
<tr class="header">
<th><blockquote>
<p>"Annual_Income": np.random.randint(20000, 150000, 200),</p>
</blockquote></th>
</tr>
</thead>
<tbody>
</tbody>
</table>

<table>
<colgroup>
<col style="width: 100%" />
</colgroup>
<thead>
<tr class="header">
<th><blockquote>
<p>"Spending_Score": np.random.randint(1, 100, 200), # proxy for
purchase</p>
</blockquote></th>
</tr>
</thead>
<tbody>
</tbody>
</table>

> behavior

<table>
<colgroup>
<col style="width: 100%" />
</colgroup>
<thead>
<tr class="header">
<th><blockquote>
<p>"Electronics_Spend": np.random.randint(0, 5000, 200),</p>
</blockquote></th>
</tr>
</thead>
<tbody>
</tbody>
</table>

<table>
<colgroup>
<col style="width: 100%" />
</colgroup>
<thead>
<tr class="header">
<th><blockquote>
<p>"Clothing_Spend": np.random.randint(0, 5000, 200),</p>
</blockquote></th>
</tr>
</thead>
<tbody>
</tbody>
</table>

<table>
<colgroup>
<col style="width: 100%" />
</colgroup>
<thead>
<tr class="header">
<th><blockquote>
<p>"Grocery_Spend": np.random.randint(0, 5000, 200)</p>
</blockquote></th>
</tr>
</thead>
<tbody>
</tbody>
</table>

> }  
> df = pd.DataFrame(data)
>
> print("Sample of dataset:")  
> print(df.head())
>
> \# -------------------------------  
> \# Step 2: Handle missing values (if any)  
> \# -------------------------------  
> imputer = SimpleImputer(strategy="median")  
> X_imputed = imputer.fit_transform(df)
>
> \# -------------------------------  
> \# Step 3: Scale features  
> \# -------------------------------  
> scaler = StandardScaler()  
> X_scaled = scaler.fit_transform(X_imputed)
>
> \# -------------------------------  
> \# Step 4: Find optimal number of clusters using Elbow + Silhouette#
> -------------------------------  
> wcss = \[\]  
> silhouette_scores = \[\]  
> K_range = range(2, 8)
>
> for k in K_range:

<table>
<colgroup>
<col style="width: 100%" />
</colgroup>
<thead>
<tr class="header">
<th><blockquote>
<p>kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)</p>
</blockquote></th>
</tr>
</thead>
<tbody>
</tbody>
</table>

<table>
<colgroup>
<col style="width: 100%" />
</colgroup>
<thead>
<tr class="header">
<th><blockquote>
<p>labels = kmeans.fit_predict(X_scaled)</p>
</blockquote></th>
</tr>
</thead>
<tbody>
</tbody>
</table>

<table>
<colgroup>
<col style="width: 100%" />
</colgroup>
<thead>
<tr class="header">
<th><blockquote>
<p>wcss.append(kmeans.inertia_)</p>
</blockquote></th>
</tr>
</thead>
<tbody>
</tbody>
</table>

<table>
<colgroup>
<col style="width: 100%" />
</colgroup>
<thead>
<tr class="header">
<th><blockquote>
<p>silhouette_scores.append(silhouette_score(X_scaled, labels))</p>
</blockquote></th>
</tr>
</thead>
<tbody>
</tbody>
</table>

> \# Plot Elbow Method  
> plt.figure(figsize=(12,5))
>
> plt.subplot(1,2,1)  
> plt.plot(K_range, wcss, marker='o')  
> plt.title("Elbow Method (WCSS vs K)")  
> plt.xlabel("Number of Clusters")  
> plt.ylabel("WCSS")
>
> \# Plot Silhouette Scores  
> plt.subplot(1,2,2)  
> plt.plot(K_range, silhouette_scores, marker='o')  
> plt.title("Silhouette Score vs K")  
> plt.xlabel("Number of Clusters")  
> plt.ylabel("Silhouette Score")
>
> plt.tight_layout()  
> plt.show()
>
> \# -------------------------------  
> \# Step 5: Train final KMeans model  
> \# -------------------------------  
> best_k = 4 \# assume from elbow/silhouette  
> kmeans = KMeans(n_clusters=best_k, random_state=42, n_init=10)
>
> df\["Cluster"\] = kmeans.fit_predict(X_scaled)
>
> print("\nCluster distribution:")  
> print(df\["Cluster"\].value_counts())
>
> print("\nCluster centroids (scaled space):")  
> print(kmeans.cluster_centers\_)
>
> **Output**  
> Sample of dataset:  
> Age Annual_Income Spending_Score Electronics_Spend Clothing_Spend0 56
> 72733 93 2896 20481 46 85318 46 4185 41462 32 129953 6 775 47803 60
> 109474 99 1625 7164 25 43664 37 3069 4544 Grocery_Spend  
> 0 9  
> 1 260  
> 2 1673  
> 3 2219  
> 4 2060
>
> Cluster distribution:  
> Cluster  
> 0 55  
> 1 54  
> 3 49  
> 2 42  
> Name: count, dtype: int64
>
> Cluster centroids (scaled space):
>
> \[\[ 0.56168671 0.37082543 0.99612086 -0.3105289 -0.19813552
> 0.37980593\] \[-0.33538872 -0.99932247 -0.67323957 -0.27642637
> -0.13495824 0.5137487 \] \[ 0.59006959 -0.18464394 -0.05517582
> 0.79110238 -0.27984004 -1.09766207\] \[-0.76662654 0.84332776
> -0.3288638 -0.02490177 0.61098939  
> -0.05163161\]\]