###
## K-Means CLustering with Scikit Learn
####
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
####
# **KMeans Parameters in scikit-learn**

The `KMeans` class in scikit-learn provides an implementation of the k-means clustering algorithm. Here’s an explanation of the key parameters:

---

## **Parameters**

### **1. `n_clusters` (default = 8)**
- **Description**: The number of clusters to form, and the number of centroids to generate.
- **Usage**: Specify the desired number of clusters (e.g., `n_clusters=3` for dividing data into 3 clusters).

---

### **2. `init` (default = 'k-means++')**
- **Description**: Method for initializing the centroids.
  - `'k-means++'`: Selects initial cluster centers to speed up convergence and improve accuracy.
  - `'random'`: Chooses initial centroids randomly.
  - A **numpy array**: Allows you to manually specify initial centroids.
- **Usage**: Stick with `'k-means++'` for most cases as it's efficient and accurate.

---

### **3. `n_init` (default = 'auto')**
- **Description**: Number of times the k-means algorithm will run with different centroid seeds.
  - `'auto'`: Uses `10` for `'k-means++'` and `1` for other initialization methods.
  - If an integer is specified, it determines the number of initializations.
- **Usage**: Increase this number if you want to improve the stability of clustering (e.g., when clusters are highly overlapping).

---

### **4. `max_iter` (default = 300)**
- **Description**: Maximum number of iterations for a single k-means run.
- **Usage**: Increase this if the algorithm takes longer to converge on your dataset. However, 300 is sufficient in most cases.

---

### **5. `tol` (default = 0.0001)**
- **Description**: Relative tolerance to declare convergence.
  - Convergence is achieved when the difference in the within-cluster sum of squares (WCSS) between consecutive iterations is less than `tol`.
- **Usage**: Use a smaller value (e.g., `tol=1e-5`) for more precise clusters at the cost of additional iterations.

---

### **6. `verbose` (default = 0)**
- **Description**: Controls the verbosity of the output during training.
  - `0`: Silent mode (no output).
  - `1` or higher: Outputs details of the algorithm's progress.
- **Usage**: Use this when debugging or needing insight into how the algorithm is progressing.

---

### **7. `random_state` (default = None)**
- **Description**: Seed for the random number generator.
  - If an integer is given, it ensures reproducibility of results (e.g., `random_state=42`).
  - If `None`, randomization is based on the system clock.
- **Usage**: Use this for reproducibility when comparing results.

---

### **8. `copy_x` (default = True)**
- **Description**: Determines whether the data matrix `X` will be copied or modified in-place.
  - `True`: The data is copied (original data is preserved).
  - `False`: Allows modifications to `X`, reducing memory usage but altering the input data.
- **Usage**: Leave as `True` unless memory usage is a concern.

---

### **9. `algorithm` (default = 'lloyd')**
- **Description**: Algorithm used for clustering.
  - `'lloyd'`: Standard k-means algorithm (fast and widely used).
  - `'elkan'`: Optimized for datasets with fewer clusters and well-separated centroids (uses triangle inequality for speed).
  - `'auto'`: Chooses the best algorithm based on the data (default = `'lloyd'`).
- **Usage**: Use `'elkan'` for speed improvements when your data is sparse and clusters are distinct.

---

## **Example Usage**

```python
from sklearn.cluster import KMeans

# Example data
data = [[1, 2], [3, 4], [5, 6], [7, 8]]

# KMeans clustering
kmeans = KMeans(
    n_clusters=3,
    init='k-means++',
    n_init=10,
    max_iter=300,
    tol=0.0001,
    random_state=42,
    algorithm='auto'
)
kmeans.fit(data)

# Results
print("Cluster centers:", kmeans.cluster_centers_)
print("Labels:", kmeans.labels_)
