# Clustering
Clustering is an unsupervised machine learning technique used to group similar data points into clusters. It identifies patterns or structures in the data without relying on predefined labels. Clustering algorithms aim to maximize intra-cluster similarity and minimize inter-cluster similarity.

- Type of Learning: Unsupervised Learning.
- Objective: Group data into clusters based on similarity or distance metrics.
- Input: Unlabeled data.
- Output: A set of clusters, where each data point belongs to one (or more) clusters.

### Types of Clustering:
- Partitioning-Based: Divides data into non-overlapping clusters.
    - K-Means, K-Medoids.
- Density-Based: Identifies clusters as dense regions in the data.
    - DBSCAN, OPTICS.
- Hierarchical: Creates a hierarchy of clusters.
    - Agglomerative Clustering, Divisive Clustering.
- Model-Based: Assumes data is generated by a mixture of underlying probability distributions.
    - Gaussian Mixture Models (GMMs).

| Algorithm        | Description                                                                 |
|------------------|-----------------------------------------------------------------------------|
| K-Means          | Divides data into K clusters by minimizing the sum of squared distances within clusters. |
| DBSCAN           | Groups points that are closely packed together and identifies noise as outliers. |
| Agglomerative    | Hierarchical approach that merges smaller clusters into larger ones.        |
| Gaussian Mixture | Assumes data is generated from a mixture of Gaussian distributions.         |
| Mean-Shift       | Groups data by finding dense regions (peaks) in a feature space.            |


## K-Means Clustering
 It aims to partition a dataset into K distinct clusters, where each data point belongs to the cluster with the nearest mean (centroid).

*How It Works:*
- Initialization: Choose the number of clusters, K, and initialize K centroids randomly.
- Assignment Step: Assign each data point to the nearest centroid (using a distance metric like Euclidean distance).
- Update Step: Recalculate the centroids as the mean of the points in each cluster.
- Repeat: Alternate between the assignment and update steps until convergence (when centroids stop changing significantly or the maximum number of iterations is reached).