# Model Name

---

## References

[Geeks for Geeks - K means Clustering - Introduction](https://www.geeksforgeeks.org/k-means-clustering-introduction/)

[Geeks for Geeks - K-means++ Algorithm](https://www.geeksforgeeks.org/ml-k-means-algorithm/)

[Scikit Learn - KMeans](https://www.geeksforgeeks.org/k-means-clustering-introduction/)

---

## Notes

#### Characteristics
- Unsupervised Learning
- Clustering task
- Group similar data points into clusters
- K means number of clusters
- Use centroid as a center of cluster
    - assign data points a cluster with closest centroid
    - update centroid with the average of data points
- use K-means++ to get better initial centroids

#### Input & Output
- **Input**: feature matrix $X$ with shape (n_samples, n_features)
- **Output**: lable vector $y$ with shape (n_samples,)

#### Parameters
- $k$ centroids

#### Hyperparameters

- $k$, number of clusters
- maximum iterations
- tolerance, threshold of change in centroids for early stopping
- initialization method for centroid

#### Runtime Complexity
- **Training**: $O(i\cdot n \cdot k \cdot d)$
    - for each iteration, 
        - for each data point, we calculate distance from each of $k$ centroids, which takes $O(d)$
        - for each centroid, we update it with average of data points, which takes $O(n\cdot d)$
- **Inference**: $O(k \cdot d)$
    - we calculate distance from the data point to $k$ centroids

where
- $i$: number of iterations
- $n$: number of samples
- $k$: number of clusters
- $d$: number of features

#### Pros & Cons
- **Pros**: 
    - time complexity is linear in $n$, works fast with large datasets
    - centroids and results are interpretable
    - structure is simple and does not require complex mathematics
    - performs well with spherical clusters
* **Cons**: 
    - choosing optimal $k$ is not easy
    - performance decrease with non-spherical or unequal size clusters
    - sensitive to initial centroids
    - struggles with outliers
    - not guaranteed to find optimal solution

---

## Mathematics

#### Euclidean Distance

$$\text{distance}(x,\mu)=\sqrt{\sum_{i=1}^{d}(x_i-\mu_i)^2}$$

where
- $x$: data point
- $\mu$: centroid
- $d$: number of features

#### Mean

$$\mu_j=\frac{1}{|C_j|}\sum_{x\in C_j}x$$

where
- $\mu_j$: new centroid of $j$-th cluster
- $C_j$: set of data points in $j$-th cluster

#### K-Means++ Initialization

1. pick the first centroid uniformly at random from the dataset; $\mu_1=x_i\in X$
2. for every data point, compute distance squared to the closest centroid; $D(x_i)^2 = \min_j\|x_i-\mu_j\|^2$
3. pick one of the data point as a next centroid with probability $P(x_i)=\frac{D(x_i)^2}{\sum_{x\in X}D(x)^2}$
4. repeat 2-3 until we get $k$ centroids

By choosing centroids with higher probability if it is far from existing ones, we can get well separated initial centroid that could possibly reduce overlapping clusters and slow convergence.

---

## Comments