A clustering tutorial with scikit-learn for beginners.
-
Introduction to k-means, k-means++ and DBSCAN (Density-Based Spatial Clustering Algorithm with Noise).
-
Explore common drawbacks of k-means, such as:
- Need to choose the right number of clusters.
- Cannot handle Noise Data and Outliers.
- Cannot handle Non-spherical Data. And of course, present solutions for the above drawbacks.
-
Introduction to supervised and unsupervised methods for measuring cluster quality such as homogeneity, completeness and the Silhouette Coefficient (part of section 2).
-
Two simple exercises (k-means & DBSCAN) along with the tutorial.
- Please refer to the slides in
slides/
or review then on google drive, there are Chinese version and English version. - Codes are in
tutorial_and_labs/
, each.ipynb
has its corresponding.html
.