Clustering Models for Machine Learning

Introduction to Clustering

Clustering is a type of unsupervised learning method. In unsupervised learning, we draw inferences from datasets consisting of input data without labeled responses. Generally, clustering is used to find meaningful structures, explanatory underlying processes, generative features, and groupings inherent in a set of examples.

Clustering is the task of dividing the population or data points into a number of groups such that data points in the same group are more similar to each other than to those in other groups. It is essentially a collection of objects based on their similarity and dissimilarity.

For example, the data points in the graph below clustered together can be classified into one single group. We can distinguish the clusters and identify that there are 3 clusters in the picture.

Why Clustering?

Clustering is important as it determines the intrinsic grouping among the unlabeled data present. There are no strict criteria for a good clustering; it depends on the user's needs. For instance, we could be interested in:

Finding representatives for homogeneous groups (data reduction)
Finding “natural clusters” and describing their unknown properties (“natural” data types)
Finding useful and suitable groupings (“useful” data classes)
Finding unusual data objects (outlier detection)

Clustering Methods

Density-Based Methods

These methods consider clusters as dense regions having some similarity and different from the lower dense regions of the space. They have good accuracy and the ability to merge two clusters.

Examples: DBSCAN (Density-Based Spatial Clustering of Applications with Noise), OPTICS (Ordering Points to Identify Clustering Structure)

Hierarchical Based Methods

Clusters formed in this method create a tree-type structure based on hierarchy. New clusters are formed using previously formed ones.

Categories: Agglomerative (bottom-up approach), Divisive (top-down approach)
Examples: CURE (Clustering Using Representatives), BIRCH (Balanced Iterative Reducing Clustering and using Hierarchies)

Partitioning Methods

These methods partition the objects into k clusters, with each partition forming one cluster. This method is used to optimize an objective criterion similarity function, such as when distance is a major parameter.

Examples: K-means, CLARANS (Clustering Large Applications based upon Randomized Search)

Grid-based Methods

In this method, the data space is formulated into a finite number of cells that form a grid-like structure. All clustering operations done on these grids are fast and independent of the number of data objects.

Examples: STING (Statistical Information Grid), WaveCluster, CLIQUE (CLustering In Quest)

Clustering Algorithms

K-means Clustering Algorithm

K-means is the simplest unsupervised learning algorithm that solves clustering problems. The K-means algorithm partitions n observations into k clusters, where each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

Applications of Clustering in Different Fields

Marketing: Characterize and discover customer segments for marketing purposes.
Biology: Classification among different species of plants and animals.
Libraries: Clustering different books based on topics and information.
Insurance: Acknowledge customers, their policies, and identify frauds.
City Planning: Group houses and study their values based on geographical locations and other factors.
Earthquake Studies: Determine dangerous zones by learning about earthquake-affected areas.

References

Feel free to contribute to this repository by adding more clustering methods, algorithms, and applications!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Part 4 - Clustering		Part 4 - Clustering
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Part 4 - Clustering

Part 4 - Clustering

README.md

README.md

Repository files navigation

Clustering Models for Machine Learning

Introduction to Clustering

Why Clustering?

Clustering Methods

Density-Based Methods

Hierarchical Based Methods

Partitioning Methods

Grid-based Methods

Clustering Algorithms

K-means Clustering Algorithm

Applications of Clustering in Different Fields

References

About

Releases

Packages

Languages

sayantann11/clustering-modelsfor-ML

Folders and files

Latest commit

History

Part 4 - Clustering

Part 4 - Clustering

README.md

README.md

Repository files navigation

Clustering Models for Machine Learning

Introduction to Clustering

Why Clustering?

Clustering Methods

Density-Based Methods

Hierarchical Based Methods

Partitioning Methods

Grid-based Methods

Clustering Algorithms

K-means Clustering Algorithm

Applications of Clustering in Different Fields

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages