Skip to content

lustering in Machine Learning Introduction to Clustering It is basically a type of unsupervised learning method . An unsupervised learning method is a method in which we draw references from datasets consisting of input data without labelled responses. Generally, it is used as a process to find meaningful structure, explanatory underlying proces…

Notifications You must be signed in to change notification settings

sayantann11/clustering-modelsfor-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Clustering Models for Machine Learning

Clustering

Introduction to Clustering

Clustering is a type of unsupervised learning method. In unsupervised learning, we draw inferences from datasets consisting of input data without labeled responses. Generally, clustering is used to find meaningful structures, explanatory underlying processes, generative features, and groupings inherent in a set of examples.

Clustering is the task of dividing the population or data points into a number of groups such that data points in the same group are more similar to each other than to those in other groups. It is essentially a collection of objects based on their similarity and dissimilarity.

For example, the data points in the graph below clustered together can be classified into one single group. We can distinguish the clusters and identify that there are 3 clusters in the picture.

Example Clusters

Why Clustering?

Clustering is important as it determines the intrinsic grouping among the unlabeled data present. There are no strict criteria for a good clustering; it depends on the user's needs. For instance, we could be interested in:

  • Finding representatives for homogeneous groups (data reduction)
  • Finding “natural clusters” and describing their unknown properties (“natural” data types)
  • Finding useful and suitable groupings (“useful” data classes)
  • Finding unusual data objects (outlier detection)

Clustering Methods

Density-Based Methods

These methods consider clusters as dense regions having some similarity and different from the lower dense regions of the space. They have good accuracy and the ability to merge two clusters.

  • Examples: DBSCAN (Density-Based Spatial Clustering of Applications with Noise), OPTICS (Ordering Points to Identify Clustering Structure)

Hierarchical Based Methods

Clusters formed in this method create a tree-type structure based on hierarchy. New clusters are formed using previously formed ones.

  • Categories: Agglomerative (bottom-up approach), Divisive (top-down approach)
  • Examples: CURE (Clustering Using Representatives), BIRCH (Balanced Iterative Reducing Clustering and using Hierarchies)

Partitioning Methods

These methods partition the objects into k clusters, with each partition forming one cluster. This method is used to optimize an objective criterion similarity function, such as when distance is a major parameter.

  • Examples: K-means, CLARANS (Clustering Large Applications based upon Randomized Search)

Grid-based Methods

In this method, the data space is formulated into a finite number of cells that form a grid-like structure. All clustering operations done on these grids are fast and independent of the number of data objects.

  • Examples: STING (Statistical Information Grid), WaveCluster, CLIQUE (CLustering In Quest)

Clustering Algorithms

K-means Clustering Algorithm

K-means is the simplest unsupervised learning algorithm that solves clustering problems. The K-means algorithm partitions n observations into k clusters, where each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

K-means Clustering

Applications of Clustering in Different Fields

  • Marketing: Characterize and discover customer segments for marketing purposes.
  • Biology: Classification among different species of plants and animals.
  • Libraries: Clustering different books based on topics and information.
  • Insurance: Acknowledge customers, their policies, and identify frauds.
  • City Planning: Group houses and study their values based on geographical locations and other factors.
  • Earthquake Studies: Determine dangerous zones by learning about earthquake-affected areas.

References

Feel free to contribute to this repository by adding more clustering methods, algorithms, and applications!

About

lustering in Machine Learning Introduction to Clustering It is basically a type of unsupervised learning method . An unsupervised learning method is a method in which we draw references from datasets consisting of input data without labelled responses. Generally, it is used as a process to find meaningful structure, explanatory underlying proces…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published