Skip to content

Python implementation of Agglomerative Clustering algorithm for unsupervised learning. Hierarchical clustering method that merges similar clusters iteratively. Suitable for various data types and shapes, offering insights into hierarchical structures within datasets.

Notifications You must be signed in to change notification settings

sumony2j/Agglomerative_Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Agglomerative Clustering

Introduction

Agglomerative clustering is a hierarchical clustering technique used for grouping similar data points into clusters. It starts with each data point as a single cluster and iteratively merges the closest clusters until only one cluster remains. Agglomerative clustering is versatile and capable of handling various types of data. This repository provides an overview of Agglomerative clustering along with examples and implementations in Python.

How Agglomerative Clustering Works

Agglomerative clustering works by iteratively merging clusters based on a linkage criterion until a stopping condition is met. The linkage criterion determines the distance between clusters and can vary depending on the specific algorithm used (e.g., single linkage, complete linkage, average linkage).

Detailed Steps:

  1. Initialization:

    • Start with each data point as a single cluster.
  2. Merge Clusters:

    • Compute the pairwise distances between all clusters.
    • Merge the two closest clusters based on the chosen linkage criterion.
    • Update the distance matrix to reflect the merged clusters.
    • Repeat this process until a stopping condition is met (e.g., a predefined number of clusters or a specific distance threshold).
  3. Hierarchy Construction:

    • As clusters are merged, a dendrogram is constructed to represent the hierarchical structure of the clusters.
    • The dendrogram provides insights into the relationships between clusters and can be used to determine the optimal number of clusters.

Key Parameters in Agglomerative Clustering

  • Linkage Criterion: The method used to compute the distance between clusters, such as single linkage, complete linkage, or average linkage.
  • Distance Metric: The metric used to compute the distance between data points within and between clusters, such as Euclidean distance, Manhattan distance, or cosine similarity.

Advantages of Agglomerative Clustering

  • Capable of handling various types of data and distance metrics.
  • Produces a hierarchical structure of clusters, providing insights into the relationships between clusters.
  • Does not require the number of clusters to be specified in advance.

Limitations of Agglomerative Clustering

  • Computationally expensive for large datasets, as it requires computing pairwise distances between all data points.
  • May be sensitive to the choice of linkage criterion and distance metric.

Applications of Agglomerative Clustering

  • Image segmentation and object detection.
  • Identifying natural groupings in biological data.
  • Customer segmentation in marketing.
  • Document clustering in natural language processing.
  • Anomaly detection in cybersecurity.

Datasets

This repository includes sample datasets in CSV format that can be used to practice Agglomerative clustering. The datasets contain relevant attributes for clustering tasks.

Repository Structure

└── Agglomerative/
    ├── CC GENERAL.csv
    ├── CreditCard_Dataset_Agglomerative.ipynb
    ├── README.md
    └── requirements.txt

Getting Started

Requirements

Ensure you have the following dependencies installed on your system:

  • JupyterNotebook

Installation

  1. Clone the Agglomerative repository:
git clone https://github.com/sumony2j/Agglomerative.git
  1. Change to the project directory:
cd Agglomerative
  1. Install the dependencies:
pip install -r requirements.txt

Running Agglomerative

Use the following command to run Agglomerative:

jupyter nbconvert --execute notebook.ipynb

Contributing

Contributions are welcome! Here are several ways you can contribute:

Contributing Guidelines
  1. Fork the Repository: Start by forking the project repository to your GitHub account.
  2. Clone Locally: Clone the forked repository to your local machine using a Git client.
    git clone https://github.com/sumony2j/Agglomerative.git
  3. Create a New Branch: Always work on a new branch, giving it a descriptive name.
    git checkout -b new-feature-x
  4. Make Your Changes: Develop and test your changes locally.
  5. Commit Your Changes: Commit with a clear message describing your updates.
    git commit -m 'Implemented new feature x.'
  6. Push to GitHub: Push the changes to your forked repository.
    git push origin new-feature-x
  7. Submit a Pull Request: Create a PR against the original project repository. Clearly describe the changes and their motivations.

Once your PR is reviewed and approved, it will be merged into the main branch.

About

Python implementation of Agglomerative Clustering algorithm for unsupervised learning. Hierarchical clustering method that merges similar clusters iteratively. Suitable for various data types and shapes, offering insights into hierarchical structures within datasets.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published