This project performs clustering analysis on U.S. crime data using Hierarchical, K-Means, and DBSCAN algorithms. The aim is to identify the optimal number of clusters and derive meaningful inferences about crime patterns based on features like Murder, Assault, Urban Population, and Rape.
- Murder: Murder rates in different places of the United States.
- Assault: Assault rates in different places of the United States.
- UrbanPop: Urban population in different places of the United States.
- Rape: Rape rates in different places of the United States.
- Used dendrograms to identify the number of clusters.
- Applied Agglomerative Clustering to group similar regions.
- Used Elbow Curve to find the optimal number of clusters.
- Clustered data using K-Means and analyzed intra-cluster similarities.
- Applied Density-Based Spatial Clustering for identifying arbitrarily shaped clusters.
- Tuned
eps
andmin_samples
parameters for optimal clustering.
- Dendrograms for hierarchical clustering.
- Elbow curve for K-Means optimization.
- Scatter plots for cluster visualization.
- Hierarchical and K-Means clustering identified consistent cluster patterns.
- DBSCAN was sensitive to parameter tuning but struggled with noise in the data (silhouette score: -0.268).
- Urban population and crime rates influence clustering significantly.
- Clone the repository:
git clone https://github.com/R-Mahesh45/crime-data-clustering.git
- Install required libraries:
pip install pandas numpy matplotlib seaborn scipy scikit-learn
- Run the clustering analysis:
python clustering_analysis.py
- Hierarchical Clustering: X clusters were formed.
- K-Means Clustering: Y clusters were optimal based on the elbow method.
- DBSCAN Clustering: Challenging to form clusters due to noise.