Skip to content

This notebook focuses on the classification of Iris Species by its Sepal Length, Sepal Width, Petal Length and Petal Width.

Notifications You must be signed in to change notification settings

mayursrt/k-means-on-iris-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

k-means-on-iris-dataset

K-Means Clustering on Iris Dataset

customer-segmentation-using-k-means

Overview

This notebook focuses on the classification of Iris Species by its Sepal Length, Sepal Width, Petal Length and Petal Width. The Iris Dataset is used for making the classifications/ clusters.

The Iris Dataset

This data sets consists of 3 different types of irises’ (Setosa, Versicolour, and Virginica) petal and sepal length, stored in a 150x4 numpy.ndarray

The rows being the samples and the columns being: Sepal Length, Sepal Width, Petal Length and Petal Width.

The below plot uses the first two features. See here for more information on this dataset.

K-Means Clustering:

K-Means algorithm is an iterative algorithm that tries to partition the dataset into K pre-defined distinct non-overlapping subgroups (clusters) where each data point belongs to only one group. It tries to make the intra-cluster data points as similar as possible while also keeping the clusters as different (far) as possible. It assigns data points to a cluster such that the sum of the squared distance between the data points and the cluster’s centroid is at the minimum. The less variation we have within clusters, the more homogeneous the data points are within the same cluster. We then proceeded to perform K-means Clustering which will create different clusters to group similar spending activity based on their age and annual income. KMeans Clustering selects random values from the data and forms clusters assigned. The closest values from the centre of each cluster were taken to update the cluster and reshape the plot (just like k-NN). The closest values are based on Euclidean Distance.

This is the code for Customer Segmentation Project made for THE SPARKS FOUNDATION.

Clone Repository

Clone this Repository using,

git clone https://github.com/mayursrt/customer-segmentation-using-k-means.git

Usage

Install jupyter from here or use

pip install jupyter

After installing jupyter notebook Just run jupyter notebook in terminal and you can visit the notebook in your web browser.

Create Environment

Create an environment using the requirements.txt using pip by using following command so you dont have to install dependencies one by one,

pip -r requirements.txt

If you need to use conda to create the environment, Read conda docs on managing environments here

Dependencies

Install missing dependencies using,

pip install pandas numpy matplotlib seaborn sklearn

About

This notebook focuses on the classification of Iris Species by its Sepal Length, Sepal Width, Petal Length and Petal Width.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published