K-Means Clustering on Iris Dataset
This notebook focuses on the classification of Iris Species by its Sepal Length, Sepal Width, Petal Length and Petal Width. The Iris Dataset is used for making the classifications/ clusters.
This data sets consists of 3 different types of irises’ (Setosa, Versicolour, and Virginica) petal and sepal length, stored in a 150x4 numpy.ndarray
The rows being the samples and the columns being: Sepal Length, Sepal Width, Petal Length and Petal Width.
The below plot uses the first two features. See here for more information on this dataset.
K-Means algorithm is an iterative algorithm that tries to partition the dataset into K pre-defined distinct non-overlapping subgroups (clusters) where each data point belongs to only one group. It tries to make the intra-cluster data points as similar as possible while also keeping the clusters as different (far) as possible. It assigns data points to a cluster such that the sum of the squared distance between the data points and the cluster’s centroid is at the minimum. The less variation we have within clusters, the more homogeneous the data points are within the same cluster. We then proceeded to perform K-means Clustering which will create different clusters to group similar spending activity based on their age and annual income. KMeans Clustering selects random values from the data and forms clusters assigned. The closest values from the centre of each cluster were taken to update the cluster and reshape the plot (just like k-NN). The closest values are based on Euclidean Distance.
This is the code for Customer Segmentation Project made for THE SPARKS FOUNDATION.
Clone this Repository using,
git clone https://github.com/mayursrt/customer-segmentation-using-k-means.git
Install jupyter
from here or use
pip install jupyter
After installing jupyter notebook Just run jupyter notebook
in terminal and you can visit the notebook in your web browser.
Create an environment using the requirements.txt
using pip by using following command so you dont have to install dependencies one by one,
pip -r requirements.txt
If you need to use conda to create the environment, Read conda docs on managing environments here
Install missing dependencies using,
pip install pandas numpy matplotlib seaborn sklearn