<center>
    
<img src="https://1.bp.blogspot.com/-fHdsJ8Q5TFU/WjqTHcKqZ-I/AAAAAAAAAic/_tVg-_c5XjcU96uWkMlzvkJ-yY3kyx2JgCLcBGAs/s1600/K-Means-Clustering-In-Machine-Learning.jpg" height=500 width=500/>

<br>
<img src="https://media.giphy.com/media/12vVAGkaqHUqCQ/giphy.gif"/>

</center>

## Table of Contents

1. [Problem Statement](#section1)<br>
 
2. [Importing packages](#section2)<br>
    
3. [Data Loading and Description](#section3)<br>
    - 3.1 [Description of the Dataset](#section301)<br>
    - 3.2 [Pandas Profiling before Data Preprocessing](#section302)<br>
4. [Preprocessing](#section4)<br>
    - 4.1 [Droping the Highly correlated column](#section401)<br>
    - 4.2 [Encode the categorical feature](#section402)<br>
    - 4.3 [Pandas Profiling after Data Preprocessing](#section403)<br>
    
5. [Data Exploration](#section5)<br>
    - 5.1 [Scatterplot of Grad.Rate vs Room.Board](#section501)
    - 5.2 [Scatterplot of P.Undergrad vs Outstate](#section502)
    
6. [K-means Clustering Tree](#section6)<br>
    - 6.1 [k-means Clustering Use case](#section601)<br>
    - 6.2 [Overview : What is Clustering?](#section602)<br>
        - 6.2.1 [Types of Clustering](#section60201)
    - 6.3 [Types of Clustering Algorithms](#section603)<br>
    - 6.4 [Introduction to K-means Algorithm](#section604)<br>
    - 6.5 [Business Cases](#section605)<br>
    - 6.6 [Algorithm](#section606)<br>
        - 6.6.1 [Cluster Assignment](#section60601)<br>
        - 6.6.2 [Move Centroid](#section60601)<br>
    - 6.7 [Choosing K](#section607)<br>
    - 6.8 [How good is K-means?](#section608)<br>
    - 6.9 [Introduction to Heirarchial Clustering](#section609)<br>
        - 6.9.1 [How to measure closeness of points?](#section60901)<br>
        - 6.9.2 [How to calculate distance between two clusters?](#section60902)<br>
        - 6.9.3 [Algorithm Explained](#section60903)<br>
        - 6.9.4 [How many clusters to form?](#section60904)<br>
        - 6.9.5 [Good Cluster Analysis](#section60905)<br>
7. [K-means Cluster Creation](#section7)<br>
    - 7.1 [Normalize data](#section701)<br>
    - 7.2 [Evaluation](#section702)<br>
    - 7.3 [Create a confusion matrix](#section703)<br>
    - 7.4 [Question: What about the K Value?](#section704)<br>
    - 7.5 [Choosing appropriate value of k](#section705)<br>
    - 7.6 [Clustering Visualization](#section706)<br>
        - 7.6.1 [Dimensinal Reduction using PCA](#section70601)
    - 7.7 [Centroid Visualization](#section707)<br>
    
8. [Conclusion](#section8)<br>

<a id=section1></a>
## 1. Problem Statement

The __U.S News and World Report’s College Data__ about the different __colleges or university__ and using the given information our task is to __predict__ wether the College | University is __Private University__ or __Public University__.

__Imp Note__: When using the Kmeans algorithm under normal circumstances, it is because you don't have __labels__. In this case we will use the labels to try to get an idea of how well the algorithm performed(So we can practically get the real intution of K-means Clustering), but you won't usually do this for Kmeans.

<img src="http://ssd6.org/files/2016/06/high-school-graduation-traditions.jpg"/>

<a id=section2></a>
## 2. Importing packages                                          

In [None]:
import numpy as np                                   # Implemennts milti-dimensional array and matrices
np.set_printoptions(precision=4)                     # To display values only upto four decimal places. 

import pandas as pd                                  # For data manipulation and analysis
pd.set_option('mode.chained_assignment', None)       # To suppress pandas warnings.
pd.set_option('display.max_colwidth', -1)            # To display all the data in the columns.
pd.options.display.max_columns = 40                  # To display all the columns.

import pandas_profiling

import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('seaborn-whitegrid')    # To apply seaborn whitegrid style to the plots.
plt.rc('figure', figsize=(10, 8))     # Set the default figure size of plots.
%matplotlib inline


import warnings
warnings.filterwarnings('ignore')     # To suppress all the warnings in the notebook.

from sklearn.metrics import classification_report, confusion_matrix

<a id=section3></a>
## 3. Data Loading and Description

In [None]:
# Importing the Dataset
college_data = pd.read_csv("/kaggle/input/us-news-and-world-reports-college-data/College.csv",index_col=0)

In [None]:
college_data.head()

<a id=section301></a>
### 3.1 Description of the Dataset

| Column Name                                                                                      | Description                                                                               |
| ------------------------------- |:-----------------------------------------------------------------------------------------:| 
| Private                             | A factor with levels No and Yes indicating private or public university                                |  
| Apps        | Number of applications received                                                                                    | 
| Accept           | Number of applications accepted                                                                        |   
| Enroll           | Number of new students enrolled                                                                                        |
| Top10perc           | Pct. new students from top 10% of H.S. class                                                                                |
| Top25perc         | Pct. new students from top 25% of H.S. class                                                                                        |
| F.Undergrad         | Number of fulltime undergraduates                                               |
| P.Undergrad        | Number of parttime undergraduates                                                                            |
| Outstate         | Out-of-state tuition                                                                                     |
| Room.Board         | Room and board costs                                                       |
|Books | Estimated book costs                                                                                          |
|Personal| Estimated personal spending                                                                     |
|PhD|    Pct. of faculty with Ph.D.’s                                                                  |
|Terminal |Pct. of faculty with terminal degree                                                                         |
|S.F.Ratio|Student/faculty ratio                                                                                 |
|perc.alumni  |Pct. alumni who donate                                                                                       |
|Expend | Instructional expenditure per student                                                                           |
|Grad.Rate|Graduation rate                                                                          |



In [None]:
college_data.info()

- ```info``` function gives us the following insights into the data:
  - There are a total of 777 samples (rows) and 18 columns in the dataset.
  - One colums is __Object__ dtype
  - Remaining columns are __Int__ and __float__ dtype
  - None of the feature have missing value 

In [None]:
college_data.describe()

- ```describe``` function gives us the following insights into the data:
  - Number of unique values in each feature
  - skewness of feature basesd __Mean__ and __Median__ value. i.e if __Mean__ is less compare to __Median__ feature is right skewed and vice versa.

<a id=section302></a>
### 3.2 Pandas Profiling before Data Preprocessing

<img src="https://raw.githubusercontent.com/insaid2018/Term-2/master/images/Pandas%20profiling.png" height="500" width="500"/>

- Here, we will perform **Pandas Profiling before preprocessing** our dataset, so we will name the **output file** as __data_before_preprocessing.html__. 


- The file will be stored in the directory of your notebook. Open it using the jupyter notebook file explorer and take a look at it and see what insights you can develop from it. 


- Or you can **output the profiling report** in the **current jupyter notebook** as well as shown in the code below. 

In [None]:
# Performing pandas profiling before data preparation
# Saving the output as data_before_preprocessing.html

# To output pandas profiling report to an external html file.
'''
profile = college.profile_report(title='Pandas Profiling before Data Preprocessing')
profile.to_file(output_file="data_before_preprocessing.html")
'''

# To output the pandas profiling report on the notebook.

college_data.profile_report(title='Pandas Profiling before Data Preprocessing', style={'full_width':True})

**Observations from Pandas Pre-Profiling before Data Processing** <br><br>
__Dataset info__:
- Number of variables: 19
    - __Note__: Here you observe one more columns it's because pandas_profiling consider the index columns as a variable.
    
- Variable Type
    - __14__ Numberic
    - __1__ Text
    - __1__ Boolean
    - __3__ Rejected ( High correlation)
    
- Rejected variable

    - __Apps__ is highly correlated with __Accept__ (ρ = 0.943450572)	
    - __Enroll__ is highly correlated with __Accept__ (ρ = 0.9116366634)	
    - __F.Undergrad__ is highly correlated with __Enroll__ (ρ = 0.964639652)
    

    
    
- Number of observations: 777
- Missing cells: 0 (0.0%)

- __Accept__ Feature:
    - Right Skewed 
    - High value of kurtosis suggests __long tail__ and possibly the presence of outliers in the feature.
<center><h3>Ouliers Detection Forumla</h3></center>
<center><img src="https://raw.githubusercontent.com/insaid2018/Term-3/master/Images/outliers.png"/>
    
- __Expend__ Feature:
    - Percentage of unique values : 95.8%.
    - High value of kurtosis suggests __long tail__ and possibly the presence of outliers in the feature.
    - Right Skewed
- __Grad.Rate__:
    - Skewness value : -0.11, Which explains data near around being symmetric.
    - Kurtosis vlaue : -0.2, Which explains data have light tail, Which mean the of data is of  __platykurtic distribution__
    
- __Outstate__ feature is near around symmetricity.

- __P.Undergrad__:
    - Skewness value is near to 6 mean right skewed.
    - High value of kurtosis suggests __long tail__ and possibly the presence of outliers in the feature.
    


<a id=section4></a>
## 4. Preprocessing the data

<a id=section401></a>

### 4.1 Droping the Highly correlated column
i.e 
- __Apps__
- __Enroll__
- __F.Undergrad__

In [None]:
# Droping the Highly correlated features 
college_data.drop(['Apps','Enroll','F.Undergrad'],inplace=True,axis=1)

In [None]:
college_data.head()

<a id=section402></a>
### 4.2 Encode the categorical feature

In [None]:
college_data['Private'].value_counts()

__Yes__: Private University
<br>
__No__: Public University

In [None]:
# map the feature to integer integer value.
college_data['Private']=college_data['Private'].map({'Yes':1,"No":0})

In [None]:
college_data.head()

<a id=section403></a>
### 4.3 Pandas Profiling after Data Preprocessing

In [None]:
# To output the pandas profiling report on the notebook.

college_data.profile_report(title='Pandas Profiling after Data Preprocessing', style={'full_width':True})

**Observations from Pandas Pre-Profiling before Data Processing** <br><br>
__Dataset info__:
- Number of variables: 16
    - __Note__: Here you observe one more columns it's because pandas_profiling consider the index columns as a variable.
    
- Variable Type
    - __14__ Numberic
    - __1__ Text
    - __1__ Boolean

<a id=section5></a>
## 5. Data Exploration


<a id="section501"></a>
### 5.1 Scatterplot of Grad.Rate vs Room.Board

In [None]:
sns.set_style('whitegrid')
sns.lmplot('Room.Board','Grad.Rate',data=college_data, hue='Private',
           palette='coolwarm',size=6,aspect=1,fit_reg=False)

```From``` the above ```scatter plot``` we can see:
- As the __Room.Board__ cost increases the __Graduation Rate__ starts decreasing And this is the most common in Private Universities.
- In the scatter plot we can see most of the data have max ```Grad.Rate``` near around __100__ but there is a point which have ```Grad.Rate``` near to __120__ could be possible __outlier__. 
    - But if you see __pandas-profiling__ and __outlier formula__ then you found the data-point is  just near to outlier But here we already have limited data so let's not drop it. Consider only extrement ouliers. i.e __Q3+3*IQR__ or __Q1-3*IQR__.
    

<a id="section502"></a>
### 5.2 Scatterplot of P.Undergrad vs Outstate  

In [None]:
sns.set_style('whitegrid')
sns.lmplot('Outstate','P.Undergrad',data=college_data, hue='Private',
           palette='coolwarm',size=6,aspect=1,fit_reg=False)

```From``` the above ```scatter plot``` we can see:
- If you see the number of P.Undergrad(part time undergrad) found their is a outliers beyond value `20000`.Refer to Pandas_profiling and find it out using outliers formula.

#### Drop the P.Undergrad outliers

In [None]:
c=college_data[college_data['P.Undergrad']>10000].index.values
c

college_data.drop(index=['Northeastern University', 'University of Minnesota Twin Cities',
       'University of South Florida'],inplace=True)

In [None]:
college_data.head()

In [None]:
college_data.shape

<a id=section6></a>
## 6. K-means Clustering Tree

<a id=section601></a>
### 6.1 k-means Clustering Use case

Have you come across a situation when a Chief Marketing Officer of a company tells you – __“Help me understand our customers better so that we can market our products to them in a better manner!"__<br>
- If the person would have asked me to calculate Life Time Value (LTV) or propensity of Cross-sell, I wouldn’t have blinked. But this question looked very broad to me!<br>
- You are _not looking for __specific insights__ for a phenomena_, but what you are looking for are __structures with in data _with out them being tied down_ to a specific outcome__.<br>
- The method of __identifying similar groups of data__ in a data set is called __clustering__. Entities in each group are __comparatively more similar__ to entities of that group than those of the other groups. In this article, I will be taking you through the __types of clustering__, __different clustering algorithms__ and a _comparison between two of the __most commonly used__ cluster methods_.

![image.png](https://raw.githubusercontent.com/insaid2018/Term-3/master/Images/clustering%201.png)

<a id="section602"></a>
### 6.2 Overview : What is Clustering?

Clustering is the task of __dividing the population__ or __data points__ into _a number of groups_ such that _data points in the same groups are more similar to other data points in the same group_ than those in other groups. 
> __In simple words, the aim is to segregate groups with similar traits and assign them into clusters__. 

![image.png](https://raw.githubusercontent.com/insaid2018/Term-3/master/Images/clustering%202.png)

Let’s understand this with an example.<br>
1. Suppose, you are the __head of a rental store__ and wish to __understand preferences__ of your costumers to __scale up your business__. 
2. Is it possible for you to look at _details of each costumer and devise a unique business strategy_ for each one of them? __Definitely not__. 
3. But, what you can do is to __cluster__ all of your _costumers into say 10 groups based on their purchasing habits_ and use a __separate strategy__ for _costumers in each of these 10 groups_. 
4. This is what we call __clustering__.

<a id="section60201"></a>
#### 6.2.1 Types of Clustering

Clustering can be divided into two subgroups :

1. __Hard Clustering__: In hard clustering, each data point __either belongs to a cluster completely or not__.<br> _For example_, in the above example each customer is put into one group out of the 10 groups.<br><br>

2. __Soft Clustering__: In soft clustering, instead of putting each data point into a separate cluster, a __probability or likelihood__ of that data point to be in those clusters is assigned.<br>_For example_, from the above scenario each costumer is assigned a probability to be in either of 10 clusters of the retail store.

![image.png](https://raw.githubusercontent.com/insaid2018/Term-3/master/Images/Types%20of%20clustering.png)


<a id="section603"></a>
### 6.3 Types of Clustering Algorithms

Since the task of clustering is subjective, this means that can be used for achieving this goal are plenty. Every methodology follows a different set of rules for defining the ‘similarity’ among data points. In fact, there are more than 100 clustering algorithms known. But few of the algorithms are used popularly, let’s look at them in detail:
- __Connectivity models__: These models are based on the notion that the data points _closer in data space __exhibit more similarity__ to each other __than__ the data points __lying farther__ away_. These models can follow two approaches.<br> Examples of these models are __hierarchical clustering algorithm__ and its variants.<br><br>
- __Centroid models__: These are __iterative clustering algorithms__ in which the notion of similarity is derived by the __closeness of a data point to the centroid of the clusters__. K-Means clustering algorithm is a popular algorithm that falls into this category. These models run iteratively to find the __local optima__.<br><br>
- __Distribution models__: These clustering models are based on the notion of __how probable is it that all data points in the cluster belong to the same distribution__ (For example: Normal, Gaussian). These models often suffer from __overfitting__. A popular example of these models is __Expectation-maximization algorithm__ which uses multivariate normal distributions.<br><br>
- __Density Models__: These models search the data space for __areas of varied density of data points__ in the data space. It isolates various different density regions and assign the data points within these regions in the same cluster. Popular examples of density models are __DBSCAN and OPTICS__.


<a id="section604"></a>
### 6.4 Introduction to K-means Algorithm

K-means clustering is a type of __unsupervised learning__, which is used when you have __unlabeled data (i.e., data without defined categories or groups)__. 
> __The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K__. <br>

- The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. <br>
- Data points are clustered based on feature similarity.<br>

The results of the __K-means clustering algorithm__ are:
1. The centroids of the K clusters, which can be used to label new data
2. Labels for the training data (each data point is assigned to a single cluster)

![image.png](https://raw.githubusercontent.com/insaid2018/Term-3/master/Images/clustering%203.gif)


<a id="section605"></a>
### 6.5 Business Cases

The K-means clustering algorithm is used to __find groups__ which have not been __explicitly labeled__ in the data. This can be used to confirm business assumptions about what types of groups exist or to identify unknown groups in complex data sets.<br>
This is a versatile algorithm that can be used for any type of grouping. Some examples of __use cases__ are:
- Behavioral Segmentation
- Inventory Categorization
- Sorting Sensor measurements
- Detecting bots and anomalies
- Computer Vision
- Astronomy

<a id="section606"></a>
### 6.6 Algorithm

To start with k-means algorithm, you first have to randomly initialize points called the cluster centroids (K).<br>
K-means is an __iterative algorithm__ and it does __two__ steps:<br>


<a id="section60601"></a>
#### 6.6.1 Cluster Assignment
The algorithm goes through each of the data points and depending on which cluster is closer, It assigns the data points to one of the three cluster centroids.![image.png](https://raw.githubusercontent.com/insaid2018/Term-3/master/Images/clustering%204.png)

![image.png](https://cdn-images-1.medium.com/max/800/1*4LOxZL6bFl3rXlr2uCiKlQ.gif)

<a id="section60602"></a>
#### 6.6.2 Move Centroid
Here, K-means moves the centroids to the average of the points in a cluster. In other words, the algorithm calculates the average of all the points in a cluster and moves the centroid to that average location.

This process is repeated until there is no change in the clusters (or possibly until some other stopping condition is met). K is chosen randomly or by giving specific initial starting points by the user.

![image.png](https://raw.githubusercontent.com/insaid2018/Term-3/master/Images/clustering%205.png)

<a id="section607"></a>
### 6.7 Choosing K

One of the metrics that is commonly used to compare results across different values of 'K' is the __mean distance between data points and their cluster centroid__. 
- Since _increasing the number of clusters will always reduce the distance to data points, increasing K will always decrease this metric, to the extreme of reaching zero when K is the same as the number of data points_. 
Thus, this metric cannot be used as the sole target. 
Instead, mean distance to the centroid as a function of K is plotted and the __"elbow point,"__ where the __rate of decrease sharply shifts__, can be used to roughly __determine K__.

A number of other techniques exist for validating K, including __cross-validation__, __information criteria__, the __information theoretic jump method__, the __silhouette method__, and the __G-means algorithm__. In addition, monitoring the distribution of data points across groups provides __insight__ into how the _algorithm is splitting the data for each K_.![image.png](https://raw.githubusercontent.com/insaid2018/Term-3/master/Images/elbow.png)

<a id="section7"></a>
## 7.  K-means Cluster Creation

- Now it is time to create the Cluster labels!

In [None]:
# Storing the target variable
y=college_data['Private']

college_data.drop(['Private'],inplace=True,axis=1)

In [None]:
college_data.head()

In [None]:
# Import KMeans from SciKit Learn.
from sklearn.cluster import KMeans

# Create an instance of a K Means model with 2 clusters
kmeans=KMeans(n_clusters=2)

<a id="section701"></a>
### 7.1 Normalize data

- Normalize the data with MinMax scaling provided by sklearn

In [None]:
from sklearn import preprocessing
# Scaling the data
minmax_processed = preprocessing.MinMaxScaler().fit_transform(college_data)
college_data.columns

In [None]:
df_numeric_scaled = pd.DataFrame(minmax_processed, index=college_data.index, columns=college_data.columns)

In [None]:
df_numeric_scaled.head()

In [None]:
# Fit the model to all the data except for the Private label.
kmeans.fit(df_numeric_scaled)

In [None]:
# What are the cluster center vectors?
kmeans.cluster_centers_

<a id="section702"></a>
### 7.2 Evaluation

There is no perfect way to evaluate clustering if you don't have the labels, however since this is just an exercise, we do __have the labels__, so we take advantage of this to __evaluate our clusters__, keep in mind, you usually won't have this luxury in the real world.

<a id="section703"></a>
### 7.3 Create a confusion matrix 

Create a confusion matrix and classification report to see how well the Kmeans clustering worked without being given any labels.

A confusion matrix is a summary of prediction results on a classification problem.

The number of correct and incorrect predictions are summarized with count values and broken down by each class.
Below is a diagram showing a general confusion matrix.

In [None]:
print(confusion_matrix(y,kmeans.labels_))
# print(classification_report(y,kmeans.labels_))

<a id="section704"></a>
### 7.4 Question: What about the K Value?

While creating cluster we already know about the number of labels in the target variable so we go for __k=2__. But what method to follow for unlabled data which is the practical use case for this K-means Clustering Algorithm.

<a id="section705"></a>
### 7.5 Choosing appropriate value of k

In [None]:
#Let's fit cluster size 1 to 20 on our data and take a look at the corresponding score value.
Nc = range(1, 20)
kmeans = [KMeans(n_clusters=i) for i in Nc]

In [None]:
score = [kmeans[i].fit(df_numeric_scaled).score(df_numeric_scaled) for i in range(len(kmeans))]

- These score values signify how far our observations are from the cluster center. We want to keep this score value around 0. A large positive or a large negative value would indicate that the cluster center is far from the observations.

- Based on these scores value, we plot an Elbow curve to decide which cluster size is optimal. Note that we are dealing with tradeoff between cluster size(hence the computation required) and the relative accuracy.

In [None]:
plt.plot(Nc,score)
plt.xlabel('Number of Clusters')
plt.ylabel('Score')
plt.title('Elbow Curve')
plt.show()

Our Elbow point is around cluster size of 2. Which we already know but here we can practically see how to choose appropriate value of K.

![](https://raw.githubusercontent.com/insaid2018/Term-3/master/Images/elbow%202.png)

<a id="section706"></a>
### 7.6  Clustering Visualization

Let's try to understand the clustering concept using visualization of data points.



In [None]:
from sklearn.decomposition import PCA

<a id="section70601"></a>
#### 7.6.1 Dimension Reduction using PCA

In [None]:
# Here we compress the data to two dimension
pca=PCA(n_components=2)
principalComponents = pca.fit_transform(df_numeric_scaled)

In [None]:
principalComponents

In [None]:
# Plot the explained variances
# features = range(pca.n_components_)
# plt.bar(features, pca.explained_variance_ratio_, color='black')
# plt.xlabel('PCA features')
# plt.ylabel('variance %')
# plt.xticks(features)

In [None]:
# Save components to a DataFrame
PCA_components = pd.DataFrame(principalComponents)

In [None]:
PCA_components.head()

In [None]:
k_means2=KMeans(n_clusters=2)

In [None]:
# Computer cluster centers and predict cluster indices 
X_clustered=k_means2.fit_predict(PCA_components)

In [None]:
# Define your own color map
Label_color_map={0:'r',1:'g'}
label_color=[Label_color_map[i] for i in X_clustered]

In [None]:
# Plot the scatter diagram
plt.figure(figsize=(7,5))
plt.scatter(principalComponents[:,0],principalComponents[:,1],c=label_color,alpha=0.5)
plt.show()

```Plot Observation```
- From the plot you can directly observe the data set in two different color. Each color represent the clusters.

<img src="https://raw.githubusercontent.com/insaid2018/Term-3/master/Images/cluster2.png" height=500 width=500/>

<a id="section707"></a>
### 7.7 Centroid Visualization

As the centroid for clustering is movable but one we get the final clusters, we also get our final centroid for each clusters.


In [None]:
# plot the centroid
center=k_means2.cluster_centers_

plt.scatter(principalComponents[:,0],principalComponents[:,1],c=label_color,alpha=0.5)
plt.scatter(center[:, 0], center[:, 1], c='blue', s=300, alpha=0.9,label = 'Centroids')

plt.show()

The ```Big-Blue-Dots``` represents the centroid of each clusters.

<a id="section8"></a>
## 8. Conclusion

- From the K-means clustering we observe how to cluster our data points and here we already have labels for out data points so we can practically observe the clustering of data points.
- The result of clustering is not much cherishing because here we are dealing with small data set. If the data is large enough then possibly the result improve.
- But to improve your ML model further you need to try out with different machine learning algorithms like 
  - **LogisticRegression**
  -  **SVM**
  - **Random Forest**
  - **Ensemble Learning Algorithms** 
  - **Artificial Neural Networks(ANNs)**
  
  - etc..
  


I am super excited to share my kernel with the Kaggle community. I prepared this kernal to made easy to understand the concept of clustering for every beginner and will continue to explain other algorithms in a very naive ways as I go on in this journey and learn new topics, I will incorporate them with each new updates. So, check for them and please leave a comment if you have any suggestions to make them better!! 


<br>
<br>



<center><h3>If you learn something out of it.. Leave your appreciation by simply hit upvote..</h3></center>