Write a program to cluster a set of points using K-means for IRIS
dataset. Consider, K=3, clusters. Consider Euclidean distance as the
distance measure. Randomly initialize a cluster mean as one of the data
points. Iterate at least for 10 iterations. After iterations are over, print the
final cluster means for each of the clusters.

Algorithm:

1. Initialize Cluster Means: Randomly select 3 points from the dataset as the initial cluster centroids.
2. Assign Points to Nearest Centroid: For each point, calculate the Euclidean distance to each centroid and assign the point to the closest centroid.
3. Recalculate Centroids: After assigning all points, recalculate the centroids as the mean of all points assigned to each cluster.
4. Repeat for 10 Iterations: Repeat the above two steps for 10 iterations or until convergence (we'll limit it to 10 iterations here).

In [1]:
import numpy as np
import pandas as pd

In [2]:
df=pd.read_csv('Iris.csv')
X=df.drop(columns=['Id','Species']).values

In [5]:
X

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
       [4.9, 3

In [3]:
X.shape

(150, 4)

In [7]:
# euclidian distance
def euclidean_distance(p1, p2):
    return np.sqrt(np.sum((p1-p2)**2))

In [8]:
def k_mean(points, m1, m2, m3):
    cluster1=[]
    cluster2=[]
    cluster3=[]

    for point in points:
        distances=[euclidean_distance(point,m1), euclidean_distance(point,m2), euclidean_distance(point,m3)]
        min_idx= np.argmax(distances)

        if min_idx==0:
            cluster1.append(point)
            m1=np.mean(cluster1, axis=0)
        elif min_idx==1:
            cluster2.append(point)
            m2=np.mean(cluster2, axis=0)
        else:
            cluster3.append(point)
            m3=np.mean(cluster3, axis=0)
    
    return cluster1, cluster2, cluster3, m1, m2, m3


In [9]:
cluster1, cluster2, cluster3, m1, m2, m3 = k_mean(X, X[0], X[1], X[3])

In [10]:
print(f'centroid of cluster1 = {m1}')
print(f'centroid of cluster2 = {m2}')
print(f'centroid of cluster3 = {m3}')

centroid of cluster1 = [5.86041667 3.025      3.7625     1.16875   ]
centroid of cluster2 = [5.83114754 3.05901639 3.75409836 1.20983607]
centroid of cluster3 = [5.84146341 3.0804878  3.76097561 1.21707317]


Question-21 for K=4