K-means

This is third assignment of Introduction to Machine Learning (COMP 462) course. In this assignment, I implement K-means clustering algorithm to cluster given three different datasets.

Datasets

The dataset ﬁles contain features (in 2D) and class labels. In this assignment, I didn’t use class labels since K-means is an unsupervised algorithm and does not need class labels. Scatter plot of the datasets given in Figure 1.

Figure 1: Three datasets.

K-means Algorithm

K-means clustering is a simple and popular type of unsupervised machine learning algorithm, which is used on unlabeled data. The goal of this algorithm is toﬁnd groups in the data, with the number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups according to provided features similarity. Algorithm steps for K-means given as following.

Randomly pick K observations and set them as initial cluster centers
Iterate until cluster assignments stop changing:
- For each of the K clusters, compute the cluster centroid. The kth clustercentroid is the vector of the p feature means for the observations in the kth cluster.
- Assign each observation to the cluster whose centroid is closest (where closest is deﬁned using Euclidean distance).

When N is number of samples and K is number of clusters, K-means algorithm try to minimize objective function which given as following.

Clustering Results

I implement K-Means class for algorithm and cluster three given dataset with following conﬁgurations:

Dataset1: k=3, k=7
Dataset2: k=2, k=5
Dataset3: k=3, k=8

Clustering results for each conﬁguration given as follows.

Figure 2: Change of centroids place for ﬁrst dataset when k = 3.

Figure 3: Objective function value vs iteration number (Dataset1, k=3).

Figure 4: Change of centroids place for ﬁrst dataset when k = 7.

Figure 5: Objective function value vs iteration number (Dataset1, k=7).

Figure 6: Change of centroids place for second dataset when k = 2.

Figure 7: Objective function value vs iteration number (Dataset2, k=2).

Figure 8: Change of centroids place for second dataset when k = 5.

Figure 9: Objective function value vs iteration number (Dataset2, k=5).

Figure 10: Change of centroids place for third dataset when k = 3.

Figure 11: Objective function value vs iteration number (Dataset3, k=3).

Figure 12: Change of centroids place for third dataset when k = 8.

Figure 13: Objective function value vs iteration number (Dataset3, k=8).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
outputs		outputs
COMP_462_Assignment_3_Report.pdf		COMP_462_Assignment_3_Report.pdf
KMeans.py		KMeans.py
README.md		README.md
assignment3.py		assignment3.py
data1.txt		data1.txt
data2.txt		data2.txt
data3.txt		data3.txt
outpaths.py		outpaths.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

outputs

outputs

COMP_462_Assignment_3_Report.pdf

COMP_462_Assignment_3_Report.pdf

KMeans.py

KMeans.py

README.md

README.md

assignment3.py

assignment3.py

data1.txt

data1.txt

data2.txt

data2.txt

data3.txt

data3.txt

outpaths.py

outpaths.py

Repository files navigation

K-means

Datasets

K-means Algorithm

Clustering Results

About

Releases

Packages

Languages

remziorak/K-means

Folders and files

Latest commit

History

Repository files navigation

K-means

Datasets

K-means Algorithm

Clustering Results

About

Resources

Stars

Watchers

Forks

Languages