# Unsupervised learning and k-means
## Instructions:
* Go through the notebook and complete the tasks. 
* Make sure you understand the examples given. If you need any help, refer to the documentation links provided or go to the discussion forum. 
* When a question allows a free-form answer (e.g. what do you observe?), create a new markdown cell below and answer the question in the notebook. 
* Save your notebooks when you are done. 

**Task:**
In this notebook, your goal is to design an unsupervised learning classifier based on k-means, and subsequently compare the performance with a supervised learning classifier such as k-NN. The first task relates to loading the IRIS data by using the scikit-learn module (as in previous topic notebooks), and subsequently applying k-means on the iris dataset.  The task is outlined as follows:

In [4]:
#- Load IRIS data using scikit learn (as in previous notebooks).
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.cluster import KMeans
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, v_measure_score

# Load the IRIS dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target


#- Apply k-means on the iris dataset.  Note that we cannot use labels in the case of k-means, as we assume this dataset is entirely unlabelled.
# Apply k-means clustering
kmeans = KMeans(n_clusters=3, random_state=42)  # Set number of clusters to 3
kmeans.fit(X)

# Get the cluster labels assigned by k-means
kmeans_labels = kmeans.labels_


#- Use the labels and train a k-NN algorithm (as we saw in previous topics) with scikit-learn.  This is a supervised approach.
# Train a k-NN classifier
knn = KNeighborsClassifier(n_neighbors=5)  # You can adjust the number of neighbors
knn.fit(X, y)

# Make predictions on the training set
knn_predictions = knn.predict(X)


#- Compare the accuracy of k-means on the iris dataset (you can use the function  sklearn.metrics.cluster.v_measure_score) with the accuracy of k-NN.
## Observations

# Calculate the accuracy of k-NN
knn_accuracy = accuracy_score(y, knn_predictions)

# Calculate the v-measure score for k-means
kmeans_v_measure = v_measure_score(y, kmeans_labels)

# Print the results
print("K-NN Accuracy: {:.2f}".format(knn_accuracy))
print("K-Means V-Measure Score: {:.2f}".format(kmeans_v_measure))

### Observations

#1. **K-NN Accuracy**: The accuracy score of the k-NN classifier should be very high (close to 1.0) since it is a supervised learning method that uses the actual labels for training.

#2. **K-Means V-Measure Score**: The v-measure score for k-means clustering will likely be lower than the k-NN accuracy. This is because k-means is unsupervised and tries to group data based on similarities without knowledge of the true labels.

#3. **Comparison**: This exercise highlights the difference between supervised and unsupervised learning. While k-NN can achieve high accuracy by learning from labeled data, k-means relies solely on the data's inherent structure, which may not always correspond to the actual classifications.

#4. **Clustering Limitations**: K-means may not perfectly match the true classes (like Setosa, Versicolor, and Virginica) since it may group data differently based on feature similarity rather than the actual class labels.


K-NN Accuracy: 0.97
K-Means V-Measure Score: 0.76


 To help you undertake this task, you can read the k-means reference, and selecrtion of examples, on scikit-learn <a href="http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#examples-using-sklearn-cluster-kmeans">here</a>. 
