# Hands-On Exercise 7.1:
# Cluster Analysis on Structured Data With Python
***

## Objectives

#### In this exercise, you will perform cluster analysis on structured data using Python. This exercise allows you to find natural grouping within the data set using a number of attributes and distance measures. The goal is to show you how clustering can be used to divide a data set into previously unknown groupings.

### Overview

You will work on a data set called Iris that is included with the datasets package. You will:<br>
● Assess what might be an appropraite number of clusters<br>
● Cluster the data using a distance measure<br>
● Evaluate the resulting clusters<br><br>


**Major Step: Data loading and text preprocessing**

1. ❏ Import the **pandas** and **numpy** libraries<br><br>
*Hint: Use pd and np as aliases*

2. ❏ Import **cluster** from **sklearn**

3. ❏ Import the **iris** data set from the **iris.csv** file and preview the first few rows using the **.head()** method

4. ❏ Remove the species attribute and convert it to a numpy array <br>

5. ❏ Apply clustering to the data using K-Means with a clustering value of 3.

6. ❏ View the assigned labels using the **.labels_** attribute

7. ❏ View the cluster centers using the **.cluster_centers_** attribute

8. ❏ Import **pyplot** from **matplotlib**

9. ❏ Plot the three clusters, using two attributes at a time (eg. 0 and 1) and plot the centroids of the three clusters

10. ❏ Import the **dendrogram** and **linkage** functions from **scipy.cluster.hierarchy**

11. ❏ Using a **ward** agglomeration technique, build a dendrogram to visualize the clusters hierarchically

12. ❏ How many natural clusters do you think there are?

![image.png](attachment:image.png)<br><br>

**If you have time, you may like to examine and run the demo code below which uses the KMeans Clustering Algorithm to segment an image. Image Segmentation is the process of assigning a label to every pixel in an image. Segmentation is used in object detection, medical imaging, facial recognition and many other applications.**<br><br>

13. ❏ Import packages and image

In [None]:
import numpy as np
import cv2
import matplotlib.pyplot as plt
original_image = cv2.imread("Nature.jpg")
original_image.shape

14. ❏ Pre-process the image

In [None]:
img=cv2.cvtColor(original_image,cv2.COLOR_BGR2RGB) # convert image to RGB color for matplotlib
vectorized = img.reshape((-1,3))  # vectorize the image
vectorized = np.float32(vectorized)          # and convert to float32 (required for kmeans)

15. ❏ Cluster the colors using OpenCV's **kmeans** algorithm<br>
Documentation: https://docs.opencv.org/2.4/modules/core/doc/clustering.html

In [None]:
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
K = 4          # No. of clusters to find
attempts=10    # No. of times to iterate through the algorithm
ret,label,center=cv2.kmeans(vectorized,K,None,criteria,attempts,cv2.KMEANS_PP_CENTERS)

16. ❏ Regenerate the clustered image

In [None]:
center = np.uint8(center)   # Convert back into uint8
res = center[label.flatten()]   # Access the labels to regenerate the clustered image
seg_image = res.reshape((img.shape))   # Clustered image

17. ❏ Display the original image and the segmented image

In [None]:
plt.figure(figsize=(12,12))
plt.subplot(1,2,1),plt.imshow(img)
plt.title('Original Image')
plt.subplot(1,2,2),plt.imshow(seg_image)
plt.title('Segmented image when K is %x' % K)
plt.show()

## <center>**Congratulations! You have successfully performed clustering analysis on structured and unstructured data in Python.**</center>

![image.png](attachment:image.png)

# <center>**This is the end of the exercise.**</center>