# Unsupervised Learning: Clustering Lab





In [55]:
from sklearn.base import BaseEstimator, ClassifierMixin, ClusterMixin
from sklearn.cluster import AgglomerativeClustering, KMeans
from scipy.cluster.hierarchy import dendrogram
from scipy.io import arff
from sklearn.preprocessing import LabelEncoder
from numpy.random import default_rng
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import copy

## 1. (50%) Implement the k-means clustering algorithm and the HAC (Hierarchical Agglomerative Clustering) algorithm.

### 1.1.1 HAC

### Code requirements 
- HAC should support both single link and complete link options.
- HAC automatically generates all clusterings from n to 1.  To simplify the amount of output you may want to implement a mechanism to specify for which k values actual output will be generated.


---
The output should include the following:
- The number of clusters (k).
- The total SSE of the full clustering. 


For each cluster report include:


- The centroid id.
- The number of instances tied to that centroid. 
* The SSE of that cluster. (The sum squared error (SSE) of a single cluster is the sum of the squared euclidean distance of each cluster member to the cluster centroid.)
---
You only need to handle continuous features

In [2]:
class HACClustering(BaseEstimator,ClusterMixin):

    def __init__(self,k=3,link_type='single'): ## add parameters here
        """
        Args:
            k = how many final clusters to have
            link_type = single or complete. when combining two clusters use complete link or single link
        """
        self.link_type = link_type
        self.k = k
        
    def find_link(self, clusters, link):
        dist = np.full((len(clusters), len(clusters)), np.inf)
        for i in range(dist.shape[0]):
            for j in range(i + 1, dist.shape[0]):
                if link == 'single':
                    link_dist = np.inf
                    for k in clusters[i]:
                        for n in clusters[j]:
                            if link_dist > np.linalg.norm(k - n, ord=2):
                                link_dist = np.linalg.norm(k - n, ord=2)
                    dist[i,j] = link_dist
                else:
                    link_dist = 0
                    for k in clusters[i]:
                        for n in clusters[j]:
                            if link_dist < np.linalg.norm(k - n, ord=2):
                                link_dist = np.linalg.norm(k - n, ord=2)
                    dist[i,j] = link_dist
        ind = np.unravel_index(dist.argmin(), dist.shape)
        val = dist[ind]
        return ind, val
        
    def fit(self,X,y=None):
        """ Fit the data; In this lab this will make the K clusters :D
        Args:
            X (array-like): A 2D numpy array with the training data
            y (array-like): An optional argument. Clustering is usually unsupervised so you don't need labels
        Returns:
            self: this allows this to be chained, e.g. model.fit(X,y).predict(X_test)
        """
        
        self.clusters = X.tolist()
        for i in range(len(self.clusters)):
            self.clusters[i] = np.array(self.clusters[i]).reshape((1,-1))
        while len(self.clusters) > self.k:
            link, val = self.find_link(self.clusters, self.link_type)
            for i in self.clusters[link[1]]:
                self.clusters[link[0]] = np.concatenate((self.clusters[link[0]], i.reshape((1,-1))))
            del(self.clusters[link[1]])

        return self
    
    def sse(self, cluster):
        centroid = np.mean(cluster, axis=0)
        error = 0
        for i in cluster:
            error += (np.linalg.norm(i - centroid, ord=2)) ** 2
        return error
    
    def print_clusters(self):
        """
            Used for grading.
            print("{:d}\n".format(k))
            print("{:.4f}\n\n".format(total SSE))
            for each cluster and centroid:
                print(np.array2string(centroid,precision=4,separator=","))
                print("\n")
                print("{:d}\n".format(size of cluster))
                print("{:.4f}\n\n".format(SSE of cluster))
        """
        
        print(self.k)
        total_error = 0
        for i in self.clusters:
            total_error += self.sse(i)
        print("{:.4f}\n".format(total_error))
        for i in self.clusters:
            print(np.array2string(np.mean(i, axis=0),precision=4,separator=","))
            print(i.shape[0])
            print("{:.4f}\n".format(self.sse(i)))
        

### 1.1.2 Debug 

Debug your model by running it on the [Debug Dataset](https://raw.githubusercontent.com/cs472ta/CS472/master/datasets/abalone.arff)


---
The dataset was modified to be a lot smaller. The last datapoint should be on line 359 or the point 0.585,0.46,0.185,0.922,0.3635,0.213,0.285,10. The remaining points should be commented out.



- Normalize Data
- K = 5
- Use the first k instances as the initial centroids
- Use 4 decimal places and DO NOT ROUND when reporting SSE and centroid values.


---
Solutions in files:

[Debug HAC Single.txt](https://raw.githubusercontent.com/cs472ta/CS472/master/debug_solutions/Debug%20HAC%20Single%20Link.txt)

[Debug HAC Complete.txt](https://raw.githubusercontent.com/cs472ta/CS472/master/debug_solutions/Debug%20HAC%20Complete%20Link.txt)

In [3]:
# Debug Here

data = arff.loadarff('datasets/abalone.arff')
df = pd.DataFrame(data[0])
data = np.array(df)

# Train on debug data

norm_data = copy.deepcopy(data)
mins = np.amin(norm_data, axis=0).reshape((1,-1))
maxs = np.amax(norm_data, axis=0).reshape((1,-1))

norm_data = ((norm_data - mins) / (maxs - mins))

print("****** single link ******")
HAC = HACClustering(5)
HAC.fit(norm_data).print_clusters()

print("****** complete link ******")
HAC2 = HACClustering(5, 'complete')
HAC2.fit(norm_data).print_clusters()

****** single link ******
5
54.4392

[0.599 ,0.5923,0.4915,0.2826,0.2682,0.2921,0.2316,0.3849]
195
54.3917

[0.9189,0.9438,0.7105,0.7016,0.759 ,0.7222,0.4472,0.8824]
1
0.0000

[1.    ,0.9831,0.8026,0.8343,0.6575,0.7825,0.9221,0.8824]
2
0.0475

[1.    ,0.9888,0.7895,1.    ,1.    ,0.8915,0.7186,0.5882]
1
0.0000

[0.9189,0.9888,0.8684,0.719 ,0.5797,0.7512,0.6432,0.9412]
1
0.0000

****** complete link ******
5
13.0824

[0.6544,0.649 ,0.5256,0.2879,0.2815,0.3057,0.2288,0.3911]
71
3.8232

[0.3661,0.3505,0.271 ,0.1008,0.1024,0.1058,0.0836,0.2116]
67
5.2786

[0.7622,0.7658,0.6759,0.4265,0.4016,0.4536,0.3376,0.5217]
38
1.4989

[0.8818,0.8904,0.7582,0.614 ,0.5433,0.5317,0.561 ,0.7794]
16
1.5328

[0.9471,0.934 ,0.8158,0.7457,0.6434,0.7944,0.6457,0.625 ]
8
0.9490



### 1.1.3 Evaluation

We will evaluate your model based on its print_clusters() output using [Evaluation Dataset](https://raw.githubusercontent.com/cs472ta/CS472/master/datasets/seismic-bumps_train.arff)

In [4]:
# Load evaluation data

data = arff.loadarff('datasets/seismic-bumps_train.arff')
df = pd.DataFrame(data[0])
data = np.array(df)


norm_data_eval = copy.deepcopy(data).astype(np.float64)
mins = np.amin(norm_data_eval, axis=0).reshape((1,-1))
maxs = np.amax(norm_data_eval, axis=0).reshape((1,-1))

norm_data_eval = ((norm_data_eval - mins) / (maxs - mins))

# Train on evaluation data

print("****** single link ******")
HAC_eval_single = HACClustering(5)
HAC_eval_single.fit(norm_data_eval).print_clusters()

print("****** complete link ******")
HAC_eval_complete = HACClustering(5, 'complete')
HAC_eval_complete.fit(norm_data_eval).print_clusters()

# Print clusters

****** single link ******
2
24.0330

[0.312 ,0.3602,0.5167,0.3418,0.3336,0.3213,0.2798,0.    ]
70
11.6039

[0.714 ,0.7588,0.5603,0.7028,0.6994,0.4864,0.7393,1.    ]
70
12.4291

****** complete link ******
2
24.0330

[0.312 ,0.3602,0.5167,0.3418,0.3336,0.3213,0.2798,0.    ]
70
11.6039

[0.714 ,0.7588,0.5603,0.7028,0.6994,0.4864,0.7393,1.    ]
70
12.4291



### 1.2.1 K-Means

### Code requirements 
- Ability to choose k and specify k initial centroids
- Use Euclidean Distance as metric
- Ability to handle distance ties
- Include output label as a cluster feature


---
The output should include the following:
- The number of clusters (k).
- The total SSE of the full clustering. 


For each cluster report include:


- The centroid id.
- The number of instances tied to that centroid. 
- The SSE of that cluster. (The sum squared error (SSE) of a single cluster is the sum of the squared euclidean distance of each cluster member to the cluster centroid.)
---
You only need to handle continuous features

In [52]:
class KMEANSClustering(BaseEstimator,ClusterMixin):

    def __init__(self,k=3,debug=False): ## add parameters here
        """
        Args:
            k = how many final clusters to have
            debug = if debug is true use the first k instances as the initial centroids otherwise choose random points as the initial centroids.
        """
        self.k = k
        self.debug = debug

    def fit(self,X,y=None):
        """ Fit the data; In this lab this will make the K clusters :D
        Args:
            X (array-like): A 2D numpy array with the training data
            y (array-like): An optional argument. Clustering is usually unsupervised so you don't need labels
        Returns:
            self: this allows this to be chained, e.g. model.fit(X,y).predict(X_test)
        """
        self.centroids = np.zeros((self.k,X.shape[1]))
        self.prev_centroids = copy.deepcopy(self.centroids)
        
        if self.debug == True:
            for i in range(self.k):
                self.centroids[i] = X[i]
        else:
            rng = default_rng()
            ind = rng.choice(X.shape[0], size=self.k, replace=False)
            for i in range(self.centroids.shape[0]):
                self.centroids[i] = X[ind[i]]
        
        
        while not np.array_equal(self.centroids, self.prev_centroids):
            self.prev_centroids = copy.deepcopy(self.centroids)
            self.clusters = []
            for i in self.centroids:
                self.clusters.append(None)
            for i in X:
                cluster = np.argmin([np.linalg.norm(i - j, ord=2) for j in self.centroids])
                if type(self.clusters[cluster]) != type(np.array([])):
                    self.clusters[cluster] = i.reshape((1,-1))
                else:
                    self.clusters[cluster] = np.concatenate((self.clusters[cluster], i.reshape((1,-1))))
            for i in range(len(self.centroids)):
                self.centroids[i] = np.mean(self.clusters[i], axis=0)
                
        return self
    
    def sse(self, cluster):
        centroid = np.mean(cluster, axis=0)
        error = 0
        for i in cluster:
            error += (np.linalg.norm(i - centroid, ord=2)) ** 2
        return error
    
    def print_clusters(self):
        """
            Used for grading.
            print("{:d}\n".format(k))
            print("{:.4f}\n\n".format(total SSE))
            for each cluster and centroid:
                print(np.array2string(centroid,precision=4,separator=","))
                print("\n")
                print("{:d}\n".format(size of cluster))
                print("{:.4f}\n\n".format(SSE of cluster))
        """
        
        print(self.k)
        total_error = 0
        for i in self.clusters:
            total_error += self.sse(i)
        print("{:.4f}\n".format(total_error))
        for i in self.clusters:
            print(np.array2string(np.mean(i, axis=0),precision=4,separator=","))
            print(i.shape[0])
            print("{:.4f}\n".format(self.sse(i)))

### 1.2.2 Debug 

Debug your model by running it on the [Debug Dataset](https://raw.githubusercontent.com/cs472ta/CS472/master/datasets/abalone.arff)


- Train until convergence
- Normalize Data
- K = 5
- Use the first k instances as the initial centroids
- Use 4 decimal places and DO NOT ROUND when reporting SSE and centroid values


---
Solutions in files:

[Debug K Means.txt](https://raw.githubusercontent.com/cs472ta/CS472/master/debug_solutions/Debug%20K%20Means.txt)

In [53]:
# Load debug data

# Train on debug data

KM = KMEANSClustering(5, True)
KM.fit(norm_data)

# Print clusters

KM.print_clusters()

5
9.7826

[0.7325,0.7327,0.627 ,0.3817,0.3633,0.4045,0.3046,0.4839]
75
4.0454

[0.3704,0.3519,0.2686,0.0926,0.0935,0.094 ,0.0792,0.218 ]
34
0.6609

[0.9035,0.905 ,0.7774,0.6579,0.5767,0.6193,0.5893,0.7279]
24
3.2116

[0.5692,0.5628,0.4376,0.211 ,0.2113,0.2248,0.1659,0.317 ]
54
1.5452

[0.1296,0.1037,0.1053,0.0177,0.0211,0.0272,0.0135,0.0724]
13
0.3195



### 1.2.3 Evaluation

We will evaluate your model based on its print_clusters() output using [Evaluation Dataset](https://raw.githubusercontent.com/cs472ta/CS472/master/datasets/seismic-bumps_train.arff)

In [54]:
# Load evaluation data

# Train on evaluation data

KM_eval = KMEANSClustering(5, True)
KM_eval.fit(norm_data_eval)

# Print clusters

KM_eval.print_clusters()

5
14.7279

[0.5349,0.6013,0.4617,0.5541,0.5232,0.5544,0.6248,1.    ]
22
2.0160

[0.3684,0.4338,0.4356,0.4388,0.3573,0.2988,0.3673,0.    ]
31
2.4076

[0.3658,0.3854,0.7563,0.3093,0.4659,0.3691,0.2413,0.    ]
20
2.5248

[0.1634,0.2137,0.3968,0.2178,0.1555,0.3078,0.1775,0.    ]
19
1.7770

[0.7961,0.831 ,0.6054,0.7709,0.7802,0.4553,0.7919,1.    ]
48
6.0025



## 2.1.1 (7.5%) Clustering the Iris Classification problem - HAC

Load the Iris Dataset [Iris Dataset](https://raw.githubusercontent.com/cs472ta/CS472/master/datasets/iris.arff)

- Use single-link and complete link clustering algorithms
- State whether you normalize your data or not (your choice).  
- Show your results for clusterings using k = 2-7.  
- Graph the total SSE for each k and discuss your results (i.e. what kind of clusters are being made).
---

In [62]:
# Iris Classification using single-link


data = arff.loadarff('datasets/iris.arff')
df = pd.DataFrame(data[0])
data = np.array(df)

le = LabelEncoder()
le.fit(data[:,-1])
data[:,-1] = le.transform(data[:,-1])

norm_iris = copy.deepcopy(data).astype(np.float64)
mins = np.amin(norm_iris, axis=0).reshape((1,-1))
maxs = np.amax(norm_iris, axis=0).reshape((1,-1))

norm_iris = ((norm_iris - mins) / (maxs - mins))

for i in range(2,8):
    print("****** k = " + str(i) + " ******\n")
    HAC_iris_s = HACClustering(i)
    HAC_iris_s.fit(norm_iris[:,:-1])
    HAC_iris_s.print_clusters()

****** k = 2 ******

2
12.1437

[0.1961,0.5908,0.0786,0.06  ]
50
1.8450

[0.545 ,0.3633,0.662 ,0.6567]
100
10.2987

****** k = 3 ******

3
11.9008

[0.199 ,0.6003,0.0792,0.0595]
49
1.6020

[0.0556,0.125 ,0.0508,0.0833]
1
0.0000

[0.545 ,0.3633,0.662 ,0.6567]
100
10.2987

****** k = 4 ******

4
11.0010

[0.199 ,0.6003,0.0792,0.0595]
49
1.6020

[0.0556,0.125 ,0.0508,0.0833]
1
0.0000

[0.5363,0.3554,0.6563,0.6531]
98
9.3927

[0.9722,0.75  ,0.9407,0.8333]
2
0.0063

****** k = 5 ******

5
10.8369

[0.199 ,0.6003,0.0792,0.0595]
49
1.6020

[0.0556,0.125 ,0.0508,0.0833]
1
0.0000

[0.5401,0.357 ,0.657 ,0.6529]
97
9.2286

[0.1667,0.2083,0.5932,0.6667]
1
0.0000

[0.9722,0.75  ,0.9407,0.8333]
2
0.0063

****** k = 6 ******

6
10.5036

[0.199 ,0.6003,0.0792,0.0595]
49
1.6020

[0.0556,0.125 ,0.0508,0.0833]
1
0.0000

[0.5373,0.3537,0.6548,0.6493]
96
8.8952

[0.1667,0.2083,0.5932,0.6667]
1
0.0000

[0.8056,0.6667,0.8644,1.    ]
1
0.0000

[0.9722,0.75  ,0.9407,0.8333]
2
0.0063

****** k = 7 ******

7
10.

In [63]:
# Iris Classification using complete-link

for i in range(2,8):
    print("****** k = " + str(i) + " ******\n")
    HAC_iris_c = HACClustering(i, 'complete')
    HAC_iris_c.fit(norm_iris[:,:-1])
    HAC_iris_c.print_clusters()

****** k = 2 ******

2
25.7462

[0.3436,0.4318,0.367 ,0.3452]
116
24.1422

[0.719 ,0.4645,0.8106,0.8419]
34
1.6041

****** k = 3 ******

3
7.1537

[0.1961,0.5908,0.0786,0.06  ]
50
1.8450

[0.4554,0.3112,0.5855,0.5612]
66
3.7046

[0.719 ,0.4645,0.8106,0.8419]
34
1.6041

****** k = 4 ******

4
6.2517

[0.2424,0.6742,0.0827,0.0732]
33
0.7438

[0.1062,0.4289,0.0708,0.0343]
17
0.1991

[0.4554,0.3112,0.5855,0.5612]
66
3.7046

[0.719 ,0.4645,0.8106,0.8419]
34
1.6041

****** k = 5 ******

5
4.7586

[0.2424,0.6742,0.0827,0.0732]
33
0.7438

[0.1062,0.4289,0.0708,0.0343]
17
0.1991

[0.5368,0.3671,0.6432,0.6295]
37
1.2271

[0.3515,0.2399,0.512 ,0.4741]
29
0.9845

[0.719 ,0.4645,0.8106,0.8419]
34
1.6041

****** k = 6 ******

6
4.1143

[0.2424,0.6742,0.0827,0.0732]
33
0.7438

[0.1062,0.4289,0.0708,0.0343]
17
0.1991

[0.5368,0.3671,0.6432,0.6295]
37
1.2271

[0.3515,0.2399,0.512 ,0.4741]
29
0.9845

[0.6365,0.4601,0.7657,0.8569]
23
0.4522

[0.8914,0.4735,0.9045,0.8106]
11
0.5076

****** k = 7 ******

7

Discuss differences between single-link and complete-link

## 2.1.2 (5%) Clustering the Iris Classification problem - HAC

Requirements:
- Repeat excercise 2.1.1 and include the output label as one of the input features.

In [64]:
# Clustering Labels using single-link

for i in range(2,8):
    print("****** k = " + str(i) + " ******\n")
    HAC_iris_s_labels = HACClustering(i)
    HAC_iris_s_labels.fit(norm_iris)
    HAC_iris_s_labels.print_clusters()

****** k = 2 ******

2
18.3937

[0.1961,0.5908,0.0786,0.06  ,0.    ]
50
1.8450

[0.545 ,0.3633,0.662 ,0.6567,0.75  ]
100
16.5487

****** k = 3 ******

3
7.8175

[0.1961,0.5908,0.0786,0.06  ,0.    ]
50
1.8450

[0.4544,0.3208,0.5525,0.5108,0.5   ]
50
2.4885

[0.6356,0.4058,0.7715,0.8025,1.    ]
50
3.4840

****** k = 4 ******

4
7.5020

[0.1961,0.5908,0.0786,0.06  ,0.    ]
50
1.8450

[0.4544,0.3208,0.5525,0.5108,0.5   ]
50
2.4885

[0.6451,0.4099,0.7752,0.8053,1.    ]
49
3.1686

[0.1667,0.2083,0.5932,0.6667,1.    ]
1
0.0000

****** k = 5 ******

5
7.2591

[0.199 ,0.6003,0.0792,0.0595,0.    ]
49
1.6020

[0.0556,0.125 ,0.0508,0.0833,0.    ]
1
0.0000

[0.4544,0.3208,0.5525,0.5108,0.5   ]
50
2.4885

[0.6451,0.4099,0.7752,0.8053,1.    ]
49
3.1686

[0.1667,0.2083,0.5932,0.6667,1.    ]
1
0.0000

****** k = 6 ******

6
6.7360

[0.199 ,0.6003,0.0792,0.0595,0.    ]
49
1.6020

[0.0556,0.125 ,0.0508,0.0833,0.    ]
1
0.0000

[0.4544,0.3208,0.5525,0.5108,0.5   ]
50
2.4885

[0.6312,0.3954,0.7681,0.8041,1

In [65]:
# Clustering Labels using complete-link
for i in range(2,8):
    print("****** k = " + str(i) + " ******\n")
    HAC_iris_c_labels = HACClustering(i, 'complete')
    HAC_iris_c_labels.fit(norm_iris)
    HAC_iris_c_labels.print_clusters()

****** k = 2 ******

2
18.3937

[0.1961,0.5908,0.0786,0.06  ,0.    ]
50
1.8450

[0.545 ,0.3633,0.662 ,0.6567,0.75  ]
100
16.5487

****** k = 3 ******

3
10.8347

[0.1961,0.5908,0.0786,0.06  ,0.    ]
50
1.8450

[0.4639,0.3173,0.5947,0.569 ,0.6429]
70
7.5866

[0.7343,0.4708,0.8192,0.8611,1.    ]
30
1.4031

****** k = 4 ******

4
9.9327

[0.2424,0.6742,0.0827,0.0732,0.    ]
33
0.7438

[0.1062,0.4289,0.0708,0.0343,0.    ]
17
0.1991

[0.4639,0.3173,0.5947,0.569 ,0.6429]
70
7.5866

[0.7343,0.4708,0.8192,0.8611,1.    ]
30
1.4031

****** k = 5 ******

5
5.4397

[0.2424,0.6742,0.0827,0.0732,0.    ]
33
0.7438

[0.1062,0.4289,0.0708,0.0343,0.    ]
17
0.1991

[0.4544,0.3208,0.5525,0.5108,0.5   ]
50
2.4885

[0.7343,0.4708,0.8192,0.8611,1.    ]
30
1.4031

[0.4875,0.3083,0.7   ,0.7146,1.    ]
20
0.6051

****** k = 6 ******

6
4.3133

[0.2424,0.6742,0.0827,0.0732,0.    ]
33
0.7438

[0.1062,0.4289,0.0708,0.0343,0.    ]
17
0.1991

[0.578 ,0.4226,0.6045,0.5635,0.5   ]
21
0.3918

[0.3649,0.2471,0.5149,0.4

Discuss any differences between the results from 2.1.1 and 2.1.2.

## 2.2.1 (7.5%) Clustering the Iris Classification problem: K-Means

Load the Iris Dataset [Iris Dataset](https://raw.githubusercontent.com/cs472ta/CS472/master/datasets/iris.arff)

Run K-Means on the Iris dataset using the output label as a feature and without using the output label as a feature

Requirements:
- State whether you normalize your data or not (your choice).  
- Show your results for clusterings using k = 2-7.  
- Graph the total SSE for each k and discuss your results (i.e. what kind of clusters are being made).
---

In [None]:
# Iris Classification without output label

In [None]:
# Iris Classification with output label

Compare results and differences between using the output label and excluding the output label

## 2.2.2 (5%) Clustering the Iris Classification problem: K-Means

Requirements:
- Use the output label as an input feature
- Run K-Means 5 times with k=4, each time with different initial random centroids and discuss any variations in the results. 

In [None]:
#K-Means 5 times

Discuss any variations in the results

## 3.1 (12.5%) Run the SK versions of HAC (both single and complete link) on iris including the output label and compare your results with those above.
Use the silhouette score for this iris problem(k = 2-7).  You may write your own code to do silhouette (optional extra credit) or you can use sklearn.metrics.silhouette_score. Please state if you coded your own silhouette score function to receive the extra credit points (described below). Discuss how helpful Silhouette appeared to be for selecting which clustering is best. You do not need to supply full Silhouette graphs, but you could if you wanted to.

Requirements
- Use the Sillhouette score for this iris problem (k= 2-7) 
- Use at least one other scoring function from [sklearn.metrics](https://scikit-learn.org/stable/modules/model_evaluation.html) and compare the results. State which metric was used. 
- Possible sklean metrics include (* metrics require ground truth labels):
    - adjusted_mutual_info_score*
    - adjusted_rand_score*
    - homogeneity_score*
    - completeness_score*
    - fowlkes_mallows_score*
    - calinski_harabasz_score
    - davies_bouldin_score
- Experiment using different hyper-parameters. Discuss Results

In [None]:
# Load sklearn



*Record impressions*

## 3.2 (12.5%) Run the SK version of k-means on iris including the output label and compare your results with those above. 

Use the silhouette score for this iris problem(k = 2-7). You may write your own code to do silhouette (optional extra credit) or you can use sklearn.metrics.silhouette_score. Please state if you coded your own silhouette score function to receive the extra credit points (described below). Discuss how helpful Silhouette appeared to be for selecting which clustering is best. You do not need to supply full Silhouette graphs, but you could if you wanted to.

Requirements
- Use the Sillhouette score for this iris problem (k= 2-7) 
- Use at least one other scoring function form sklearn.metrics and compare the results. State which metric was used
- Experiment different hyper-parameters. Discuss Results

In [None]:
# Load sklearn 



*Record impressions*

## 4. (Optional 5% extra credit) For your silhouette experiment above, write and use your own code to calculate the silhouette scores, rather than the SK or other version. 


*Show findings here*

In [None]:
# Copy function Below