### Object Oriented Programming Lab for data science


#### Objectives
K-means clustering is the most commonly used unsupervised machine learning algorithm for partitioning a given data set into a set of k groups (i.e. k clusters). In this lab, you will practice how to implement k-means clustering in Python Object-Oriented Programming techniques. You do not need to know about the k-means clustering algorithm but you should complete the class definition, instance and object creation and some functions to run the program. We have provided a dataset called "sales.csv" that you will use to implement the K-means clustering.

The script is partially completed and there are missing codes in this script and your task is to complete the code as insturcted: 

Task 1: The first task is to write a Python class. Name this class "KMeans".

1. Inside the class, define a constructor with two parameters, called "file_name" and "n_cluster". 
2. Inside the constructor declare three variables and name those variables as: labels, attributes and centroids; and initialise them with value "None". 
3. Declare an empty dictionaly called "clusters"

4. Now, the body of the constructor should call the following functions:
    - a) The read_data(paramter name) method with one parameter that will reference the constructure parameter "file_name",
    - b) The create_cluster(parameter name) method with one paramter that will refernece the constructor paramter "n_clusters".
    - c) The set_initial_centroids(parameter name) method with one paramter that will refernece the constructor paramter "n_clusters".


In [1]:
import numpy as np

# Code 1: Define the class here...
class KMeans:
    # Code 2: Define the class constructor here...
    def __init__(self, file_name, n_clusters):
        # Code 3: declare variable 1
        # Code 4: declare variable 2
        # Code 5: declare variable 3
        # Code 6: declare an empty dictionay
        self.labels = None
        self.attributes  = None
        self.centroids = None
        self.clusters = {}

        # Code 7: call function 1
        # Code 8: call function 2
        # Code 9: call function 3
        self.read_data(file_name)
        self.create_clusters(n_clusters)
        self.set_initial_centroids(n_clusters)
    #-------------------------------------------------------

    def read_data(self, file_name):
        data = np.loadtxt(file_name, dtype=str, delimiter=',', skiprows=1)
        self.labels = data[:, 0]
        self.attributes = data[:, 1:].astype('int32') 
    #-------------------------------------------------------
    
    def create_clusters(self, n_clusters):
        for n in range(n_clusters):
            self.clusters[n] = {'labels': [], 'attributes': []}
    #-------------------------------------------------------

    def set_initial_centroids(self, n_clusters):
        rand_idxs = np.random.choice(self.attributes.shape[0], n_clusters, replace=False)
        self.centroids = self.attributes[rand_idxs]
    #-------------------------------------------------------

    def get_distance(self, record, centroid):
        return np.sum(np.abs(record - centroid))
    #-------------------------------------------------------

    def get_new_centroids(self):
        new_centroids = np.zeros(self.centroids.shape)
        counter = 0
        for cluster in self.clusters.values():
            new_centroids[counter] = np.mean(cluster['attributes'], axis=0)
            counter += 1
        return new_centroids
    #-------------------------------------------------------

    def build_clusters(self):
        while True:
            for cluster in self.clusters.values():
                cluster['labels'] = []
                cluster['attributes'] = []
                
            for record_idx, record in enumerate(self.attributes):
                distances = []
                for centroid in self.centroids:
                    distances.append(self.get_distance(record, centroid))
                min_dist_cluster = np.argmin(distances)
                self.clusters[min_dist_cluster]['labels'].append(self.labels[record_idx])
                self.clusters[min_dist_cluster]['attributes'].append(record)

            new_centroids = self.get_new_centroids()
            
            if(np.array_equal(self.centroids, new_centroids)):
                break  # if no change in clusters

            self.centroids = new_centroids
   	#-------------------------------------------------------

    def print_results(self):
        for cluster in self.clusters:
            labels = self.clusters[cluster]['labels']
            print('Cluster', cluster, 'data:')
            print('-'*15)
            print(labels)
            print()


Task 2: Create Class Object
1. Declare a variable that will take a user input to create number of cluster
2. Create the class object and pass the two argument value: 
    - a) The first argument will pass the data file name and path to the constructor and the second argument will pass the value taken from the user to create a number of cluster. The declaration should be like this: n_clusters = k, where k is the variable that has taken user input. Finally, store the class object to a new object variable called "k_means"
    - b) Using the claas object variable call the method "build_clusters()" and "print_reults()"
    - c) Finally, run the code and it should ask to input k value. If everything is correct, then the program will create number of cluster from the data.

In [2]:
# Code 10: Declare a variable that will take user input to create cluster
# Code 11: Create class object and store it to a new cobject variable
# Code 12: Call the build_clusters() method
# Code 13: Call the print_results() method
k = int(input('Enter your K: '))
k_means = KMeans('sales2.csv', n_clusters=k)
k_means.build_clusters()
k_means.print_results()

Enter your K: 2
Cluster 0 data:
---------------
['P1', 'P2', 'P3', 'P4', 'P5', 'P6', 'P7', 'P8', 'P9', 'P10', 'P11', 'P12', 'P13', 'P14', 'P20', 'P21', 'P22', 'P23', 'P26', 'P29', 'P31', 'P32', 'P33', 'P50', 'P51', 'P53', 'P59', 'P62', 'P65', 'P68', 'P71', 'P74', 'P77', 'P81', 'P82', 'P91', 'P93', 'P94', 'P95', 'P98', 'P99', 'P100', 'P103', 'P104', 'P105', 'P106', 'P107', 'P108', 'P109', 'P110', 'P111', 'P114', 'P115', 'P116', 'P117', 'P118', 'P121', 'P122', 'P123', 'P124', 'P125', 'P126', 'P127', 'P144', 'P145', 'P146', 'P147', 'P148', 'P149', 'P150', 'P151', 'P152', 'P153', 'P154', 'P155', 'P156', 'P157', 'P158', 'P159', 'P160', 'P161', 'P162', 'P163', 'P164', 'P165', 'P166', 'P171', 'P195', 'P197', 'P198', 'P199', 'P200']

Cluster 1 data:
---------------
['P15', 'P16', 'P17', 'P18', 'P19', 'P24', 'P25', 'P27', 'P28', 'P30', 'P34', 'P35', 'P36', 'P37', 'P38', 'P39', 'P40', 'P41', 'P42', 'P43', 'P44', 'P45', 'P46', 'P47', 'P48', 'P49', 'P52', 'P54', 'P55', 'P56', 'P57', 'P58', 'P60', 

#### Created by Dr Nazmul Hussain