###### ### The University of Melbourne, School of Computing and Information Systems
# COMP30027 Machine Learning, 2021 Semester 1

## Assignment 1: Pose classification with naive Bayes


**Student ID(s):**     `PLEASE ENTER YOUR ID(S) HERE`


This iPython notebook is a template which you will use for your Assignment 1 submission.

Marking will be applied on the four functions that are defined in this notebook, and to your responses to the questions at the end of this notebook (Submitted in a separate PDF file).

**NOTE: YOU SHOULD ADD YOUR RESULTS, DIAGRAMS AND IMAGES FROM YOUR OBSERVATIONS IN THIS FILE TO YOUR REPORT (the PDF file).**

You may change the prototypes of these functions, and you may write other functions, according to your requirements. We would appreciate it if the required functions were prominent/easy to find.

**Adding proper comments to your code is MANDATORY. **

In [5]:
# This function should prepare the data by reading it from a file and converting it into a useful format for training and testing

import numpy as np
from collections import defaultdict
import re
from math import sqrt
from math import exp
from math import pow
from math import pi
from math import log
import matplotlib.pyplot as plt
%matplotlib inline


def open_file(class_count_dict,class_dict, X, y):
    f = open("COMP30027_2021_assignment1_data/train.csv",'r')
    n_instances = 0
    for line in f.readlines()[0:]:
        cleaned_line = line.strip().split(",")
        attributes = cleaned_line[1:]
        class_name = cleaned_line[0]
        X.append(attributes)
        y.append(class_name)
        if class_dict.get(class_name) is None:
            class_dict[class_name]=[]
        # appending each
        class_dict[class_name].append(attributes)
        n_instances += 1
        class_count_dict[cleaned_line[0]] += 1
    f.close
    return None

#change all missing values from 9999 to 0 
def remove_missing_values(X):
    for i in range(len(X)):
        for j in range(len(X[i])):
            if X[i][j] == 9999:
                X[i][j] = 0
    return None

#convert data type from string to float                
def str_to_float(X):
    for i in range(len(X)):
        for j in range(len(X[i])):
            X[i][j] = float(X[i][j])
    return None

In [6]:
# This function should calculate prior probabilities and likelihoods from the training data and using
# them to build a naive Bayes model

def mean(array):
    return (sum(array)/len(array))

def standard_deviation(array,mean_val):
    total = 0
    for one in array:
        total+=(pow((one-mean_val),2))
    result = sqrt(total/(len(array)-1))
    return result

def train(class_dict, class_count_dict,training_details,attr_length,total_inst):
    
    # dictionary format {key : [[means], [standard deviation], class_prob]}
    for key in class_dict.keys():
        
        if training_details.get(key) is None:
            training_details[key]=[]
            
        #array to store each attributes' mean value
        mean_array=[]
        
        #array to store each standard deviations' value
        std_dev_array=[]
        
        for i in range(attr_length):
            attr_array = [instance[i] for instance in class_dict[key]]
            
            mean_val = mean(attr_array)
            std_dev_val = standard_deviation(attr_array,mean_val) 
            
            mean_array.append(mean_val)
            std_dev_array.append(std_dev_val)
        #append mean, standard deviation & class probability to training details dictionary
        training_details[key].append(mean_array)
        training_details[key].append(std_dev_array)
        class_prob = class_count_dict[key]/total_inst
        training_details[key].append(class_prob)
    return


In [19]:
# This function should predict classes for new items in a test dataset (for the purposes of this assignment, you
# can re-use the training data as a test set)
def gaussian_distribution(val,mean,std_dev):
    exponential = exp(-(1/2)* (pow(((val-mean)/std_dev),2)))
    result = (1/(std_dev * sqrt(2*pi)))* exponential
    return result

def take_log(val):
    if(val==0):
        return 0
    else:
        return log(val)

def probability(instance,training_details,attr_num,class_num):
    probs = defaultdict()
    
    #find the likelihood/probability of the instance in each class
    for key in training_details.keys():
        total_prob=0
        sum_gaussian_prob=0
        
        # likelihood of the instance's attribute based on the class
        for i in range(attr_num):
            attr_mean = training_details[key][0][i]
            attr_std_dev = training_details[key][1][i]
            gauss_prob = gaussian_distribution(instance[i],attr_mean,attr_std_dev)
            #take_log here to handle log(0) for now as log(0) cant be computed
            sum_gaussian_prob+=take_log(gauss_prob)
            
        # probability of the instance being a class :  log(class prob) + sum(log(each attribute))
        total_prob+=sum_gaussian_prob
        class_prob=training_details[key][2]
        total_prob+=log(class_prob)
        
        # put in dictionary
        probs[key] = total_prob

    return probs
            
def predict(training_details,attr_num,class_num,X_test,y_test):
    #will need to read the test.csv , now just testing the functions on first instance of the test.csv below
    sample_test=[126.8358,99.9275,0,0,47.5551,-9.7848,7.3779,-65.1004,-62.3788,-80.8448,-63.5874,2.4797,-14.5613,0,0,-30.3392,-41.2163,23.3146,57.5625,-24.693,61.5094,-34.0565]
    
    
    f = open("COMP30027_2021_assignment1_data/test.csv",'r')
    for line in f.readlines()[0:]:
        cleaned_line = line.strip().split(",")
        attributes = cleaned_line[1:]
        class_name = cleaned_line[0]
        X_test.append(attributes)
        y_test.append(class_name)
    
    f.close
    
    str_to_float(X_test)
    remove_missing_values(X_test)
    
    predicted_classes=[]
    for instance in X_test:
        #probability dictionary for every classes on one test instance
        probability_dict = probability(instance,training_details,attr_num,class_num)
        #find the class name with the highest probability
        max_prob_class = max(probability_dict, key=probability_dict.get)
        predicted_classes.append(max_prob_class)
    
    return predicted_classes


In [20]:
# This function should evaliate the prediction performance by comparing your model’s class outputs to ground
# truth labels

def evaluate(actual,predicted):
    correct_predict=0
    test_num=len(actual)
    for i in range(test_num):
        if(actual[i]==predicted[i]):
            correct_predict+=1
    
    accuracy = (correct_predict / test_num) * 100
    return accuracy

In [24]:
#RUN HERE
#store value of attributes per class
X = []

#store class belonging to a set of attributes
y = []

#store number of instances per class
class_count_dict = defaultdict(int)

#store all instances per class
class_dict = defaultdict()

#store the details of each class after training
training_details = defaultdict()

#number of attributes
attr_num = 0

#number of instances
total_instances = 0

open_file(class_count_dict,class_dict, X, y)
str_to_float(X)
remove_missing_values(X)

attr_num = len(X[0])
total_instances = sum([class_count_dict[key] for key in class_count_dict.keys()])

#train the data
train(class_dict, class_count_dict, training_details, attr_num, total_instances)

# FOR TESTING INSTANCES
X_test=[]
#store actual class
y_test=[]
predicted_classes = predict(training_details,attr_num,len(training_details.keys()),X_test,y_test)

accuracy = evaluate(y_test,predicted_classes)
print("Accuracy is ", accuracy, "%")

Accuracy is  73.27586206896551 %


## Questions 


If you are in a group of 1, you will respond to **two** questions of your choosing.

If you are in a group of 2, you will respond to **four** questions of your choosing.

A response to a question should take about 100–250 words, and make reference to the data wherever possible.

#### NOTE: you may develope codes or functions to help respond to the question here, but your formal answer should be submitted separately as a PDF.

### Q1
Since this is a multiclass classification problem, there are multiple ways to compute precision, recall, and F-score for this classifier. Implement at least two of the methods from the "Model Evaluation" lecture and discuss any differences between them. (The implementation should be your own and should not just call a pre-existing function.)

### Q2
The Gaussian naıve Bayes classifier assumes that numeric attributes come from a Gaussian distribution. Is this assumption always true for the numeric attributes in this dataset? Identify some cases where the Gaussian assumption is violated and describe any evidence (or lack thereof) that this has some effect on the classifier’s predictions.

### Q3
Implement a kernel density estimate (KDE) naive Bayes classifier and compare its performance to the Gaussian naive Bayes classifier. Recall that KDE has kernel bandwidth as a free parameter -- you can choose an arbitrary value for this, but a value in the range 5-25 is recommended. Discuss any differences you observe between the Gaussian and KDE naive Bayes classifiers. (As with the Gaussian naive Bayes, this KDE naive Bayes implementation should be your own and should not just call a pre-existing function.)

### Q4
Instead of using an arbitrary kernel bandwidth for the KDE naive Bayes classifier, use random hold-out or cross-validation to choose the kernel bandwidth. Discuss how this changes the model performance compared to using an arbitrary kernel bandwidth.

### Q5
Naive Bayes ignores missing values, but in pose recognition tasks the missing values can be informative. Missing values indicate that some part of the body was obscured and sometimes this is relevant to the pose (e.g., holding one hand behind the back). Are missing values useful for this task? Implement a method that incorporates information about missing values and demonstrate whether it changes the classification results.

### Q6
Engineer your own pose features from the provided keypoints. Instead of using the (x,y) positions of keypoints, you might consider the angles of the limbs or body, or the distances between pairs of keypoints. How does a naive Bayes classifier based on your engineered features compare to the classifier using (x,y) values? Please note that we are interested in explainable features for pose recognition, so simply putting the (x,y) values in a neural network or similar to get an arbitrary embedding will not receive full credit for this question. You should be able to explain the rationale behind your proposed features. Also, don't forget the conditional independence assumption of naive Bayes when proposing new features -- a large set of highly-correlated features may not work well.