# Multi-Class Perceptron for Emotion Classification
This notebook contains an implementation of the multi-class perceptron algorithm for emotion classification using the ISEAR dataset. Datafiles are .csv files containing texts tagged with 7 self-reported emotions: joy, fear, anger, sadness, disgust, shame, and guilt. 
## Perceptron Class
The Perceptron class is initialized with 3 parameters: X-train - List of texts for training the model; y-train - List of labels corresponding to X-train; epoch - an integer corresponding to the number of training iterations.

In [1]:
import data_utils as du
import random

In [2]:
# Class definition for perceptron contains methods init, fit, and predict
class Perceptron:
    def __init__(self, X_train, y_train, epoch):
        self.X_train = X_train
        self.y_train = y_train
        self.epoch = epoch
        # These will be updated below
        self.features = None
        self.weight_vecs = None
        self.labels = None

        # Create dictionary where features are keys and values are indices
        self.features = du.build_vocab(X_train)
        
 
        # Get unique labels
        self.labels = list(set(y_train))

        # Create weight vectors length of vocabulary for each label. This is a 2D dictionary where:
        # Keys are labels and values are dictionaries containing a weight (initialized to value between 0-1) for each feature (word) in our vocabulary.
        self.weight_vecs = {key: dict() for key in self.labels}
        for key in self.weight_vecs:
            for i in self.features.keys():
                self.weight_vecs[key][i] = random.random()
    
  
    def fit(self):
        # We iterate over the model for the specified number of epochs
        for i in range(self.epoch):
            '''We take the argmax of the dot product of our weights + feature vectors. Since we have 1 for word occurance and 0 for 
            non-occurance, we will simply take the sum of weights'''
            for j in range(len(self.X_train)):
                emotion_scores = dict(joy=0, fear=0, guilt=0, anger=0, disgust=0, sadness=0, shame=0)
                text = X_train[j].split()
                for label in self.labels:
                    dot_product = 0
                    for word in text:
                        dot_product += self.weight_vecs[label][word]
                    emotion_scores[label] = dot_product
                
                predicted_label = max(emotion_scores, key=emotion_scores.get)
                true_label = y_train[j]
                if predicted_label != true_label:
                    for word in text:
                        self.weight_vecs[predicted_label][word] -= 1
                        self.weight_vecs[true_label][word] += 1
    
    # Returns list of predicted labels given X_test parameter
    def predict(self, X_test):
        y_predict = []
        for i in range(len(X_test)):
            emotion_scores = dict(joy=0, fear=0, guilt=0, anger=0, disgust=0, sadness=0, shame=0)
            text = X_test[i].split()
            for label in self.labels:
                dot_product = 0
                for word in text:
                    if word in self.features.keys():
                        dot_product += self.weight_vecs[label][word]
                emotion_scores[label] = dot_product
            
            predicted_label = max(emotion_scores, key=emotion_scores.get)      
            y_predict.append(predicted_label)
            
        return y_predict




        


## Prep Training and Test Data
Now, we first prepare our training and test data to be used by the Perceptron class. This section defines two functions: sep_labels() and prep_data(). The first, sep_labels() decouples the labels from our training documents and returns a list of training documents, X, and a list of corresponding labels, y. The prep_data() function accepts a two arguments: a path to the .csv training data, and a path to the .csv test data. The function returns 4 lists: X_train, y_train, X_test, y_test. 

In [3]:

# Separate labels
def sep_labels(data):
    X = []
    y = []
    for i in data:
        y.append(i[0])
        X.append(i[1])
    return X, y

# Preparing data for model
def prep_data(train_file, test_file):
    training_data = du.getdata(train_file)
    testing_data = du.getdata(test_file)

    X_train, y_train = sep_labels(training_data)
    X_test, y_test = sep_labels(testing_data)

    return X_train, y_train, X_test, y_test


X_train, y_train, X_test, y_test = prep_data('isear-train.csv', 'isear-test.csv')





## Testing the model
We initialize the Perceptron with X_traing, y_train, and select a number of epochs. We then call model.fit() to train the model, and model.predict() to receive our list of predicted labels. 

In [4]:
model = Perceptron(X_train, y_train, epoch = 5)
model.fit()
predictions = model.predict(X_test)


## Evaluation
We import the Evaluator class from the evaluator module. The evaluator object is initialized with a list of predictions and true values. From the precision, recall, and f_score values for each label, we can see that the model needs some work. Possible next steps are removing some common stop words and performing TF-IDF. 

In [5]:
from evaluator import Evaluator

eval = Evaluator(predictions, y_test)
eval.ret_precision()
eval.ret_recall()
eval.ret_fscore()

In [6]:
print(eval.precision)
print(eval.recall)
print(eval.f_score)

{'joy': 0.5284974093264249, 'fear': 0.6176470588235294, 'guilt': 0.4444444444444444, 'anger': 0.39869281045751637, 'disgust': 0.47904191616766467, 'sadness': 0.47191011235955055, 'shame': 0.4010989010989011}
{'joy': 0.6375, 'fear': 0.5121951219512195, 'guilt': 0.32432432432432434, 'anger': 0.3567251461988304, 'disgust': 0.47337278106508873, 'sadness': 0.5637583892617449, 'shame': 0.46794871794871795}
{'joy': 0.5779036827195467, 'fear': 0.56, 'guilt': 0.375, 'anger': 0.3765432098765432, 'disgust': 0.4761904761904762, 'sadness': 0.5137614678899083, 'shame': 0.43195266272189353}
