# LXMLS 2017 - Day 1

## Classification

This day will serve as an introduction to machine learning. We recall some fundamental concepts about decision
theory and classification. We also present some widely used models and algorithms and try to provide
the main motivation behind them. There are several textbooks that provide a thorough description of some
of the concepts introduced here: for example, Mitchell (1997), Duda et al. (2001), Scholkopf and Smola (2002), ¨
Joachims (2002), Bishop (2006), Manning et al. (2008), to name just a few. The concepts that we introduce in this
chapter will be revisited in later chapters, where the same algorithms and models will be adapted to structured
inputs and outputs. For now, we concern only with multi-class classification (with just a few classes).

**Exercise 1.1** In this exercise we will use the Amazon sentiment analysis data (Blitzer et al., 2007), where the goal is to classify text documents as expressing a positive or negative sentiment (i.e., a classification problem with two classes).We are going to focus on book reviews. To load the data, type:

In [2]:
import sys
sys.path.append('../../../')
import pdb


import lxmls.readers.simple_data_set as sds
import lxmls.readers.sentiment_reader as srs
import lxmls.classifiers.linear_classifier as lcc
import lxmls.classifiers.perceptron as percc
import lxmls.classifiers.mira as mirac
import lxmls.classifiers.gaussian_naive_bayes as gnbc
import lxmls.classifiers.multinomial_naive_bayes as mnb

reload(mnb)  # this allows you to edit the module and run this script again without rebooting Python
import lxmls.classifiers.max_ent_batch as mebc
import lxmls.classifiers.max_ent_online as meoc
import lxmls.classifiers.svm as svmc
import lxmls.readers.sentiment_reader as srs

scr = srs.SentimentCorpus("books")

2000
1600


Resolution of the exercise 1.1

In [3]:
def train(self, x, y):
    # n_docs = no. of documents
    # n_words = no. of unique words
    n_docs, n_words = x.shape

    # classes = a list of possible classes
    classes = np.unique(y)
    # n_classes = no. of classes
    n_classes = np.unique(y).shape[0]

    # initialization of the prior and likelihood variables
    prior = np.zeros(n_classes)
    likelihood = np.zeros((n_words, n_classes))

    # TODO: This is where you have to write your code!
    # You need to compute the values of the prior and likelihood parameters
    # and place them in the variables called "prior" and "likelihood".
    # Examples:
    # prior[0] is the prior probability of a document being of class 0
    # likelihood[4, 0] is the likelihood of the fifth(*) feature being
    # active, given that the document is of class 0
    # (*) recall that Python starts indices at 0, so an index of 4
    # corresponds to the fifth feature!

    ##########################
    # Solution to Exercise 1.1
    ##########################
    for i in xrange(n_classes):
        docs_in_class, _ = np.nonzero(y == classes[i])  # docs_in_class = indices of documents in class i
        prior[i] = 1.0 * len(docs_in_class) / n_docs  # prior = fraction of documents with this class

        # word_count_in_class = count of word occurrences in documents of class i
        word_count_in_class = x[docs_in_class, :].sum(0)
        total_words_in_class = word_count_in_class.sum()  # total_words_in_class = total number of words in documents of class i
        if not self.smooth:
            # likelihood = count of occurrences of a word in a class
            likelihood[:, i] = word_count_in_class / total_words_in_class
        else:
            likelihood[:, i] = (word_count_in_class+self.smooth_param) / (total_words_in_class + self.smooth_param*n_words)
    ##############################
    # End solution to Exercise 1.1
    ##############################

    params = np.zeros((n_words+1, n_classes))
    for i in xrange(n_classes):
        params[0, i] = np.log(prior[i])
        params[1:, i] = np.nan_to_num(np.log(likelihood[:, i]))
    self.likelihood = likelihood
    self.prior = prior
    self.trained = True
    return params


In [4]:
import lxmls.classifiers.multinomial_naive_bayes as mnbb
mnb = mnbb.MultinomialNaiveBayes()
params_nb_sc = mnb.train(scr.train_X,scr.train_y)
y_pred_train = mnb.test(scr.train_X,params_nb_sc)
acc_train = mnb.evaluate(scr.train_y, y_pred_train)
y_pred_test = mnb.test(scr.test_X,params_nb_sc)
acc_test = mnb.evaluate(scr.test_y, y_pred_test)
print "Multinomial Naive Bayes Amazon Sentiment Accuracy train: %f test: %f"%(
acc_train,acc_test)

Multinomial Naive Bayes Amazon Sentiment Accuracy train: 0.974375 test: 0.840000


In [5]:
# Exercise 1.1: run all classifiers on 2D data ####

# This instruction generates a simple 2D dataset with two classes.
# Each class is a Gaussian distribution.
# Input parameters (feel free to change them):
# nr_examples: number of points in the dataset
# g1: parameters for the first gaussian, of the form:
# g1 = [[mean_x,mean_y], std]
# mean_x and mean_y are the x and y coordinates of the mean of the Gaussian
# std is the standard deviation of the Gaussian
# g2: parameters for the second gaussian, with a similar form as g1
# balance: percentage of points in the first gaussian
# split: fraction of points to use for train, development, and test respectively
sd = sds.SimpleDataSet(nr_examples=100,
                       g1=[[-1, -1], 1],
                       g2=[[1, 1], 1],
                       balance=0.5,
                       split=[0.5, 0, 0.5])

# Plot the data and the Bayes Optimal classifier
fig, axis = sd.plot_data()

# Initialize the Naive Bayes (NB) classifier for Gaussian data
gnb = gnbc.GaussianNaiveBayes()

# Learn the NB parameters from the train data
params_nb_sd = gnb.train(sd.train_X, sd.train_y)

# Use the learned parameters to predict labels for the training data
y_pred_train = gnb.test(sd.train_X, params_nb_sd)

# Compute accuracy on training data from predicted labels and true labels
acc_train = gnb.evaluate(sd.train_y, y_pred_train)

# Use the learned parameters to predict labels for the test data
y_pred_test = gnb.test(sd.test_X, params_nb_sd)

# Compute accuracy on test data from predicted labels and true labels
acc_test = gnb.evaluate(sd.test_y, y_pred_test)

# Add a line to the plot with the line corresponding to the NB classifier
fig, axis = sd.add_line(fig, axis, params_nb_sd, "Naive Bayes", "red")

[[-1.69314718 -1.69314718]
 [-1.          1.        ]
 [-1.          1.        ]]


In [6]:
# Print these two accuracies to the terminal
print "Naive Bayes Simple Dataset Accuracy train: %f test: %f" % (acc_train, acc_test)
print

# Same as above, but for the perceptron classifier (instead of Naive Bayes)
perc = percc.Perceptron()
params_perc_sd = perc.train(sd.train_X, sd.train_y)
y_pred_train = perc.test(sd.train_X, params_perc_sd)
acc_train = perc.evaluate(sd.train_y, y_pred_train)
y_pred_test = perc.test(sd.test_X, params_perc_sd)
acc_test = perc.evaluate(sd.test_y, y_pred_test)
fig, axis = sd.add_line(fig, axis, params_perc_sd, "Perceptron", "blue")
print "Perceptron Simple Dataset Accuracy train: %f test: %f" % (acc_train, acc_test)
print

# Same as above, but for the MIRA classifier
mira = mirac.Mira()
params_mira_sd = mira.train(sd.train_X, sd.train_y)
y_pred_train = mira.test(sd.train_X, params_mira_sd)
acc_train = mira.evaluate(sd.train_y, y_pred_train)
y_pred_test = mira.test(sd.test_X, params_mira_sd)
acc_test = mira.evaluate(sd.test_y, y_pred_test)
fig, axis = sd.add_line(fig, axis, params_mira_sd, "Mira", "green")
print "Mira Simple Dataset Accuracy train: %f test: %f" % (acc_train, acc_test)
print

# Same as above, but for the Maximum Entropy classifier, batch version
me_lbfgs = mebc.MaxEntBatch()
params_meb_sd = me_lbfgs.train(sd.train_X, sd.train_y)
y_pred_train = me_lbfgs.test(sd.train_X, params_meb_sd)
acc_train = me_lbfgs.evaluate(sd.train_y, y_pred_train)
y_pred_test = me_lbfgs.test(sd.test_X, params_meb_sd)
acc_test = me_lbfgs.evaluate(sd.test_y, y_pred_test)
fig, axis = sd.add_line(fig, axis, params_meb_sd, "Max-Ent-Batch", "orange")
print "Max-Ent batch Simple Dataset Accuracy train: %f test: %f" % (acc_train, acc_test)
print

# Same as above, but for the Maximum Entropy classifier, online version
me_sgd = meoc.MaxEntOnline()
params_meo_sd = me_sgd.train(sd.train_X, sd.train_y)
y_pred_train = me_sgd.test(sd.train_X, params_meo_sd)
acc_train = me_sgd.evaluate(sd.train_y, y_pred_train)
y_pred_test = me_sgd.test(sd.test_X, params_meo_sd)
acc_test = me_sgd.evaluate(sd.test_y, y_pred_test)
fig, axis = sd.add_line(fig, axis, params_meo_sd, "Max-Ent-Online", "magenta")
print "Max-Ent Online Simple Dataset Accuracy train: %f test: %f" % (acc_train, acc_test)
print

# Same as above, but for the SVM classifier
svm = svmc.SVM()
params_svm_sd = svm.train(sd.train_X, sd.train_y)
y_pred_train = svm.test(sd.train_X, params_svm_sd)
acc_train = svm.evaluate(sd.train_y, y_pred_train)
y_pred_test = svm.test(sd.test_X, params_svm_sd)
acc_test = svm.evaluate(sd.test_y, y_pred_test)
fig, axis = sd.add_line(fig, axis, params_svm_sd, "SVM", "yellow")
print "SVM Online Simple Dataset Accuracy train: %f test: %f" % (acc_train, acc_test)
print

# End of exercise 3.1 #########

Naive Bayes Simple Dataset Accuracy train: 0.900000 test: 0.840000

Rounds: 0 Accuracy: 0.900000
Rounds: 1 Accuracy: 0.780000
Rounds: 2 Accuracy: 0.880000
Rounds: 3 Accuracy: 0.860000
Rounds: 4 Accuracy: 0.820000
Rounds: 5 Accuracy: 0.860000
Rounds: 6 Accuracy: 0.900000
Rounds: 7 Accuracy: 0.860000
Rounds: 8 Accuracy: 0.900000
Rounds: 9 Accuracy: 0.820000
Perceptron Simple Dataset Accuracy train: 0.900000 test: 0.820000

Rounds: 0 Accuracy: 0.880000
Rounds: 1 Accuracy: 0.900000
Rounds: 2 Accuracy: 0.680000
Rounds: 3 Accuracy: 0.760000
Rounds: 4 Accuracy: 0.680000
Rounds: 5 Accuracy: 0.820000
Rounds: 6 Accuracy: 0.840000
Rounds: 7 Accuracy: 0.840000
Rounds: 8 Accuracy: 0.320000
Rounds: 9 Accuracy: 0.880000
Mira Simple Dataset Accuracy train: 0.860000 test: 0.800000

Objective = 0.69314718056
Objective = 0.820050382317
Objective = 0.518005699926
Objective = 0.517603782644
Objective = 0.517541454694
Objective = 0.517541361737
Objective = 0.517541360472
Max-Ent batch Simple Dataset Accurac

**Exercise 1.2** We provide an implementation of the perceptron algorithm in the class Perceptron (file perceptron.py).

In [7]:
def train(self, x, y, seed=1):
    self.params_per_round = []
    x_orig = x[:, :]
    x = self.add_intercept_term(x)
    nr_x, nr_f = x.shape
    nr_c = np.unique(y).shape[0]
    w = np.zeros((nr_f, nr_c))
    for epoch_nr in xrange(self.nr_epochs):

        # use seed to generate permutation
        np.random.seed(seed)
        perm = np.random.permutation(nr_x)

        # change the seed so next epoch we don't get the same permutation
        seed += 1

        for nr in xrange(nr_x):
            # print "iter %i" %( epoch_nr*nr_x + nr)
            inst = perm[nr]
            y_hat = self.get_label(x[inst:inst+1, :], w)

            if y[inst:inst+1, 0] != y_hat:
                # Increase features of th e truth
                w[:, y[inst:inst+1, 0]] += self.learning_rate * x[inst:inst+1, :].transpose()

                # Decrease features of the prediction
                w[:, y_hat] += -1 * self.learning_rate * x[inst:inst+1, :].transpose()

        self.params_per_round.append(w.copy())
        self.trained = True
        y_pred = self.test(x_orig, w)
        acc = self.evaluate(y, y_pred)
        self.trained = False
        print "Rounds: %i Accuracy: %f" % (epoch_nr, acc)
    self.trained = True

    if self.averaged:
        new_w = 0
        for old_w in self.params_per_round:
            new_w += old_w
        new_w /= len(self.params_per_round)
        return new_w
    return w
