# Multi-Layer Perceptron for Text Classification

In this notebook, we try to classify brown corpus regarding its categories.

In [None]:
import nltk
from nltk.corpus import brown
corpus = [(brown.words(fileid), brown.categories(fileid)) for fileid in brown.fileids()]

In [None]:
import random
random.seed(0)
random.shuffle(corpus)

In [None]:
docs = [' '.join(words) for words, cats in corpus]
cats = [' '.join(cats) for words, cats in corpus]

Each document is represented by a word frequency vector.

In [None]:
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(max_df=0.8, min_df=3)
vecs = vectorizer.fit_transform(docs)

To process in PyTorch, we need to convert the string representation of category into a number representation.
This code assigns different IDs to each category.

In [None]:
cat_to_id = dict()
id_to_cat = dict()
for i, cat in enumerate(brown.categories()):
    cat_to_id[cat] = i
    id_to_cat[i] = cat

In [None]:
cat_ids = [cat_to_id[cat] for cat in cats]

In [None]:
print(cat_ids)

The input dimension of MLP is the vocabulary size, and the output dimension is the number of classes.

In [None]:
features = vectorizer.get_feature_names()
input_dimension = len(features)
output_dimension = len(brown.categories())

In [None]:
print(input_dimension)
print(output_dimension)

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)

In PyTorch, all the variables in a computational graph should be represented as `Tensor` object.
We convert the arrays of word frequency vectors and categories into the `Tensor` objects by the following code.

In [None]:
xs = torch.FloatTensor(vecs.toarray())
ys = torch.LongTensor(cat_ids)

In [None]:
xs_train = xs[:450]
ys_train = ys[:450]
xs_test = xs[450:]
ys_test = ys[450:]

### Model Definition

Three-layered perceptron applies two linear transformations sequentially to feature vector $x$ to obtain a prediction of $y$.
PyTorch provides `Sequential` type that allows us to make a complex model by combining some transformation functions sequentially.
As we have seen in the previous chapter, each `nn.Linear` object contains parameters $w$ and $b$.
Therefore, the model we defined contains four kinds of parameters, $w^{(1)}$ and $b^{(1)}$ for the first linear transformation and $w^{(2)}$ and $b^{(2)}$ for the second linear transformation.

In [None]:
hidden_dimension = 10

model = torch.nn.Sequential(
    torch.nn.Linear(input_dimension, hidden_dimension),
    torch.nn.ReLU(),
    torch.nn.Linear(hidden_dimension, output_dimension)
)

In [None]:
for param in model.parameters():
    print(param.data)

Then we apply the cross entropy loss because we work on multi-class classification. (see https://pytorch.org/docs/master/nn.html#torch.nn.CrossEntropyLoss)

We can make the training code as usual even though we are using a compound model.

In [None]:
cross_entropy = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr = 0.1)
for epoch in range(300):
    zs = model(xs_train)
    loss = cross_entropy(zs, ys_train)
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

In [None]:
for param in model.parameters():
    print(param.data)

The probability of each class can be obtained by applying the softmax function to the output of the model.
We then take the class that has the maximum probability by `max()` method.

In [None]:
ys_prob, ys_pred = F.softmax(model(xs_test), 1).max(1)

In [None]:
ys_prob, ys_pred

Just compare with the true categories and calculate the classification accuracy.

In [None]:
ys_test

In [None]:
ys_test == ys_pred

In [None]:
print((ys_test == ys_pred).sum().double() / len(ys_test))