## Neural Networks # 2: Document Classification with a Multilayer Network
This example of a multilayer network (multilayer perceptron - MLP) is presented. This example is almost identical to the one on single layer perceptrons. The main differences are in the definition of the network architecture (part 2) and the parameters which are passed to the Experiment class to train the model.

1. Creation of the dataset

As in the previous example, we create our **training and validation sets** with the 20newsgroup corpus.

In [1]:
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer

# On utilise le corpus 20Newsgroups et on limite les exemples d'entraînement à 4 classes
wanted_categories = ['rec.sport.hockey', 'sci.space', 'rec.autos', 'sci.med']
training_corpus = fetch_20newsgroups(subset='train', categories=wanted_categories, shuffle=True)
validation_corpus = fetch_20newsgroups(subset='test', categories=wanted_categories, shuffle=True)
target_categories = training_corpus.target_names

# On créer un Bag-of-Words avec l'ensemble d'entrainement
vectorizer = CountVectorizer(lowercase=True)
X_train = vectorizer.fit_transform(training_corpus.data)
y_train = training_corpus.target

# On réutilise la transformation sur l'ensemble de validation
X_valid = vectorizer.transform(validation_corpus.data)
y_valid = validation_corpus.target

## 2. Creation of a multilayer neural network architecture
The network architecture of this example contains 2 layers:

* a first which converts a document vector into an intermediate representation (the hidden layer) and
* another that produces the output values from the hidden layer.

The 2 layers correspond to linear transformations (z = Wx + b).

And we apply an activation function of the **RELU** (Rectified Linear Unit) type to the output of the first layer.

In [2]:
from torch import nn

bow_size = X_train.shape[1]
nb_classes = len(target_categories)

# input : un vecteur de mots
# output: les différentes classes de notre problème de classification

class MultiLayerPerceptron(nn.Module):
    
    def __init__(self, input_size, hidden_layer_size, output_size) :
        super().__init__()
        self.input_layer = nn.Linear(input_size, hidden_layer_size)
        self.output_layer = nn.Linear(hidden_layer_size, output_size)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = nn.functional.relu_(x)
        x = self.output_layer(x)

## 3. Creation of a dataloader to iterate on the data in minibatch
This part is identical to that of the previous example on perceptrons.

In [3]:
from torch.utils.data import Dataset, DataLoader
from torch import FloatTensor, LongTensor
from scipy.sparse import csr_matrix
import numpy as np

class SparseMatrixDataset(Dataset):
    
    def __init__(self, dataset_in_csr_matrix: csr_matrix, target: np.array):
        self.dataset = dataset_in_csr_matrix
        self.target = target
    
    def __len__(self):
        return self.dataset.shape[0]

    def __getitem__(self, index):
        return FloatTensor(self.dataset[index,:].todense()).squeeze(0), LongTensor([self.target[index]]).squeeze(0)
        
def get_dataloader(base_dataset, dataset_target, dataset_class):
    return DataLoader(dataset_class(base_dataset, dataset_target), batch_size=16, shuffle=True)

train_loader = get_dataloader(X_train, y_train, SparseMatrixDataset)
valid_loader = get_dataloader(X_valid, y_valid, SparseMatrixDataset)

## 4. Creating a training loop
This part is also similar to the previous example.

The main difference with the previous example is that we must define the size of the hidden layer which has 100 neurons (hidden_size = 100). This choice is arbitrary and could be determined by a grid search type exploration.

Another more minor difference: Experiment is told to save the model and the training statistics in the 'model / mlp' directory.

Note: Patience is essential when working with neural networks. Despite the small size of the network, training the network takes a few minutes (about 7 seconds per epoch on my computer).

In [34]:
from poutyne.framework import Experiment
from poutyne import set_seeds
from torch.optim import SGD
import numpy as np

set_seeds(42)
hidden_size = 100

model = MultiLayerPerceptron(bow_size, hidden_size, nb_classes)
print(model)

experiment = Experiment('model/mlp', model, optimizer = "SGD", task="classification")

MultiLayerPerceptron(
  (input_layer): Linear(in_features=37000, out_features=100, bias=True)
  (output_layer): Linear(in_features=100, out_features=4, bias=True)
)


In [38]:
logging = experiment.train(train_loader, valid_loader, epochs=30, disable_tensorboard=True)


AttributeError: 'NoneType' object has no attribute 'log_softmax'

## 5. Predictions with the model
Now that the model is trained, we test it on new examples to see what we get.

In [6]:
from torch.nn.functional import softmax 

def get_most_probable_class(sentence, model):
    vectorized_sentence = vectorizer.transform([sentence]).todense()
    prediction = model(FloatTensor(vectorized_sentence).squeeze(0)).detach()
    output = softmax(prediction, dim=0)
    max_category_index = np.argmax(output)
    max_category = target_categories[max_category_index]
    print("\nClassification de la phrase: ", sentence)
    print("Sorties du réseau de neurones:", prediction)
    print("Valeurs obtenues après application de softmax:", output)
    print("Meilleure classe: {} qui correspond en sortie au neurone {}".format(max_category, max_category_index))
    return(max_category)

In [26]:
test_docs = ['Getzky was a center, not a goaltender', 
             'Mazda and BMW cars are esthetic',
             'Doctor, doctor, gimme the news', 
             'Take me to the moon']

[get_most_probable_class(sentence, model) for sentence in test_docs]

AttributeError: 'NoneType' object has no attribute 'detach'

In [27]:
# On test le modèle avec de plus longues phrases, ce qui semble régler notre problème

test_docs = ['Getzky was a center, not a goaltender but a fantastic hockey player', 
             'Mazda and BMW are esthetic cars but the motors are quite different',
             'Doctor, doctor, gimme the news', 
             'Take me to the moon, the sun and planet Mars']

[get_most_probable_class(sentence, model) for sentence in test_docs]

AttributeError: 'NoneType' object has no attribute 'detach'