## Neural Networks # 1: Classification with a 1-Layer Network
In this notebook you will find a first example on the use of a simple neural network for the classification of texts. We present here the basic concepts related to neural networks and some basic information on the PyTorch library.

In this example, an architecture comprising a single layer of neurons (perceptrons) is used. A perceptron is a neuron that:
* takes as input a vector of numeric value
* multiply each of the values by a weight
* applies an activation function to produce an output result (optional).

See the description on Wikipedia for more details.

This is equivalent to a neural network without a hidden layer. It is also equivalent to a logistic regression if the activation function is or a softmax (multiclass classification).

The dependencies needed to run these examples are:
* gensim
* torch==1.6
* torchvision
* wget
* sklearn
* numpy
* matplotlib
* poutyne
* pandas
* spacy

## 1. Creation of the dataset
As for the example of text classification seen during the 3rd week of this course, we will use the 20newsgroup corpus to conduct our tests. This corpus is available through scikit-learn.

We create 2 sets of examples:

* train: texts used for network training
* valid: the texts used to evaluate the performance of the network at each iteration of the training

In [1]:
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer

# On utilise le corpus 20Newsgroups et on limite les exemples d'entraînement à 4 classes
wanted_categories = ['rec.sport.hockey', 'sci.space', 'rec.autos', 'sci.med']
training_corpus = fetch_20newsgroups(subset='train', categories=wanted_categories, shuffle=True)
validation_corpus = fetch_20newsgroups(subset='test', categories=wanted_categories, shuffle=True)

target_categories = training_corpus.target_names

# On créer un sac de mots (Bag of Words - BOW) avec l'ensemble d'entrainement 
vectorizer = CountVectorizer(lowercase=True)
X_train = vectorizer.fit_transform(training_corpus.data)
y_train = training_corpus.target

# On réutilise le CountVectorizer pour transformer l'ensemble de validation en sac de mots
X_valid = vectorizer.transform(validation_corpus.data)
y_valid = validation_corpus.target 

print("Les classes sont:", target_categories)

Les classes sont: ['rec.autos', 'rec.sport.hockey', 'sci.med', 'sci.space']


## 2. Creating a 1-layer neural network architecture
The architecture for this example is a single-layer network that:

* takes as input vectors generated by the CountVectorizer. The dimension of these vectors is 37,000 (the number of different words in the corpus)
* has 4 neurons at the output that correspond to the classes of our texts (cars, hockey, med and space).

In the PyTorch library, this corresponds to a linear layer (nn.Linear). In other words, we apply a linear transformation on the vectors of counts x as input by using the weights W of the network (z = Wx + b).

In [6]:
from torch import nn

bow_size = X_train.shape[1]
nb_classes = len(target_categories)

perceptron = nn.Linear(bow_size, nb_classes)
print(perceptron)

Linear(in_features=37000, out_features=4, bias=True)


## 3. Creation of a dataloader to iterate on the data in minibatch
In this section, we define a utility function that allows you to group the examples into sets of fixed size (minibatches). See the section "training a neural network" in the course notes or in the first video clip on neural networks.

The main ideas to remember are as follows:

* When training a neural network, one can use at each iteration either 1 single example, a few examples or all the examples before updating the weights of the network.
* For stochastic gradient descent (SGD) training, a few examples are used - a minibatch.
* So a minibatch simply means that we present some examples to the network in order to update the weights of its links.
* In the Deep learning libraries, the examples are represented in the form of tensors. A tensor is a matrix in several dimensions.
* In our case, we have tensors in 2 dimensions:
 * a dimension which corresponds to the vectors of words (| V | = 37,000) and
 * another which corresponds to the number of examples presented to the neural network (batch_size = 16).
* A DataLoader is a utility class in PyTorch that manages minibatches. So which allows you to take a set of examples and divide it into several groups of the same size (batch_size).
* Last detail: we use here a sparse matrix (scipy.sparse import csr_matrix) because our document vectors contain a lot of 0s.

This should give you the essentials to understand the rest. But it is not important to remember all the implementation details because you will not be evaluated on these aspects.

In [7]:
from torch.utils.data import Dataset, DataLoader
from torch import FloatTensor, LongTensor
from scipy.sparse import csr_matrix
import numpy as np

class SparseMatrixDataset(Dataset):
    def __init__(self, dataset_in_csr_matrix: csr_matrix, target: np.array):
        self.dataset = dataset_in_csr_matrix
        self.target = target
    
    def __len__(self):
        return self.dataset.shape[0]

    def __getitem__(self, index):
        return FloatTensor(self.dataset[index,:].todense()).squeeze(0), LongTensor([self.target[index]]).squeeze(0)
        
def get_dataloader(base_dataset, dataset_target, dataset_class):
    return DataLoader(dataset_class(base_dataset, dataset_target), batch_size=16, shuffle=True)

train_loader = get_dataloader(X_train, y_train, SparseMatrixDataset)
valid_loader = get_dataloader(X_valid, y_valid, SparseMatrixDataset)

## 4. Creating a training loop
In this step, we implement the network drive loop. As seen in the course, this loop consists of applying the following steps:

* Present a set of examples (minibatch) to the neural network
* Obtain the results for each of these examples (forward propagation)
* Calculate the cumulative loss on these examples
* Propagate errors backwards (backpropagation) in order to modify the network weights using gradient descent

To simplify the whole thing, we use the Poutyne library which makes it possible to hide the complexity of the instructions necessary to train a neural network. All this is taken care of by the Experiment class which manages the learning (experiment.train) only from the description of the network (perceptron) and some parameters describing how to conduct the training:

* we have a classification task
* we want an optimization of the stochastic gradient descent type (SGD)
* we reuse our dataset 30 times to train the network (epochs = 30)
* the model is saved in the "./model/perceptron" directory as well as statistics on learning.
* we use the data_loader created in the previous step to manage the examples during training (train_loader) and for the evaluation of the model (valid_loader)

It takes about 1 minute to train the model.

Poutyne was developed by one of our doctoral students (Frédérik Paradis) and is a good tool to add to your safe if you want to do work with PyTorch.

In [10]:
from poutyne.framework import Experiment
from poutyne import set_seeds
from torch.optim import SGD

set_seeds(42)
experiment = Experiment('model/perceptron', perceptron, optimizer = "SGD", task="classification")

In [11]:
logging = experiment.train(train_loader, valid_loader, epochs=30, disable_tensorboard=True)


[35mEpoch: [36m1/30 [35mStep: [36m149/149 [35m100.00% |[35m█████████████████████████[35m|[32m3.54s [35mloss:[94m 0.923344[35m acc:[94m 72.574549[35m fscore_micro:[94m 0.725745[35m val_loss:[94m 0.733501[35m val_acc:[94m 81.324921[35m val_fscore_micro:[94m 0.813249[0m
Epoch 1: val_acc improved from -inf to 81.32492, saving file to model/perceptron\checkpoint_epoch_1.ckpt
[35mEpoch: [36m2/30 [35mStep: [36m149/149 [35m100.00% |[35m█████████████████████████[35m|[32m3.71s [35mloss:[94m 0.493188[35m acc:[94m 91.852163[35m fscore_micro:[94m 0.918522[35m val_loss:[94m 0.584016[35m val_acc:[94m 87.066246[35m val_fscore_micro:[94m 0.870662[0m
Epoch 2: val_acc improved from 81.32492 to 87.06625, saving file to model/perceptron\checkpoint_epoch_2.ckpt
[35mEpoch: [36m3/30 [35mStep: [36m149/149 [35m100.00% |[35m█████████████████████████[35m|[32m3.93s [35mloss:[94m 0.382614[35m acc:[94m 94.960101[35m fscore_micro:[94m 0.949601[35m val_loss:[94

Epoch 23: val_acc improved from 92.05047 to 92.11356, saving file to model/perceptron\checkpoint_epoch_23.ckpt
[35mEpoch: [36m24/30 [35mStep: [36m149/149 [35m100.00% |[35m█████████████████████████[35m|[32m3.81s [35mloss:[94m 0.097974[35m acc:[94m 99.454011[35m fscore_micro:[94m 0.994540[35m val_loss:[94m 0.298575[35m val_acc:[94m 91.861199[35m val_fscore_micro:[94m 0.918612[0m
[35mEpoch: [36m25/30 [35mStep: [36m149/149 [35m100.00% |[35m█████████████████████████[35m|[32m3.90s [35mloss:[94m 0.094751[35m acc:[94m 99.538009[35m fscore_micro:[94m 0.995380[35m val_loss:[94m 0.297725[35m val_acc:[94m 91.987382[35m val_fscore_micro:[94m 0.919874[0m
[35mEpoch: [36m26/30 [35mStep: [36m149/149 [35m100.00% |[35m█████████████████████████[35m|[32m4.95s [35mloss:[94m 0.092009[35m acc:[94m 99.622008[35m fscore_micro:[94m 0.996220[35m val_loss:[94m 0.292738[35m val_acc:[94m 92.239748[35m val_fscore_micro:[94m 0.922397[0m
Epoch 26: val_acc i

## 5. Prediction with the model
Now that the model is trained, we test it on new examples to see what we get.

In [12]:
from torch.nn.functional import softmax 

def get_most_probable_class(sentence, model):
    vectorized_sentence = vectorizer.transform([sentence]).todense()
    prediction = model(FloatTensor(vectorized_sentence).squeeze(0)).detach()
    output = softmax(prediction, dim=0)
    max_category_index = np.argmax(output)
    max_category = target_categories[max_category_index]
    print("\nClassification de la phrase: ", sentence)
    print("Sorties du réseau de neurones:", prediction)
    print("Valeurs obtenues après application de softmax:", output)
    print("Meilleure classe: {} qui correspond en sortie au neurone {}".format(max_category, max_category_index))
    return(max_category)

In [14]:
# We test the model with a few sentences

test_docs = ['Getzky was a center, not a goaltender', 
             'Mazda and BMW cars are esthetic ',
             'Doctor, doctor, gimme the news', 
             'Take me to the moon']

[get_most_probable_class(sentence, perceptron) for sentence in test_docs]


Classification de la phrase:  Getzky was a center, not a goaltender
Sorties du réseau de neurones: tensor([ 0.0709, -0.0175,  0.0134, -0.0737])
Valeurs obtenues après application de softmax: tensor([0.2685, 0.2457, 0.2535, 0.2323])
Meilleure classe: rec.autos qui correspond en sortie au neurone 0

Classification de la phrase:  Mazda and BMW cars are esthetic 
Sorties du réseau de neurones: tensor([ 0.7499, -0.3200, -0.1551, -0.2804])
Valeurs obtenues après application de softmax: tensor([0.4752, 0.1630, 0.1922, 0.1696])
Meilleure classe: rec.autos qui correspond en sortie au neurone 0

Classification de la phrase:  Doctor, doctor, gimme the news
Sorties du réseau de neurones: tensor([-0.2045,  0.0228,  0.3622, -0.1550])
Valeurs obtenues après application de softmax: tensor([0.1973, 0.2476, 0.3477, 0.2073])
Meilleure classe: sci.med qui correspond en sortie au neurone 2

Classification de la phrase:  Take me to the moon
Sorties du réseau de neurones: tensor([-0.0958,  0.0906, -0.0269, 

['rec.autos', 'rec.autos', 'sci.med', 'rec.sport.hockey']

In [15]:
# We test the model with longer sentences, which seems to solve our problem

test_docs = ['Getzky was a center, not a goaltender but a fantastic hockey player', 
             'Mazda and BMW are esthetic cars but the motors are quite different',
             'Doctor, doctor, gimme the news', 
             'Take me to the moon, the sun and planet Mars']

[get_most_probable_class(sentence, perceptron) for sentence in test_docs]


Classification de la phrase:  Getzky was a center, not a goaltender but a fantastic hockey player
Sorties du réseau de neurones: tensor([-0.1697,  0.6352, -0.2488, -0.2230])
Valeurs obtenues après application de softmax: tensor([0.1958, 0.4378, 0.1809, 0.1856])
Meilleure classe: rec.sport.hockey qui correspond en sortie au neurone 1

Classification de la phrase:  Mazda and BMW are esthetic cars but the motors are quite different
Sorties du réseau de neurones: tensor([ 0.7716, -0.2390, -0.2703, -0.2506])
Valeurs obtenues après application de softmax: tensor([0.4816, 0.1753, 0.1699, 0.1733])
Meilleure classe: rec.autos qui correspond en sortie au neurone 0

Classification de la phrase:  Doctor, doctor, gimme the news
Sorties du réseau de neurones: tensor([-0.2045,  0.0228,  0.3622, -0.1550])
Valeurs obtenues après application de softmax: tensor([0.1973, 0.2476, 0.3477, 0.2073])
Meilleure classe: sci.med qui correspond en sortie au neurone 2

Classification de la phrase:  Take me to the 

['rec.sport.hockey', 'rec.autos', 'sci.med', 'sci.space']