# Chapter 8: Neural Networks
## Training a Classifier on the *Salammbô* Dataset with Keras

We use three classes: French, English, and German

Programs from the book: [_Python for Natural Language Processing_](https://link.springer.com/book/9783031575488)

__Author__: Pierre Nugues

## Modules

We first need to import some modules

In [1]:
import numpy as np

In [2]:
import keras
import os
os.environ["KERAS_BACKEND"] = 'torch'

### Reading the dataset
We can read the data from a file with the svmlight format or directly create numpy arrays

In [3]:
X = np.array(
    [[35680, 2217], [42514, 2761], [15162, 990], [35298, 2274],
     [29800, 1865], [40255, 2606], [74532, 4805], [37464, 2396],
     [31030, 1993], [24843, 1627], [36172, 2375], [39552, 2560],
     [72545, 4597], [75352, 4871], [18031, 1119], [36961, 2503],
     [43621, 2992], [15694, 1042], [36231, 2487], [29945, 2014],
     [40588, 2805], [75255, 5062], [37709, 2643], [30899, 2126],
     [25486, 1784], [37497, 2641], [40398, 2766], [74105, 5047],
     [76725, 5312], [18317, 1215]
     ])

y = np.array(
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
     1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

We add German data and we adjust `y`

In [4]:
X_de = np.array(
    [[37599, 1771], [44565, 2116], [16156, 715], [37697, 1804],
     [29800, 1865], [42606, 2146], [78242, 3813], [40341, 1955],
     [31030, 1993], [26676, 1346], [39250, 1902], [41780, 2106],
     [72545, 4597], [79195, 3988], [19020, 928]
     ])

X = np.vstack((X, X_de))

y = np.array(
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
     1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
     2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

## Scaling the Data
Scaling and normalizing are usually very significant with neural networks. We use sklean transformers. They consist of two main methods: `fit()` and `transform()`.

### Normalizing

In [5]:
from sklearn.preprocessing import Normalizer
normalizer = Normalizer()
X_norm = normalizer.fit_transform(X)
X_norm[:4]

array([[0.99807515, 0.06201605],
       [0.99789783, 0.06480679],
       [0.99787509, 0.06515607],
       [0.99793128, 0.06428964]])

### Standardizing

In [6]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler(with_mean=True, with_std=True)
X_scaled = scaler.fit_transform(X_norm)
X_scaled[:4]

array([[-0.03108396,  0.0944527 ],
       [-0.4126595 ,  0.44232074],
       [-0.46160343,  0.48585864],
       [-0.34067721,  0.37785758]])

In [7]:
Y_cat = keras.utils.to_categorical(y)
Y_cat[:4]

array([[1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.]])

## Creating a Model

We set a seed to have reproducible results

In [8]:
np.random.seed(1337)

We create a classifier equivalent to a logistic regression with `softmax`

In [9]:
model = keras.Sequential([
    keras.layers.Dense(3, activation='softmax')
])

Or with one hidden layer

In [10]:
model2 = keras.Sequential([
    keras.layers.Dense(5, activation='relu'),
    # layers.Dropout(0.5),
    keras.layers.Dense(3, activation='softmax')
])

To try the network with one hidden layer, set `simple` to false

In [11]:
simple = False
if not simple:
    model = model2

## Fitting the Model

We compile and fit the model

In [12]:
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])
model.fit(X_scaled, Y_cat, epochs=50, batch_size=1)

Epoch 1/50
[1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 278us/step - accuracy: 0.4388 - loss: 0.8730
Epoch 2/50
[1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 224us/step - accuracy: 0.4029 - loss: 0.8783  
Epoch 3/50
[1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 189us/step - accuracy: 0.7955 - loss: 0.7923
Epoch 4/50
[1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 190us/step - accuracy: 0.4887 - loss: 0.9273  
Epoch 5/50
[1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 191us/step - accuracy: 0.6353 - loss: 0.8428  
Epoch 6/50
[1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 199us/step - accuracy: 0.6815 - loss: 0.8368
Epoch 7/50
[1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 207us/step - accuracy: 0.7961 - loss: 0.7399
Epoch 8/50
[1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 202us/step - accuracy: 0.8579 - loss: 0.7616
Epoch 9/50
[1m45/45[0m [32m━━━━

<keras.src.callbacks.history.History at 0x32ad35750>

### The weights

In [13]:
model.get_weights()

[array([[-0.332442  , -0.6867883 , -0.1025905 ,  0.23111534,  0.88336825],
        [ 1.0086831 ,  1.2684585 , -0.11404803, -0.5456369 , -1.1680247 ]],
       dtype=float32),
 array([-0.52488416, -0.2642969 , -0.04652902,  0.04949662,  0.1343793 ],
       dtype=float32),
 array([[-1.2261132 ,  0.49274623,  0.3206304 ],
        [-0.7903693 ,  0.57375276, -1.3231831 ],
        [ 0.76134884,  0.8069629 , -0.5780432 ],
        [-0.23806335,  0.26463315,  0.90389764],
        [-0.80682236, -0.98075837,  0.80323315]], dtype=float32),
 array([ 0.9722076, -0.4286995, -0.5435087], dtype=float32)]

## Prediction
### Probabilities

We compute the probabilities to belong to the classes for all the training set

In [14]:
Y_pred_proba = model.predict(X_scaled)

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step 


In [15]:
np.set_printoptions(precision=3, suppress=True)
Y_pred_proba

array([[0.682, 0.168, 0.15 ],
       [0.563, 0.338, 0.099],
       [0.516, 0.388, 0.095],
       [0.614, 0.279, 0.106],
       [0.682, 0.168, 0.15 ],
       [0.589, 0.31 , 0.101],
       [0.612, 0.283, 0.105],
       [0.636, 0.246, 0.117],
       [0.624, 0.265, 0.111],
       [0.49 , 0.418, 0.093],
       [0.467, 0.443, 0.09 ],
       [0.59 , 0.309, 0.101],
       [0.66 , 0.209, 0.131],
       [0.6  , 0.298, 0.102],
       [0.68 , 0.167, 0.153],
       [0.207, 0.741, 0.052],
       [0.132, 0.831, 0.037],
       [0.365, 0.557, 0.078],
       [0.128, 0.835, 0.037],
       [0.257, 0.682, 0.061],
       [0.099, 0.871, 0.03 ],
       [0.256, 0.683, 0.061],
       [0.055, 0.926, 0.019],
       [0.117, 0.849, 0.034],
       [0.058, 0.922, 0.02 ],
       [0.045, 0.939, 0.016],
       [0.141, 0.82 , 0.039],
       [0.17 , 0.784, 0.045],
       [0.092, 0.88 , 0.028],
       [0.374, 0.547, 0.079],
       [0.002, 0.001, 0.997],
       [0.003, 0.001, 0.997],
       [0.001, 0.   , 0.999],
       [0.

We recompute it with matrices

In [16]:
from keras.activations import softmax, relu
if simple:
    print(softmax((X_scaled@model.get_weights()
          [0] + model.get_weights()[1]))[:4])
else:
    print(softmax((relu(X_scaled@model.get_weights()
          [0] + model.get_weights()[1]))@model.get_weights()[2] + model.get_weights()[3])[:4])

tf.Tensor(
[[0.682 0.168 0.15 ]
 [0.563 0.338 0.099]
 [0.516 0.388 0.095]
 [0.614 0.279 0.106]], shape=(4, 3), dtype=float64)


### Classes

In [17]:
y_pred = np.argmax(Y_pred_proba, axis=-1)
y_pred

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 0, 2, 2, 2, 0, 2, 2, 2, 0, 2,
       2])

## Loss
We recompute the loss

For one observation

In [18]:
- Y_cat[0] @ np.log(Y_pred_proba[0]).T

0.3825540244579315

For the dataset

In [19]:
-np.mean(np.log(Y_pred_proba[range(0, len(y)), y]))

0.3929104

## Evaluation

With Keras

In [20]:
model.evaluate(X_scaled, Y_cat)

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9556 - loss: 0.3814 


[0.39291033148765564, 0.9333333373069763]

With sklearn

In [21]:
from sklearn.metrics import classification_report

print(classification_report(y, y_pred))

              precision    recall  f1-score   support

           0       0.83      1.00      0.91        15
           1       1.00      1.00      1.00        15
           2       1.00      0.80      0.89        15

    accuracy                           0.93        45
   macro avg       0.94      0.93      0.93        45
weighted avg       0.94      0.93      0.93        45



We computed the accuracy from the training set. This is not a good practice. We should use a dedicated test set instead.