In this lab, we will understand the activation functions

Some activations like ReLU and Softmax are also available as layers.

In [37]:
import tensorflow as tf
from tensorflow import keras
layer = keras.layers.ReLU()
import warnings
warnings.filterwarnings("ignore")
#Ignore the warnings on this cell

Let's run our data through the ReLU

As expected, ReLU has filtered all the negative numbers.
<br/><br/>
You can customize the threshold of ReLU:

Notice any difference?

Let's try other activation functions.

SoftMax gives us a probability distribution over the input.

The sum of a probability distribution is 1!

Sigmoid on the other hand, returns the distribution with following logic:



1.   if value < -5, sigmoid returns close to 0
2.   if value > 5, sigmoid returns close to 1
3.   for all other values, sigmoid distributes them between the limits with the logic 1 / (1 + exp(-x)).



In [29]:
activation = tf.keras.activations.sigmoid
data = [-6.0,-5.0,-4,0,-1.1,0.0,2.0,5.0,12,42]
print(data)
print(activation(data))

[-6.0, -5.0, -4, 0, -1.1, 0.0, 2.0, 5.0, 12, 42]
tf.Tensor(
[0.00247262 0.00669285 0.01798621 0.5        0.24973987 0.5
 0.8807971  0.9933072  0.99999386 1.        ], shape=(10,), dtype=float32)


Swish is represented by data*sigmoid(data). <br/><br/>
Swish provides a bound on lower limits, while remains unbounded on upper limits.

In [30]:
activation = tf.keras.activations.swish
print(data)
print(activation(data))

[-6.0, -5.0, -4, 0, -1.1, 0.0, 2.0, 5.0, 12, 42]
tf.Tensor(
[-1.4835738e-02 -3.3464260e-02 -7.1944840e-02  0.0000000e+00
 -2.7471387e-01  0.0000000e+00  1.7615942e+00  4.9665360e+00
  1.1999927e+01  4.2000000e+01], shape=(10,), dtype=float32)


Tanh, or hyperbolic tangent, is similar to sigmoid, but while sigmoid's limit is 0 to 1, tanh's limit is -1 to 1.

In [31]:
activation = tf.keras.activations.tanh
print(data)
print(activation(data))

[-6.0, -5.0, -4, 0, -1.1, 0.0, 2.0, 5.0, 12, 42]
tf.Tensor(
[-0.99998784 -0.99990916 -0.9993292   0.         -0.8004991   0.
  0.9640276   0.99990916  1.          1.        ], shape=(10,), dtype=float32)


Now let's apply the learning to a classification function!

Let's select the good old imdb dataset for sentiment analysis

Define the hyperparameters and read the datasets

In [32]:
HP_vocab_size = 10000
HP_epochs = 50
HP_batch_size = 32
HP_maxlen = 256
HP_initial_LR = 0.001

In [33]:
from tensorflow import keras
import tensorflow
import warnings
warnings.filterwarnings("ignore")
data = keras.datasets.imdb
(xtrain, ytrain), (xtest, ytest) = data.load_data(
    num_words = HP_vocab_size
)
vocab = data.get_word_index()
rev_fixed_vocab = dict()
fixed_dict = dict()

fixed_dict = { k:v+3 for k,v in vocab.items()}
rev_fixed_vocab = { v:k for k,v in fixed_dict.items() }

rev_fixed_vocab[0] = "<PAD>"
rev_fixed_vocab[1] = "<START>"
rev_fixed_vocab[2] = "<UNK>"
rev_fixed_vocab[3] = "<UNUSED>"
xtrain_padded = keras.preprocessing.sequence.pad_sequences(
    xtrain, maxlen = HP_maxlen,
    truncating = 'post', padding = 'post'
)
xtest_padded = keras.preprocessing.sequence.pad_sequences(
    xtest, maxlen = HP_maxlen,
    truncating = 'post', padding = 'post'
)

Now let's build the network.

In [34]:
import tensorflow as tf
from keras.layers import Embedding, GlobalAvgPool1D
from keras.layers import Dense
m = keras.Sequential([
        Embedding(10000, 32),
        GlobalAvgPool1D(),
        Dense(256, activation=tf.nn.relu),
        Dense(512, activation=tf.nn.relu),
        Dense(1, activation=tf.nn.sigmoid)
])
m.compile(loss = 'binary_crossentropy',
          optimizer='adam', metrics=['accuracy'])
m.fit(xtrain_padded, ytrain, epochs=20)

Epoch 1/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 8ms/step - accuracy: 0.6601 - loss: 0.5683
Epoch 2/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 9ms/step - accuracy: 0.8818 - loss: 0.2831
Epoch 3/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 9ms/step - accuracy: 0.9071 - loss: 0.2359
Epoch 4/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 8ms/step - accuracy: 0.9273 - loss: 0.1941
Epoch 5/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 10ms/step - accuracy: 0.9325 - loss: 0.1811
Epoch 6/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 8ms/step - accuracy: 0.9427 - loss: 0.1562
Epoch 7/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 8ms/step - accuracy: 0.9520 - loss: 0.1330
Epoch 8/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 10ms/step - accuracy: 0.9458 - loss: 0.1443
Epoch 9/20
[1m782/782[0m [32m━━━━

<keras.src.callbacks.history.History at 0x7c8f32f23350>

Get the predictions

In [35]:
p_unknown = m.predict(xtest_padded)

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step


Check the performance

In [36]:
from sklearn.metrics import accuracy_score
mapping_logic = lambda x: 1 if x>0.75 else 0
pred_unknown = [mapping_logic(pred) for pred in p_unknown]
print('Accuracy on unknown data')
print(accuracy_score(pred_unknown, ytest))

Accuracy on unknown data
0.83008


**Conclusion**

In this lab, you learnt how to apply activations as


*   independent layers
*   embedded inside the network layers

Activations lead to a better learning!

