## MBA em Ciência de Dados
# Redes Neurais e Arquiteturas Profundas

### <span style="color:darkred">Módulo 6 - Redes neurais para dados sequenciais</span>

#### <span style="color:darkred">**Parte 4: Transformer Network**</span>

Moacir Antonelli Ponti

CeMEAI - ICMC/USP São Carlos

---

In [1]:
import os
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import pandas as pd

from tensorflow import keras
from keras import layers
from numpy.random import seed
from tensorflow.random import set_seed

In [2]:
!wget http://143.107.183.175:22980/download.php?file=embeddings/glove/glove_s50.zip

--2020-10-18 00:09:58--  http://143.107.183.175:22980/download.php?file=embeddings/glove/glove_s50.zip
Connecting to 143.107.183.175:22980... connected.
HTTP request sent, awaiting response... 200 OK
Length: 181356545 (173M) [application/octet-stream]
Saving to: ‘download.php?file=embeddings%2Fglove%2Fglove_s50.zip’


2020-10-18 00:10:26 (6.37 MB/s) - ‘download.php?file=embeddings%2Fglove%2Fglove_s50.zip’ saved [181356545/181356545]



In [3]:
!mv download.php?file=embeddings%2Fglove%2Fglove_s50.zip glove_s50.zip
!unzip -q glove_s50.zip

In [4]:
path_to_glove_file = os.path.join(
    os.path.expanduser("~"), "/content/glove_s50.txt"
)

embeddings_index = {}
with open(path_to_glove_file) as f:
    for line in f:
        word, coefs = line.split(maxsplit=1)
        coefs = np.fromstring(coefs, "f", sep=" ")
        embeddings_index[word] = coefs

print("Encontrados %s word vectors." % len(embeddings_index))

  if __name__ == '__main__':


Encontrados 929594 word vectors.


In [5]:
print(embeddings_index['aprovação'])
print(len(embeddings_index['aprovação']))

[ 6.984870e-01  1.938170e-01  1.839920e-01 -2.590166e+00 -3.155430e-01
 -1.469410e-01  1.290320e-01  3.814410e-01 -4.846610e-01  3.721310e-01
  6.471990e-01 -1.248160e+00 -3.151210e-01  3.676890e-01 -7.965720e-01
  2.589710e-01 -1.260200e-02 -6.782460e-01 -4.735670e-01  3.739230e-01
  1.437597e+00  2.001800e-02  9.999200e-02 -1.829620e-01  2.779400e-01
  1.222500e-01 -2.345070e-01 -7.791430e-01  6.422940e-01  3.167230e-01
 -3.914640e-01  3.333300e-01  2.291640e-01 -9.465310e-01 -2.157560e-01
 -3.246800e-02 -3.029230e-01  9.146800e-02 -1.788646e+00 -2.995630e-01
 -3.183580e-01 -7.586490e-01  2.524000e-03 -6.656960e-01  7.843900e-01
  1.341660e-01  6.273990e-01  3.014050e-01 -4.354190e-01  1.121057e+00]
50


In [56]:
df = pd.read_csv("rumor-election-brazil-2018.csv", delimiter=';')
texto = df['texto']
rotulos = (df['rotulo']=='VERDADE').astype(int)

class_names = ["FALSO", "VERDADEIRO"]

print(texto[:10])
print(rotulos[:10])

0    Salário Mínimo: R$ 950,00. Bolsa Presidiário: ...
1    Empresa contratada pelo TSE para apuração dos ...
2    O Aloizio Mercadante, ministro da Educação, mo...
3    Há um complô espalhando fake news descaradas e...
4    Somente em 2017, mais de 800 milhões de tonela...
5    Nunca vi o Lula pronunciar essa palavra fascis...
6    O Mourão, por exemplo, foi ele próprio tortura...
7    O PSB, todos os seus governadores e o seu pres...
8    Bolsonaro Nunca aprovou um projeto de seguranç...
9    Ele Lula não pode aparecer mais que 25% no hor...
Name: texto, dtype: object
0    0
1    0
2    0
3    0
4    1
5    0
6    0
7    0
8    1
9    1
Name: rotulo, dtype: int64


In [57]:
rng = np.random.RandomState(1)
rng.shuffle(texto)
rng = np.random.RandomState(1)
rng.shuffle(rotulos)

validation_split = 0.1
num_validation = int(validation_split * len(texto))
x_train = texto[:-num_validation]
x_val = texto[-num_validation:]
y_train = rotulos[:-num_validation]
y_val = rotulos[-num_validation:]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Vocabulário irá considerar até 20 mil palavras, e irá truncar sequências com mais de 32 tokens

In [58]:
vocab_size = 20000 
maxlen = 25

from tensorflow.keras.layers.experimental.preprocessing import TextVectorization

vectorizer = TextVectorization(max_tokens=vocab_size, output_sequence_length=maxlen)
text_ds = tf.data.Dataset.from_tensor_slices(x_train).batch(16)
vectorizer.adapt(text_ds)

voc = vectorizer.get_vocabulary()
word_index = dict(zip(voc, range(len(voc))))

In [60]:
num_tokens = len(voc) + 2
print("Número de tokens: ", num_tokens)
embedding_dim = 50
convertidas = 0
falhas = 0

# Prepare embedding matrix
embedding_matrix = np.zeros((num_tokens, embedding_dim))
print(embedding_matrix.shape)
for word, i in word_index.items():
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        if (embedding_vector.shape[0] != embedding_dim):
          falhas += 1
        else:
          # Words not found in embedding index will be all-zeros.
          # This includes the representation for "padding" and "OOV"
          embedding_matrix[i] = embedding_vector
          convertidas += 1
    else:
        falhas += 1

print("Palavras convertidas: %d / não convertidas: %d)" % (convertidas, falhas))


Número de tokens:  1944
(1944, 50)
Palavras convertidas: 1785 / não convertidas: 157)


In [61]:
x_train = vectorizer(np.array([[s] for s in x_train])).numpy()
x_val = vectorizer(np.array([[s] for s in x_val])).numpy()

y_train = np.array(y_train)
y_val = np.array(y_val)

print(x_train.shape)
print(x_val.shape)

(414, 25)
(46, 25)


---
## Implementação de Transformer

Apoorv Nandan

https://keras.io/examples/nlp/text_classification_with_transformer/

Camada Multi-head Self-attention

In [101]:
class MultiHeadSelfAttention(layers.Layer):
    def __init__(self, embed_dim, num_heads=8):
        super(MultiHeadSelfAttention, self).__init__()
        self.embed_dim = embed_dim
        self.num_heads = num_heads
        if embed_dim % num_heads != 0:
            raise ValueError(
                f"embedding dimension = {embed_dim} should be divisible by number of heads = {num_heads}"
            )
        self.projection_dim = embed_dim // num_heads
        self.query_dense = layers.Dense(embed_dim)
        self.key_dense = layers.Dense(embed_dim)
        self.value_dense = layers.Dense(embed_dim)
        self.combine_heads = layers.Dense(embed_dim)

    def attention(self, query, key, value):
        score = tf.matmul(query, key, transpose_b=True)
        dim_key = tf.cast(tf.shape(key)[-1], tf.float32)
        scaled_score = score / tf.math.sqrt(dim_key)
        weights = tf.nn.softmax(scaled_score, axis=-1)
        output = tf.matmul(weights, value)
        return output, weights

    def separate_heads(self, x, batch_size):
        x = tf.reshape(x, (batch_size, -1, self.num_heads, self.projection_dim))
        return tf.transpose(x, perm=[0, 2, 1, 3])

    def call(self, inputs):
        # x.shape = [batch_size, seq_len, embedding_dim]
        batch_size = tf.shape(inputs)[0]
        query = self.query_dense(inputs)  # (batch_size, seq_len, embed_dim)
        key = self.key_dense(inputs)  # (batch_size, seq_len, embed_dim)
        value = self.value_dense(inputs)  # (batch_size, seq_len, embed_dim)
        query = self.separate_heads(
            query, batch_size
        )  # (batch_size, num_heads, seq_len, projection_dim)
        key = self.separate_heads(
            key, batch_size
        )  # (batch_size, num_heads, seq_len, projection_dim)
        value = self.separate_heads(
            value, batch_size
        )  # (batch_size, num_heads, seq_len, projection_dim)
        attention, weights = self.attention(query, key, value)
        attention = tf.transpose(
            attention, perm=[0, 2, 1, 3]
        )  # (batch_size, seq_len, num_heads, projection_dim)
        concat_attention = tf.reshape(
            attention, (batch_size, -1, self.embed_dim)
        )  # (batch_size, seq_len, embed_dim)
        output = self.combine_heads(
            concat_attention
        )  # (batch_size, seq_len, embed_dim)
        return output

Bloco Transformer com Atenção + combinação residual + normalização + dropout

In [64]:
class TransformerBlock(layers.Layer):
    def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1):
        super(TransformerBlock, self).__init__()
        self.att = MultiHeadSelfAttention(embed_dim, num_heads)
        self.ffn = keras.Sequential(
            [layers.Dense(ff_dim, activation="relu"), layers.Dense(embed_dim),]
        )
        self.layernorm1 = layers.LayerNormalization(epsilon=1e-6)
        self.layernorm2 = layers.LayerNormalization(epsilon=1e-6)
        self.dropout1 = layers.Dropout(rate)
        self.dropout2 = layers.Dropout(rate)

    def call(self, inputs, training):
        attn_output = self.att(inputs)
        attn_output = self.dropout1(attn_output, training=training)
        out1 = self.layernorm1(inputs + attn_output)
        ffn_output = self.ffn(out1)
        ffn_output = self.dropout2(ffn_output, training=training)
        return self.layernorm2(out1 + ffn_output)



#### Camada de Embedding, contendo word embedding e vetor com posições das palavras

In [65]:
class TokenAndPositionEmbedding(layers.Layer):
    def __init__(self, maxlen, vocab_size, embed_dim, embedding_matrix):
        super(TokenAndPositionEmbedding, self).__init__()
        self.token_emb = layers.Embedding(
            input_dim=maxlen, 
            output_dim=embed_dim,
            embeddings_initializer=keras.initializers.Constant(embedding_matrix),
            trainable=False)
        self.pos_emb = layers.Embedding(
            input_dim=maxlen, 
            output_dim=embed_dim)
        
    def call(self, x):
        maxlen = tf.shape(x)[-1]
        positions = tf.range(start=0, limit=maxlen, delta=1)
        positions = self.pos_emb(positions)
        x = self.token_emb(x)
        return x + positions



### Montando a rede Transformer

In [96]:
num_heads = 2  # Number of attention heads
ff_dim = 32  # Hidden layer size in feed forward network inside transformer

inputs = layers.Input(shape=(maxlen,))
embedding_layer = TokenAndPositionEmbedding(num_tokens, vocab_size, embedding_dim, embedding_matrix)
x = embedding_layer(inputs)
transformer_block = TransformerBlock(embedding_dim, num_heads, ff_dim)
x = transformer_block(x)
x = layers.GlobalAveragePooling1D()(x)
x = layers.Dropout(0.1)(x)
x = layers.Dense(16, activation="relu")(x)
x = layers.Dropout(0.2)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)

modelT = keras.Model(inputs=inputs, outputs=outputs)
modelT.summary()

Model: "functional_49"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_34 (InputLayer)        [(None, 25)]              0         
_________________________________________________________________
token_and_position_embedding (None, 25, 50)            194400    
_________________________________________________________________
transformer_block_18 (Transf (None, 25, 50)            13682     
_________________________________________________________________
global_average_pooling1d_17  (None, 50)                0         
_________________________________________________________________
dropout_70 (Dropout)         (None, 50)                0         
_________________________________________________________________
dense_142 (Dense)            (None, 16)                816       
_________________________________________________________________
dropout_71 (Dropout)         (None, 16)              

In [97]:
modelT.compile("adam", "binary_crossentropy", metrics=["accuracy"])
history = modelT.fit(
    x_train, y_train, batch_size=32, epochs=20, validation_data=(x_val, y_val)
)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [100]:
string_input = keras.Input(shape=(1,), dtype="string")
x = vectorizer(string_input)
preds = modelT(x)
end_to_end_model = keras.Model(string_input, preds)

frase = "Na pós graduação, as mulheres são maioria"
classe = (end_to_end_model.predict([[frase]])[0] > 0.5).astype(int)
print(frase, ': ', class_names[classe[0]])

frase = "As queimadas esse ano são equivalentes a uma área do tamanho do Reino Unido"
classe = (end_to_end_model.predict([[frase]])[0] > 0.5).astype(int)
print(frase, ': ', class_names[classe[0]])

frase = "Acabou a corrupção no Brasil"
classe = (end_to_end_model.predict([[frase]])[0] > 0.5).astype(int)
print(frase, ': ', class_names[classe[0]])

frase = "Para poder ganhar eleições, presidente faz aliança com partidos grandes"
classe = (end_to_end_model.predict([[frase]])[0] > 0.5).astype(int)
print(frase, ': ', class_names[classe[0]])

Na pós graduação, as mulheres são maioria :  VERDADEIRO
As queimadas esse ano são equivalentes a uma área do tamanho do Reino Unido :  VERDADEIRO
Acabou a corrupção no Brasil :  FALSO
Para poder ganhar eleições, presidente faz aliança com partidos grandes :  VERDADEIRO
