<a href="https://colab.research.google.com/github/OPzinn/lia1_2024_2/blob/main/Entregas%20-%20Jo%C3%A3o%20Septimio%20Zeferino/C%C3%B3pia_de_Aula_12_Construindo_um_modelo_com_TensorFlow_IMDB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Projeto Ponta a Ponta - Construindo um Modelo com Tensorflow -  IMDB**

**Problema:** construir um modelo de Inteligência Artificial capaz de classificar reviews usando 10k palavras

**Não há mágica. Há matemática!** 🧙

In [14]:
# Importando o necessário
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.preprocessing.text import Tokenizer
import matplotlib.pyplot as plt
import numpy as np

**Carregando os dados de Treino e Teste e Definindo Parametros**


In [3]:
# Parametros IMDB
max_words = 10000
max_len = 500

In [4]:
# Carrega o dataset IMDB - Já está no Keras!
# Todos os DS no Keras -> https://keras.io/api/datasets/
(x_treino, y_treino), (x_teste, y_teste) = datasets.imdb.load_data(num_words = max_words)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
[1m17464789/17464789[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step


In [5]:
x_treino = pad_sequences(x_treino, maxlen = max_len)
x_teste = pad_sequences(x_teste, maxlen = max_len)

**Construindo a Rede Neural Convolucional**

O Keras é uma biblioteca do TensorFlow.

In [10]:
modelo_joao = Sequential()
#camadas de classificação
modelo_joao.add(Embedding(input_dim=max_words, output_dim=128, input_length=max_len))  # Embedding para representar palavras
modelo_joao.add(LSTM(64))  # LSTM para capturar dependências sequenciais
modelo_joao.add(Dense(1, activation='sigmoid'))  # Saída binária



In [11]:
# Compilação do modelo
modelo_joao.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# adam - é um algoritmo de aprendizagem tipo backpropagation!
# loss - função de erro. Isso se resume a uma otimização função matemática!
# metrics - medir o sucesso!

**Treinamento**

In [13]:
# Executando o treinamento
%%time
history = modelo_joao.fit(x_treino, y_treino, epochs=5, batch_size=64, validation_data=(x_teste, y_teste))

Epoch 1/5
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 29ms/step - accuracy: 0.7179 - loss: 0.5371 - val_accuracy: 0.8192 - val_loss: 0.4094
Epoch 2/5
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 34ms/step - accuracy: 0.8239 - loss: 0.3914 - val_accuracy: 0.7887 - val_loss: 0.4672
Epoch 3/5
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 35ms/step - accuracy: 0.8791 - loss: 0.2981 - val_accuracy: 0.8734 - val_loss: 0.3134
Epoch 4/5
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 33ms/step - accuracy: 0.9359 - loss: 0.1751 - val_accuracy: 0.8772 - val_loss: 0.3196
Epoch 5/5
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 27ms/step - accuracy: 0.9493 - loss: 0.1399 - val_accuracy: 0.8645 - val_loss: 0.3587
CPU times: user 1min, sys: 2.11 s, total: 1min 2s
Wall time: 1min 30s


Treinamento concluído com sucesso! 💪 Observe se ao final de cada época a acurácia aumenta.

**Previsão - Testar o modelo (Deploy)**

Testar o modelo com uma review para ver se ela é positiva ou negativa.


In [39]:
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.datasets import imdb

# codigo para determinar positivo ou negativo
def analise_filme(modelo_joao, review, word_index, max_words=10000, max_len=500):
    # associa palavras ao token imdb
    tokenizer = Tokenizer(num_words=max_words)
    tokenizer.word_index = word_index

    # conversao do texto para inteiros
    sequences = tokenizer.texts_to_sequences([review])

    # verifica se esta vazia
    if len(sequences[0]) == 0:
        return "Review contains unknown words."

    # padronizacao para compatibilidade com o modelo
    padded_sequence = pad_sequences(sequences, maxlen=max_len)

    # prever
    prediction = modelo_joao.predict(padded_sequence)

    # retorno de resultado
    return 'Positiva' if prediction >= 0.5 else 'Negativa'

# carregamento da database de palavras
word_index = imdb.get_word_index()

# exemplo de review de click 2006 para testar
review = """It's not the typical Adam Sandler movie and thank goodness for that. This movie has some actual drama,
some real heart to it. It's not all lowbrow toilet humor. Has Adam Sandler grown up? Even in this more grownup venture
he apparently just couldn't help himself, tossing in the obligatory disgusting fart joke. But we'll give him a pass on
that one because pretty much everything else in the movie shows a refreshing maturity. Well, OK maybe not the humping
dogs. But what do you want? Sandler's never going to go full-blown serious dramatist and who'd want him to? This movie
maintains the humor Sandler is known for but also gives you a story you actually care about and moments of great emotion
and poignancy. Along the way Sandler gets to show that he does have some actual serious acting chops. One scene with him
and his father, played by Henry Winkler, particularly stands out. Here Sandler's character has so much emotion coursing
through him. And Sandler performs the scene so well you feel the emotion right along with him. Very well done, and more
than a little surprising from an actor who is not known for this sort of thing."""

# Fazer a previsão
resultado = analise_filme(modelo_joao, review, word_index)
print(f"The review sentiment is: {resultado}")


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step
The review sentiment is: Positiva


Fim! 🔥