<a href="https://colab.research.google.com/github/irshandyaditya/machine_learning/blob/main/P10/Praktikum_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Setup**

In [4]:
import numpy as np
import tensorflow_datasets as tfds
import tensorflow as tf

tfds.disable_progress_bar()

In [5]:
import matplotlib.pyplot as plt

def plot_graphs(history, metric):
  plt.plot(history.history[metric])
  plt.plot(history.history['val_'+metric], '')
  plt.xlabel("Epochs")
  plt.ylabel(metric)
  plt.legend([metric, 'val_'+metric])

### **Setup input pipeline**

In [6]:
dataset, info = tfds.load('imdb_reviews', with_info=True,
                          as_supervised=True)
train_dataset, test_dataset = dataset['train'], dataset['test']

train_dataset.element_spec

Downloading and preparing dataset 80.23 MiB (download: 80.23 MiB, generated: Unknown size, total: 80.23 MiB) to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0...
Dataset imdb_reviews downloaded and prepared to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0. Subsequent calls will reuse this data.


(TensorSpec(shape=(), dtype=tf.string, name=None),
 TensorSpec(shape=(), dtype=tf.int64, name=None))

Mengembalikan dataset (teks, pasangan label):

In [7]:
for example, label in train_dataset.take(1):
  print('text: ', example.numpy())
  print('label: ', label.numpy())

text:  b"This was an absolutely terrible movie. Don't be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie's ridiculous storyline. This movie is an early nineties US propaganda piece. The most pathetic scenes were those when the Columbian rebels were making their cases for revolutions. Maria Conchita Alonso appeared phony, and her pseudo-love affair with Walken was nothing but a pathetic emotional plug in a movie that was devoid of any real meaning. I am disappointed that there are movies like this, ruining actor's like Christopher Walken's good name. I could barely sit through it."
label:  0


Acak data untuk pelatihan dan membuat kumpulan pasangan (teks, label)

In [8]:
BUFFER_SIZE = 10000
BATCH_SIZE = 64

train_dataset = train_dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
test_dataset = test_dataset.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)

for example, label in train_dataset.take(1):
  print('texts: ', example.numpy()[:3])
  print()
  print('labels: ', label.numpy()[:3])


texts:  [b"I found myself very caught up in this movie, at least at the beginning, and any credit I give to this movie, is Lacey Chabert, she was fantastic!! But thats where it ends. I seem to be very good at figuring out who the killer is, and I like it when a movie is able to completely baffel me, but I felt out and out lied to, they whole time they lead you in one direction and then suddenly they decided to go in a completely different direction at the end, they gave no hit to it at all, thats not misleading that very bad writing and planning, someone did not think at all!<br /><br />I felt the movie would have been much better if they had stuck to the plot that the lead you on, they also seemed to not answer anything, why did Jane(maria) burn down the professor's house.<br /><br />Its a great pity as I felt it started out as a relatively good movie."
 b"This was my first introduction to the world of Bollywood and I'm now hooked! Okay so it requires adoption of a different mindset t

### **Buat Teks Encoder**

In [9]:
VOCAB_SIZE = 1000
encoder = tf.keras.layers.TextVectorization(
    max_tokens=VOCAB_SIZE)
encoder.adapt(train_dataset.map(lambda text, label: text))

In [10]:
vocab = np.array(encoder.get_vocabulary())
vocab[:20]

array(['', '[UNK]', 'the', 'and', 'a', 'of', 'to', 'is', 'in', 'it', 'i',
       'this', 'that', 'br', 'was', 'as', 'for', 'with', 'movie', 'but'],
      dtype='<U14')

In [11]:
encoded_example = encoder(example)[:3].numpy()
encoded_example

array([[ 10, 249, 532, ...,   0,   0,   0],
       [ 11,  14,  56, ...,   0,   0,   0],
       [  1,   2, 805, ...,   0,   0,   0]])

In [12]:
for n in range(3):
  print("Original: ", example[n].numpy())
  print("Round-trip: ", " ".join(vocab[encoded_example[n]]))
  print()

Original:  b"I found myself very caught up in this movie, at least at the beginning, and any credit I give to this movie, is Lacey Chabert, she was fantastic!! But thats where it ends. I seem to be very good at figuring out who the killer is, and I like it when a movie is able to completely baffel me, but I felt out and out lied to, they whole time they lead you in one direction and then suddenly they decided to go in a completely different direction at the end, they gave no hit to it at all, thats not misleading that very bad writing and planning, someone did not think at all!<br /><br />I felt the movie would have been much better if they had stuck to the plot that the lead you on, they also seemed to not answer anything, why did Jane(maria) burn down the professor's house.<br /><br />Its a great pity as I felt it started out as a relatively good movie."
Round-trip:  i found myself very [UNK] up in this movie at least at the beginning and any [UNK] i give to this movie is [UNK] [UNK]

### **Buat Model**

In [47]:
model = tf.keras.Sequential([
    encoder,
    tf.keras.layers.Embedding(
        input_dim=len(encoder.get_vocabulary()),
        output_dim=64,
        # Use masking to handle the variable sequence lengths
        mask_zero=True),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1)
])

In [48]:
print([layer.supports_masking for layer in model.layers])

[False, True, True, True, True]


In [49]:
# predict on a sample text without padding.

sample_text = ['The movie was cool. The animation and the graphics ',
               'were out of this world. I would recommend this movie.']

# Lakukan prediksi
text_tensor = tf.convert_to_tensor(sample_text)
predictions = model.predict(text_tensor)
print(predictions[0])

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step
[2.6549213e-05]


In [50]:
# predict on a sample text with padding

padding = "the " * 2000
text_padding = padding + " " .join(sample_text)
text_padding_convert = tf.convert_to_tensor([text_padding])
predictions = model.predict(text_padding_convert)
print(predictions[0])

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 645ms/step
[0.01217136]


In [51]:
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              optimizer=tf.keras.optimizers.Adam(1e-4),
              metrics=['accuracy'])

# **Train Model**

In [None]:
history = model.fit(train_dataset, epochs=10,
 validation_data=test_dataset,
 validation_steps=30)

Epoch 1/10
[1m169/391[0m [32m━━━━━━━━[0m[37m━━━━━━━━━━━━[0m [1m6:31[0m 2s/step - accuracy: 0.4873 - loss: 0.6928