<a href="https://colab.research.google.com/github/ilhamydn17/2141720091-MachLearn-2023/blob/week-10/Praktikum_2_%26_TUGAS_RNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Praktikum 2 - RNN

In [1]:
import tensorflow as tf
import numpy as np
import os
import time

## Preprocessing

In [2]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt


In [3]:
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
print(f'Length of text: {len(text)} characters')

Length of text: 1115394 characters


In [4]:
print(text[:250])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.



In [5]:
vocab = sorted(set(text))
print(f'{len(vocab)} unique characters')

65 unique characters


## Text Processing

In [6]:
example_texts = ['abcdefg', 'xyz']
chars = tf.strings.unicode_split(example_texts, input_encoding='UTF-8')
chars

<tf.RaggedTensor [[b'a', b'b', b'c', b'd', b'e', b'f', b'g'], [b'x', b'y', b'z']]>

In [7]:
ids_from_chars = tf.keras.layers.StringLookup(
vocabulary=list(vocab), mask_token=None)

In [8]:
ids = ids_from_chars(chars)
ids

<tf.RaggedTensor [[40, 41, 42, 43, 44, 45, 46], [63, 64, 65]]>

In [9]:
chars_from_ids = tf.keras.layers.StringLookup(
    vocabulary=ids_from_chars.get_vocabulary(), invert=True, mask_token=None)

In [10]:
chars = chars_from_ids(ids)
chars

<tf.RaggedTensor [[b'a', b'b', b'c', b'd', b'e', b'f', b'g'], [b'x', b'y', b'z']]>

In [11]:
tf.strings.reduce_join(chars, axis=-1).numpy()

array([b'abcdefg', b'xyz'], dtype=object)

In [12]:
def text_from_ids(ids):
    return tf.strings.reduce_join(chars_from_ids(ids), axis=-1)

In [13]:
all_ids = ids_from_chars(tf.strings.unicode_split(text, 'UTF-8'))
all_ids

<tf.Tensor: shape=(1115394,), dtype=int64, numpy=array([19, 48, 57, ..., 46,  9,  1])>

In [14]:
ids_dataset = tf.data.Dataset.from_tensor_slices(all_ids)

In [15]:
for ids in ids_dataset.take(10):
    print(chars_from_ids(ids).numpy().decode('utf-8'))

F
i
r
s
t
 
C
i
t
i


In [16]:
seq_length = 100

In [17]:
sequences = ids_dataset.batch(seq_length+1, drop_remainder=True)

for seq in sequences.take(1):
  print(chars_from_ids(seq))

tf.Tensor(
[b'F' b'i' b'r' b's' b't' b' ' b'C' b'i' b't' b'i' b'z' b'e' b'n' b':'
 b'\n' b'B' b'e' b'f' b'o' b'r' b'e' b' ' b'w' b'e' b' ' b'p' b'r' b'o'
 b'c' b'e' b'e' b'd' b' ' b'a' b'n' b'y' b' ' b'f' b'u' b'r' b't' b'h'
 b'e' b'r' b',' b' ' b'h' b'e' b'a' b'r' b' ' b'm' b'e' b' ' b's' b'p'
 b'e' b'a' b'k' b'.' b'\n' b'\n' b'A' b'l' b'l' b':' b'\n' b'S' b'p' b'e'
 b'a' b'k' b',' b' ' b's' b'p' b'e' b'a' b'k' b'.' b'\n' b'\n' b'F' b'i'
 b'r' b's' b't' b' ' b'C' b'i' b't' b'i' b'z' b'e' b'n' b':' b'\n' b'Y'
 b'o' b'u' b' '], shape=(101,), dtype=string)


In [18]:
for seq in sequences.take(5):
    print(text_from_ids(seq).numpy())

b'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '
b'are all resolved rather to die than to famish?\n\nAll:\nResolved. resolved.\n\nFirst Citizen:\nFirst, you k'
b"now Caius Marcius is chief enemy to the people.\n\nAll:\nWe know't, we know't.\n\nFirst Citizen:\nLet us ki"
b"ll him, and we'll have corn at our own price.\nIs't a verdict?\n\nAll:\nNo more talking on't; let it be d"
b'one: away, away!\n\nSecond Citizen:\nOne word, good citizens.\n\nFirst Citizen:\nWe are accounted poor citi'


In [19]:
def split_input_target(sequence):
    input_text = sequence[:-1]
    target_text = sequence[1:]
    return input_text, target_text

In [20]:
split_input_target(list("Tensorflow"))

(['T', 'e', 'n', 's', 'o', 'r', 'f', 'l', 'o'],
 ['e', 'n', 's', 'o', 'r', 'f', 'l', 'o', 'w'])

In [21]:
dataset = sequences.map(split_input_target)

In [22]:
for input_example, target_example in dataset.take(1):
    print("Input :", text_from_ids(input_example).numpy())
    print("Target:", text_from_ids(target_example).numpy())

Input : b'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou'
Target: b'irst Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '


In [23]:
BATCH_SIZE = 64
BUFFER_SIZE = 10000

dataset = (
    dataset
    .shuffle(BUFFER_SIZE)
    .batch(BATCH_SIZE, drop_remainder=True)
    .prefetch(tf.data.experimental.AUTOTUNE))

dataset

<_PrefetchDataset element_spec=(TensorSpec(shape=(64, 100), dtype=tf.int64, name=None), TensorSpec(shape=(64, 100), dtype=tf.int64, name=None))>

In [24]:
vocab_size = len(ids_from_chars.get_vocabulary())

embedding_dim = 256

rnn_units = 1024

In [25]:
class MyModel(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, rnn_units):
    super().__init__(self)
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(rnn_units,
                                   return_sequences=True,
                                   return_state=True)
    self.dense = tf.keras.layers.Dense(vocab_size)

  def call(self, inputs, states=None, return_state=False, training=False):
    x = inputs
    x = self.embedding(x, training=training)
    if states is None:
      states = self.gru.get_initial_state(x)
    x, states = self.gru(x, initial_state=states, training=training)
    x = self.dense(x, training=training)

    if return_state:
      return x, states
    else:
      return x

In [26]:
model = MyModel(
    vocab_size=vocab_size,
    embedding_dim=embedding_dim,
    rnn_units=rnn_units)

In [27]:
for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predictions = model(input_example_batch)
    print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

(64, 100, 66) # (batch_size, sequence_length, vocab_size)


In [28]:
model.summary()

Model: "my_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       multiple                  16896     
                                                                 
 gru (GRU)                   multiple                  3938304   
                                                                 
 dense (Dense)               multiple                  67650     
                                                                 
Total params: 4022850 (15.35 MB)
Trainable params: 4022850 (15.35 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [29]:
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices, axis=-1).numpy()

In [30]:
print("Input:\n", text_from_ids(input_example_batch[0]).numpy())
print()
print("Next Char Predictions:\n", text_from_ids(sampled_indices).numpy())

Input:
 b's yet I do not: but, as I can learn,\nHe hearkens after prophecies and dreams;\nAnd from the cross-row'

Next Char Predictions:
 b'QvX[UNK]utoVV3k-wV,\n$utTQ.BZ:aGZcoTJLil[UNK]qqG-putxEUgCcDNWlPOlQCqUw:PetU,ilCjn-qAFI?ERm?CsL$rMFEGVVs R\nYhy'


In [31]:
loss = tf.losses.SparseCategoricalCrossentropy(from_logits=True)

In [32]:
example_batch_mean_loss = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("Mean loss:        ", example_batch_mean_loss)

Prediction shape:  (64, 100, 66)  # (batch_size, sequence_length, vocab_size)
Mean loss:         tf.Tensor(4.1905766, shape=(), dtype=float32)


In [33]:
tf.exp(example_batch_mean_loss).numpy()

66.06087

In [34]:
model.compile(optimizer='adam', loss=loss)

In [35]:
checkpoint_dir = './training_checkpoints'

checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

In [36]:
EPOCHS = 20

In [37]:
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [38]:
class OneStep(tf.keras.Model):
  def __init__(self, model, chars_from_ids, ids_from_chars, temperature=1.0):
    super().__init__()
    self.temperature = temperature
    self.model = model
    self.chars_from_ids = chars_from_ids
    self.ids_from_chars = ids_from_chars

    skip_ids = self.ids_from_chars(['[UNK]'])[:, None]
    sparse_mask = tf.SparseTensor(
        values=[-float('inf')]*len(skip_ids),
        indices=skip_ids,
        dense_shape=[len(ids_from_chars.get_vocabulary())])
    self.prediction_mask = tf.sparse.to_dense(sparse_mask)

  @tf.function
  def generate_one_step(self, inputs, states=None):
    input_chars = tf.strings.unicode_split(inputs, 'UTF-8')
    input_ids = self.ids_from_chars(input_chars).to_tensor()
    predicted_logits, states = self.model(inputs=input_ids, states=states,
                                          return_state=True)

    predicted_logits = predicted_logits[:, -1, :]
    predicted_logits = predicted_logits/self.temperature
    predicted_logits = predicted_logits + self.prediction_mask

    predicted_ids = tf.random.categorical(predicted_logits, num_samples=1)
    predicted_ids = tf.squeeze(predicted_ids, axis=-1)

    predicted_chars = self.chars_from_ids(predicted_ids)
    return predicted_chars, states

In [39]:
one_step_model = OneStep(model, chars_from_ids, ids_from_chars)

In [40]:
start = time.time()
states = None
next_char = tf.constant(['ROMEO:'])
result = [next_char]

for n in range(1000):
  next_char, states = one_step_model.generate_one_step(next_char, states=states)
  result.append(next_char)

result = tf.strings.join(result)
end = time.time()
print(result[0].numpy().decode('utf-8'), '\n\n' + '_'*80)
print('\nRun time:', end - start)

ROMEO:
Come to me: you know the manacles of our bitter
continuent sup ill-body, and Some puts and
To call me some scather, brainly stay.

Nurse:
Ay, brain! why? Now, afore God for best
To serve thy conscience to me from fasting yet in one to age
thy counsel's knees. Behold the royal dry too late
I give unto be moght, and, whilst I play't
And take the souls rid on san so nink.

YORK:
They should knew by his remedy?

BISHOP OF CARLISLE:
My lord.

CORIOLANUS:
Let them have watch'd, with no less cancelliness
That kiss the sweets? adden his lessons
And manner to do so remove
As the shepherds do with a lover.

TRANIO:
So this is Ludy steel.

LEONTES:
Hark!

First Senator:
The conscience sir,
Your tributary deeds must wish one for tyick,
And then draw interrupts, these weeps,
Which makes me down, all in your hateful prince.

WARWICK:
And till the duke my husband--whither York,
No better woadd! he wakes us well.
I will be hurt'd with nothing.
More legst you speak upon't!
What, do you jealous u

In [41]:
start = time.time()
states = None
next_char = tf.constant(['ROMEO:', 'ROMEO:', 'ROMEO:', 'ROMEO:', 'ROMEO:'])
result = [next_char]

for n in range(1000):
  next_char, states = one_step_model.generate_one_step(next_char, states=states)
  result.append(next_char)

result = tf.strings.join(result)
end = time.time()
print(result, '\n\n' + '_'*80)
print('\nRun time:', end - start)

tf.Tensor(
[b"ROMEO:\nGrimmNord's in a man'st friend.\nLet them plant but when I seem cowards her love?\nWhere's Clifford duty undischarged at unpregame?\n\nTailor:\nWhy I should bid him or no happy dry he of such\nBirst against thy brother veil'd than steal\nThan party from the light tomb company.\n\nABPHOSER:\nLet me enjoy' with mildness arms.\n\nBRUTUS:\nLet them but tell me, I'll be your king now. Where brother,\nA wife to look here, and this is England's king\nHath now his son for and name of get:\nUnto my vice is nothing; for the other flours\nAnd brought it, even here I came from year,\nSo shalt he be made for moved to begg\na beggars.\n\nPROSPERO:\nTwife is banish'd; and he shall be cross'd.\n\nWARWICK:\nSoft father Edward Bianca, stand alood,\nYou shall harm in her present, course of justice;\nWhere, ere we now? what with a fault? Who's there?\n\nKING RICHARD II:\nRight.\n\nARCHARDINEN:\nA dig!\nTurn gild, my wife, and leave his native years,\nUnto his revolit, we should have 

In [42]:
tf.saved_model.save(one_step_model, 'one_step')
one_step_reloaded = tf.saved_model.load('one_step')



In [43]:
states = None
next_char = tf.constant(['ROMEO:'])
result = [next_char]

for n in range(100):
  next_char, states = one_step_reloaded.generate_one_step(next_char, states=states)
  result.append(next_char)

print(tf.strings.join(result)[0].numpy().decode("utf-8"))

ROMEO:
She may, my lord, think you:
if you shall come to the people, not with law;
Remom his ghost in pala


## Tugas Praktikum

In [44]:
class CustomTraining(MyModel):
  @tf.function
  def train_step(self, inputs):
    inputs, labels = inputs
    with tf.GradientTape() as tape:
      predictions = self(inputs, training=True)
      loss = self.loss(labels, predictions)
    grads = tape.gradient(loss, model.trainable_variables)
    self.optimizer.apply_gradients(zip(grads, model.trainable_variables))

    return {'loss': loss}

In [45]:
model = CustomTraining(
  vocab_size=len(ids_from_chars.get_vocabulary()),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units)

In [46]:
model.compile(optimizer = tf.keras.optimizers.Adam(),
       loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))

In [47]:
model.fit(dataset, epochs=1)



<keras.src.callbacks.History at 0x7e256cc7a2f0>

In [48]:
EPOCHS = 10

mean = tf.metrics.Mean()

for epoch in range(EPOCHS):
  start = time.time()

  mean.reset_states()
  for (batch_n, (inp, target)) in enumerate(dataset):
    logs = model.train_step([inp, target])
    mean.update_state(logs['loss'])

    if batch_n % 50 == 0:
      template = f"Epoch {epoch+1} Batch {batch_n} Loss {logs['loss']:.4f}"
      print(template)

  # saving (checkpoint) the model every 5 epochs
  if (epoch + 1) % 5 == 0:
    model.save_weights(checkpoint_prefix.format(epoch=epoch))

  print()
  print(f'Epoch {epoch+1} Loss: {mean.result().numpy():.4f}')
  print(f'Time taken for 1 epoch {time.time() - start:.2f} sec')
  print("_"*80)

model.save_weights(checkpoint_prefix.format(epoch=epoch))

Epoch 1 Batch 0 Loss 2.1720
Epoch 1 Batch 50 Loss 2.0585
Epoch 1 Batch 100 Loss 1.9523
Epoch 1 Batch 150 Loss 1.8397

Epoch 1 Loss: 1.9776
Time taken for 1 epoch 12.88 sec
________________________________________________________________________________
Epoch 2 Batch 0 Loss 1.7987
Epoch 2 Batch 50 Loss 1.7510
Epoch 2 Batch 100 Loss 1.6993
Epoch 2 Batch 150 Loss 1.6618

Epoch 2 Loss: 1.6999
Time taken for 1 epoch 11.79 sec
________________________________________________________________________________
Epoch 3 Batch 0 Loss 1.5536
Epoch 3 Batch 50 Loss 1.5854
Epoch 3 Batch 100 Loss 1.5495
Epoch 3 Batch 150 Loss 1.5125

Epoch 3 Loss: 1.5420
Time taken for 1 epoch 11.94 sec
________________________________________________________________________________
Epoch 4 Batch 0 Loss 1.4976
Epoch 4 Batch 50 Loss 1.4258
Epoch 4 Batch 100 Loss 1.4511
Epoch 4 Batch 150 Loss 1.3881

Epoch 4 Loss: 1.4463
Time taken for 1 epoch 11.61 sec
_____________________________________________________________________

### Soal
Berdasarkan kode dan 2 kelas yang telah dilakukan diatas, dapat dilihat perbedaanya seperti berikut:

1. Tujuan Kelas
  * ``OneStep``: Tujuan dari kelas OneStep adalah untuk melakukan satu langkah generasi teks, mengambil teks input sebelumnya, dan menghasilkan satu karakter atau token berikutnya dalam teks yang akan digenerate. Kelas ini digunakan untuk menghasilkan teks berdasarkan model bahasa yang telah dipelajari.
  * ``CustomTraining``: Tujuan dari kelas CustomTraining adalah untuk melatih model bahasa. Kelas ini digunakan untuk melakukan langkah pelatihan yang mencakup perhitungan loss, perhitungan gradien, dan pembaruan parameter-model.
2. Metode Utama
  * ``OneStep``: memiliki metode utama ``generate_one_step``, yang akan mengambil teks input sebelumnya dan menghasilkan karakter dan token berikutnya dalam teks berdasarkan model bahasa.
  * ``CustomTraining``: Metode utama dalam kelas CustomTraining adalah train_step. Metode ini digunakan selama pelatihan model dan mencakup perhitungan loss, perhitungan gradien, dan pembaruan parameter-model.
3. Prosedur Pelatihan
  * ``OneStep``: kelas ini tidak terlibat dalam proses pelatihan model, hanya digunakan untuk menghasilkan teks berdasarkan model yang telah dipelajari.
  * ``CustomTraining``: kelas ini digunanakan selama proses pelatihan model, yang melakukan perhitungan loss, gradien, dan pembaruan paramater.
4. Output Hasil Pelatihan
  * ``OneStep``: menghasilkan karakter atau token berikutnya dalam teks yang akan digenerate. Output tersebut merupakan teks yang dihasilkan model berdasarkan teks input sebelumnya.
  * ``Custom Training``: menghasilkan output dengan loss yang dihasilkan saat proses pelatihan, yang mana nilai tersebut digunakan untuk mengukur sejauh mana prediksi model sesuai dengan label yang seharusnya.
5. Durasi Waktu Eksekusi Kode
  * Durasi waktu eksekusi kodenya akan berbeda antara kedua kelas, hal tersebut karena bergantung pada kompleksitas perhitungan yang dilakukan. Kelas ``OneStep`` hanya melakukan satu langkah generasi, yang dapat dilakukan dengan cepat, sedangkan kelas ``CustomTraining`` melakukan lebih banyak perhitungan selama proses pelatihan, sehingga waktu eksekusinya mungkin lebih lama.
6. Akurasi Prediksi
  * ``OneStep`` : Akurasi prediksi dalam kelas OneStep sulit untuk dievaluasi secara langsung karena tidak terlibat dalam proses pelatihan model. Fokusnya kelas ini hanya menghasilkan teks yang koheren dan berkualitas tinggi, bukan memprediksi bagaimana hasil dari output.
  * ``CustomTraining``: Selama proses pelatihan, model menghasilkan prediksi teks, dan akurasi prediksi dinilai berdasarkan sejauh mana prediksi model sesuai dengan label yang seharusnya. Output dari kelas CustomTraining adalah loss yang dihasilkan selama proses pelatihan, yang mana nilai tersebut digunakan untuk mengukur sejauh mana prediksi model sesuai dengan label. Semakin rendah nilai loss, semakin baik akurasi prediksi model.

  ****

