<a href="https://colab.research.google.com/github/frcaulia/machine_learning/blob/main/Module_10.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Faricha Aulia ( 2141720155 )**
Machine Learning Course : Module 10 Lab Work Assignment

---

### **Set Up**

Import library

In [None]:
import tensorflow as tf
import numpy as np
import os
import time

Download Dataset Shakespeare

In [None]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt','https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

Load Data

In [None]:
# Read, then decode for py2 compat.
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print(f'Length of text: {len(text)} characters')

In [None]:
# Take a look at the first 250 characters in text
print(text[:250])

In [None]:
# The unique characters in the file
vocab = sorted(set(text))
print(f'{len(vocab)} unique characters')

### **Text Processing**

Vectorize Text

In [None]:
example_texts = ['abcdefg','xyz']
chars = tf.strings.unicode_split(example_texts, input_encoding='UTF-8')
chars

<tf.RaggedTensor [[b'a', b'b', b'c', b'd', b'e', b'f', b'g'], [b'x', b'y', b'z']]>

In [None]:
ids_from_chars = tf.keras.layers.StringLookup(vocabulary=list(vocab),mask_token=None)

In [None]:
ids = ids_from_chars(chars)
ids

<tf.RaggedTensor [[40, 41, 42, 43, 44, 45, 46], [63, 64, 65]]>

In [None]:
chars_from_ids = tf.keras.layers.StringLookup(vocabulary=ids_from_chars.get_vocabulary(), invert=True, mask_token=None)

In [None]:
chars = chars_from_ids(ids)
chars

<tf.RaggedTensor [[b'a', b'b', b'c', b'd', b'e', b'f', b'g'], [b'x', b'y', b'z']]>

In [None]:
tf.strings.reduce_join(chars,axis=-1).numpy()

array([b'abcdefg', b'xyz'], dtype=object)

Creating the Training Set and Target

In [None]:
all_ids = ids_from_chars(tf.strings.unicode_split(text, 'UTF-8'))
all_ids

<tf.Tensor: shape=(1115394,), dtype=int64, numpy=array([19, 48, 57, ..., 46,  9,  1])>

In [None]:
ids_dataset = tf.data.Dataset.from_tensor_slices(all_ids)

In [None]:
for ids in ids_dataset.take(10):
    print(chars_from_ids(ids).numpy().decode('utf-8'))

F
i
r
s
t
 
C
i
t
i


In [None]:
seq_length = 100

In [None]:
sequences = ids_dataset.batch(seq_length+1, drop_remainder=True)

for seq in sequences.take(1):
  print(chars_from_ids(seq))

tf.Tensor(
[b'F' b'i' b'r' b's' b't' b' ' b'C' b'i' b't' b'i' b'z' b'e' b'n' b':'
 b'\n' b'B' b'e' b'f' b'o' b'r' b'e' b' ' b'w' b'e' b' ' b'p' b'r' b'o'
 b'c' b'e' b'e' b'd' b' ' b'a' b'n' b'y' b' ' b'f' b'u' b'r' b't' b'h'
 b'e' b'r' b',' b' ' b'h' b'e' b'a' b'r' b' ' b'm' b'e' b' ' b's' b'p'
 b'e' b'a' b'k' b'.' b'\n' b'\n' b'A' b'l' b'l' b':' b'\n' b'S' b'p' b'e'
 b'a' b'k' b',' b' ' b's' b'p' b'e' b'a' b'k' b'.' b'\n' b'\n' b'F' b'i'
 b'r' b's' b't' b' ' b'C' b'i' b't' b'i' b'z' b'e' b'n' b':' b'\n' b'Y'
 b'o' b'u' b' '], shape=(101,), dtype=string)


In [None]:
ids_from_chars = tf.keras.layers.StringLookup(
    vocabulary=ids_from_chars.get_vocabulary(), invert=True, mask_token=None)

def text_from_ids(ids):
    return tf.strings.reduce_join(ids_from_chars(ids), axis=-1).numpy()

In [None]:
for seq in sequences.take(5):
    print(text_from_ids(seq))

b'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '
b'are all resolved rather to die than to famish?\n\nAll:\nResolved. resolved.\n\nFirst Citizen:\nFirst, you k'
b"now Caius Marcius is chief enemy to the people.\n\nAll:\nWe know't, we know't.\n\nFirst Citizen:\nLet us ki"
b"ll him, and we'll have corn at our own price.\nIs't a verdict?\n\nAll:\nNo more talking on't; let it be d"
b'one: away, away!\n\nSecond Citizen:\nOne word, good citizens.\n\nFirst Citizen:\nWe are accounted poor citi'


In [None]:
def split_input_target(sequence):
  input_text = sequence[:-1]
  target_text = sequence[1:]
  return input_text, target_text

In [None]:
split_input_target(list("Tensorflow"))

(['T', 'e', 'n', 's', 'o', 'r', 'f', 'l', 'o'],
 ['e', 'n', 's', 'o', 'r', 'f', 'l', 'o', 'w'])

In [None]:
dataset = sequences.map(split_input_target)

In [None]:
for input_example, target_example in dataset.take(1):
  print("Input :", text_from_ids(input_example))
  print("Target:", text_from_ids(target_example))

Input : b'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou'
Target: b'irst Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '


Creating Batch Training

In [None]:
# Batch size
BATCH_SIZE = 64

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

dataset = (
    dataset
    .shuffle(BUFFER_SIZE)
    .batch(BATCH_SIZE, drop_remainder=True)
    .prefetch(tf.data.experimental.AUTOTUNE))

dataset

<_PrefetchDataset element_spec=(TensorSpec(shape=(64, 100), dtype=tf.int64, name=None), TensorSpec(shape=(64, 100), dtype=tf.int64, name=None))>

### **Create a Model**

In [None]:
# Length of the vocabulary in StringLookup Layer
vocab_size = len(ids_from_chars.get_vocabulary())

# The embedding dimension
embedding_dim = 256

# Number of RNN units
rnn_units = 1024

In [None]:
class MyModel(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, rnn_units):
    super().__init__(self)
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(rnn_units,
                                   return_sequences=True,
                                   return_state=True)
    self.dense = tf.keras.layers.Dense(vocab_size)

  def call(self, inputs, states=None, return_state=False, training=False):
    x = inputs
    x = self.embedding(x, training=training)
    if states is None:
      states = self.gru.get_initial_state(x)
    x, states = self.gru(x, initial_state=states, training=training)
    x = self.dense(x, training=training)

    if return_state:
      return x, states
    else:
      return x

In [None]:
model = MyModel(
    vocab_size=vocab_size,
    embedding_dim=embedding_dim,
    rnn_units=rnn_units)

### **Test The Model**

In [None]:
for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predictions = model(input_example_batch)
    print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

(64, 100, 66) # (batch_size, sequence_length, vocab_size)


In [None]:
model.summary()

Model: "my_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       multiple                  16896     
                                                                 
 gru (GRU)                   multiple                  3938304   
                                                                 
 dense (Dense)               multiple                  67650     
                                                                 
Total params: 4022850 (15.35 MB)
Trainable params: 4022850 (15.35 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [None]:
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices, axis=-1).numpy()

In [None]:
sampled_indices

array([24, 24, 48, 23, 60,  0, 11, 19, 19,  2, 43,  4, 12, 16,  7, 14, 10,
       47, 24, 31, 52,  4, 47, 18, 52, 43, 13, 45, 18, 38, 47, 34, 51,  6,
       56, 40, 57, 32, 65, 32, 49, 63, 23, 30, 50, 12, 62, 50, 39, 17, 12,
       28, 65, 41,  8,  7, 22, 46, 65, 32, 40, 54, 37, 34, 28, 38,  2, 62,
       25, 37, 40, 12, 30, 36, 58, 52, 47, 56, 32, 16, 47, 57, 21, 33,  7,
        3, 23, 10, 52, 22, 58,  0, 42, 45, 29, 21, 15, 43, 18, 29])

In [None]:
print("Input:\n", text_from_ids(input_example_batch[0]))
print()
print("Next Char Predictions:\n", text_from_ids(sampled_indices))

Input:
 b"s wife by the arm,\nThat little thinks she has been sluiced in's absence\nAnd his pond fish'd by his n"

Next Char Predictions:
 b"KKiJu[UNK]:FF d$;C,A3hKRm$hEmd?fEYhUl'qarSzSjxJQk;wkZD;Ozb-,IgzSaoXUOY wLXa;QWsmhqSChrHT,!J3mIs[UNK]cfPHBdEP"


## **Labwork Assignment**

The training procedure in practical 2 is a simple procedure that does not give you much control. This model uses 'teacher-forcing,' which prevents bad predictions from being fed back into the model, so the model never learns to recover from mistakes. So, after you've seen how to run the model manually, you'll implement a custom training loop. This provides a starting point if, for example, you want to implement curriculum learning to help stabilize the open-loop model's output. The most crucial part of the custom training loop is the training step function.

Use tf.GradientTape to track gradient values. You can learn more about this approach by reading the eager execution guide.

In [None]:
class CustomTraining(MyModel):
  @tf.function
  def train_step(self, inputs):
      inputs, labels = inputs
      with tf.GradientTape() as tape:
          predictions = self(inputs, training=True)
          loss = self.loss(labels, predictions)
      grads = tape.gradient(loss, model.trainable_variables)
      self.optimizer.apply_gradients(zip(grads, model.trainable_variables))

      return {'loss': loss}

In [None]:
model = CustomTraining(vocab_size=len(ids_from_chars.get_vocabulary()), embedding_dim=embedding_dim, rnn_units=rnn_units)

In [None]:
model.compile(optimizer = tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))

In [None]:
model.fit(dataset, epochs=1)



<keras.src.callbacks.History at 0x7910445ba6b0>

In [None]:
EPOCHS = 10

mean = tf.metrics.Mean()

for epoch in range(EPOCHS):
    start = time.time()

    mean.reset_states()
    for (batch_n, (inp, target)) in enumerate(dataset):
        logs = model.train_step([inp, target])
        mean.update_state(logs['loss'])

        if batch_n % 50 == 0:
            template = f"Epoch {epoch+1} Batch {batch_n} Loss {logs['loss']:.4f}"
            print(template)

    # saving (checkpoint) the model every 5 epochs
    if (epoch + 1) % 5 == 0:
        model.save_weights(checkpoint_prefix.format(epoch=epoch))

    print()
    print(f'Epoch {epoch+1} Loss: {mean.result().numpy():.4f}')
    print(f'Time taken for 1 epoch {time.time() - start:.2f} sec')
    print("_"*80)

model.save_weights(checkpoint_prefix.format(epoch=epoch))

Epoch 1 Batch 0 Loss 2.1702
Epoch 1 Batch 50 Loss 2.0578
Epoch 1 Batch 100 Loss 1.9357
Epoch 1 Batch 150 Loss 1.8700

Epoch 1 Loss: 2.0004
Time taken for 1 epoch 1101.92 sec
________________________________________________________________________________
Epoch 2 Batch 0 Loss 1.8065
Epoch 2 Batch 50 Loss 1.7755
Epoch 2 Batch 100 Loss 1.6923
Epoch 2 Batch 150 Loss 1.6657

Epoch 2 Loss: 1.7217
Time taken for 1 epoch 1041.91 sec
________________________________________________________________________________
Epoch 3 Batch 0 Loss 1.6007
Epoch 3 Batch 50 Loss 1.6007


**Run the code above and tell me how it differs from lab 2?**

The difference between Practicum 2 and Assignment lies in their training procedures.

1. Practicum 2 (Classic Training Loop):
  * Uses the model.fit function to train the model (standard training approach commonly used in TensorFlow)
  * Loss is calculated with 'tf.losses.SparseCategoricalCrossentropy', defined separately but not used in the training process. Instructions to print the loss are present but might not be suitable in the given context.
  * The training process uses model.fit by providing the dataset and the number of epochs.

2. Assignment (Custom Training Loop):
  * Employs a more customized training approach. It involves creating a custom class, CustomTraining, which overrides the train_step method.
  * Inside train_step, the loss is computed with the self.loss function, likely an integrated loss function within the model. Gradients are calculated and applied to the model more explicitly.
  * The training process uses a specific loop that processes batches of the dataset. Each batch is processed manually by calling train_step on the custom model, and the mean loss is computed for each epoch.

Conclusion : Code in the assignment provides more flexibility and control over the training procedure. We can specify steps such as loss calculation and gradient computation more directly. While Code 1 is more user-friendly, it may not offer the same level of control as Code 2.