
# **Text Assistant Generative Model (TAGModel)**
---
## **Introduction**
Text generation is a fascinating field within natural language processing (NLP) that involves creating coherent and contextually relevant text based on given input. In this notebook, we'll explore how to build a text generation model using a Transformer architecture, a powerful deep learning model known for its effectiveness in handling sequential data.



## **Importing Libraries**
We start by importing the necessary libraries and modules required for building and training our Transformer model. These include TensorFlow, NumPy, Matplotlib, and other relevant components.


In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Dropout, LayerNormalization, MultiHeadAttention, Embedding
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.callbacks import ModelCheckpoint
import pickle
import os
import matplotlib.pyplot as plt

## **Scaled Dot-Product Attention Mechanism**
The scaled dot-product attention mechanism is a key component of the Transformer architecture. It computes the dot product of the query and key vectors, scales it, applies a softmax function to obtain attention weights, and finally computes a weighted sum of the value vectors.

In [None]:
def scaled_dot_product_attention(queries, keys, values, mask=None):
    product = tf.matmul(queries, keys, transpose_b=True)

    keys_dim = tf.cast(tf.shape(keys)[-1], tf.float32)
    scaled_product = product / tf.math.sqrt(keys_dim)

    if mask is not None:
        scaled_product += (mask * -1e9)

    attention_weights = tf.nn.softmax(scaled_product, axis=-1)

    output = tf.matmul(attention_weights, values)
    return output, attention_weights

## **Multi-Head Attention Layer**
The multi-head attention layer allows the model to focus on different parts of the input sequence independently by splitting the query, key, and value vectors into multiple heads and then concatenating the results.

In [None]:
class MultiHeadAttentionLayer(tf.keras.layers.Layer):
    def __init__(self, num_heads, head_size, dropout=0.1):
        super(MultiHeadAttentionLayer, self).__init__()
        self.num_heads = num_heads
        self.head_size = head_size
        self.dropout = dropout

        self.depth = self.head_size // self.num_heads
        self.wq = Dense(self.head_size)
        self.wk = Dense(self.head_size)
        self.wv = Dense(self.head_size)
        self.dense = Dense(self.head_size)

    def split_heads(self, x, batch_size):
        x = tf.reshape(x, (batch_size, -1, self.num_heads, self.depth))
        return tf.transpose(x, perm=[0, 2, 1, 3])

    def call(self, inputs):
        q, k, v, mask = inputs['query'], inputs['key'], inputs['value'], inputs['mask']
        batch_size = tf.shape(q)[0]

        query = self.wq(q)
        key = self.wk(k)
        value = self.wv(v)

        query = self.split_heads(query, batch_size)
        key = self.split_heads(key, batch_size)
        value = self.split_heads(value, batch_size)

        scaled_attention, attention_weights = scaled_dot_product_attention(query, key, value, mask)
        scaled_attention = tf.transpose(scaled_attention, perm=[0, 2, 1, 3])

        concat_attention = tf.reshape(scaled_attention, (batch_size, -1, self.head_size))

        outputs = self.dense(concat_attention)
        return outputs, attention_weights

## **Positional Encoding**
Since the Transformer model doesn't inherently understand the order of the sequence, positional encoding is added to provide information about the position of each token in the sequence.

In [None]:
def positional_encoding(max_len, head_size):
    pos = tf.cast(tf.range(max_len)[:, tf.newaxis], dtype=tf.float32)
    i = tf.cast(tf.range(head_size)[tf.newaxis, :], dtype=tf.float32)
    angle_rads = pos / tf.pow(10000, (i-i%2)/head_size)
    sines = tf.math.sin(angle_rads[:, 0::2])
    cosines = tf.math.cos(angle_rads[:, 1::2])
    return tf.concat([sines, cosines], axis=-1)

## **Transformer Encoder Layer**
The transformer encoder layer consists of a multi-head self-attention mechanism followed by a position-wise feed-forward neural network. It applies layer normalization and residual connections around each sub-layer.

In [None]:
def transformer_encoder_layer(inputs, head_size, num_heads, ff_dim, dropout=0):
    attention, _ = MultiHeadAttentionLayer(num_heads, head_size)(inputs={'query': inputs, 'key': inputs, 'value': inputs, 'mask': None})
    attention = Dropout(dropout)(attention)
    attention = LayerNormalization(epsilon=1e-6)(inputs + attention)

    ffn = tf.keras.Sequential(
        [Dense(ff_dim, activation="relu"), Dense(inputs.shape[-1]),]
    )
    ffn_out = ffn(attention)
    ffn_out = Dropout(dropout)(ffn_out)
    return LayerNormalization(epsilon=1e-6)(attention + ffn_out)


## **Building the Transformer Model**
We build the transformer model by stacking multiple transformer encoder layers. This model takes tokenized input sequences, embeds them, and passes them through the encoder layers to generate output sequences.

In [None]:
def transformer_encoder(inputs, head_size, num_heads, ff_dim, num_layers, dropout=0):
    inputs += positional_encoding(tf.shape(inputs)[1], head_size)
    x = inputs
    for _ in range(num_layers):
        x = transformer_encoder_layer(x, head_size, num_heads, ff_dim, dropout)
    return x

def build_transformer_model(max_len, vocab_size, head_size, num_heads, ff_dim, num_layers, dropout=0.5):
    inputs = Input(shape=(max_len,))
    embedding_layer = Embedding(input_dim=vocab_size, output_dim=head_size)(inputs)
    x = transformer_encoder(embedding_layer, head_size, num_heads, ff_dim, num_layers, dropout)
    x = Dense(vocab_size, activation="softmax")(x[:, -1, :])
    return Model(inputs=inputs, outputs=x)

## **Loading and Preprocessing the Dataset**
We load and preprocess the dataset by tokenizing the text data and generating input sequences for training the model.


In [None]:
dataset_path = "TAGModel/dataset/"
conversations = []
for file_name in os.listdir(dataset_path):
    if file_name.endswith('.txt'):
        file_path = os.path.join(dataset_path, file_name)
        with open(file_path, 'r', encoding='utf-8') as file:
            conversations.append(file.read())

conversations = [line for convo in conversations for line in convo.splitlines()]

tokenizer = tf.keras.preprocessing.text.Tokenizer(char_level=False, filters='', lower=False, oov_token="[-START-] [-END-]")
tokenizer.fit_on_texts(conversations)
total_words = len(tokenizer.word_index) + 1

tokenizer_path = "TAGModel/models/tokenizer.pkl"
with open(tokenizer_path, 'wb') as tokenizer_file:
    pickle.dump(tokenizer, tokenizer_file)

input_sequences = []
for line in conversations:
    token_list = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(token_list)):
        n_gram_sequence = token_list[:i+1]
        input_sequences.append(n_gram_sequence)

max_sequence_len = max([len(x) for x in input_sequences])
input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre'))

X, y = input_sequences[:,:-1], input_sequences[:,-1]

## **Training the Model**
With the dataset prepared, we train our Transformer model using the compiled model, optimizer, loss function, and training data.

In [None]:
head_size = 720
num_heads = 8
ff_dim = 2045
num_layers = 14
dropout = 0.2

checkpoint_path = "TAGModel/checkpoint/"
checkpoint_callback = ModelCheckpoint(
    filepath=checkpoint_path,
    save_weights_only=True,
    monitor='val_loss',
    mode='min',
    save_best_only=True
)



model = build_transformer_model(max_sequence_len - 1, total_words, head_size, num_heads, ff_dim, num_layers, dropout)
model.compile(optimizer=Adam(learning_rate=0.0001), loss="sparse_categorical_crossentropy", metrics=['accuracy'])

history = model.fit(X, y, batch_size=245, epochs=50, validation_split=0.3, verbose=1, callbacks=[checkpoint_callback])

model.load_weights(checkpoint_path)

model.save("TAGModel/models/TAGModel")

## **Generating Text**
After training, we use the trained model to generate text based on a given input sequence.

In [None]:

# Predict function
def generate_text(model, tokenizer, initial_text, temperature=1.0, max_length=1024):
    input_text = initial_text + " [-START-]"
    output_text = "[-START-] "
    for _ in range(max_length):
        tokenized_input = tokenizer.texts_to_sequences([input_text])[0]
        padded_input = pad_sequences([tokenized_input], maxlen=max_sequence_len-1, padding='pre')
        response = model.predict(padded_input)[0]

        # Adjust predictions with temperature scaling
        response = np.log(response) / temperature
        exp_response = np.exp(response)
        response = exp_response / np.sum(exp_response)

        # Sample the next token
        if output_text.startswith('[-START-]'):
            predicted_word_index = np.random.choice(len(response), p=response)
            predicted_word = tokenizer.index_word[predicted_word_index]
        
        if predicted_word == '[-END-]':
            break

        input_text += ' ' + predicted_word
        output_text += predicted_word + ' '      

    output_text = output_text.replace("[-START-] ", "")


    return output_text

# Example usage
user_input = "Halo! Apa kabar?"
response = generate_text(model=model, tokenizer=tokenizer, initial_text=user_input, temperature=0.3)
print("\033[92m" + "GENERATE >> " + "\033[0m" + " " + "\033[96m" + response + "\033[0m")

## **Visualizing Training History**
Lastly, we visualize the training history of our model using scatter plots, which display the loss and accuracy metrics over each epoch of training.

In [None]:
plt.scatter(range(1, len(history.history['loss']) + 1), history.history['loss'], label='Training Loss')
plt.scatter(range(1, len(history.history['val_loss']) + 1), history.history['val_loss'], label='Validation Loss')
plt.scatter(range(1, len(history.history['accuracy']) + 1), history.history['accuracy'], label='Training Accuracy')
plt.scatter(range(1, len(history.history['val_accuracy']) + 1), history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Metric Value')
plt.title('Training History')
plt.legend()
plt.savefig("TAGModel/image/1.jpeg")
plt.show()

## **Conclusion**

In this notebook, we've explored the implementation of a Transformer-based text generation model using TensorFlow and Keras. Let's recap the key functions and their roles in the model:

1. **Scaled Dot-Product Attention Mechanism**: This function computes the scaled dot product of the query and key vectors, applies a softmax function to obtain attention weights, and computes a weighted sum of the value vectors. It allows the model to focus on different parts of the input sequence based on their importance.

2. **Multi-Head Attention Layer**: The multi-head attention layer splits the input into multiple heads, computes attention independently, and then concatenates the results. This mechanism enables the model to capture different aspects of the input sequence simultaneously.

3. **Positional Encoding**: Since the Transformer model lacks inherent understanding of sequence order, positional encoding is added to provide positional information to the model. It helps the model learn the sequential relationships between tokens in the input sequence.

4. **Transformer Encoder Layer**: This layer consists of a multi-head self-attention mechanism followed by a position-wise feed-forward neural network. It applies layer normalization and residual connections around each sub-layer, allowing the model to effectively capture and process sequential information.

5. **Building the Transformer Model**: We stack multiple transformer encoder layers to build the complete Transformer model. The model takes tokenized input sequences, embeds them, and passes them through the encoder layers to generate output sequences.

6. **Loading and Preprocessing the Dataset**: We load and preprocess the dataset by tokenizing the text data and generating input sequences for training the model.

7. **Training the Model**: With the dataset prepared, we train our Transformer model using the compiled model, optimizer, loss function, and training data. We monitor the training progress and save the best weights using callbacks.

8. **Generating Text**: After training, we use the trained model to generate text based on a given input sequence. The model predicts the next token in the sequence iteratively, generating coherent and contextually relevant text.

9. **Visualizing Training History**: Lastly, we visualize the training history of our model using scatter plots. These plots display the loss and accuracy metrics over each epoch of training, providing insights into the model's performance and convergence.

By understanding and implementing these functions, we've built a powerful text generation model capable of generating diverse and contextually relevant text based on given input. This notebook serves as a comprehensive guide to building and training Transformer-based models for text generation tasks.




---



Code by ExzDeveloper

Developer by Ezra Valen Ne Tofa

Email: officialbangezz@gmail.com

Github: https://github.com/exzgit

Repository: https://github.com/exzgit/TAG-Model

Support me: https://ko-fi.com/exzcsm
