# Seq2Seq Model for Text Transformation

This notebook demonstrates how to train a Seq2Seq model for text transformation using TensorFlow. The model can be used to transform input texts into desired output texts based on the training data provided.

The notebook is organized into the following sections:
1. Load Data from JSON
2. Tokenize Sequences
3. Create the Seq2Seq Model
4. Compile and Train the Model
5. Save the Model
6. Generate Transformed Output


In [None]:
import tensorflow as tf
import numpy as np
import json

## 1. Load Data from JSON

In this section, we load the input-output data from a JSON file.

In [None]:
# Load data from JSON file
with open('mariya_anti_data.json') as f:
    data = json.load(f)

input_texts = []
output_texts = []
for item in data:
    input_text = item['input']
    output_text = item['output']
    input_texts.append(input_text)
    output_texts.append(output_text)

## 2. Tokenize Sequences

In this section, we tokenize the input and output sequences and prepare them for model training.

In [None]:
# Tokenize input and output sequences
input_tokenizer = tf.keras.preprocessing.text.Tokenizer()
input_tokenizer.fit_on_texts(input_texts)
input_sequences = input_tokenizer.texts_to_sequences(input_texts)
max_input_length = max(len(seq) for seq in input_sequences)
input_sequences = tf.keras.preprocessing.sequence.pad_sequences(input_sequences, maxlen=max_input_length)

output_tokenizer = tf.keras.preprocessing.text.Tokenizer(filters='')
output_tokenizer.fit_on_texts(output_texts)
output_sequences = output_tokenizer.texts_to_sequences(output_texts)
max_output_length = max(len(seq) for seq in output_sequences)
output_sequences = tf.keras.preprocessing.sequence.pad_sequences(output_sequences, maxlen=max_output_length)

## 3. Create the Seq2Seq Model

In this section, we define and configure the Seq2Seq model architecture using TensorFlow's Keras API.

In [None]:
# Create the Seq2Seq model
embedding_dim = 128
hidden_units = 256

encoder_inputs = tf.keras.layers.Input(shape=(None,))
encoder_embedding = tf.keras.layers.Embedding(len(input_tokenizer.word_index) + 1, embedding_dim)(encoder_inputs)
encoder_lstm = tf.keras.layers.LSTM(hidden_units, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding)
encoder_states = [state_h, state_c]

decoder_inputs = tf.keras.layers.Input(shape=(None,))
decoder_embedding = tf.keras.layers.Embedding(len(output_tokenizer.word_index) + 1, embedding_dim)(decoder_inputs)
decoder_lstm = tf.keras.layers.LSTM(hidden_units, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=encoder_states)
decoder_dense = tf.keras.layers.Dense(len(output_tokenizer.word_index) + 1, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs[:, :-1, :])  # Remove the last timestamp from the output

model = tf.keras.models.Model([encoder_inputs, decoder_inputs], decoder_outputs)

## 4. Compile and Train the Model

In this section, we compile and train the Seq2Seq model using the tokenized input and output sequences.

In [None]:
# Compile and train the model
model.compile(optimizer=tf.keras.optimizers.Adam(), loss='sparse_categorical_crossentropy')
model.fit([input_sequences, output_sequences[:, :-1]], np.expand_dims(output_sequences[:, 1:], -1), batch_size=1, epochs=50)

## 5. Save the Model

In this section, we save the trained Seq2Seq model for future use.

In [None]:
# Save the model
model.save('seq2seq_model.h5')

## 6. Generate Transformed Output

In this section, we define a function to generate transformed output for new input texts using the trained Seq2Seq model.

In [None]:
# Generate transformed output using the trained model
def generate_transformed_output(input_text):
    input_sequence = input_tokenizer.texts_to_sequences([input_text])
    input_sequence = tf.keras.preprocessing.sequence.pad_sequences(input_sequence, maxlen=input_sequences.shape[1])
    output_sequence = np.zeros((1, output_sequences.shape[1]))

    for i in range(output_sequences.shape[1]):
        predictions = model.predict([input_sequence, output_sequence]).argmax(axis=-1)
        output_sequence[0, i] = predictions[0, i]

    output_text = output_tokenizer.sequences_to_texts(output_sequence)[0]
    return output_text

## Test the Model

In this section, you can test the model with sample input texts and observe the transformed outputs.

In [None]:
# Test the model with a sample input
input_text = input("Enter text: ")
transformed_output = generate_transformed_output(input_text)
print("Input:", input_text)
print("Transformed Output:", transformed_output)