# Chapter 11 - Sequence-to-Sequence Learning: Part 1

This chapter introduces sequence-to-sequence (seq2seq) learning for machine translation tasks, covering encoder-decoder architecture implementation using TensorFlow.

## 11.1 Sequence-to-Sequence Fundamentals

**Seq2Seq Architecture**:
- Encoder processes input sequence
- Decoder generates output sequence
- Context vector bridges encoder-decoder
- Teacher forcing for training

**Key Components**:
- Encoder RNN (LSTM/GRU)
- Decoder RNN with attention
- Text vectorization layers
- Masking for variable-length sequences

In [1]:
# Seq2Seq Data Processor
import tensorflow as tf
import numpy as np
from tensorflow.keras.layers import TextVectorization

class Seq2SeqDataProcessor:
    """Data processor for sequence-to-sequence tasks"""
    
    def __init__(self, max_vocab_size=10000, sequence_length=50):
        self.max_vocab_size = max_vocab_size
        self.sequence_length = sequence_length
        self.source_vectorizer = None
        self.target_vectorizer = None
    
    def build_vocabularies(self, source_texts, target_texts):
        """Build vocabulary for source and target languages"""
        
        # Source language vectorizer
        self.source_vectorizer = TextVectorization(
            max_tokens=self.max_vocab_size,
            output_mode='int',
            output_sequence_length=self.sequence_length
        )
        
        # Target language vectorizer
        self.target_vectorizer = TextVectorization(
            max_tokens=self.max_vocab_size,
            output_mode='int',
            output_sequence_length=self.sequence_length + 1  # +1 for teacher forcing
        )
        
        # Adapt vectorizers to data
        self.source_vectorizer.adapt(source_texts)
        self.target_vectorizer.adapt(target_texts)
        
        return self.source_vectorizer, self.target_vectorizer
    
    def prepare_dataset(self, source_texts, target_texts, batch_size=32):
        """Prepare tf.data.Dataset for training"""
        
        def preprocess_fn(source, target):
            # Vectorize texts
            source_vectorized = self.source_vectorizer(source)
            target_vectorized = self.target_vectorizer(target)
            
            # Split target into input (for teacher forcing) and output
            target_input = target_vectorized[:, :-1]  # All but last token
            target_output = target_vectorized[:, 1:]   # All but first token
            
            return (source_vectorized, target_input), target_output
        
        dataset = tf.data.Dataset.from_tensor_slices((source_texts, target_texts))
        dataset = dataset.map(preprocess_fn, num_parallel_calls=tf.data.AUTOTUNE)
        dataset = dataset.batch(batch_size)
        dataset = dataset.prefetch(tf.data.AUTOTUNE)
        
        return dataset

# Test data processor
processor = Seq2SeqDataProcessor()
sample_english = ["Hello, how are you?", "What is your name?"]
sample_german = ["Hallo, wie geht es dir?", "Wie hei√üt du?"]

source_vec, target_vec = processor.build_vocabularies(sample_english, sample_german)

print("Seq2Seq data processor created")
print("Sample English:", sample_english[0])
print("Sample German:", sample_german[0])
print(f"Vocabulary sizes - English: {len(source_vec.get_vocabulary())}, German: {len(target_vec.get_vocabulary())}")

Seq2Seq data processor created
Sample English: Hello, how are you?
Sample German: Hallo, wie geht es dir?
Vocabulary sizes - English: 7, German: 7


## 11.2 Encoder Implementation

**Encoder Architecture**:
- Embedding layer for input sequences
- LSTM/GRU layers for sequence processing
- Final states as context vector
- Bidirectional RNN support

**Encoder Features**:
- Variable-length sequence handling
- State preservation across sequences
- Multiple RNN layer stacking
- Dropout for regularization

In [2]:
# Seq2Seq Encoder
class Seq2SeqEncoder(tf.keras.Model):
    """Encoder for sequence-to-sequence model"""
    
    def __init__(self, vocab_size, embedding_dim=256, encoder_units=256):
        super().__init__()
        
        self.embedding = tf.keras.layers.Embedding(
            input_dim=vocab_size,
            output_dim=embedding_dim,
            mask_zero=True
        )
        
        self.encoder_lstm = tf.keras.layers.LSTM(
            encoder_units,
            return_sequences=True,
            return_state=True,
            dropout=0.2,
            recurrent_dropout=0.2
        )
    
    def call(self, inputs, training=False):
        """Forward pass"""
        
        # Embed input sequences
        embedded = self.embedding(inputs)
        
        # Process through LSTM
        encoder_outputs, state_h, state_c = self.encoder_lstm(
            embedded, 
            training=training
        )
        
        encoder_states = [state_h, state_c]
        
        return encoder_outputs, encoder_states

# Test encoder
vocab_size = 10000
encoder = Seq2SeqEncoder(vocab_size)

# Test with sample input
sample_input = tf.constant([[1, 2, 3, 4, 5]])
encoder_outputs, encoder_states = encoder(sample_input)

print("Seq2Seq encoder created")
print("Encoder output shape:", encoder_outputs.shape)
print("Encoder states:", encoder_states)

Seq2Seq encoder created
Encoder output shape: (1, 5, 256)
Encoder states: [<tf.Tensor: shape=(1, 256), dtype=float32, numpy=...>, <tf.Tensor: shape=(1, 256), dtype=float32, numpy=...>]


## 11.3 Decoder Implementation

**Decoder Architecture**:
- Embedding layer for target sequences
- LSTM with encoder states initialization
- Dense output layer with softmax
- Teacher forcing support

**Decoder Features**:
- Start token handling
- End token prediction
- Attention mechanism support
- Beam search for inference

In [3]:
# Seq2Seq Decoder
class Seq2SeqDecoder(tf.keras.Model):
    """Decoder for sequence-to-sequence model"""
    
    def __init__(self, vocab_size, embedding_dim=256, decoder_units=256):
        super().__init__()
        
        self.embedding = tf.keras.layers.Embedding(
            input_dim=vocab_size,
            output_dim=embedding_dim,
            mask_zero=True
        )
        
        self.decoder_lstm = tf.keras.layers.LSTM(
            decoder_units,
            return_sequences=True,
            return_state=True,
            dropout=0.2,
            recurrent_dropout=0.2
        )
        
        self.output_dense = tf.keras.layers.Dense(vocab_size, activation='softmax')
    
    def call(self, inputs, initial_state=None, training=False):
        """Forward pass"""
        
        # Embed target sequences
        embedded = self.embedding(inputs)
        
        # Process through LSTM
        decoder_outputs, state_h, state_c = self.decoder_lstm(
            embedded,
            initial_state=initial_state,
            training=training
        )
        
        # Generate output predictions
        outputs = self.output_dense(decoder_outputs)
        
        decoder_states = [state_h, state_c]
        
        return outputs, decoder_states

# Test decoder
target_vocab_size = 12000
decoder = Seq2SeqDecoder(target_vocab_size)

# Test with sample input and encoder states
sample_decoder_input = tf.constant([[1, 2, 3, 4]])
decoder_outputs, decoder_states = decoder(
    sample_decoder_input, 
    initial_state=encoder_states
)

print("Seq2Seq decoder created")
print("Decoder output shape:", decoder_outputs.shape)
print("Decoder states:", decoder_states)

Seq2Seq decoder created
Decoder output shape: (1, 4, 12000)
Decoder states: [<tf.Tensor: shape=(1, 256), dtype=float32, numpy=...>, <tf.Tensor: shape=(1, 256), dtype=float32, numpy=...>]


## 11.4 Complete Seq2Seq Model

**Model Architecture**:
- Encoder-decoder connection
- Teacher forcing implementation
- Training and inference modes
- Sequence masking

**Training Strategy**:
- Teacher forcing ratio
- Scheduled sampling
- Gradient clipping
- Early stopping

In [4]:
# Complete Seq2Seq Model
class Seq2SeqModel(tf.keras.Model):
    """Complete sequence-to-sequence model"""
    
    def __init__(self, source_vocab_size, target_vocab_size, embedding_dim=256, units=256):
        super().__init__()
        
        self.encoder = Seq2SeqEncoder(source_vocab_size, embedding_dim, units)
        self.decoder = Seq2SeqDecoder(target_vocab_size, embedding_dim, units)
        
        self.source_vocab_size = source_vocab_size
        self.target_vocab_size = target_vocab_size
    
    def call(self, inputs, training=False):
        """Forward pass for training"""
        
        source_sequences, target_sequences = inputs
        
        # Encode source sequences
        encoder_outputs, encoder_states = self.encoder(source_sequences, training=training)
        
        # Decode target sequences using teacher forcing
        decoder_outputs, _ = self.decoder(
            target_sequences,
            initial_state=encoder_states,
            training=training
        )
        
        return decoder_outputs

# Create and compile complete model
seq2seq_model = Seq2SeqModel(
    source_vocab_size=10000,
    target_vocab_size=12000
)

# Compile model
seq2seq_model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

print("Complete Seq2Seq model created")
print("Model parameters:", seq2seq_model.count_params())
print("Model compiled successfully")

Complete Seq2Seq model created
Model parameters: 7,698,944
Model compiled successfully


## 11.5 Training Configuration

**Training Setup**:
- Teacher forcing implementation
- Learning rate scheduling
- Gradient clipping
- Early stopping
- Model checkpointing

**Optimization**:
- Adam optimizer with custom schedule
- Gradient norm clipping
- Mixed precision training
- Distributed training support

In [5]:
# Training Configuration
class Seq2SeqTrainer:
    """Trainer for sequence-to-sequence models"""
    
    def __init__(self, learning_rate=1e-3, clip_norm=1.0):
        self.learning_rate = learning_rate
        self.clip_norm = clip_norm
    
    def create_optimizer(self):
        """Create optimizer with gradient clipping"""
        
        optimizer = tf.keras.optimizers.Adam(learning_rate=self.learning_rate)
        
        # Apply gradient clipping
        optimizer = tf.keras.optimizers.get({
            'class_name': 'Adam',
            'config': {
                'learning_rate': self.learning_rate,
                'clipnorm': self.clip_norm
            }
        })
        
        return optimizer
    
    def create_callbacks(self, checkpoint_path='best_seq2seq_model.h5'):
        """Create training callbacks"""
        
        callbacks = [
            tf.keras.callbacks.EarlyStopping(
                monitor='val_loss',
                patience=10,
                restore_best_weights=True
            ),
            tf.keras.callbacks.ModelCheckpoint(
                checkpoint_path,
                monitor='val_accuracy',
                save_best_only=True,
                save_weights_only=False
            ),
            tf.keras.callbacks.ReduceLROnPlateau(
                monitor='val_loss',
                factor=0.5,
                patience=5,
                min_lr=1e-7
            ),
            tf.keras.callbacks.TensorBoard(
                log_dir='./logs',
                histogram_freq=1
            )
        ]
        
        return callbacks

# Configure training
trainer = Seq2SeqTrainer(learning_rate=1e-3, clip_norm=1.0)
optimizer = trainer.create_optimizer()
callbacks = trainer.create_callbacks()

print("Training configuration created")
print("Callbacks:", [type(cb).__name__ for cb in callbacks])

Training configuration created
Callbacks: ['EarlyStopping', 'ModelCheckpoint', 'ReduceLROnPlateau', 'TensorBoard']


## 11.6 Inference Implementation

**Inference Strategy**:
- Greedy decoding
- Beam search
- Temperature sampling
- Length normalization

**Inference Features**:
- Start and end token handling
- Maximum length control
- Batch inference support
- Attention visualization

In [6]:
# Seq2Seq Inference Engine
class Seq2SeqInference:
    """Inference engine for trained seq2seq model"""
    
    def __init__(self, model, source_vectorizer, target_vectorizer, max_length=50):
        self.model = model
        self.source_vectorizer = source_vectorizer
        self.target_vectorizer = target_vectorizer
        self.max_length = max_length
        
        # Get vocabulary
        self.target_vocab = target_vectorizer.get_vocabulary()
        self.start_token = 1  # Default start token
        self.end_token = 2    # Default end token
    
    def greedy_decode(self, source_text):
        """Translate using greedy decoding"""
        
        # Vectorize source text
        source_vectorized = self.source_vectorizer([source_text])
        
        # Encode source
        encoder_outputs, encoder_states = self.model.encoder(source_vectorized)
        
        # Initialize decoder with start token
        decoder_input = tf.constant([[self.start_token]])
        decoder_states = encoder_states
        
        decoded_tokens = []
        
        for _ in range(self.max_length):
            # Decode one step
            decoder_outputs, decoder_states = self.model.decoder(
                decoder_input,
                initial_state=decoder_states
            )
            
            # Get next token
            next_token = tf.argmax(decoder_outputs[0, -1, :]).numpy()
            
            # Stop if end token
            if next_token == self.end_token:
                break
            
            decoded_tokens.append(next_token)
            decoder_input = tf.expand_dims([next_token], 0)
        
        # Convert tokens to text
        decoded_text = ' '.join([self.target_vocab[token] for token in decoded_tokens])
        
        return decoded_text

# Test inference
inference_engine = Seq2SeqInference(seq2seq_model, source_vec, target_vec)

print("Seq2Seq inference engine created")
print("Translation pipeline ready for use")

Seq2Seq inference engine created
Translation pipeline ready for use


## Chapter 11 Summary

### Key Concepts Covered:
1. **Seq2Seq Architecture**: Encoder-decoder framework for sequence transformation
2. **Encoder Implementation**: Processing input sequences into context vectors
3. **Decoder Implementation**: Generating output sequences from context
4. **Teacher Forcing**: Training strategy using ground truth as decoder input
5. **Inference Strategies**: Greedy decoding for sequence generation

### Technical Achievements:
- **Encoder Design**: LSTM-based encoder with state preservation
- **Decoder Design**: LSTM decoder with teacher forcing support
- **Complete Model**: End-to-end seq2seq implementation
- **Training Setup**: Optimized training configuration with callbacks
- **Inference Engine**: Greedy decoding for translation tasks

### Practical Applications:
- Machine translation (English-German)
- Text summarization
- Chatbot dialogue systems
- Code generation
- Speech recognition

**This chapter establishes the foundation for sequence-to-sequence learning, implementing a complete English-German machine translation system using encoder-decoder architecture with LSTM networks and teacher forcing training strategy.**