# Processing Sequences with RNNs and CNNs
## Chapter 15 - Sequence Modeling Implementation Guide

## 1. Introduction to Sequence Processing

Sequence models are crucial for:
- Time series forecasting
- Natural language processing
- Speech recognition
- Music generation

Key challenges:
- Variable-length inputs/outputs
- Long-term dependencies
- Temporal patterns at different scales

## 2. Recurrent Neural Networks (RNNs)

### 2.1 Basic RNN Architecture
- Process sequences step-by-step
- Maintain hidden state between time steps
- Mathematical formulation:
  \[
  h_t = \tanh(W_{hh}h_{t-1} + W_{xh}x_t + b_h)
  \]
  \[
  y_t = W_{hy}h_t + b_y
  \]

In [None]:
import tensorflow as tf
from tensorflow.keras.layers import SimpleRNN

# Basic RNN example
model = tf.keras.Sequential([
    tf.keras.layers.SimpleRNN(units=32, input_shape=(None, 10)),  # Any sequence length
    tf.keras.layers.Dense(1)  # Single output prediction
])

model.compile(optimizer='adam', loss='mse')
model.summary()

### 2.2 Long Short-Term Memory (LSTM)
- Solves vanishing gradient problem
- Uses gating mechanisms:
  - Forget gate (what to discard)
  - Input gate (what to store)
  - Output gate (what to output)

In [None]:
# LSTM implementation
model = tf.keras.Sequential([
    tf.keras.layers.LSTM(64, return_sequences=True, input_shape=(None, 10)),
    tf.keras.layers.LSTM(32),
    tf.keras.layers.Dense(1)
])

model.compile(optimizer='adam', loss='mse')
model.summary()

## 3. Sequence-to-Sequence Models

### 3.1 Encoder-Decoder Architecture
- Encoder processes input sequence
- Decoder generates output sequence
- Used for machine translation, summarization

In [None]:
# Basic Encoder-Decoder implementation
encoder_inputs = tf.keras.Input(shape=(None, 10))
encoder = tf.keras.layers.LSTM(32, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]

decoder_inputs = tf.keras.Input(shape=(None, 8))  # Different output features
decoder_lstm = tf.keras.layers.LSTM(32, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = tf.keras.layers.Dense(8, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

model = tf.keras.Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(optimizer='adam', loss='categorical_crossentropy')
model.summary()

## 4. Temporal Convolutional Networks

### 4.1 1D Convolution for Sequences
- Process sequences with CNNs
- Benefits:
  - Parallel processing
  - Long-term pattern capture with dilated convolutions
  - Stable gradients

In [None]:
# WaveNet-like architecture
model = tf.keras.Sequential()
model.add(tf.keras.layers.InputLayer(input_shape=[None, 1]))

# Add dilated convolutional blocks
for rate in (1, 2, 4, 8) * 2:
    model.add(tf.keras.layers.Conv1D(
        filters=20, kernel_size=2, padding="causal",
        activation="relu", dilation_rate=rate))

# Add final layers
model.add(tf.keras.layers.Conv1D(filters=10, kernel_size=1))

model.compile(loss="mse", optimizer="adam")
model.summary()

## 5. Attention Mechanisms

### 5.1 Attention Layer
- Focus on relevant parts of input
- Computes context vector as weighted sum
- Transformers use self-attention

In [None]:
# Simplified attention implementation
class AttentionLayer(tf.keras.layers.Layer):
    def __init__(self, units):
        super().__init__()
        self.W1 = tf.keras.layers.Dense(units)
        self.W2 = tf.keras.layers.Dense(units)
        self.V = tf.keras.layers.Dense(1)
    
    def call(self, query, values):
        # Score calculation
        score = self.V(tf.nn.tanh(self.W1(query) + self.W2(values)))
        
        # Attention weights
        attention_weights = tf.nn.softmax(score, axis=1)
        
        # Context vector
        context_vector = attention_weights * values
        context_vector = tf.reduce_sum(context_vector, axis=1)
        
        return context_vector, attention_weights

# Example usage
attention = AttentionLayer(10)
query = tf.random.normal((32, 1, 64))
values = tf.random.normal((32, 10, 64))
context, weights = attention(query, values)
print("Context vector shape:", context.shape)
print("Attention weights shape:", weights.shape)

## 6. Practical Applications

### 6.1 Time Series Forecasting
- Predict future values from historical data
- Example: Stock prices, weather, sales

### 6.2 Text Generation
- Character-level or word-level models
- Can capture style and structure

In [None]:
# Text generation example
text = "The quick brown fox jumps over the lazy dog."

# Create character-level dataset
vocab = sorted(set(text))
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

text_as_int = np.array([char2idx[c] for c in text])

# Create training examples
seq_length = 10
examples_per_epoch = len(text)//(seq_length+1)

# Create training batches
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)
dataset = dataset.shuffle(10000).batch(1, drop_remainder=True)

# Build and train model (simplified example)
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(len(vocab), 8, batch_input_shape=[1, None]),
    tf.keras.layers.GRU(32, return_sequences=True, stateful=True),
    tf.keras.layers.Dense(len(vocab))
])

model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))
model.summary()

## 7. Exercises

1. Implement an LSTM for time series forecasting
2. Compare performance of RNN vs CNN for sequence processing
3. Add attention to a sequence-to-sequence model
4. Experiment with different RNN cell types (GRU, LSTM, SimpleRNN)
5. Implement a character-level text generator

## 8. Key Takeaways

- RNNs process sequences step-by-step but can struggle with long-term dependencies
- LSTMs and GRUs address vanishing gradient problems with gating mechanisms
- 1D CNNs can effectively process sequences with advantages over RNNs
- Attention mechanisms help focus on relevant parts of input sequences
- Sequence models power applications from forecasting to text generation