# <font color="#418FDE" size="6.5" uppercase>**Text Models**</font>

>Last update: 20260130.
    
By the end of this Lecture, you will be able to:
- Use nn.Embedding to convert token indices into dense vector representations. 
- Build a simple sequence model (e.g., RNN or CNN) for text classification in PyTorch. 
- Train and evaluate the text model on a labeled dataset, monitoring accuracy and loss. 


## **1. Working With Embeddings**

### **1.1. Using Embedding Layers**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_06/Lecture_B/image_01_01.jpg?v=1769756724" width="250">



>* Embedding layers turn token IDs into vectors
>* Model uses these vectors to capture token similarities

>* Embeddings start random and are trained via gradients
>* Similar tokens move closer; different tokens separate

>* Embeddings compress sparse one-hot vectors efficiently
>* They capture similar meanings and enable better generalization



In [None]:
#@title Python Code - Using Embedding Layers

# This script shows basic text embeddings.
# It uses TensorFlow to mimic PyTorch embeddings.
# Focus is on converting token indices to vectors.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and Keras layers.
import tensorflow as tf
from tensorflow.keras import layers

# Set deterministic random seeds.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version briefly.
print("TensorFlow version:", tf.__version__)

# Define a tiny toy vocabulary.
vocab = {"<pad>": 0, "good": 1, "bad": 2, "movie": 3}

# Show the vocabulary mapping.
print("Vocabulary mapping:", vocab)

# Prepare a small batch of token index sequences.
sequences = np.array([[1, 3, 0], [2, 3, 0]], dtype=np.int32)

# Print the raw index sequences.
print("Token index sequences:", sequences)

# Define embedding hyperparameters.
vocab_size = len(vocab)
embedding_dim = 4

# Create an Embedding layer like nn.Embedding.
embedding_layer = layers.Embedding(
    input_dim=vocab_size,
    output_dim=embedding_dim,
    embeddings_initializer="uniform",
)

# Pass the sequences through the embedding layer.
embedded = embedding_layer(sequences)

# Convert embeddings to a NumPy array.
embedded_np = embedded.numpy()

# Validate the resulting embedding shape.
print("Embedding shape:", embedded_np.shape)

# Print embeddings for the first sequence only.
print("Embeddings for first sequence:")
print(embedded_np[0])

# Show that same token index shares same vector.
print("Vector for 'movie' in both sequences:")
print(embedded_np[0, 1], embedded_np[1, 1])




### **1.2. Handling Padding Tokens**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_06/Lecture_B/image_01_02.jpg?v=1769756764" width="250">



>* Padding tokens standardize sequence lengths for batching
>* If learned, padding embeddings can mislead models

>* Give padding a special index and treatment
>* Neutralize padding embeddings so only real words matter

>* Padding noise can corrupt sequence information aggregation
>* Neutral padding improves generalization to varying lengths



In [None]:
#@title Python Code - Handling Padding Tokens

# This script shows padding token embeddings clearly.
# It uses TensorFlow to mimic PyTorch style embeddings.
# Focus on padding index behavior inside embedding layers.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and Keras layers.
import tensorflow as tf
from tensorflow.keras import layers

# Set deterministic random seeds for reproducibility.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version in one short line.
print("TensorFlow version:", tf.__version__)

# Define a tiny toy vocabulary with a padding token.
PAD_TOKEN = "<pad>"
WORD_TO_INDEX = {PAD_TOKEN: 0, "thanks": 1, "for": 2}

# Show the mapping so learners see the padding index.
print("Vocabulary mapping:", WORD_TO_INDEX)

# Create two short example sentences as index sequences.
sentence_a = [WORD_TO_INDEX["thanks"], WORD_TO_INDEX["for"]]
sentence_b = [WORD_TO_INDEX["thanks"]]

# Pad the shorter sentence manually using the padding index.
max_len = 2
padded_a = sentence_a
padded_b = sentence_b + [WORD_TO_INDEX[PAD_TOKEN]]

# Stack sequences into a small batch tensor.
batch_indices = np.array([padded_a, padded_b], dtype=np.int32)

# Validate the batch shape before embedding lookup.
print("Batch indices shape:", batch_indices.shape)

# Define embedding dimensions and vocabulary size.
vocab_size = len(WORD_TO_INDEX)
embedding_dim = 4

# Create an embedding layer with a mask for padding index.
embedding_layer = layers.Embedding(
    input_dim=vocab_size,
    output_dim=embedding_dim,
    mask_zero=True,
    name="toy_embedding",
)

# Pass the batch through the embedding layer once.
embedded_batch = embedding_layer(batch_indices)

# Retrieve the boolean mask created for padding positions.
mask = embedding_layer.compute_mask(batch_indices)

# Convert tensors to numpy arrays for easy printing.
embedded_np = embedded_batch.numpy()
mask_np = mask.numpy()

# Print the raw batch indices and corresponding mask.
print("Batch indices:\n", batch_indices)
print("Mask for non padding positions:\n", mask_np)

# Show embeddings for the first sequence tokens.
print("Embeddings for first sequence:\n", embedded_np[0])

# Show embeddings for the second sequence including padding.
print("Embeddings for second sequence:\n", embedded_np[1])

# Demonstrate how the mask can ignore padding in a mean.
masked_embeddings = embedded_batch * tf.cast(
    tf.expand_dims(mask, axis=-1), tf.float32
)

# Compute a simple masked mean over sequence length.
sum_embeddings = tf.reduce_sum(masked_embeddings, axis=1)
valid_counts = tf.reduce_sum(tf.cast(mask, tf.float32), axis=1)
mean_embeddings = sum_embeddings / tf.expand_dims(tf.maximum(valid_counts, 1.0), axis=-1)

# Print the final masked sentence level representations.
print("Masked mean sentence embeddings:\n", mean_embeddings.numpy())



### **1.3. Initializing embeddings**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_06/Lecture_B/image_01_03.jpg?v=1769756837" width="250">



>* Embeddings start as randomly initialized, meaningless vectors
>* Training gradients reshape them so similar words cluster

>* Use pre-trained embeddings for faster, better learning
>* Match vocab, copy known vectors, randomly init unknowns

>* Choose to freeze or fine-tune embeddings
>* Initialization injects prior knowledge and guides learning



In [None]:
#@title Python Code - Initializing embeddings

# This script shows how embeddings are initialized.
# It uses TensorFlow to create simple embeddings.
# Focus on shapes and values not heavy training.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and check version.
import tensorflow as tf
print("TensorFlow version:", tf.__version__)

# Set deterministic random seeds.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)

# Set TensorFlow random seed.
tf.random.set_seed(seed_value)

# Define a tiny toy vocabulary.
vocab = ["<pad>", "cat", "dog", "banana", "excellent"]

# Map tokens to integer indices.
word_to_index = {w: i for i, w in enumerate(vocab)}

# Show the vocabulary mapping.
print("Vocabulary mapping:", word_to_index)

# Define embedding configuration.
vocab_size = len(vocab)
embedding_dim = 4

# Create a random initialized embedding layer.
random_embedding_layer = tf.keras.layers.Embedding(
    input_dim=vocab_size,
    output_dim=embedding_dim,
    embeddings_initializer="uniform",
    name="random_embedding",
)

# Build the layer by calling it once.
dummy_indices = tf.constant([[1, 2, 3]])
random_vectors = random_embedding_layer(dummy_indices)

# Confirm output shape is as expected.
print("Random embedding output shape:", random_vectors.shape)

# Inspect the underlying random embedding matrix.
random_matrix = random_embedding_layer.get_weights()[0]
print("Random embedding matrix shape:", random_matrix.shape)

# Show embeddings for specific example tokens.
cat_index = word_to_index["cat"]
dog_index = word_to_index["dog"]

# Gather their vectors from the matrix.
cat_vector = random_matrix[cat_index]
dog_vector = random_matrix[dog_index]

# Print the random vectors for cat and dog.
print("Random cat vector:", np.round(cat_vector, 3))
print("Random dog vector:", np.round(dog_vector, 3))

# Create fake pre trained embedding matrix.
pretrained_matrix = np.zeros((vocab_size, embedding_dim), dtype=np.float32)

# Manually define simple semantic patterns.
pretrained_matrix[word_to_index["cat"]] = np.array([1.0, 0.9, 0.0, 0.0])
pretrained_matrix[word_to_index["dog"]] = np.array([0.9, 1.0, 0.0, 0.0])

# Put fruit in a different region.
pretrained_matrix[word_to_index["banana"]] = np.array([0.0, 0.0, 1.0, 0.8])

# Put positive word near animals for illustration.
pretrained_matrix[word_to_index["excellent"]] = np.array([0.8, 0.7, 0.2, 0.1])

# Keep padding token as zeros.
pretrained_matrix[word_to_index["<pad>"]] = np.zeros(embedding_dim)

# Create an embedding layer using pretrained weights.
pretrained_embedding_layer = tf.keras.layers.Embedding(
    input_dim=vocab_size,
    output_dim=embedding_dim,
    embeddings_initializer=tf.keras.initializers.Constant(pretrained_matrix),
    trainable=False,
    name="pretrained_embedding",
)

# Build the pretrained layer by calling it.
pretrained_vectors = pretrained_embedding_layer(dummy_indices)

# Confirm pretrained output shape.
print("Pretrained embedding output shape:", pretrained_vectors.shape)

# Extract cat and dog vectors from pretrained matrix.
pre_cat_vector = pretrained_matrix[cat_index]
pre_dog_vector = pretrained_matrix[dog_index]

# Print pretrained vectors for comparison.
print("Pretrained cat vector:", pre_cat_vector)
print("Pretrained dog vector:", pre_dog_vector)

# Compute cosine similarity between cat and dog.
def cosine_similarity(a, b):
    a = a.astype(np.float32)
    b = b.astype(np.float32)
    num = np.dot(a, b)
    den = np.linalg.norm(a) * np.linalg.norm(b)
    return float(num / den) if den != 0 else 0.0

# Compare similarity in pretrained space.
cat_dog_sim = cosine_similarity(pre_cat_vector, pre_dog_vector)

# Print final summary line about similarity.
print("Cosine similarity between pretrained cat and dog:", round(cat_dog_sim, 3))



## **2. Text Sequence Models**

### **2.1. Recurrent Layers Overview**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_06/Lecture_B/image_02_01.jpg?v=1769756911" width="250">



>* Recurrent layers read sequences step by step
>* Hidden state summarizes past context and word meanings

>* Recurrent layers turn variable text into vectors
>* Shared parameters handle phrases, negations, and lengths

>* RNNs, LSTMs, GRUs trade memory and cost
>* Choosing a layer balances context needs and resources



In [None]:
#@title Python Code - Recurrent Layers Overview

# This script shows simple recurrent layers usage.
# We build a tiny RNN text classifier example.
# Focus is understanding sequence processing mechanics.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import tensorflow and keras submodules.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Set deterministic random seeds for reproducibility.
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
tf.random.set_seed(SEED)

# Print TensorFlow version in one short line.
print("TensorFlow version:", tf.__version__)

# Create a tiny toy sentiment dataset.
texts = [
    "i love this movie",
    "this film was great",
    "what a fantastic story",
    "i really enjoyed it",
    "i hate this movie",
    "this film was terrible",
    "what a boring story",
    "i really disliked it",
]

# Create integer labels for sentiment classes.
labels = np.array([1, 1, 1, 1, 0, 0, 0, 0], dtype=np.int32)

# Build a simple word level vocabulary.
vocab = {}
for text in texts:
    for token in text.split():
        if token not in vocab:
            vocab[token] = len(vocab) + 1

# Show vocabulary mapping for clarity.
print("Vocabulary:", vocab)

# Convert texts into lists of token indices.
sequences = []
for text in texts:
    indices = [vocab[token] for token in text.split()]
    sequences.append(indices)

# Pad sequences to equal length for batching.
max_len = max(len(seq) for seq in sequences)
X = keras.preprocessing.sequence.pad_sequences(
    sequences, maxlen=max_len, padding="post", truncating="post"
)

# Validate shapes before building the model.
print("Input shape:", X.shape, "Labels shape:", labels.shape)

# Define basic hyperparameters for the model.
vocab_size = len(vocab) + 1
embedding_dim = 8
rnn_units = 16

# Build a simple RNN based text classifier.
model = keras.Sequential([
    layers.Embedding(vocab_size, embedding_dim, input_length=max_len),
    layers.SimpleRNN(rnn_units),
    layers.Dense(1, activation="sigmoid"),
])

# Compile model with binary crossentropy loss.
model.compile(
    optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"]
)

# Train briefly with silent verbose setting.
history = model.fit(
    X, labels, epochs=20, batch_size=4, verbose=0, validation_split=0.25
)

# Evaluate model performance on training data.
loss, acc = model.evaluate(X, labels, verbose=0)

# Print final loss and accuracy in few lines.
print("Final training loss:", round(float(loss), 4))
print("Final training accuracy:", round(float(acc), 4))

# Inspect model prediction for one sample.
sample_text = "i really enjoyed this movie"

# Tokenize sample text using existing vocabulary.
sample_tokens = [
    vocab.get(token, 0) for token in sample_text.split()
]

# Pad sample sequence to required length.
sample_seq = keras.preprocessing.sequence.pad_sequences(
    [sample_tokens], maxlen=max_len, padding="post", truncating="post"
)

# Run model prediction on the sample.
prob = float(model.predict(sample_seq, verbose=0)[0][0])

# Print predicted probability for positive sentiment.
print("Sample text:", sample_text)
print("Predicted positive probability:", round(prob, 4))




### **2.2. Convolutional Text Patterns**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_06/Lecture_B/image_02_02.jpg?v=1769756973" width="250">



>* CNN filters scan local nâ€‘gram token windows
>* Filters learn phrase patterns linked to labels

>* Filters detect useful phrases anywhere in text
>* Different filters learn varied local and stylistic patterns

>* Filters act as feature detectors for phrases
>* Stacked filters capture richer, increasingly abstract patterns



In [None]:
#@title Python Code - Convolutional Text Patterns

# This script shows convolutional text patterns.
# We build a tiny CNN text classifier.
# Focus is on local ngram style filters.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import tensorflow and keras submodules.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Set deterministic random seeds everywhere.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print tensorflow version in one short line.
print("TensorFlow version:", tf.__version__)

# Prepare a tiny toy text dataset.
texts = [
    "I absolutely love this product",
    "This is not good at all",
    "I am very disappointed today",
    "I highly recommend this item",
    "The quality is really bad",
    "It works great and I am happy",
]

# Create matching sentiment labels list.
labels = [1, 0, 0, 1, 0, 1]

# Build a simple Keras tokenizer.
tokenizer = keras.preprocessing.text.Tokenizer(num_words=50)

# Fit tokenizer on the small texts.
tokenizer.fit_on_texts(texts)

# Convert texts into integer sequences.
sequences = tokenizer.texts_to_sequences(texts)

# Choose a fixed maximum sequence length.
max_len = 8

# Pad sequences to the same length.
X = keras.preprocessing.sequence.pad_sequences(
    sequences, maxlen=max_len, padding="post"
)

# Convert labels into numpy array.
y = np.array(labels, dtype="int32")

# Validate shapes before modeling.
print("Input shape:", X.shape, "Labels shape:", y.shape)

# Define vocabulary size from tokenizer.
vocab_size = min(50, len(tokenizer.word_index) + 1)

# Create a simple CNN text classification model.
model = keras.Sequential([
    layers.Embedding(vocab_size, 8, input_length=max_len),
    layers.Conv1D(4, 3, activation="relu"),
    layers.GlobalMaxPooling1D(),
    layers.Dense(1, activation="sigmoid"),
])

# Build the model to ensure it has an input tensor.
model.build(input_shape=(None, max_len))

# Force-call the model once so it has a defined input tensor.
_ = model(X[:1])

# Compile model with binary crossentropy loss.
model.compile(
    optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"]
)

# Train model silently for few epochs.
history = model.fit(
    X, y, epochs=20, batch_size=2, verbose=0, validation_split=0.3
)

# Evaluate model performance on training data.
loss, acc = model.evaluate(X, y, verbose=0)

# Print final loss and accuracy values.
print("Training loss:", round(float(loss), 4))
print("Training accuracy:", round(float(acc), 4))

# Select one example to inspect convolution.
example_text = "I am not happy with this"

# Convert example text to padded sequence.
example_seq = tokenizer.texts_to_sequences([example_text])
example_pad = keras.preprocessing.sequence.pad_sequences(
    example_seq, maxlen=max_len, padding="post"
)

# Build a submodel to see conv outputs.
conv_layer = model.layers[1]
conv_model = keras.Model(model.inputs, conv_layer.output)

# Get convolution feature map for example.
conv_output = conv_model.predict(example_pad, verbose=0)

# Print shape of convolution feature map.
print("Conv feature map shape:", conv_output.shape)

# Print feature map values for first filter.
print("First filter activations:", np.round(conv_output[0, :, 0], 3))

# Show model prediction for the example.
pred_prob = float(model.predict(example_pad, verbose=0)[0, 0])
print("Predicted positive probability:", round(pred_prob, 4))



### **2.3. Temporal Pooling Basics**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_06/Lecture_B/image_02_03.jpg?v=1769757089" width="250">



>* Temporal pooling compresses variable-length sequences into vectors
>* Summarizes all token information for classification

>* Average pooling captures overall context across tokens
>* Max pooling highlights strongest, most informative time steps

>* Combine pooling methods to capture varied sequence information
>* Choose pooling carefully to summarize long, noisy texts



In [None]:
#@title Python Code - Temporal Pooling Basics

# This script shows temporal pooling basics clearly.
# We use tiny sequences and simple PyTorch tensors.
# Focus is on average and max temporal pooling.

# import required PyTorch modules for tensors.
import torch

# set deterministic seed for reproducible tensors.
torch.manual_seed(0)

# create a tiny batch of token index sequences.
sequences = torch.tensor([[1, 2, 3, 0], [4, 5, 0, 0]])

# print the raw token index sequences batch.
print("Token index sequences:", sequences)

# define a small embedding layer for tokens.
embedding_dim = 4
vocab_size = 10
embed = torch.nn.Embedding(vocab_size, embedding_dim)

# get embedded representations for each token.
embedded = embed(sequences)

# confirm embedded tensor has expected shape.
print("Embedded shape (batch, time, dim):", embedded.shape)

# create a mask to ignore padding tokens.
pad_token = 0
mask = (sequences != pad_token).unsqueeze(-1)

# validate mask shape matches embedded shape.
assert mask.shape == embedded.shape[:2] + (1,)

# apply mask so padding embeddings become zero.
masked_embedded = embedded * mask

# compute lengths of each non padded sequence.
lengths = mask.sum(dim=1).clamp(min=1)

# perform average pooling over time dimension.
avg_pooled = masked_embedded.sum(dim=1) / lengths

# perform max pooling over time dimension safely.
masked_for_max = masked_embedded.clone()

# replace zero only positions with large negative.
masked_for_max[~mask.expand_as(masked_for_max)] = -1e9

# compute max pooled representation across time.
max_pooled, _ = masked_for_max.max(dim=1)

# print pooled vectors for interpretation.
print("Average pooled vectors:", avg_pooled)

# show how max pooling keeps strongest features.
print("Max pooled vectors:", max_pooled)

# final print summarizing temporal pooling usage.
print("Temporal pooling gives fixed size sequence summaries.")



## **3. Training Text Models**

### **3.1. CrossEntropy Loss for Labels**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_06/Lecture_B/image_03_01.jpg?v=1769757135" width="250">



>* Cross entropy measures how wrong class predictions are
>* Lower loss means higher probability on correct label

>* Penalizes wrong, overconfident predictions more strongly
>* Training reduces loss, boosting confidence in correct labels

>* Batch loss averages per-example cross entropy values
>* Optimizer updates weights to boost correct label probabilities



In [None]:
#@title Python Code - CrossEntropy Loss for Labels

# This script shows cross entropy loss usage.
# We build a tiny text classifier example.
# Focus is on labels and loss behavior.

# Optional install for TensorFlow in some environments.
# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and Keras utilities.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Set deterministic random seeds for reproducibility.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version in one short line.
print("TensorFlow version:", tf.__version__)

# Define a tiny toy text dataset.
texts = [
    "i love this movie",
    "this film was great",
    "what a fantastic story",
    "i hate this movie",
    "this film was terrible",
    "what a boring story",
]

# Define integer labels for sentiment classes.
labels = [1, 1, 1, 0, 0, 0]

# Create a simple word index mapping.
vocab = {}
for sentence in texts:
    for word in sentence.split():
        if word not in vocab:
            vocab[word] = len(vocab) + 1

# Convert sentences to sequences of token indices.
sequences = []
for sentence in texts:
    seq = [vocab[word] for word in sentence.split()]
    sequences.append(seq)

# Pad sequences to equal length for batching.
max_len = max(len(seq) for seq in sequences)
X = keras.preprocessing.sequence.pad_sequences(
    sequences, maxlen=max_len, padding="post"
)

# Convert labels to a NumPy array of integers.
y = np.array(labels, dtype="int32")

# Validate shapes before building the model.
print("Input shape:", X.shape, "Labels shape:", y.shape)

# Define vocabulary size and embedding dimension.
vocab_size = len(vocab) + 1
embedding_dim = 8

# Build a simple text classification model.
inputs = keras.Input(shape=(max_len,))
embedding = layers.Embedding(
    input_dim=vocab_size, output_dim=embedding_dim
)(inputs)

# Pool over time to get a fixed size vector.
pooled = layers.GlobalAveragePooling1D()(embedding)

# Add a small dense layer for representation.
hidden = layers.Dense(8, activation="relu")(pooled)

# Final layer outputs logits for two classes.
logits = layers.Dense(2, activation=None)(hidden)

# Create the Keras model object.
model = keras.Model(inputs=inputs, outputs=logits)

# Compile model with sparse categorical crossentropy.
model.compile(
    optimizer="adam",
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"],
)

# Train the model silently for a few epochs.
history = model.fit(
    X,
    y,
    epochs=20,
    batch_size=2,
    verbose=0,
)

# Evaluate model on the same tiny dataset.
loss_value, acc_value = model.evaluate(
    X,
    y,
    verbose=0,
)

# Print final loss and accuracy for inspection.
print("Final loss (cross entropy):", round(loss_value, 4))
print("Final accuracy:", round(acc_value, 4))

# Get model logits and probabilities for first example.
logits_example = model.predict(X[:1], verbose=0)
probs_example = tf.nn.softmax(logits_example, axis=-1)

# Print predicted probabilities and true label.
print("Predicted probabilities:", probs_example.numpy()[0])
print("True label:", int(y[0]))

# Manually compute cross entropy for first example.
true_label_index = int(y[0])
true_prob = probs_example.numpy()[0][true_label_index]
manual_loss = -np.log(true_prob + 1e-7)

# Show manual loss compared with model loss.
print("Manual cross entropy example:", round(float(manual_loss), 4))



### **3.2. Ignoring Padding Tokens**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_06/Lecture_B/image_03_02.jpg?v=1769757204" width="250">



>* Padding makes variable-length sequences share batch size
>* Padding tokens are meaningless and must not affect loss

>* Use a mask to ignore padded positions
>* Focus loss and gradients on real tokens

>* Padding noise worsens with very different lengths
>* Masking padding gives cleaner gradients, better generalization



In [None]:
#@title Python Code - Ignoring Padding Tokens

# This script shows how padding masks work.
# We build a tiny text classifier with padding.
# We monitor loss while ignoring padding tokens.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and Keras layers.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Set deterministic random seeds.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version briefly.
print("TensorFlow version:", tf.__version__)

# Define a small toy text dataset.
texts = [
    "good movie",
    "great film",
    "excellent acting",
    "bad movie",
    "terrible film",
    "poor acting",
]

# Define binary sentiment labels.
labels = np.array([1, 1, 1, 0, 0, 0], dtype=np.int32)

# Create a simple word index mapping.
word_index = {"<pad>": 0, "good": 1, "movie": 2, "great": 3}
word_index.update({"film": 4, "excellent": 5, "acting": 6})
word_index.update({"bad": 7, "terrible": 8, "poor": 9})

# Define maximum sequence length.
max_len = 4

# Encode texts into index sequences.
encoded_sequences = []
for text in texts:
    tokens = text.split()
    seq = [word_index.get(t, 0) for t in tokens]
    encoded_sequences.append(seq)

# Pad sequences with the pad index.
padded_sequences = keras.preprocessing.sequence.pad_sequences(
    encoded_sequences, maxlen=max_len, padding="post", value=0
)

# Convert padded sequences to int32 array.
inputs = np.array(padded_sequences, dtype=np.int32)

# Validate shapes before building model.
assert inputs.shape[0] == labels.shape[0]
assert inputs.shape[1] == max_len

# Define vocabulary size and embedding dimension.
vocab_size = len(word_index)
embedding_dim = 8

# Build a simple text classification model.
inputs_layer = keras.Input(shape=(max_len,), dtype="int32")
embedding_layer = layers.Embedding(
    input_dim=vocab_size, output_dim=embedding_dim, mask_zero=True
)(inputs_layer)

# Use a masked global average pooling layer.
pooled = layers.GlobalAveragePooling1D()(embedding_layer)

# Add a small dense classification head.
outputs_layer = layers.Dense(1, activation="sigmoid")(pooled)
model = keras.Model(inputs=inputs_layer, outputs=outputs_layer)

# Compile model with binary crossentropy loss.
model.compile(
    optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"]
)

# Train model silently for a few epochs.
history = model.fit(
    inputs,
    labels,
    epochs=10,
    batch_size=2,
    verbose=0,
    validation_split=0.33,
)

# Evaluate model performance on training data.
loss, acc = model.evaluate(inputs, labels, verbose=0)

# Print final loss and accuracy values.
print("Final training loss (masked):", round(float(loss), 4))
print("Final training accuracy (masked):", round(float(acc), 4))

# Show padded inputs and model predictions.
predictions = model.predict(inputs, verbose=0)
print("Padded input sequences:")
print(inputs)
print("Predicted probabilities:")
print(np.round(predictions.squeeze(), 4))




### **3.3. Measuring Model Accuracy**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_06/Lecture_B/image_03_03.jpg?v=1769757264" width="250">



>* Accuracy counts how many predictions are correct
>* Correct predictions divided by total gives performance percentage

>* Track accuracy on training and validation sets
>* Diverging accuracies signal overfitting and poor generalization

>* Accuracy works best on balanced classification tasks
>* Use extra metrics and track loss, accuracy trends



In [None]:
#@title Python Code - Measuring Model Accuracy

# This script shows how to measure accuracy.
# We use a tiny text classification example.
# Focus is on computing accuracy clearly.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import tensorflow and keras utilities.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Set deterministic random seeds.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print tensorflow version briefly.
print("TensorFlow version:", tf.__version__)

# Prepare a tiny toy text dataset.
texts = [
    "good movie", "great film", "nice story", "bad movie",
    "terrible film", "awful story", "loved it", "hated it",
]

# Create binary sentiment labels.
labels = np.array([1, 1, 1, 0, 0, 0, 1, 0], dtype=np.int32)

# Use a simple text vectorization layer.
vectorizer = layers.TextVectorization(
    max_tokens=20, output_mode="int", output_sequence_length=4
)

# Adapt vectorizer on the small corpus.
vectorizer.adapt(np.array(texts))

# Vectorize all texts into integer sequences.
text_ds = tf.data.Dataset.from_tensor_slices(texts)
int_sequences = text_ds.map(vectorizer).batch(len(texts))

# Materialize sequences as a numpy array.
X_all = next(iter(int_sequences)).numpy()

# Validate shapes before modeling.
print("Input shape:", X_all.shape, "Labels shape:", labels.shape)

# Split into small train and validation sets.
X_train, X_val = X_all[:6], X_all[6:]
y_train, y_val = labels[:6], labels[6:]

# Build a simple embedding based model.
model = keras.Sequential([
    layers.Embedding(input_dim=20, output_dim=8, input_length=4),
    layers.GlobalAveragePooling1D(),
    layers.Dense(1, activation="sigmoid"),
])

# Compile model with accuracy metric.
model.compile(
    optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"]
)

# Train briefly with silent verbose setting.
history = model.fit(
    X_train, y_train, epochs=10, batch_size=2, verbose=0,
    validation_data=(X_val, y_val)
)

# Get final training accuracy from history.
train_acc = history.history["accuracy"][-1]
val_acc = history.history["val_accuracy"][-1]

# Print accuracy values in a friendly way.
print("Final training accuracy:", round(float(train_acc), 3))
print("Final validation accuracy:", round(float(val_acc), 3))

# Compute predictions on validation set.
val_probs = model.predict(X_val, verbose=0)
val_preds = (val_probs.flatten() >= 0.5).astype("int32")

# Manually compute accuracy step by step.
correct = np.sum(val_preds == y_val)
manual_acc = correct / y_val.shape[0]

# Show manual accuracy to match metric.
print("Manual validation accuracy:", round(float(manual_acc), 3))




# <font color="#418FDE" size="6.5" uppercase>**Text Models**</font>


In this lecture, you learned to:
- Use nn.Embedding to convert token indices into dense vector representations. 
- Build a simple sequence model (e.g., RNN or CNN) for text classification in PyTorch. 
- Train and evaluate the text model on a labeled dataset, monitoring accuracy and loss. 

In the next Module (Module 7), we will go over 'torch.compile and Speed'