# Football Tactics Model using DLA Transformer Architecture

**Author**: Deep Learning Academy  
**Date**: {}

This notebook implements a football tactics prediction model using advanced Transformer architectures from DLA notebooks:
- **Notebook 9**: Decoder-only LLM with temperature and top-k sampling
- **Notebook 8B**: Encoder-decoder seq2seq architecture

We'll build a model that:
1. Understands game state (positions, formations, ball location)
2. Predicts optimal tactical sequences
3. Generates diverse tactics using sampling strategies

## 1.0 Background: Football Tactics as Sequence-to-Sequence

### Problem Formulation

**Input (Game State)**: Current field positions, ball location, formation  
**Output (Tactics)**: Sequence of tactical actions (pass, press, position)

### Architecture Choice: Encoder-Decoder Transformer

```
Game State → Encoder → Context
                         ↓
Start Token → Decoder → Cross-Attention → Next Tactic
                ↓
         (autoregressive)
```

### Key DLA Techniques Used

From **Notebook 8B** (Translation Model):
- TransformerEncoder with self-attention
- TransformerDecoder with cross-attention
- PositionalEmbedding for sequences
- Causal masking for autoregressive generation

From **Notebook 9** (LLM Building):
- Temperature sampling for diversity
- Top-K sampling for quality control
- Custom learning rate schedules
- Efficient batch processing

## 2.0 Setup and Dependencies

In [None]:
# Install if needed
# !pip install tensorflow keras numpy

In [None]:
import os
import numpy as np
import random
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")

# Set seed for reproducibility
random.seed(42)
np.random.seed(42)
tf.random.set_seed(42)

## 3.0 Football Tactics Data Preparation

We'll create synthetic football tactics data representing:
- **Game State**: Player positions, ball location, score
- **Tactics**: Sequence of actions (pass, move, press, etc.)

In [None]:
# Define football vocabulary

# Player positions
POSITIONS = ['GK', 'LB', 'CB', 'RB', 'LWB', 'RWB', 'CDM', 'CM', 'CAM', 'LM', 'RM', 'LW', 'RW', 'ST', 'CF']

# Tactical actions
ACTIONS = ['pass', 'dribble', 'shoot', 'cross', 'tackle', 'intercept', 'press', 'fallback', 'support', 'move_forward']

# Directions/targets
DIRECTIONS = ['left', 'right', 'center', 'forward', 'back', 'wide']

# Formations
FORMATIONS = ['4-4-2', '4-3-3', '3-5-2', '4-2-3-1', '5-3-2']

print(f"Positions: {len(POSITIONS)}")
print(f"Actions: {len(ACTIONS)}")
print(f"Directions: {len(DIRECTIONS)}")
print(f"Formations: {len(FORMATIONS)}")

In [None]:
# Generate synthetic football tactics data
def generate_game_state():
    """Generate a random game state description"""
    formation = random.choice(FORMATIONS)
    ball_position = random.choice(['defense', 'midfield', 'attack'])
    score_diff = random.choice(['losing', 'drawing', 'winning'])
    return f"formation {formation} ball {ball_position} status {score_diff}"

def generate_tactic_sequence():
    """Generate a tactical sequence"""
    num_actions = random.randint(3, 6)
    tactics = []
    
    for _ in range(num_actions):
        position = random.choice(POSITIONS)
        action = random.choice(ACTIONS)
        direction = random.choice(DIRECTIONS)
        tactics.append(f"{position} {action} {direction}")
    
    return " , ".join(tactics)

# Generate training data
num_samples = 500
game_states = [generate_game_state() for _ in range(num_samples)]
tactic_sequences = [generate_tactic_sequence() for _ in range(num_samples)]

# Add start and end tokens to tactics
tactic_sequences = ["[start] " + seq + " [end]" for seq in tactic_sequences]

print(f"Generated {len(game_states)} training samples")
print(f"\nExample game state: {game_states[0]}")
print(f"Example tactic: {tactic_sequences[0]}")

## 4.0 Data Preprocessing and Vectorization

Using techniques from **Notebook 8B**: Text vectorization with custom vocabulary

In [None]:
# Create text pairs for training
text_pairs = list(zip(game_states, tactic_sequences))

# Shuffle and split
random.shuffle(text_pairs)
val_samples = int(0.15 * len(text_pairs))
train_samples = len(text_pairs) - val_samples

train_pairs = text_pairs[:train_samples]
val_pairs = text_pairs[train_samples:]

print(f"Training samples: {len(train_pairs)}")
print(f"Validation samples: {len(val_pairs)}")
print(f"\nSample pair:")
print(f"Game State: {train_pairs[0][0]}")
print(f"Tactics: {train_pairs[0][1]}")

In [None]:
from tensorflow.keras.layers import TextVectorization

# Configuration
vocab_size = 500
state_sequence_length = 15
tactic_sequence_length = 40

# Create vectorization layers (technique from Notebook 8B)
state_vectorization = TextVectorization(
    max_tokens=vocab_size,
    output_mode='int',
    output_sequence_length=state_sequence_length,
)

tactic_vectorization = TextVectorization(
    max_tokens=vocab_size,
    output_mode='int',
    output_sequence_length=tactic_sequence_length + 1,
)

# Adapt to training data
train_states = [pair[0] for pair in train_pairs]
train_tactics = [pair[1] for pair in train_pairs]

state_vectorization.adapt(train_states)
tactic_vectorization.adapt(train_tactics)

print(f"State vocabulary size: {state_vectorization.vocabulary_size()}")
print(f"Tactic vocabulary size: {tactic_vectorization.vocabulary_size()}")
print(f"\nSample vocabulary (states): {state_vectorization.get_vocabulary()[:15]}")
print(f"\nSample vocabulary (tactics): {tactic_vectorization.get_vocabulary()[:15]}")

## 5.0 Create Training Datasets

Following **Notebook 8B** approach for efficient data pipeline

In [None]:
def format_dataset(states, tactics):
    """Format data for encoder-decoder training"""
    states = state_vectorization(states)
    tactics = tactic_vectorization(tactics)
    return ({
        "encoder_inputs": states,
        "decoder_inputs": tactics[:, :-1],
    }, tactics[:, 1:])

def make_dataset(pairs, batch_size=32):
    """Create tf.data.Dataset with prefetching"""
    states_list = [pair[0] for pair in pairs]
    tactics_list = [pair[1] for pair in pairs]
    dataset = tf.data.Dataset.from_tensor_slices((states_list, tactics_list))
    dataset = dataset.batch(batch_size)
    dataset = dataset.map(format_dataset, num_parallel_calls=tf.data.AUTOTUNE)
    return dataset.shuffle(2048).prefetch(tf.data.AUTOTUNE).cache()

train_ds = make_dataset(train_pairs)
val_ds = make_dataset(val_pairs)

print("Training dataset created successfully!")
print(f"Dataset structure: {train_ds.element_spec}")

## 6.0 Build Transformer Components (DLA Architecture)

### 6.1 PositionalEmbedding Layer

From **Notebook 8B**: Combines token embeddings with learned positional embeddings

In [None]:
class PositionalEmbedding(layers.Layer):
    """Positional Embedding from DLA Notebook 8B"""
    def __init__(self, sequence_length, input_dim, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.token_embeddings = layers.Embedding(
            input_dim=input_dim, output_dim=output_dim
        )
        self.position_embeddings = layers.Embedding(
            input_dim=sequence_length, output_dim=output_dim
        )
        self.sequence_length = sequence_length
        self.input_dim = input_dim
        self.output_dim = output_dim

    def call(self, inputs):
        length = tf.shape(inputs)[-1]
        positions = tf.range(start=0, limit=length, delta=1)
        embedded_tokens = self.token_embeddings(inputs)
        embedded_positions = self.position_embeddings(positions)
        return embedded_tokens + embedded_positions

    def compute_mask(self, inputs, mask=None):
        return tf.math.not_equal(inputs, 0)

    def get_config(self):
        config = super().get_config()
        config.update({
            "sequence_length": self.sequence_length,
            "input_dim": self.input_dim,
            "output_dim": self.output_dim,
        })
        return config

print("✓ PositionalEmbedding defined")

### 6.2 TransformerEncoder

From **Notebook 8B**: Self-attention encoder for understanding game state

In [None]:
class TransformerEncoder(layers.Layer):
    """Transformer Encoder from DLA Notebook 8B"""
    def __init__(self, embed_dim, dense_dim, num_heads, **kwargs):
        super().__init__(**kwargs)
        self.embed_dim = embed_dim
        self.dense_dim = dense_dim
        self.num_heads = num_heads
        self.attention = layers.MultiHeadAttention(
            num_heads=num_heads, key_dim=embed_dim
        )
        self.dense_proj = keras.Sequential([
            layers.Dense(dense_dim, activation='relu'),
            layers.Dense(embed_dim),
        ])
        self.layernorm_1 = layers.LayerNormalization()
        self.layernorm_2 = layers.LayerNormalization()

    def call(self, inputs, mask=None):
        if mask is not None:
            mask = mask[:, tf.newaxis, :]
        attention_output = self.attention(inputs, inputs, attention_mask=mask)
        proj_input = self.layernorm_1(inputs + attention_output)
        proj_output = self.dense_proj(proj_input)
        return self.layernorm_2(proj_input + proj_output)

    def get_config(self):
        config = super().get_config()
        config.update({
            "embed_dim": self.embed_dim,
            "dense_dim": self.dense_dim,
            "num_heads": self.num_heads,
        })
        return config

print("✓ TransformerEncoder defined")

### 6.3 TransformerDecoder

From **Notebook 8B**: Decoder with causal self-attention and cross-attention

In [None]:
class TransformerDecoder(layers.Layer):
    """Transformer Decoder from DLA Notebook 8B"""
    def __init__(self, embed_dim, dense_dim, num_heads, **kwargs):
        super().__init__(**kwargs)
        self.embed_dim = embed_dim
        self.dense_dim = dense_dim
        self.num_heads = num_heads
        self.attention_1 = layers.MultiHeadAttention(
            num_heads=num_heads, key_dim=embed_dim
        )
        self.attention_2 = layers.MultiHeadAttention(
            num_heads=num_heads, key_dim=embed_dim
        )
        self.dense_proj = keras.Sequential([
            layers.Dense(dense_dim, activation='relu'),
            layers.Dense(embed_dim),
        ])
        self.layernorm_1 = layers.LayerNormalization()
        self.layernorm_2 = layers.LayerNormalization()
        self.layernorm_3 = layers.LayerNormalization()
        self.supports_masking = True

    def get_causal_attention_mask(self, inputs):
        """Create causal mask for autoregressive generation"""
        input_shape = tf.shape(inputs)
        batch_size, sequence_length = input_shape[0], input_shape[1]
        i = tf.range(sequence_length)[:, tf.newaxis]
        j = tf.range(sequence_length)
        mask = tf.cast(i >= j, dtype='int32')
        mask = tf.reshape(mask, (1, input_shape[1], input_shape[1]))
        mult = tf.concat(
            [tf.expand_dims(batch_size, -1), tf.constant([1, 1], dtype=tf.int32)],
            axis=0,
        )
        return tf.tile(mask, mult)

    def call(self, inputs, encoder_outputs, mask=None):
        causal_mask = self.get_causal_attention_mask(inputs)
        if mask is not None:
            padding_mask = tf.cast(mask[:, tf.newaxis, :], dtype='int32')
            padding_mask = tf.minimum(padding_mask, causal_mask)
        else:
            padding_mask = causal_mask
        
        # Self-attention on decoder (causal)
        attention_output_1 = self.attention_1(
            query=inputs,
            value=inputs,
            key=inputs,
            attention_mask=causal_mask,
        )
        attention_output_1 = self.layernorm_1(inputs + attention_output_1)
        
        # Cross-attention to encoder
        attention_output_2 = self.attention_2(
            query=attention_output_1,
            value=encoder_outputs,
            key=encoder_outputs,
            attention_mask=mask[:, tf.newaxis, :] if mask is not None else None,
        )
        attention_output_2 = self.layernorm_2(attention_output_1 + attention_output_2)
        
        # Feed-forward
        proj_output = self.dense_proj(attention_output_2)
        return self.layernorm_3(attention_output_2 + proj_output)

    def get_config(self):
        config = super().get_config()
        config.update({
            "embed_dim": self.embed_dim,
            "dense_dim": self.dense_dim,
            "num_heads": self.num_heads,
        })
        return config

print("✓ TransformerDecoder defined")

## 7.0 Build Complete Football Tactics Model

Encoder-decoder architecture combining techniques from both notebooks

In [None]:
# Model hyperparameters
embed_dim = 128
dense_dim = 512
num_heads = 4

# Encoder: Process game state
encoder_inputs = keras.Input(shape=(None,), dtype='int64', name='encoder_inputs')
x = PositionalEmbedding(state_sequence_length, vocab_size, embed_dim)(encoder_inputs)
encoder_outputs = TransformerEncoder(embed_dim, dense_dim, num_heads)(x)
encoder = keras.Model(encoder_inputs, encoder_outputs, name='encoder')

# Decoder: Generate tactics
decoder_inputs = keras.Input(shape=(None,), dtype='int64', name='decoder_inputs')
encoded_seq_inputs = keras.Input(shape=(None, embed_dim), name='encoder_outputs')
x = PositionalEmbedding(tactic_sequence_length, vocab_size, embed_dim)(decoder_inputs)
x = TransformerDecoder(embed_dim, dense_dim, num_heads)(x, encoded_seq_inputs)
x = layers.Dropout(0.3)(x)
decoder_outputs = layers.Dense(vocab_size, activation='softmax')(x)
decoder = keras.Model([decoder_inputs, encoded_seq_inputs], decoder_outputs, name='decoder')

# Full model
decoder_outputs = decoder([decoder_inputs, encoder_outputs])
football_model = keras.Model(
    [encoder_inputs, decoder_inputs],
    decoder_outputs,
    name='football_tactics_model'
)

print("✓ Football Tactics Model built successfully!")
football_model.summary()

## 8.0 Compile and Train

Using techniques from **Notebook 8B** for seq2seq training

In [None]:
# Compile model
football_model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

print("✓ Model compiled successfully!")

In [None]:
# Train the model
history = football_model.fit(
    train_ds,
    epochs=30,
    validation_data=val_ds,
)

print("✓ Training complete!")

## 9.0 Inference with Advanced Sampling (from Notebook 9)

### 9.1 Greedy Decoding

Simple argmax at each step

In [None]:
def decode_sequence_greedy(input_state):
    """Greedy decoding: always pick most likely next token"""
    # Encode game state
    tokenized_input = state_vectorization([input_state])
    encoder_output = encoder(tokenized_input)
    
    # Start with [start] token
    decoded_tactic = "[start]"
    
    for i in range(tactic_sequence_length):
        tokenized_target = tactic_vectorization([decoded_tactic])
        predictions = decoder([tokenized_target, encoder_output])
        
        # Greedy: take argmax
        sampled_token_index = np.argmax(predictions[0, i, :])
        sampled_token = tactic_vectorization.get_vocabulary()[sampled_token_index]
        
        decoded_tactic += " " + sampled_token
        
        if sampled_token == "[end]":
            break
    
    # Clean up
    decoded_tactic = decoded_tactic.replace("[start] ", "")
    decoded_tactic = decoded_tactic.replace(" [end]", "")
    
    return decoded_tactic

print("✓ Greedy decoding function defined")

### 9.2 Temperature Sampling (from Notebook 9)

Control randomness: lower temperature = more deterministic, higher = more diverse

In [None]:
def decode_sequence_temperature(input_state, temperature=1.0):
    """Temperature sampling from DLA Notebook 9"""
    # Encode game state
    tokenized_input = state_vectorization([input_state])
    encoder_output = encoder(tokenized_input)
    
    decoded_tactic = "[start]"
    
    for i in range(tactic_sequence_length):
        tokenized_target = tactic_vectorization([decoded_tactic])
        predictions = decoder([tokenized_target, encoder_output])
        
        # Temperature sampling
        logits = predictions[0, i, :]
        logits = logits / temperature  # Scale by temperature
        probabilities = tf.nn.softmax(logits).numpy()
        
        # Sample from distribution
        sampled_token_index = np.random.choice(len(probabilities), p=probabilities)
        sampled_token = tactic_vectorization.get_vocabulary()[sampled_token_index]
        
        decoded_tactic += " " + sampled_token
        
        if sampled_token == "[end]":
            break
    
    decoded_tactic = decoded_tactic.replace("[start] ", "")
    decoded_tactic = decoded_tactic.replace(" [end]", "")
    
    return decoded_tactic

print("✓ Temperature sampling function defined")

### 9.3 Top-K Sampling (from Notebook 9)

Only sample from top K most likely tokens for better quality

In [None]:
def decode_sequence_topk(input_state, k=5, temperature=1.0):
    """Top-K sampling from DLA Notebook 9"""
    # Encode game state
    tokenized_input = state_vectorization([input_state])
    encoder_output = encoder(tokenized_input)
    
    decoded_tactic = "[start]"
    
    for i in range(tactic_sequence_length):
        tokenized_target = tactic_vectorization([decoded_tactic])
        predictions = decoder([tokenized_target, encoder_output])
        
        # Get logits and apply temperature
        logits = predictions[0, i, :]
        logits = logits / temperature
        
        # Top-K filtering
        top_k_indices = tf.argsort(logits, direction='DESCENDING')[:k]
        top_k_logits = tf.gather(logits, top_k_indices)
        top_k_probs = tf.nn.softmax(top_k_logits).numpy()
        
        # Sample from top K
        sampled_index = np.random.choice(k, p=top_k_probs)
        sampled_token_index = top_k_indices[sampled_index].numpy()
        sampled_token = tactic_vectorization.get_vocabulary()[sampled_token_index]
        
        decoded_tactic += " " + sampled_token
        
        if sampled_token == "[end]":
            break
    
    decoded_tactic = decoded_tactic.replace("[start] ", "")
    decoded_tactic = decoded_tactic.replace(" [end]", "")
    
    return decoded_tactic

print("✓ Top-K sampling function defined")

## 10.0 Test Tactics Generation

Generate tactics for various game situations using all three sampling methods

In [None]:
# Test scenarios
test_scenarios = [
    "formation 4-4-2 ball midfield status drawing",
    "formation 4-3-3 ball attack status winning",
    "formation 3-5-2 ball defense status losing",
    "formation 4-2-3-1 ball attack status drawing",
]

print("=" * 100)
print("FOOTBALL TACTICS GENERATION - DLA TRANSFORMER MODEL")
print("=" * 100)

for scenario in test_scenarios:
    print(f"\n{'='*100}")
    print(f"GAME STATE: {scenario}")
    print(f"{'='*100}")
    
    # Greedy
    greedy_tactic = decode_sequence_greedy(scenario)
    print(f"\n[GREEDY] Predicted Tactics:")
    print(f"  {greedy_tactic}")
    
    # Temperature sampling
    print(f"\n[TEMPERATURE SAMPLING (T=0.7)] Predicted Tactics:")
    for j in range(2):
        temp_tactic = decode_sequence_temperature(scenario, temperature=0.7)
        print(f"  Variant {j+1}: {temp_tactic}")
    
    # Top-K sampling
    print(f"\n[TOP-K SAMPLING (K=5, T=0.8)] Predicted Tactics:")
    for j in range(2):
        topk_tactic = decode_sequence_topk(scenario, k=5, temperature=0.8)
        print(f"  Variant {j+1}: {topk_tactic}")
    
    print()

## 11.0 Summary and Key DLA Techniques Used

### Architecture Summary

```
Game State (formation, ball position, score)
    ↓
Encoder (TransformerEncoder)
    - Self-attention on game state
    - Positional embeddings
    - Feed-forward networks
    ↓
Context Representation
    ↓
Decoder (TransformerDecoder)
    - Causal self-attention (autoregressive)
    - Cross-attention to encoder
    - Feed-forward networks
    ↓
Tactical Sequence (player actions)
```

### DLA Techniques Implemented

#### From Notebook 8B (Translation Model):
✅ **TransformerEncoder**: Self-attention for game state understanding  
✅ **TransformerDecoder**: Causal attention + cross-attention  
✅ **PositionalEmbedding**: Token + position embeddings  
✅ **Causal Masking**: Prevents future information leakage  
✅ **TextVectorization**: Custom vocabulary adaptation  
✅ **Efficient Data Pipeline**: tf.data with prefetching  

#### From Notebook 9 (LLM Building):
✅ **Temperature Sampling**: Control generation diversity  
✅ **Top-K Sampling**: Quality control by filtering low-probability tokens  
✅ **Greedy Decoding**: Deterministic generation  
✅ **Batch Processing**: Efficient training with shuffling/caching  

### Model Specifications
- **Embedding Dimension**: 128
- **Feed-forward Dimension**: 512
- **Attention Heads**: 4
- **Vocabulary Size**: 500
- **State Sequence Length**: 15
- **Tactic Sequence Length**: 40

### Next Steps

1. **Enhanced Data**: Use real match data from football APIs
2. **Reward Signal**: Integrate match outcome as reward for RL fine-tuning
3. **Beam Search**: Implement beam search for multiple candidate tactics
4. **Attention Visualization**: Visualize what game aspects the model focuses on
5. **Multi-task Learning**: Predict both tactics and expected outcomes

### Key Innovation

This model combines:
- **Seq2seq understanding** (game state → tactics) from Notebook 8B
- **Advanced sampling strategies** for diverse tactical generation from Notebook 9
- **Autoregressive generation** for sequential tactical planning

The result is a football tactics model that can:
1. Understand complex game states
2. Generate contextually appropriate tactical sequences
3. Provide multiple tactical options with controlled diversity