# CE - 652
## Artificial Intelligence for Autonomous Driving
## Application Assignments 
##### Week: 8
##### Instructor: Dr. Juan D. Gomez

# **Request: Enhancing This Jupyter Notebook with Key Transformer Concepts**

## **Objective**
Please enrich this Jupyter Notebook with a structured, interactive, and visually engaging **desk of contents** covering the following key topics:

---

## **1. Transformer Architecture Overview**
- Provide a concise yet comprehensive explanation of the Transformer model.
- Utilize well-structured diagrams and animations to illustrate its key components, including:
  - **Encoder-Decoder Structure**
  - **Self-Attention Mechanisms**
  - **Feedforward Layers**

---

## **2. Positional Encodings**
- Provide your own explanation of **positional encodings** 
- Include **visual representations** to facilitate intuitive understanding.

---

## **3. Attention Mechanisms**
- Break down the following mechanisms step by step:
  - **Scaled Dot-Product Attention**
  - **Self-Attention**
  - **Multi-Head Attention**
- Incorporate **visualizations and real-world analogies** to demonstrate their role in context learning.

---

## **Guidelines**
✅ **Minimize extensive text** — prioritize **figures, animations, and practical examples** to enhance comprehension.  
✅ **Optionally you can use interactive elements** where applicable (e.g., sliders, heatmaps, or attention visualizations) to allow dynamic exploration of concepts.  
✅ Ensure **clarity and an intuitive flow** that aligns with how Transformers are taught in modern deep learning curricula.


# **Coding assignment:**
---


Please create a Python function (using the template below) that simulates the Attention mechanism described in the paper "Attention Is All You Need". This function randomly generates inputs (queries, keys, and values), hardcodes weight matrices (also randomly generated), and computes the output of the Attention mechanism step by step.

It follows the formula:

$$
\text{Attention}(Q, K, V) = \text{softmax} \left( \frac{Q K^T}{\sqrt{d_k}} \right) V
$$

where:

𝑄 (queries), 
𝐾 (keys), and 
𝑉 (values) are randomly generated input matrices.

The weight matrices for the projections of 
𝑄
,
𝐾
, and 
𝑉 are also randomly generated.

The function then computes scaled dot-product attention and outputs the results.

In [8]:
import numpy as np

def simulate_attention(Q, K, V):
    """
    Simulates the Attention mechanism from "Attention Is All You Need"
    with hardcoded (randomly initialized) parameters.

    Parameters:
    Q (numpy.ndarray): Query matrix (seq_length, d_model)
    K (numpy.ndarray): Key matrix (seq_length, d_model)
    V (numpy.ndarray): Value matrix (seq_length, d_model)

    Returns:
    numpy.ndarray: Attention output matrix
    """
    np.random.seed(42)  # For reproducibility
    
    # Validate input dimensions "Q, K, and V must have the same shape"
    #your code here 
    
    # Extract sequence length and embedding dimension
    # Your code here

    # Hardcoded (randomly initialized) weight matrices
    # your code here

    # Apply learned transformations
    # your code here

    # Compute scaled dot-product attention
    # Use Scaling factor
    # your code here

    # Apply softmax
    # your code here

    # Compute the final attention output
    #your code here

    return attention_output



In [9]:
# Example usage with random inputs
seq_length = 5
d_model = 4
Q = np.random.rand(seq_length, d_model)
K = np.random.rand(seq_length, d_model)
V = np.random.rand(seq_length, d_model)

output = simulate_attention(Q, K, V)
print("Simulated Attention Output:")
print(output)


Simulated Attention Output:
[[0.39262948 1.35442195 1.22699499 1.53990615]
 [0.40230947 1.38003129 1.25121885 1.56692975]
 [0.40299057 1.37916101 1.25199227 1.56584286]
 [0.39617445 1.36497352 1.23375705 1.55021601]
 [0.40423846 1.38312263 1.25186312 1.56910907]]


Now, can you use the same "simulate_attention" function to perform self-attention?

In [11]:
# Example usage for self-attention with random inputs
# Your code here


Simulated Self-Attention Output:
[[0.49855453 1.53473981 1.30312152 1.80045753]
 [0.52012488 1.59878081 1.37676646 1.87154636]
 [0.41162108 1.24101302 1.02000712 1.46154345]
 [0.47253647 1.45413035 1.21602026 1.7099343 ]
 [0.45663862 1.39503364 1.16383683 1.64030628]]


##### Please explain what the inputs and outputs (here random values) would mean in a real usage of attention and use an example.

Your answer here