# Setup

To set up an Anaconda environment for implementing the Transformer model in PyTorch, follow these steps:

---

### **1. Create a New Conda Environment**
Open a terminal and run:
```bash
conda create --name attention-is-all-you-need python=3.12
```

---

### **2. Activate the Environment**
```bash
conda activate attention-is-all-you-need
```

---

### **3. Install PyTorch**
For GPU (CUDA):
```bash
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
```
For CPU (if you don’t have a compatible GPU):
```bash
conda install pytorch torchvision torchaudio cpuonly -c pytorch
```
Check if PyTorch is installed correctly:
```python
python -c "import torch; print(torch.__version__)"
```

---

### **4. Install Essential Libraries**
```bash
pip install numpy pandas matplotlib tqdm
```
- `numpy`: Tensor operations
- `pandas`: Data handling (optional, useful for datasets)
- `matplotlib`: Visualization
- `tqdm`: Progress bars for training

---

### **5. Install NLP Libraries (If Needed)**
```bash
pip install transformers datasets tokenizers sentencepiece
```
- `transformers`: Pretrained models from Hugging Face (optional)
- `datasets`: NLP datasets from Hugging Face
- `tokenizers`: Efficient tokenization
- `sentencepiece`: Subword tokenization (used in original Transformer)

---

### **6. Install Jupyter Notebook (Optional)**
If you want to develop in Jupyter:
```bash
conda install jupyter
```
Then start Jupyter:
```bash
jupyter notebook
```

---

### **7. Verify Everything**
Run the following to ensure your environment is properly set up:
```python
import torch
print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
```

---

### **8. Save the Environment (Optional)**
To export your environment for reproducibility:
```bash
conda env export > environment.yml
```
To recreate it later:
```bash
conda env create -f environment.yml
```

---

# Start

In [5]:
import torch
print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())

PyTorch version: 2.5.1
CUDA available: False


In [6]:
import torch
import torch.nn as nn
import torch.optim as optim
from typing import Optional, Tuple

In [8]:
import torch
import torch.nn as nn

class TokenEmbedding(nn.Module):
    def __init__(self, vocab_size: int, d_model: int):
        """
        Initializes the embedding layer.

        Args:
            vocab_size (int): Number of unique tokens in the vocabulary.
            d_model (int): Dimension of the embedding vectors.
        """
        super().__init__()
        
        # TODO: Define the embedding layer that maps token indices to dense vectors.
        self.embedding = nn.Embedding(num_embeddings=vocab_size, embedding_dim=d_model)  

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        Forward pass for token embedding.

        Args:
            x (torch.Tensor): Tensor of shape (batch_size, seq_len) containing token indices.

        Returns:
            torch.Tensor: Tensor of shape (batch_size, seq_len, d_model) containing embedded representations.
        """
        # TODO: Implement the lookup operation using the embedding layer.
        embedded = self.embedding(x)  

        return embedded


In [18]:
def run_tests():
    # Test Parameters
    vocab_size = 100
    d_model = 16
    batch_size = 4
    seq_len = 10

    # Create a sample input tensor
    test_input = torch.randint(0, vocab_size, (batch_size, seq_len))

    # Initialize TokenEmbedding
    embedding_layer = TokenEmbedding(vocab_size, d_model)

    # Test 1: Check Output Shape
    output = embedding_layer(test_input)
    assert output.shape == (batch_size, seq_len, d_model), f"Unexpected shape: {output.shape}"
    
    # Test 2: Ensure Output is a Tensor of Correct Type
    assert isinstance(output, torch.Tensor), "Output is not a tensor"
    assert output.dtype == torch.float32, f"Unexpected dtype: {output.dtype}"
    
    # Test 3: Check if the Same Token Index Maps to the Same Embedding
    index = torch.tensor([[5]])
    embedding_1 = embedding_layer(index)
    embedding_2 = embedding_layer(index)
    assert torch.allclose(embedding_1, embedding_2), "Embeddings should be identical for the same index"
    
    # Test 4: Check if Different Indices Give Different Embeddings
    index1 = torch.tensor([[5]])
    index2 = torch.tensor([[8]])
    embedding_1 = embedding_layer(index1)
    embedding_2 = embedding_layer(index2)
    assert not torch.allclose(embedding_1, embedding_2), "Different indices should have different embeddings"
    
    # Test 5: Check if Gradients are Computed
    loss = output.sum()
    loss.backward()
    assert embedding_layer.embedding.weight.grad is not None, "Gradients should not be None"
    assert embedding_layer.embedding.weight.grad.shape == (vocab_size, d_model), "Gradient shape mismatch"
    
    print("✅ All tests passed successfully!")

# Run all tests
run_tests()


✅ All tests passed successfully!


In [17]:
embedding_layer = TokenEmbedding(vocab_size=10, d_model=3)
embedding_layer(torch.tensor(5))

tensor([ 0.0233,  0.3579, -0.8141], grad_fn=<EmbeddingBackward0>)

In [None]:
import torch
import torch.nn as nn

class PositionalEncoding(nn.Module):
    def __init__(self, d_model: int, max_len: int = 5000):
        """
        Initializes positional encoding.

        Args:
            d_model (int): Dimension of the embedding vectors.
            max_len (int): Maximum sequence length.
        """
        super().__init__()

        # TODO: Create a positional encoding matrix of shape (max_len, d_model)
        positions = torch.tensor
        self.pe = None  

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        Adds positional encoding to the input embeddings.

        Args:
            x (torch.Tensor): Tensor of shape (batch_size, seq_len, d_model) containing input embeddings.

        Returns:
            torch.Tensor: Tensor of shape (batch_size, seq_len, d_model) with positional encodings added.
        """
        # TODO: Add positional encodings to input embeddings
        return None  
