<a href="https://colab.research.google.com/github/kapilsh/gpt-oss-scratch/blob/main/gpt_oss_from_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GPT OSS Architecture

![GPT OSS Architecture](./resources/AI%20Knowledge%20Bank.jpg)

Let's build out these components one by one. In some cases, we can just see how the individual modules will work but there are already PyTorch version available so we will use that at the end

## RMS Norm

https://arxiv.org/pdf/1910.07467

![RMS Norm](./resources/rms_norm.png)

PyTorch already has an implementation of RMSNorm in `torch.nn.RMSNorm` but we can check our sample implementation wrt PyTorch one

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class RMSNorm(nn.Module):
    def __init__(self, embedding_dimension: int, eps: float = 1e-5):
        super().__init__()
        self.eps = eps
        self.embedding_dimension = embedding_dimension
        self.weight = nn.Parameter(torch.ones(embedding_dimension))

    def forward(self, x: torch.Tensor):
        means = x.pow(2).mean(dim=-1, keepdim=True)
        return (x * torch.rsqrt(means + self.eps)) * self.weight

In [3]:
torch.manual_seed(123)

example_batch = torch.randn(2, 3, 4)

rms_norm = RMSNorm(embedding_dimension=example_batch.shape[-1])
rmsnorm_pytorch = torch.nn.RMSNorm(example_batch.shape[-1], eps=1e-5)

assert torch.allclose(rms_norm(example_batch), rmsnorm_pytorch(example_batch))