# **Understanding AI Systems for Cybersecurity: How a Large Language Model (LLM) works?**

*Disclaimer: This article is made for educational purpose in the context of Cybersecurity and not replace a real training or course on Deep Learning. What's more, I've sometimes used ChatGPT or Claude to briefly explain a specific concept because I consider them to be better at it than I am.*

This notebok follows on from the first part of an article you can read here: [Medium Article](https://medium.com/@loicmartins/understanding-ai-systems-for-cybersecurity-how-a-large-language-model-llm-works-part-1-c9bbac1728c2)

The aim of these two articles is to understand the overall architecture of an LLM, rather than the complete infrastructure, in order to identify potential vulnerabilities. In the last article, I said that the key is to understand that an LLM encodes representations of the world and its output are based on these representations. If you use ChatGPT, Claude or Le Chat, you'll see that they're the same technology (almost), but that the 3 models behave differently.

In the context of cybersecurity, the threat can be different, because, of course, we can find vulnerabilities in the overall infrastructure, but also within the model itself. If you're trying to exploit vulnerabilities in a web application or network, the aim may be to shut down the entire system or steal data and information. But for an LLM it can be completely different: you can modify the output of the model or “spy the input”.

Modifying the output means modifying these internal representations of the model. This can be compared to manipulating a person. By talking to them, you try to change their state of mind, so that they change their behavior. And that can be very dangerous, because you can leave a model on the market that manipulates people for various purposes: politics, scams, obtaining information...

That's why, in the context of cybersecurity, it's important to understand how LLM works. The level of knowledge of their inner workings you want to have will depend on your objectives (see part 1).

In this article, I will show you the code behind the GPT-2 model and I will comment every steps.

The code of this section is inspired by 3 main ressources:
- Video - Andrej Karpathy - Let's build GPT: from scratch, in code, spelled out - https://www.youtube.com/watch?v=kCc8FmEb1nY - Jan 17, 2023
- Book - Sebastian Raschka - Build a Large Language Model (From Scratch) - October 29, 2024
- Course - Mike X Cohen - A deep understanding of deep learning - https://www.udemy.com/course/deeplearning_x/



---



## **The spine of a GPT-2 Model**

Here is the `GPTModel` class that represents the spine of the model. When you see a number, like `(1)`, you can read the explanations below the code.

Let's take a deep dive into the code, written in Python using PyTorch Library.

*Disclaimer: we won’t talk about the programming or software engineering part, just the architecture of the model*.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class GPTModel(nn.Module):

    # (1) 2 Parts

    def __init__(self, config):
        super().__init__()

        # (2) Embedding layer
        self.tok_emb = nn.Embedding(config["vocab_size"], config["emb_dim"])
        self.pos_emb = nn.Embedding(config["context_length"], config["emb_dim"])

        # (3) Dropout
        self.drop_emb = nn.Dropout(config["drop_rate"])


        # (4) Transformer Blocks
        self.trf_blocks = nn.Sequential(
            *[TransformerBlock(config) for _ in range(config["n_layers"])])

        # (5) Normalization Layer
        self.final_norm = LayerNorm(config["emb_dim"])

        # (6) Output layer
        self.out_head = nn.Linear(config["emb_dim"], config["vocab_size"], bias=False)


    def forward(self, in_idx):

        batch_size, seq_len = in_idx.shape

        # (7) Embedding Layer
        tok_embeds = self.tok_emb(in_idx) #1
        pos_embeds = self.pos_emb(torch.arange(seq_len, device=in_idx.device)) #2
        x = tok_embeds + pos_embeds #3

        # (8) Regularization Layer
        x = self.drop_emb(x)

        # (9) Transformer Block
        x = self.trf_blocks(x)

        # (8) Normalization Layer
        x = self.final_norm(x)

        # (10) Output Layer
        logits = self.out_head(x)

        return logits


### **(1) 2 Parts**

In our GPT Model we have 2 parts:

- `__init__`
    * The constructor for the model class.
    * Define the layers and components of the model.
    * Used for initializing all the model components.

- `Forward`
    * Defines the forward pass of the model.
    * How the input data flows through the layers that were defined in __init__.

The first part corresponds to the instantiation of a `GPTModel` object. When you run this code, `model = GPTModel()`, here's what happens:

## **First  Part - Constructor**

### **(2) Embedding layer**

As a reminder, this layer transforms a raw input, such as a question in human language, into an embedding matrix:
each token has a token ID, and is represented as an n-dimensional vector.

This layer initializes two matrices with random numbers (specific methods exist):

*   `tok_emb`
  *   Captures the various features for each tokens.
  *   Rows: a token ID for each row, corresponding to the model's vocabulary.
  *   Columns: n-dimensional columns which are numbers.

*   `pos_emb`
  *   Encodes the position of the token in the sentence.
  *   Rows: equal to the number of the context length (=reference to the total number of tokens allowed by the model).
  *   Columns: same dimensions as those of tok_emb.

More information here: [Less Wrong - Article - LLM Basics: Embedding Spaces - Transformer Token Vectors Are Not Points in Space](https://www.lesswrong.com/posts/pHPmMGEMYefk9jLeh/llm-basics-embedding-spaces-transformer-token-vectors-are#1_2_The_Input_Embedding_Matrix)


### **(3) Dropout**

It’s a regularization method used for preventing overfitting, enhancing generalization, and stabilizing training. We initialize the method with a specific drop rate.

More information here: [DeepLearning AI - Video - Understanding Dropout (C2W1L07)](https://www.youtube.com/watch?v=ARq74QuavAo)

### **(4) Transformer Blocks**

This layer initializes the transformer blocks. We have a For Loop, so we can instantiate several transformer blocks objects.

As you know it’s the most important part of the model. This layer allows the model to:
*   Transform input elements into enhanced context vector representations that incorporate information about all inputs.
*   Identify and analyzes relationships between elements in the input sequence.

This part of the model is very complex, and we have several other classes hidden behind the `TransformerBlock()` class. To summarize:


*   Inside this class we have 3 main layers:
  *   MultiHeadAttention
  *   FeedForward
  *   LayerNorm
*   When the `self.trf_blocks` object is instantiated using the `TransformerBlock()` class, the model initializes different weight matrices, with random numbers (specific methods exist), inside these different layers.


If you want to see the code of the `TransformerBlock()` class, you can look at the (9) in part 2.

More information here:


*   [3Blue1Brown - Video - Transformers (how LLMs work) explained visually | DL5](https://www.youtube.com/watch?v=wjZofJX0v4M)
*   [3Blue1Brown - Video - How might LLMs store facts | DL7](https://www.youtube.com/watch?v=9-Jl0dxWQs8)



### **(5) Normalization Layer**

It’s a normalization method to keep the activations stable across different layers.
We initialize a `final_norm` object using the customize `LayerNorm()` class (code not in this article).

More information here: [PyTorch - Documentation - LayerNorm](https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html)

### **(6) Output layer**

Linear Layer that converts the transformer output (a sequence of vectors) into logits for each token in the vocabulary. The output logits represents the next token’s unnormalized probabilities.

**In the context of cybersecurity, the Embedding layer and Transformation blocks seem to be the most critical, as they contain the model representations.**



---



## **Second  Part - Forward Method**

We'll now analyze the second part of the GPT model class, the `Forward` function.
When you run the model, for training or generate text, you pass the input (token ID because we process text before) as parameter and here's what happens next:

### **(7) Embedding Layer**

This first step is divided into three sub-steps:

1.   We take the matrix `tok_embeds` (shape of the vocabulary), and select the rows corresponding to the token ID of the input. We obtain a new matrix of the size of the context length.
2.   We take the `pos_emb` matrix (shape of the context length), and select a row whose index corresponds to the place of a token ID in the input sequence: select the row index 0 for the first token ID in the sequence.
3.   We add these two matrices together, and we have a matrix with a number of rows corresponding to the context length and a number of columns corresponding to the number of dimensions we've chosen.


### **(8) Regularization and Normalization**

These two layers are different, but they represent mathematical concepts that promote more stability and good results during training.



*   **Normalization:** subtract the mean and divide by the standard deviation (square root of the variance). Then, we can scale and shift the result using two weight matrices.
*   **Regularization Dropout:** drop a unit (along with connections) at training time with a specified probability p.

### **(9) Transformer Block**

In the `GPTModel()` Class, when we call the `TransformerBlock()` class, this is what the code looks like:

In [None]:
class TransformerBlock(nn.Module):

    def __init__(self, cfg):

        super().__init__()

        # (9.1) Multi-Head Attention Layer
        self.att = MultiHeadAttention(
            d_in=cfg["emb_dim"],
            d_out=cfg["emb_dim"],
            context_length=cfg["context_length"],
            num_heads=cfg["n_heads"],
            dropout=cfg["drop_rate"],
            qkv_bias=cfg["qkv_bias"])

        # (9.2) Feed Forward Layer
        self.ff = FeedForward(cfg)

        # (9.3) Regularization and Normalization
        self.norm1 = LayerNorm(cfg["emb_dim"])
        self.norm2 = LayerNorm(cfg["emb_dim"])
        self.drop_shortcut = nn.Dropout(cfg["drop_rate"])


    def forward(self, x):

      # (9.5) Forward Pass

        # (9.4) Shortcut connection
        shortcut = x
        x = self.norm1(x)
        x = self.att(x)
        x = self.drop_shortcut(x)
        x = x + shortcut

        shortcut = x
        x = self.norm2(x)
        x = self.ff(x)
        x = self.drop_shortcut(x)
        x = x + shortcut

        return x



The transformer is a specific architecture and is not only used in LLMs. It's important to note that, in the case of GPT-2, we only have the decoder part of the Transformer architecture (not the encoder part).

This Layer is divided in 3 main parts:

1.   Multi-Head Attention Layer
2.   Feed Forward Layer
3.   Regularization and Normalization

**(9.1) Multi-Head Attention Layer**

This layer is a custom class represented by the code below:

In [None]:
class MultiHeadAttention(nn.Module):

    def __init__(self, d_in, d_out, context_length, dropout, num_heads, qkv_bias=False):

        super().__init__()
        assert d_out % num_heads == 0, "d_out must be divisible by num_heads"

        self.d_out = d_out
        self.num_heads = num_heads
        self.head_dim = d_out // num_heads

        # (a)
        self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias)
        self.W_key = nn.Linear(d_in, d_out, bias=qkv_bias)
        self.W_value = nn.Linear(d_in, d_out, bias=qkv_bias)

        # (b)
        self.out_proj = nn.Linear(d_out, d_out)
        self.dropout = nn.Dropout(dropout)

        # (d)
        self.register_buffer("mask", torch.triu(torch.ones(context_length, context_length), diagonal=1))

    def forward(self, x):

        b, num_tokens, d_in = x.shape

        # (a)
        keys = self.W_key(x)
        queries = self.W_query(x)
        values = self.W_value(x)

        keys = keys.view(b, num_tokens, self.num_heads, self.head_dim)
        values = values.view(b, num_tokens, self.num_heads, self.head_dim)
        queries = queries.view(b, num_tokens, self.num_heads, self.head_dim)

        keys = keys.transpose(1, 2)
        queries = queries.transpose(1, 2)
        values = values.transpose(1, 2)

        # (c)
        attn_scores = queries @ keys.transpose(2, 3)

        # (d)
        mask_bool = self.mask.bool()[:num_tokens, :num_tokens]
        attn_scores.masked_fill_(mask_bool, -torch.inf)

        # (e)
        attn_weights = torch.softmax(attn_scores / keys.shape[-1]**0.5, dim=-1)
        attn_weights = self.dropout(attn_weights)

        # (f)
        context_vec = (attn_weights @ values).transpose(1, 2)
        context_vec = context_vec.contiguous().view(b, num_tokens, self.d_out)
        context_vec = self.out_proj(context_vec)  # optional projection

        return context_vec

As you can see, the code is longer and more complex. I can't go into all the details in this blog post, but the attention mechanism is fundamental to LLM architecture.
The attention mechanism allows to assess and learn the relationships and dependencies between various parts of the input itself. To be clear, it allows the model to try to understand the context and not just the word itself.

To sum up, here are the main parts of the code:

*   (a) = 3 weight matrices representing the foundations of the attention layer. Each matrix has a specific role, enabling the model to identify specific features.
*   (b) = output and dropout layers as we saw previously ((6) and (3)).
*   (c) = compute attention score, that measures how relevant a given token (or word, subwords...) in the input sequence is to another token.
*   (d) = causal attention mask. We want the self-attention mechanism to consider only the tokens that appear prior to the current position when predicting the next token in a sequence. We don’t want later words influence earlier words, so we hide the next word(s).
*   (e) = compute attention weights, that is a normalized version of the attention score.
*   (f) = calculate the context vector, which is the output of the self-attention layer. This is an embedding vector enriched by the incorporation of information from all the other elements in the sequence.

To illustrate the role of this layer, we can say that before the Attention layer, the model identified the different words separately, and after this layer, it can encode the relationship between the different words in the sequence.

More information here: [3Blue1Brown - Video - Attention in transformers, step-by-step | DL6](https://www.youtube.com/watch?v=eMlx5fFNoYc)

**(9.2) Feed Forward Layer**

After the Multi-Head Attention Layer, we have a Feed Forward Layer. This layer can be thought of as a “basic” neural network layer (multilayer perceptron). In short, it takes the output of the multi-headed attention layer and projects it into a space n times as large, enabling exploration of a richer representation space.

In [None]:
class FeedForward(nn.Module):

    def __init__(self, cfg):
        super().__init__()

        self.layers = nn.Sequential(

            # (b)
            nn.Linear(cfg["emb_dim"], 4 * cfg["emb_dim"]),
            # (c)
            GELU(),
            nn.Linear(4 * cfg["emb_dim"], cfg["emb_dim"]),
        )

    def forward(self, x):
        return self.layers(x)

We have two parts:
*   (a) = Linear layer: expand the embedding dimension to increase the number of features.
*   (b) = Activation Function: mathematical function applied to the output of a neuron. It introduces non-linearity into the model.

More information here: [NVIDIA - Article - Linear/Fully-Connected Layers User's Guide](https://docs.nvidia.com/deeplearning/performance/dl-performance-fully-connected/index.html)

**(9.3) Regularization and Normalization**

Same Layer that we saw previously (see (3)(5)(8)).

**(9.4) Shortcut connection**

Like regularization and normalization, shortcut connection is a performance-enhancing method. It allows information and gradients to flow more easily through the network.  
We insert it between different layers of a neural network. This allows to skip some of the layers in the neural network and feeds the output of one layer as the input to the next layers.

More information here: [Analytics Vidhya - Article - What are Skip Connections in Deep Learning?](https://www.analyticsvidhya.com/blog/2021/08/all-you-need-to-know-about-skip-connections/)

**(9.5) Forward Pass**

Like for our GPT Model class, the forward pass function:


1.   Takes the input matrix.
2.   Normalizes it.
3.   Passes it through the Multi-Head Attention Layer, which output context vectors.
4.   Applies dropout.
5.   Adds the input of the first part of the function (shortcut connection).
6.   Normalizes it.
7.   Passes it through the Feed Forward Layer.
8.   Adds the input of the second part of the function.

*“The output of a Transformer block in an LLM is a sequence of contextualized embeddings, where each token’s representation incorporates information from other tokens in the sequence, allowing the model to capture long-range dependencies and complex relationships between words or subwords.” ChatGPT.*


This output of the transformer block is either:
*   used as input into a new transformer block or
*   used as input to the ouput layer.


### **(10) Output Layer**

This linear layer is similar to other linear layers, but in this case, it transforms the contextualized embeddings into logits, where each logit corresponds to the likelihood of a token in the vocabulary being the next token in the sequence.


It’s the end of the code!

## What to keep in mind?

The aim of this post was to give a first glimpse of what an LLM model looks like. As I said, it's a first look because:
*   GPT-2 is “old” now, and we can find more complex and powerful models.
*   We didn’t talk about training with backpropagation, fine tuning or hyper-parameters.

The second aim of this blog is to show that LLMs hide a mysterious part: the embedding and weight matrices, which enclose the "mind" of LLMs. It sounds “crazy”, but LLMs represent artificial cognition. It's not human cognition, but it's still a type of cognition. So, if we look at history and the world today, we can see how dangerous it can be to manipulate human cognition. It’s why, in cybersecurity, we need people that understand AI, because malicious actors can, for example:
*   Train their own model that talk like Hitler.
*   Access and try to modify the inner workings of the LLM to change the outputs. You might think of a mental health app, a financial chat bot or a programming co-pilot.

This is just the beginning, but a new form of hacking could emerge in the years to come. It won't just be about finding vulnerabilities in infrastructures, but perhaps also hacking into the mind of LLMs and AGI.