# LoRA for Sentiment Analysis
ðŸ“— You can find an interactive Colab version of this tutorial [here](https://colab.research.google.com/github/ndif-team/nnsight/blob/main/docs/source/notebooks/tutorials/LoRA_tutorial.ipynb).


[Low Rank Adaptation (LoRA)](https://github.com/microsoft/LoRA) is a technique used to modify and fine tune large language models in a more efficient way. Rather than modifying all of the model weights, LoRAs find two low dimensional matrices that have the lowest rank. It then multiplies the two matrices to find the fine tuned weight matrix. This fine tuned weight matrix will be the same size as the original pre trained weight matrix. Once the fine tuned matrix has been found it can then be applied to the model's layers.

![TRAIN FIGURE](images/LoRA_tutorial_figure_1.png)

<br>
<br>


Fine tuning with a LoRA is a part of the [Parameter Efficient Fine Tuning (PEFT)](https://github.com/huggingface/) family because it keeps the original model unchanged and introduces a small number of layers or parameters instead. Once the fine tuned matrix has been calculated, it is applied to the last Multilayer Perceptron (MLP) layer of the model. Once the LoRA has been applied, the model is fine tuned based on a knowledge base or domain specific dataset.


![TEST FIGURE](images/LoRA_tutorial_figure_2.png)

# Setup

Make sure you have obtained your [NDIF API key](https://login.ndif.us/) and configured your workspace for [remote execution](https://nnsight.net/notebooks/features/remote_execution/).

In [None]:
from IPython.display import clear_output
!pip install nnsight

clear_output()

In [None]:
!pip install --upgrade transformers torch
!pip install pyarrow==15.0.2
!pip install datasets
!pip install datasets torch
!pip install -U bitsandbytes

!huggingface-cli login --token YOUR_HF_TOKEN_HERE
clear_output()

In [None]:
from nnsight import CONFIG

CONFIG.set_default_api_key('YOUR API KEY HERE')

!huggingface-cli login --token YOUR_HF_TOKEN_HERE # <- Copy your hugging face token here
clear_output()

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


Here are the imports needed for this tutorial.

In [None]:
import torch
import torch.nn as nn
import pandas as pd
from nnsight import LanguageModel
from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoModelForCausalLM
from transformers import TrainingArguments, Trainer
from torch.utils.data import DataLoader, Subset
from datasets import load_dataset

# Prepare Data

For this tutorial we will be using the The Stanford Sentiment Treebank (SST2). It consists of sentences from movie reviews and human annotations of their sentiment. The task is to predict the sentiment of a given sentence as being either positive or negative. In the dataset, the positive/negative labels of each phrase are represented by a 0 for each negative statement and a 1 for each positive statement.


In [None]:
# GLUE is a standard Natural Language Processing (NLP) benchmark which is commonly used for sentiment analysis tasks.
# It is responisble for assessing the effectiveness of language models across various NLP tasks.
# It serves as a standard for evaluating a model's ability to understand and process language.
dataset = load_dataset("glue", "sst2")

# 0 = neg, 1 = pos
def label_to_str(example):
    example['label'] = 'positive' if example['label'] == 1 else 'negative'
    return example

train_data = [(dataset['sentence'], 'positive' if dataset['label'] == 1 else 'negative') for dataset in dataset['train']]
validation_data = [(dataset['sentence'], 'positive' if dataset['label'] == 1 else 'negative') for dataset in dataset['validation']]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/35.3k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/3.11M [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/72.8k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/148k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/67349 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/872 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1821 [00:00<?, ? examples/s]

Next, we need to tokenize our data. Tokenizing involves converting text into a numerical representation. It is a popular technique in NLP because it helps the models better understand the text and output a more accurate result.

In [None]:
tokenizer = AutoTokenizer.from_pretrained('openai-community/gpt2', add_prefix_space=True)
tokenizer.pad_token = tokenizer.eos_token

# Uses the tokenizer from the model to tokenize a given sentence with padding and truncation
def tokenize_function(text):
  return tokenizer(text['sentence'], padding='max_length', truncation=True, max_length=10, return_tensors='pt')

# We use .map() in order to apply the tokenization function to all the training data.
#tokenized_train = map(tokenize_function, train_data)
tokenized_train_dataset = dataset['train'].map(tokenize_function, batched=True, batch_size=10)
tokenized_train_dataset = tokenized_train_dataset.map(lambda x: {'input_ids': x['input_ids'], 'attention_mask': x['attention_mask'], 'labels': x['label']})


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Map:   0%|          | 0/67349 [00:00<?, ? examples/s]

Map:   0%|          | 0/67349 [00:00<?, ? examples/s]

# Prepare our Model

For this tutorial we will be using the [Llama-70B](https://huggingface.co/meta-llama/Llama-2-70b) language model.


In [None]:
# Use the LanguageModel wrapper class to load in the Llama model
model_name = "meta-llama/Meta-Llama-3.1-70B"
model = LanguageModel(model_name, device_map='auto')

config.json:   0%|          | 0.00/826 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/301 [00:00<?, ?B/s]

This is the model architechure before the LoRA has been applied. After the model has been fine tuned with the LoRA, the last MLP layer of the model will be replaced with the LoRA.

<br>
<br>

Weâ€™re going to train a very simple LORA that, when applied, will make our model determine whether a sentence is displaying a positive sentiment or a negative sentiment.

In [None]:
from nnsight.envoy import Envoy

# We will define a LORA class.
# The LORA class call method operations are simply traced like you would normally do in a .trace.
class LORA(nn.Module):
    def __init__(self, module: Envoy, dim: int, r: int) -> None:
        """Init.

        Args:
            module (Envoy): Which model Module we are adding the LORA to.
            dim (int): Dimension of the layer we are adding to (This could potentially be auto populated if the user scanned first so we know the shape)
            r (int): Inner dimension of the LORA
        """
        super(LORA, self).__init__()
        self.r = r
        self.module = module
        self.WA = torch.nn.Parameter(torch.randn(dim, self.r), requires_grad=True).save()
        self.WB = torch.nn.Parameter(torch.zeros(self.r, dim), requires_grad=True).save()

    # The Call method defines how to actually apply the LORA.
    # happens after the forward pass
    def __call__(self, alpha: float = 1.0):
        """Call.

        Args:
            alpha (float, optional): How much to apply the LORA. Can be altered after training for inference. Defaults to 1.0.
        """

        # We apply WA to the first positional arg (the hidden states)
        A_x = torch.matmul(self.module.input, self.WA)
        BA_x = torch.matmul(A_x, self.WB)

        # LORA is additive
        h = BA_x + self.module.output

        # Replace the output with our new one * alpha
        # Could also have been self.module.output[:] = h * alpha, for in-place
        self.module.output = h * alpha

    def parameters(self):
        # Some way to get all the parameters.
        return [self.WA, self.WB]

# LLM Fine Tuning


In [None]:
# Inner LORA dimension
lora_dim = 4

# Module to train LORA on
# Accesses the last mlp layer of the model
module = model.model.layers[-1].mlp

We can use the `.scan()` method to get the shape of the module without having to fully run the model.

In [None]:
with model.scan(" "):
    dim = module.output.shape[-1]

print(dim)

Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)


8192


In [None]:
# The LORA object itself isn't transmitted to the server. Only the forward / call method.
# The parameters are created remotely and never sent only retrieved
with model.session(remote=True) as session:

    dataset = tokenized_train_dataset

    # Smaller chunks to run faster, feel free to increase
    indices = list(range(0, 5000))
    subset = Subset(dataset, indices)


    # Create a dataloader from it.
    dataloader = DataLoader(subset, batch_size=10)

    # Create our LORA on the last mlp and apply it to the model
    lora = LORA(module, dim, lora_dim)

    # Create an optimizer. Use the parameters from LORA
    optimizer = torch.optim.AdamW(lora.parameters(), lr=3)

    # Iterate over dataloader using .iter.
    with session.iter(dataloader, return_context=True) as (batch, iterator):

        # Accesses the phrase that contains either a positive/negative sentiment
        prompt = batch['sentence']

        # Determines whether the phrase is positive/negative
        correct_token = batch['label']


        # Run .trace with prompt
        with model.trace(prompt) as tracer:


            # Apply LORA to intervention graph just by calling it with .trace
            # This is invoke the __call__() method of the LORA class defined above
            lora()


            # Get logits
            # Logits are the output of the neural network before the
            # activation function has been applied.
            logits = model.lm_head.output


            # Do cross entropy on last predicted token and correct_token
            loss = torch.nn.functional.cross_entropy(logits[:, -1], batch['label'])

            # Call backward
            loss.backward()


        # Call methods on optimizer. Graphs that arent from .trace (so in this case session and iterator both have their own graph) are executed sequentially.
        # The Graph of Iterator here will be:
        # 1.) Index batch at 0 for prompt
        # 2.) Index batch at 1 for correct_token
        # 3.) Execute the .trace using the prompt
        # 4.) Call .step() on optimizer
        optimizer.step()
        # 5.) Call .zero_grad() in optimizer
        optimizer.zero_grad()
        # 6.) Print out the lora WA weights to show they are indeed changing
        iterator.log(lora.WA)



[1;30;43mStreaming output truncated to the last 5000 lines.[0m
INFO:nnsight_remote:a61583b9-75eb-4f84-b103-dd6c1b3ba32a - LOG: Parameter containing:
tensor([[-1.4609,  0.8828,  0.3320,  0.0106],
        [ 3.3281, -0.1050,  1.3281,  2.9062],
        [ 1.9844, -0.1611,  0.3496,  1.0938],
        ...,
        [-1.7109,  0.3262, -1.0625, -2.0469],
        [ 1.5391,  0.9219,  0.8750,  1.9531],
        [ 2.0156,  1.1953,  1.9453,  2.1562]], requires_grad=True)
2024-10-08 15:05:50,269 a61583b9-75eb-4f84-b103-dd6c1b3ba32a - LOG: Parameter containing:
tensor([[-1.4688,  0.9023,  0.3398,  0.0243],
        [ 3.2969, -0.0067,  1.3750,  2.9219],
        [ 1.9453, -0.1621,  0.3398,  1.0781],
        ...,
        [-1.6562,  0.2432, -1.1016, -2.0469],
        [ 1.4844,  0.9180,  0.8398,  1.9453],
        [ 1.9766,  1.0859,  1.8203,  2.1406]], requires_grad=True)
INFO:nnsight_remote:a61583b9-75eb-4f84-b103-dd6c1b3ba32a - LOG: Parameter containing:
tensor([[-1.4688,  0.9023,  0.3398,  0.0243],
       

In [None]:
print(model)

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 8192)
    (layers): ModuleList(
      (0-79): 80 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=8192, out_features=8192, bias=False)
          (k_proj): Linear(in_features=8192, out_features=1024, bias=False)
          (v_proj): Linear(in_features=8192, out_features=1024, bias=False)
          (o_proj): Linear(in_features=8192, out_features=8192, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=8192, out_features=28672, bias=False)
          (up_proj): Linear(in_features=8192, out_features=28672, bias=False)
          (down_proj): Linear(in_features=28672, out_features=8192, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((8192,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((8192,), eps=1e-05)
      )
    )
    (n

In addition to the weights changing, we know the LoRA has been applied because there is a difference in the model's architecture. The 11th block of the model no longer has the standard MLP layer and instead contains the LoRA.


Now it is time to test out whether our fine tuned model is able to predict the sentiment of a given sentence.

In [None]:
# With lora. Will output "negative".
with model.generate("I'm upset", remote=True) as generator:
  lora()
  out = model.lm_head.output.save()

# The model outputs the sentiment as tokens first.
token_ids = out.argmax(dim=-1)

# Convert the tokens to either positive or negative
count_positive = (token_ids == 1).sum().item()
count_negative = (token_ids == 0).sum().item()

# Determine the overall sentiment of the entire sentence
if count_positive > count_negative:
  print("\nPrediction with LoRA: Positive\n")
else:
  print("\nPrediction with LoRA: Negative\n")

# Then without. It will try to complete the sentence rather than output the
# sentiment analysis.

with model.generate("I'm upset", remote=True) as generator:
    out = model.lm_head.output.save()

print("\nPrediction without LoRA:", model.tokenizer.decode(out.argmax(dim=-1)[0]))

2024-10-08 15:16:19,547 1e738b58-e05d-47f9-93c4-fb9ae84602b9 - RECEIVED: Your job has been received and is waiting approval.
INFO:nnsight_remote:1e738b58-e05d-47f9-93c4-fb9ae84602b9 - RECEIVED: Your job has been received and is waiting approval.
2024-10-08 15:16:19,586 1e738b58-e05d-47f9-93c4-fb9ae84602b9 - RUNNING: Your job has started running.
INFO:nnsight_remote:1e738b58-e05d-47f9-93c4-fb9ae84602b9 - RUNNING: Your job has started running.
2024-10-08 15:16:19,598 1e738b58-e05d-47f9-93c4-fb9ae84602b9 - APPROVED: Your job was approved and is waiting to be run.
INFO:nnsight_remote:1e738b58-e05d-47f9-93c4-fb9ae84602b9 - APPROVED: Your job was approved and is waiting to be run.
2024-10-08 15:16:20,109 1e738b58-e05d-47f9-93c4-fb9ae84602b9 - COMPLETED: Your job has been completed.
INFO:nnsight_remote:1e738b58-e05d-47f9-93c4-fb9ae84602b9 - COMPLETED: Your job has been completed.
Downloading result: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 1.03M/1.03M [00:00<00:00, 1.98MB/s]



Prediction with LoRA: Negative



2024-10-08 15:16:22,933 1ad601ee-b03b-4e6c-9f43-6dcd3cc9a02f - RECEIVED: Your job has been received and is waiting approval.
INFO:nnsight_remote:1ad601ee-b03b-4e6c-9f43-6dcd3cc9a02f - RECEIVED: Your job has been received and is waiting approval.
2024-10-08 15:16:25,291 1ad601ee-b03b-4e6c-9f43-6dcd3cc9a02f - APPROVED: Your job was approved and is waiting to be run.
INFO:nnsight_remote:1ad601ee-b03b-4e6c-9f43-6dcd3cc9a02f - APPROVED: Your job was approved and is waiting to be run.
2024-10-08 15:16:25,302 1ad601ee-b03b-4e6c-9f43-6dcd3cc9a02f - RUNNING: Your job has started running.
INFO:nnsight_remote:1ad601ee-b03b-4e6c-9f43-6dcd3cc9a02f - RUNNING: Your job has started running.
2024-10-08 15:16:25,478 1ad601ee-b03b-4e6c-9f43-6dcd3cc9a02f - COMPLETED: Your job has been completed.
INFO:nnsight_remote:1ad601ee-b03b-4e6c-9f43-6dcd3cc9a02f - COMPLETED: Your job has been completed.
Downloading result: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 1.03M/1.03M [00:00<00:00, 2.59MB/s]


Prediction without LoRA: Question have a that



