<a href="https://colab.research.google.com/github/githubpradeep/notebooks/blob/main/LLM_TinyStories_Compression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install rouge
!pip install -q bitsandbytes datasets accelerate loralib
!pip install -q git+https://github.com/huggingface/transformers.git@main git+https://github.com/huggingface/peft.git
# install additional dependencies needed for training
!pip install rouge-score tensorboard py7zr
!pip install datasets

In [None]:
!pip install einops

Collecting einops
  Downloading einops-0.6.1-py3-none-any.whl (42 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/42.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: einops
Successfully installed einops-0.6.1


In [1]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

In [2]:
torch.set_default_device('cuda')


In [5]:
torch.set_default_device('cuda')
model = AutoModelForCausalLM.from_pretrained("roneneldan/TinyStories-33M", trust_remote_code=True, torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("roneneldan/TinyStories-33M", trust_remote_code=True, torch_dtype="auto")
inputs = tokenizer('''once upon a time''', return_tensors="pt", return_attention_mask=False)

outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


once upon a time, there was a little girl named Lily. She loved to play outside in the sunshine. One day, she saw a big, scary dog. The dog barked and growled at her. Lily was scared and ran away.

Later that day, Lily's mom asked her to help with the laundry. Lily was happy to help and started to sort the clothes. She found a shirt that was too small for her, but she was proud to find it.

As the sun started to set, Lily's mom called her inside. She took off her wet clothes and put them in the dryer. Lily was happy to be inside where it was warm and cozy. She went to bed that night feeling proud of herself for helping with the laundry.
<|endoftext|>


In [6]:
from dataclasses import dataclass

@dataclass
class LowRankConfig:
    rank:int
    target_modules: list[str]

In [7]:
model

GPTNeoForCausalLM(
  (transformer): GPTNeoModel(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(2048, 768)
    (drop): Dropout(p=0.0, inplace=False)
    (h): ModuleList(
      (0-3): 4 x GPTNeoBlock(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPTNeoAttention(
          (attention): GPTNeoSelfAttention(
            (attn_dropout): Dropout(p=0.0, inplace=False)
            (resid_dropout): Dropout(p=0.0, inplace=False)
            (k_proj): Linear(in_features=768, out_features=768, bias=False)
            (v_proj): Linear(in_features=768, out_features=768, bias=False)
            (q_proj): Linear(in_features=768, out_features=768, bias=False)
            (out_proj): Linear(in_features=768, out_features=768, bias=True)
          )
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPTNeoMLP(
          (c_fc): Linear(in_features=768, out_features=3072, bias=True)
          (c_proj): Linear(in_feat

In [43]:
#low rank decomposition of SelfAttention Key, Query and Value Matrices
config = LowRankConfig(
    rank= 384,
    target_modules=["k_proj", "v_proj", "q_proj"]
)

In [44]:
from torch import nn
from dataclasses import dataclass
from torch.nn import functional as F
class LowRankLayer(nn.Module):
    """given a linear layer find low rank decomposition"""
    def __init__(self, rank, full_rank_layer):
        super().__init__()
        self.rank = rank

        U, S, Vh = torch.linalg.svd(full_rank_layer.weight.float())
        S_diag = torch.diag(S)
        self.U = U[:, :self.rank]
        self.S = S_diag[:self.rank, :self.rank]
        self.Vh = Vh[:self.rank, :]

    def forward(self, x):
        aprox_weight_matrix = self.U @ self.S @ self.Vh
        output = F.linear(x, aprox_weight_matrix)
        return output


In [45]:
#find the module that ends target suffix
def get_submodules(model, key):
    parent = model.get_submodule(".".join(key.split(".")[:-1]))
    target_name = key.split(".")[-1]
    target = model.get_submodule(key)
    return parent, target, target_name

# this function replaces a target layer with low rank layer
def recursive_setattr(obj, attr, value):
    attr = attr.split('.', 1)
    if len(attr) == 1:
        setattr(obj, attr[0], value)
    else:
        recursive_setattr(getattr(obj, attr[0]), attr[1], value)


In [46]:
import copy
model_lr = copy.deepcopy(model)


In [47]:
for key, module in model.named_modules():
    target_module_found = any(key.endswith("." + target_key) for target_key in config.target_modules)
    if target_module_found:
        low_rank_layer = LowRankLayer(config.rank, module)
        #replace target layer with low rank layer
        recursive_setattr(model_lr, key, low_rank_layer)

In [48]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [49]:
print_trainable_parameters(model)

trainable params: 68514048 || all params: 68514048 || trainable%: 100.0


In [50]:
print_trainable_parameters(model_lr)

trainable params: 61436160 || all params: 61436160 || trainable%: 100.0


In [55]:
1-61436160/68514048

0.10330564616471061

In [52]:
model.save_pretrained("model", from_pt=True)


In [53]:
model_lr.save_pretrained("model_lr", from_pt=True)


In [54]:
!ls -lh model/pytorch_model.bin

NotImplementedError: ignored

In [None]:
!ls -lh model_lr/pytorch_model.bin

In [19]:
1-2.1/2.7

0.2222222222222222

In [58]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

torch.set_default_device('cuda')
inputs = tokenizer('''once upon a time''', return_tensors="pt", return_attention_mask=False)

outputs = model_lr.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


once upon a time there was a little girl called Lucy. She was very curious and loved to explore.

One day, Lucy went to the park. She saw a big tree with a big, long rope. She wanted to climb it, so she did. She felt very brave and happy.

Suddenly, she heard a voice. It was a little bird. The bird said, "Be careful, Lucy! You must be careful when you climb the rope."


The bird said, "You're welcome. I'm glad you're careful. Now, let's go explore the park together."

Lucy and the bird had lots of fun playing together. They had a lot of fun.

The end.
<|endoftext|>


In [60]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

torch.set_default_device('cuda')
inputs = tokenizer('''A goat was''', return_tensors="pt", return_attention_mask=False)

outputs = model_lr.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


A goat was eating the grass. It was very hungry. It saw the children and started to run towards them. The children were scared and ran away. The goat was left alone. The goat was still hungry and looked for something else to eat. It was very hungry and sad.

The goat saw a small house with a garden. It went inside and saw a big bowl of food. It smelled something delicious inside the house. The goat was very happy and started to eat the food. It ate and ate until it was full. The goat ate all the food in the garden. It was very full and happy.

But then, the goat heard a loud noise. It turned around and saw a big storm coming. The goat was scared and tried to run away. But it was too late. The storm was too strong and the goat got hurt. The goat was very sad and wished it had never come to the house. It wished it had never left it alone.

