<a href="https://colab.research.google.com/github/syedwaseemjan/aim_exercises/blob/main/exercise_5/ai_makerspace.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Task Fine-tuning a GPT-style model using peft, transformers and bitsandbytes

**❓Question #1:**
What makes Llama 3 8B Instruct a good model to use for a summarization task?

1. Pre-training on Diverse Data
      Like many large language models, Llama 3 8B is trained on a broad and diverse dataset. This diverse training helps the model develop a strong understanding of different types of content and text structures, making it effective at summarizing various topics.

2. Model Size and Capabilities
      With 8 billion parameters, Llama 3 is large enough to have a deep understanding of language, context, and the nuances needed for effective summarization

3. Efficiency
      Larger models like Llama 3 8B can often produce high-quality summaries more efficiently in terms of both time and the required computational resources, relative to generating summaries manually or using less sophisticated tools.

## Setting Up Dependencies

In [1]:
!pip install -qU bitsandbytes datasets accelerate loralib peft transformers trl

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.8/119.8 MB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.8/547.8 kB[0m [31m47.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m309.4/309.4 kB[0m [31m37.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m251.6/251.6 kB[0m [31m33.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.3/9.3 MB[0m [31m102.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.7/226.7 kB[0m [31m29.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.8/40.8 MB[0m [31m15.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m16.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━

In [2]:
import torch
torch.cuda.is_available()

True

Time to import some dependencies!

In [3]:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
import torch
import torch.nn as nn
import bitsandbytes as bnb
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, BitsAndBytesConfig

# Task #1: Loading the Model

## Block-wise k-bit Quantization

In [5]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

## ❓Question #2:
What exactly is happening in the double quantization step?

> Double Quantization, a method that quantizes the quantization constants, saving an average
of about 0.37 bits per parameter (approximately 3 GB for a 65B model).

In [6]:
model_id = "NousResearch/Meta-Llama-3-8B-Instruct"
# model_id = "syedwaseemjan/llama38binstruct_summarize"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map='auto',
)

config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

In [7]:
tokenizer = AutoTokenizer.from_pretrained(model_id)

tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [8]:
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Model Architecture

In [9]:
print(model)

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): Ll

In [10]:
model.config

LlamaConfig {
  "_name_or_path": "NousResearch/Meta-Llama-3-8B-Instruct",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128009,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "float16",
    "bnb_4bit_quant_storage": "uint8",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": true,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": true,
    "load_in_8bit": false,
    "quant_method": "bitsandbytes"
 

## #### ❓Question #3:

![image](https://i.imgur.com/N8y2crZ.png)

Label the image with the appropriate layer from `NousResearch/Meta-Llama-3-8B-Instruct`'s architecture.

- Layer Norm:

  Layer normalization is applied at different points in the model. In your model, the following are layer normalization components:

  - `(input_layernorm): LlamaRMSNorm()`
  - `(post_attention_layernorm): LlamaRMSNorm()`
  - `(norm): LlamaRMSNorm()`

- Feed Forward:

   The feed-forward network is part of the LlamaMLP module, which consists of linear layers and activation functions:

  - `(mlp): LlamaMLP( (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False) (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False) (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False) (act_fn): SiLU() )`

- Masked Multi Self-Attention:

   This refers to the self-attention mechanism within the model:

   - `(self_attn): LlamaSdpaAttention( (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False) (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False) (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False) (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False) (rotary_emb): LlamaRotaryEmbedding() )`

- Text & Position Embed:

   This part of the model is responsible for embedding the input tokens and their positions:

  - `(embed_tokens): Embedding(128256, 4096)`

- Text Prediction:

   The final layer that makes the predictions based on the processed data is the language model head:

   - `(lm_head): Linear(in_features=4096, out_features=128256, bias=False)`

# Task #2: Data and Data Preparation

In [11]:
!git clone https://github.com/lauramanor/legal_summarization

Cloning into 'legal_summarization'...
remote: Enumerating objects: 31, done.[K
remote: Counting objects: 100% (6/6), done.[K
remote: Compressing objects: 100% (6/6), done.[K
remote: Total 31 (delta 2), reused 0 (delta 0), pack-reused 25[K
Receiving objects: 100% (31/31), 136.60 KiB | 752.00 KiB/s, done.
Resolving deltas: 100% (10/10), done.


In [12]:
import json

jsonl_array = []

with open('legal_summarization/tldrlegal_v1.json') as f:
  data = json.load(f)
  for key, value in data.items():
    jsonl_array.append(value)

In [13]:
from datasets import Dataset, load_dataset

legal_dataset = Dataset.from_list(jsonl_array)

In [14]:
legal_dataset

Dataset({
    features: ['doc', 'id', 'original_text', 'reference_summary', 'title', 'uid'],
    num_rows: 85
})

In [15]:
legal_dataset = legal_dataset.train_test_split(test_size=0.2)

In [16]:
legal_dataset_test_valid = legal_dataset["test"].train_test_split(test_size=0.5)

In [17]:
from datasets import DatasetDict

legal_dataset = DatasetDict({
    "train" : legal_dataset["train"],
    "test" : legal_dataset_test_valid["test"],
    "validation" : legal_dataset_test_valid["train"]
})

In [18]:
legal_dataset

DatasetDict({
    train: Dataset({
        features: ['doc', 'id', 'original_text', 'reference_summary', 'title', 'uid'],
        num_rows: 68
    })
    test: Dataset({
        features: ['doc', 'id', 'original_text', 'reference_summary', 'title', 'uid'],
        num_rows: 9
    })
    validation: Dataset({
        features: ['doc', 'id', 'original_text', 'reference_summary', 'title', 'uid'],
        num_rows: 8
    })
})

In [19]:
legal_dataset["train"][0]

{'doc': 'Google Play Game Services (May 15th, 2013)',
 'id': '546a72bb98d9d5a17e00040f',
 'original_text': 'for api clients that use their own avatar naming system in place of the user s google identity then you must make clear to users that their gameplay information will still be submitted to google and associated with their google identity and viewable within different google products.',
 'reference_summary': 'if using avatars usernames tell the user that their g identity will still be used by google.',
 'title': 'Privacy',
 'uid': 'legalsum67'}

In [41]:
legal_dataset['train'].to_pandas()

Unnamed: 0,doc,id,original_text,reference_summary,title,uid
0,"Google Play Game Services (May 15th, 2013)",546a72bb98d9d5a17e00040f,for api clients that use their own avatar nami...,if using avatars usernames tell the user that ...,Privacy,legalsum67
1,Minecraft End User Licence Agreement,53cc5d0a09cc3f9e24000071,contentif you make any content available on or...,if you make stuff available on or through the ...,,legalsum32
2,Pokemon GO Terms of Service,5786730a6cca83a54c0035b3,the services and app may contain links to thir...,we might link to other people s websites but w...,Links to Third Party Websites or Resources,legalsum12
3,"Google Play Game Services (May 15th, 2013)",546a72bb98d9d5a17e000410,you shall not permit your api client to submit...,don t allow users to fake scores. no multiplay...,Gameplay Information,legalsum66
4,"Google Play Game Services (May 15th, 2013)",546a72bb98d9d5a17e000414,you agree to comply with the google platform d...,you must also abide by the developer content a...,Developer Content Policies,legalsum62
...,...,...,...,...,...,...
63,Minecraft End User Licence Agreement,53cc5d0a09cc3f9e2400006f,any content you make available on our game mus...,anything you make available on our game must b...,,legalsum34
64,Android SDK License Agreement (June 2014),543ed49a98d9d5a17e000267,you agree that if you use the sdk to develop a...,protect users sensitive data and have an adequ...,4.3,legalsum52
65,YouTube Terms of Service,56f6efd267eca599140045bd,13. assignmentthese terms of service and any r...,the jurisdiction is california.,,legalsum46
66,YouTube Terms of Service,56f6efd267eca599140045c6,content is provided to you as is. you may acce...,you are not allowed to download videos.,,legalsum40


# Instruction Templating

## Activity #1: Creating the create_prompt function

In [20]:
INSTRUCTION_PROMPT_TEMPLATE = """\
YOUR PROMPT HERE"""

RESPONSE_TEMPLATE = """\
YOUR PROMPT HERE"""

In [21]:
def create_prompt(sample, include_response = True):
  """
  Parameters:
    - sample: dict representing row of dataset
    - include_response: bool

  Functionality:
    This function should build the Python str `full_prompt`.

    If `include_response` is true, it should include the summary -
    else it should not contain the summary (useful for prompting) and testing

  Returns:
    - full_prompt: str
  """

  full_prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Please convert the following legal content into a human-readable summary<|eot_id|><|start_header_id|>user<|end_header_id|>

[LEGAL_DOC]
{sample["original_text"]}
[END_LEGAL_DOC]<|eot_id|>"""

  if include_response:
     full_prompt += f"""<|start_header_id|>assistant<|end_header_id|>

{sample["reference_summary"]}<|eot_id|>"""

  return full_prompt

In [22]:
print(create_prompt(legal_dataset["test"][1]))

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Please convert the following legal content into a human-readable summary<|eot_id|><|start_header_id|>user<|end_header_id|>

[LEGAL_DOC]
you agree that you will not remove obscure or alter any proprietary rights notices including copyright and trademark notices that may be affixed to or contained within the sdk.
[END_LEGAL_DOC]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

keep copyright and trademark notices intact.<|eot_id|>


In [23]:
print(create_prompt(legal_dataset["test"][1], include_response=False))

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Please convert the following legal content into a human-readable summary<|eot_id|><|start_header_id|>user<|end_header_id|>

[LEGAL_DOC]
you agree that you will not remove obscure or alter any proprietary rights notices including copyright and trademark notices that may be affixed to or contained within the sdk.
[END_LEGAL_DOC]<|eot_id|>


In [24]:
def generate_response(prompt, model, tokenizer):
  """
  Parameters:
    - prompt: str representing formatted prompt
    - model: model object
    - tokenizer: tokenizer object

  Functionality:
    This will allow our model to generate a response to a prompt!

  Returns:
    - str response of the model
  """

  # convert str input into tokenized input
  encoded_input = tokenizer(prompt,  return_tensors="pt")

  # send the tokenized inputs to our GPU
  model_inputs = encoded_input.to('cuda')

  # generate response and set desired generation parameters
  generated_ids = model.generate(
      **model_inputs,
      max_new_tokens=256,
      do_sample=True,
      pad_token_id=tokenizer.eos_token_id
  )

  # decode output from tokenized output to str output
  decoded_output = tokenizer.batch_decode(generated_ids)

  # return only the generated response (not the prompt) as output
  return decoded_output[0].split("<|end_header_id|>")[-1]

In [25]:
legal_dataset["test"][1]

{'doc': 'Android SDK License Agreement (June 2014)',
 'id': '543ed49a98d9d5a17e000269',
 'original_text': 'you agree that you will not remove obscure or alter any proprietary rights notices including copyright and trademark notices that may be affixed to or contained within the sdk.',
 'reference_summary': 'keep copyright and trademark notices intact.',
 'title': '\u200b3.8',
 'uid': 'legalsum50'}

In [26]:
generate_response(create_prompt(legal_dataset["test"][1], include_response=False),
                  model,
                  tokenizer)

'\n\nHere is a human-readable summary:\n\nWhen using the software development kit (SDK), you agree to not remove or alter any notices that indicate the intellectual property rights, such as copyright and trademark notices, that are included in the SDK.<|eot_id|>'

### Let's try another just to see how the model responds to a different prompt.

In [27]:
legal_dataset["test"][3]

{'doc': 'Android SDK License Agreement (June 2014)',
 'id': '543ed49a98d9d5a17e00025f',
 'original_text': 'to the maximum extent permitted by law you agree to defend indemnify and hold harmless google its affiliates and their respective directors officers employees and agents from and against any and all claims actions suits or proceedings as well as any and all losses liabilities damages costs and expenses including reasonable attorneys fees arising out of or accruing from a your use of the sdk b any application you develop on the sdk that infringes any intellectual property rights of any person or defames any person or violates their rights of publicity or privacy and c any non compliance by you of the license agreement.',
 'reference_summary': 'don t blame google.',
 'title': '\u200b12.1',
 'uid': 'legalsum60'}

In [28]:
generate_response(create_prompt(legal_dataset["test"][3], include_response=False),
                  model,
                  tokenizer)

"\n\nHere's a human-readable summary:\n\nWhen you use the software development kit (SDK) or develop an application using the SDK, you agree to protect Google and its affiliates from any legal issues that may arise from your use. This includes:\n\n* Defending Google and its affiliates in any lawsuits or claims that result from your use of the SDK or your application, if it infringes on someone's intellectual property rights, defames someone, or violates their privacy or publicity rights.\n* Paying for any damages, losses, or expenses, including attorney fees, that Google and its affiliates may incur as a result of your use.\n\nIn other words, you're agreeing to take responsibility for any legal issues that may arise from your use of the SDK, and to protect Google and its affiliates from any resulting losses or damages.<|eot_id|>"

# Required Post Processing

In [29]:
from peft import prepare_model_for_kbit_training
model.config.use_cache = False
model = prepare_model_for_kbit_training(model)

# Task #3: Setting up PEFT LoRA

## Helper Function to Print Parameter %age

In [30]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [31]:
print_trainable_parameters(model)

trainable params: 0 || all params: 4540600320 || trainable%: 0.0


## Initializing LoRA Config

In [32]:
from peft import LoraConfig, get_peft_model

# set our rank (higher value is more memory/better performance)
lora_r = 16

# set our dropout (default value)
lora_dropout = 0.1

# rule of thumb: alpha should be (lora_r * 2)
lora_alpha = 32

# construct our LoraConfig with the above hyperparameters
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    target_modules="all-linear",
    task_type="CAUSAL_LM"
)

In [33]:
model = get_peft_model(
    model,
    peft_config
)

print_trainable_parameters(model)

trainable params: 41943040 || all params: 4582543360 || trainable%: 0.9152786281546499


In [34]:
print(model)

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(128256, 4096)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaSdpaAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (k_proj): lora.Linear4bit(
                (base_layer): Linear4

## ❓Question #4:
What modules (or groupings of layers) did we apply LoRA too - and how can we tell from the model summary?

### Modules with LoRA Applied
1. Self-Attention Mechanism

 - Query Projection (q_proj)
 - Key Projection (k_proj)
 - Value Projection (v_proj)
 - Output Projection (o_proj)

For each of these projections, LoRA has been applied. This is evident from the presence of lora.Linear4bit and the associated LoRA-specific components such as lora_A and lora_B.

2. MLP (Feed-Forward Network)

 - Gate Projection (gate_proj)
 - Up Projection (up_proj)
 - Down Projection (down_proj)

Similar to the self-attention mechanism, LoRA has been applied to these linear layers in the MLP. This is again indicated by lora.Linear4bit and the associated LoRA-specific components.

# Task #4: Training the Model

## Setting up Training

In [35]:
from trl import SFTConfig

max_seq_length = 1024

args = SFTConfig(
  output_dir = "llama38binstruct_summarize",
  #num_train_epochs=5,
  max_steps = 100, # comment out this line if you want to train in epochs
  per_device_train_batch_size = 1,
  warmup_steps = 3,
  logging_steps=10,
  #evaluation_strategy="epoch",
  eval_strategy="steps",
  eval_steps=25, # comment out this line if you want to evaluate at the end of each epoch
  learning_rate=2e-4,
  lr_scheduler_type='constant',
  dataset_kwargs={
        "add_special_tokens" : False,
        "append_concat_token" : False,
  },
  max_seq_length=max_seq_length,
  packing=True,
)

## ❓Question #5:
Describe what the following parameters are doing:

1. warmup_steps

  The **warmup_steps** parameter controls the number of training steps over which the learning rate will gradually increase from 0 to the initial learning rate specified by the learning_rate parameter. This is done to stabilize training and prevent large updates at the beginning, which can cause instability.

2. learning_rate

  The **learning_rate** parameter specifies the initial step size for updating the model weights. It determines how quickly or slowly a model learns. A smaller learning rate might lead to a slower but more stable convergence, whereas a larger learning rate might speed up training but risk overshooting the optimal values.

  In the given configuration:

  **learning_rate=2e-4**: The learning rate is set to 0.0002, which is a common choice for fine-tuning large language models, balancing between stability and convergence speed.

3. lr_scheduler_type

  The **lr_scheduler_type** parameter specifies the type of learning rate scheduler to be used during training. The scheduler adjusts the learning rate at specified intervals or according to a specific schedule, which can help improve training stability and performance.

  In the given configuration:

  **lr_scheduler_type='constant'**: This specifies that the learning rate will remain constant throughout the training process after the warmup period. This means that after the initial warmup phase, the learning rate will not change, maintaining the value specified by the learning_rate parameter.



In [36]:
from trl import SFTTrainer

trainer = SFTTrainer(
  model=model,
  peft_config=peft_config,
  tokenizer=tokenizer,
  formatting_func=create_prompt,
  args=args,
  train_dataset=legal_dataset["train"],
  eval_dataset=legal_dataset["validation"]
)



Generating train split: 0 examples [00:00, ? examples/s]

Generating train split: 0 examples [00:00, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


In [37]:
trainer.train()



Step,Training Loss,Validation Loss
25,1.465,1.687563
50,0.527,1.898018
75,0.2272,2.12255
100,0.0977,2.173345


TrainOutput(global_step=100, training_loss=0.687916431427002, metrics={'train_runtime': 540.1373, 'train_samples_per_second': 0.185, 'train_steps_per_second': 0.185, 'total_flos': 4636795522252800.0, 'train_loss': 0.687916431427002, 'epoch': 5.0})

# Task #5: Share Your Model!

In [None]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
trainer.push_to_hub("ai-maker-space/llama38binstruct-summary-100s")

Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

events.out.tfevents.1719762178.b29df963492d.1226.0:   0%|          | 0.00/9.26k [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.43k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/syedwaseemjan/llama38binstruct_summarize/commit/a062cb60f450f15f5d121ae6fad16e372b6ee394', commit_message='ai-maker-space/llama38binstruct-summary-100s', commit_description='', oid='a062cb60f450f15f5d121ae6fad16e372b6ee394', pr_url=None, pr_revision=None, pr_num=None)

# Compare Outputs

In [38]:
merged_model = model.merge_and_unload()



## ❓Question #6:
What does the merge_and_unload() method do?

**merge_and_unload()** method is used to combine the learned task-specific adaptations into the original model weights and then remove the adaptation-specific components to produce a streamlined and efficient model.



In [39]:
generate_response(create_prompt(legal_dataset["test"][1], include_response=False),
                  merged_model,
                  tokenizer)

'\n\nHere is a human-readable summary of the legal content:\n\n"When using the SDK, you agree not to remove or alter any copyright or trademark notices that are included with the software.<|eot_id|>'

In [40]:
generate_response(create_prompt(legal_dataset["test"][3], include_response=False),
                  merged_model,
                  tokenizer)

"\n\nHere's a human-readable summary:\n\nBy using the software development kit (SDK), you agree to protect Google and its affiliates from any legal claims, lawsuits, or proceedings that might arise from your use of the SDK or any app you develop using it. This includes:\n\n* If your app infringes on someone else's intellectual property rights, defames someone, or violates their privacy or publicity rights.\n* If you don't comply with the terms of the license agreement.\n\nYou'll be responsible for any losses, damages, or expenses, including legal fees, that arise from these situations.<|eot_id|>"