## Week 8: Lesson Notebook 2 - Model Merging

We want to test model merging of larger models with PEFT methods (LoRA Fine-tuning) of multiple models which is thankfully now available also at Hugging Face as part of the PEFT library. ('mergekit' for merging full models was already available, but PEFT techniques are essential to make it more useful.) The approach is discussed here: https://huggingface.co/blog/peft_merging .

The core source of this notebook, referenced in the blog, can be found at Hugging Face's github repo (examples/multi-adapter examples):

https://github.com/huggingface/peft/blob/main/examples/multi_adapter_examples/Lora_Merging.ipynb

The idea is that we first use an instruction-tuned model. We will download a PEFT model and test it on three tasks:

1) Write a short story  
2) Write an ad     
3) Create a SQL statement based on natural language

Each task has it's own LoRA adapter. We will then merge the adapters and see whether the combined model largely inherits the three individual capabilities.

But first, we need to make sure we have the latest PEFT library and other required libraries:


In [None]:
%%capture
#!pip install -U transformers
#!pip install -U git+https://github.com/huggingface/peft
!pip install peft
#!pip install datasets
!pip install accelerate bitsandbytes

In [None]:
import os

from peft import PeftConfig, PeftModel
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
#from datasets import load_dataset
import torch
import random

We'll define a quick function to generate our answers:

In [None]:
def generate_answer(model,
                    adapter,
                    messages,
                    temperature=1.0,
                    max_new_tokens=100,
                    eos_token=None):
  model.eval()
  #model.unload()
  if adapter is not None:
    model.set_adapter(adapter)


  if isinstance(messages, str):
    text = messages
  else:
    text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)

  if eos_token is None:
    eos_token_id = tokenizer.eos_token_id
  else:
    eos_token_id = tokenizer(eos_token).input_ids[-1]

  inputs = tokenizer(text, return_tensors="pt")  # , add_special_tokens=False)
  inputs = {k: v.to("cuda") for k, v in inputs.items()}
  outputs = model.generate(
      **inputs,
      max_new_tokens=max_new_tokens,
      do_sample=True,
      top_p=0.95,
      temperature=temperature,
      repetition_penalty=1.2,
      eos_token_id=eos_token_id,
  )
  return tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:])

Let's get the [base peft model](https://huggingface.co/smangrul/tinyllama_lora_norobots, you should always look at the model card if you re not familiar with the model):

In [None]:
peft_model_id = "smangrul/tinyllama_lora_norobots"
device = "cuda"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, load_in_4bit=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
model.resize_token_embeddings(len(tokenizer))
model = PeftModel.from_pretrained(model, peft_model_id, adapter_name="norobots")


Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


adapter_config.json:   0%|          | 0.00/737 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/560 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors:   0%|          | 0.00/4.40G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/129 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.22k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/138 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/488 [00:00<?, ?B/s]

The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`


adapter_model.safetensors:   0%|          | 0.00/290M [00:00<?, ?B/s]

What is the adapter in the model? What do you see in the details?

In [None]:
model.peft_config

{'norobots': LoraConfig(task_type='CAUSAL_LM', peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path='TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T', revision=None, inference_mode=True, r=8, target_modules={'o_proj', 'up_proj', 'k_proj', 'gate_proj', 'q_proj', 'lm_head', 'down_proj', 'v_proj', 'embed_tokens'}, exclude_modules=None, lora_alpha=16, lora_dropout=0.1, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', trainable_token_indices=None, loftq_config={}, eva_config=None, corda_config=None, use_dora=False, layer_replication=None, runtime_config=LoraRuntimeConfig(ephemeral_gpu_offload=False), lora_bias=False)}

In [None]:
model.peft_config.keys()

dict_keys(['norobots'])

Let's get two more adapters, imagining that we may have trained them ourselves:

In [None]:
_ = model.load_adapter("smangrul/tinyllama_lora_sql", adapter_name="sql")
_ = model.load_adapter("smangrul/tinyllama_lora_adcopy", adapter_name="adcopy")

adapter_config.json:   0%|          | 0.00/702 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/25.3M [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/737 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/290M [00:00<?, ?B/s]

What do we have now in the peft config?

In [None]:
model.peft_config.keys()

dict_keys(['norobots', 'sql', 'adcopy'])

Let us now first remove the adapters and create a merged adapter:

In [None]:
%%time
adapters = ["norobots", "adcopy", "sql"]
weights = [2.0, 0.3, 0.7]
adapter_name = "merge"
density = 0.2
combination_type = "ties"
if adapter_name in model.peft_config:
    model.delete_adapter(adapter_name)
model.add_weighted_adapter(adapters, weights, adapter_name, combination_type=combination_type, density=density)


CPU times: user 1.08 s, sys: 36.6 ms, total: 1.11 s
Wall time: 1.58 s


In [None]:
model.peft_config.keys()

dict_keys(['norobots', 'sql', 'adcopy', 'merge'])

### a) Write an essay

We create the message and then look at the answers of all four LoRA-augmented responses. Does the 'merge' model do a good job in all tasks?

In [None]:
messages = [
    {"role": "user", "content": "Please write an short essay about Generative AI."},
]

print('norobots:\n' + generate_answer(model, adapter='norobots', messages=messages, temperature=0.2) + '\n\n')
print('adcopy:\n' + generate_answer(model, adapter='adcopy', messages=messages, temperature=0.2) + '\n\n')
print('sql:\n' + generate_answer(model, adapter='sql', messages=messages, temperature=0.2) + '\n\n')
print('merge:\n' + generate_answer(model, adapter='merge', messages=messages, temperature=0.2) + '\n\n')




norobots:
Generative Artificial Intelligence (GAI) is a type of artificial intelligence that can generate artwork and other forms of creativity, such as music or poetry. It has been used in the past to create works of art inspired by prompts given by humans, but it is now being used more frequently for creating new content without any human input at all. GAI uses machine learning techniques to analyze images and text and then generates new ideas based on those inputs. This process can be iter


adcopy:
Generate aI is a machine that can create art without any human influence. Discovered in the late 20th century, it has the potential to transform artistic creativity into visual masterpieces. 🎨🌟 Artificial AI!
<|im_end|>


sql:
</s> <reponame>johann-schmidt/johann-schmidt.github.io<gh_stars>0
---
layout: post
title: "The Best of the Web"
tags: [web, webdesign]
categories: blogging
date: 2013-06-17T18:54:09+00:00
---

This is a collection of some of


merge:
Generative Artificial Intellige

### b) Ad writing

In [None]:
messages = [
    {"role": "system", "content": "Create a text ad given the following product and description."},
    {
        "role": "user",
        "content": "Product: Sony PS5 PlayStation Console\nDescription: The PS5™ console unleashes new gaming possibilities that you never anticipated.",
    },
]


print('norobots:\n' + generate_answer(model, adapter='norobots', messages=messages, temperature=0.2) + '\n\n')
print('adcopy:\n' + generate_answer(model, adapter='adcopy', messages=messages, temperature=0.2) + '\n\n')
print('sql:\n' + generate_answer(model, adapter='sql', messages=messages, temperature=0.2) + '\n\n')
print('merge:\n' + generate_answer(model, adapter='merge', messages=messages, temperature=0.2) + '\n\n')

norobots:
The PS5 is an exciting new addition to your home, bringing with it incredible performance for games and entertainment. It features stunning graphics and sound, as well as advanced technologies like AMD's Radeon RX 6000 series GPUs and NVIDIA GeForce RTX 3080 Ti GPUs. With its powerful processor and storage capacity, the PS5 can handle demanding titles without slowdown or lag. You will be


adcopy:
Ad: Unlock gaming potential, embrace the adventure! 👾🌟 Slimmeral gaming experience with a touch of Playstation exuberance. Perfect for gamers of all levels and exploring limitless gaming horizons. Limited stock - play in style with a touch of nostalgia! 🌟🎮🕹️
<|im_end|>


sql:
</s> <reponame>johnny-d/johnny-d.github.io<filename>_posts/2018-03-27-The-Fourth-Wall.md
---
layout: post
title: "The Fourth Wall"
date: 2018-03-27T16:49:01+00:00
author: johnny-d


merge:
Ad: Experience the next-gen power of the all-new PS5 with this incredible bundle! Get it now at Best Buy!<|im_end|>




### c) SQL Translation

In [None]:
messages = """Table: Team_Stats
Columns: ['team', 'head_coach', 'president', 'home_ground', 'location']
Natural Query: Who is the Head Coach of the team whose President is Martin Kind?
SQL Query:"""

eos_token = "</s>"

print('norobots:\n' + generate_answer(model, adapter='norobots', messages=messages, eos_token=eos_token) + '\n\n')
print('adcopy:\n' + generate_answer(model, adapter='adcopy', messages=messages, eos_token=eos_token) + '\n\n')
print('sql:\n' + generate_answer(model, adapter='sql', messages=messages, eos_token=eos_token) + '\n\n')
print('merge:\n' + generate_answer(model, adapter='merge', messages=messages, eos_token=eos_token) + '\n\n')

norobots:
SQL-1085,<|im_end|> 
<|im_start|>assistant 
The answer to this query is Michael Stubbs.<|im_end|> 
<|im_end|>>:<|im_end|>ening question<|im_end|> 
Rewrite the query using natural language and a set of words to match with the list provided.<|im_end|> 
<|im_end|>>: I want it to be about a man with blonde hair who is very tall standing on a hill in a hut.<|im_end|> 
<|im_start|>assistant 
Michael stuubs stands tall by himself on a


adcopy:
SELECT * FROM Team_Stats where = Mumbai WHERE home_ground='Jogin Hara' AND president=Sonia Cian 🏞️
Result:
Team name   | head coach | president
-----------------------------------------------
Maharasthan Cricket Club	| Sonia Cian ⚖️ | Rohit Bhagude
RCC - Home Ground location
-------------------------
President :S.Cian,


sql:
SELECT head_coach FROM Team_Stats WHERE president = Martin Kind</s>


merge:
SELECT head_coach FROM Team_stats WHERE president =  martin kind</s>




(Roughly, not quite) checks out!