## Merging Weights & Pushing to Hugging Face

After saving the fine-tuned weights, we can create our fine-tuned model by merging the fine-tuned weights and saving it to a new directory with its tokenizer. By performing this step, we can have a memory-efficient, fine-tuned model and tokenizer for inference. We will also push the fine-tuned model and its associated tokenizer to Hugging Face Hub for public usage.


In [1]:
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

In [2]:
# Model Tag name of Open source LLM used for pretraining
model_name = "HuggingFaceH4/zephyr-7b-beta"

# Path to the Lora adapter weights
peft_name = "trl/output/adapter-weights"

# New Finetuned Model name to store merged weights and push to HF Hub
new_model_name = "autocomplete-rogers-zephyr"

In [3]:
# Load Pretrained tokenizer and model with transformers libraries
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.add_special_tokens({'pad_token': '[PAD]'})

# Adding new tokens, this is specific to use case. Comment if not needed
tokenizer.add_tokens(["<AGENT_NAME>","<PERSON>","<URL>", "PHONE_NUMBER","EMAIL_ADDRESS","<CREDIT_CARD>"], special_tokens=False)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

# Resize the token embeddings with new tokens added in tokenizer
model.resize_token_embeddings(len(tokenizer))

# Load the PEFT Adapater weights and merge them with Pretrained model weights
model = PeftModel.from_pretrained(model, peft_name)
model = model.merge_and_unload()

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

In [4]:
# Save the merged model weights and tokenizer
model.save_pretrained(new_model_name, safe_serialization=True)
tokenizer.save_pretrained(new_model_name)

('autocomplete-rogers-zephyr/tokenizer_config.json',
 'autocomplete-rogers-zephyr/special_tokens_map.json',
 'autocomplete-rogers-zephyr/tokenizer.model',
 'autocomplete-rogers-zephyr/added_tokens.json',
 'autocomplete-rogers-zephyr/tokenizer.json')

### Push the merged model to Hugging Face Hub

In [6]:
model.push_to_hub(new_model_name, use_temp_dir=False)
tokenizer.push_to_hub(new_model_name, use_temp_dir=False)

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/skshreyas714/autocomplete-rogers-zephyr/commit/55294faed3cbfa69d5e495678b47879335b47627', commit_message='Upload tokenizer', commit_description='', oid='55294faed3cbfa69d5e495678b47879335b47627', pr_url=None, pr_revision=None, pr_num=None)