# Merging and Converting HF model to GGUF Format

In this notebook, we'll be seeing how one can load their model from hugging face, merge that model with the base model and then convert it into GGUF format that can be directly used in Ollama.

For this notebook, we'll be loading the [inclinedadarsh/gemma-3-1b-nl-to-regex](https://huggingface.co/inclinedadarsh/gemma-3-1b-nl-to-regex) model, merge it and then finally convert it to GGUF format using [llama.cpp](https://github.com/ggml-org/llama.cpp/).

> Make sure to change the runtime type to **T4 GPU**

<a target="_blank" href="https://colab.research.google.com/github/inclinedadarsh/gemma-finetune-ui/blob/main/notebooks/merging_model.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [None]:
%pip install -U peft transformers

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel, PeftConfig
import torch

In [None]:
from google.colab import userdata
HF_TOKEN = userdata.get('HF_TOKEN')

## Loading and merging the model

In [None]:
model_name = "inclinedadarsh/gemma-3-1b-nl-to-regex"

In [None]:
peft_config = PeftConfig.from_pretrained(model_name)

In [None]:
base_model_name = peft_config.base_model_name_or_path

In [None]:
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    device_map='auto',
    torch_dtype=torch.float16,
    attn_implementation='eager'
)

In [None]:
model = PeftModel.from_pretrained(base_model, model_name)

In [None]:
# Let's finally merge and unload the model.

merged_model = model.merge_and_unload()

In [None]:
# We're going to save the model and the tokenizer in the `merged_model` directory.

merged_model.save_pretrained('./merged_model')
tokenizer.save_pretrained("./merged_model")

## Converting model to GGUF Format (for Ollama)

In [None]:
!git clone https://github.com/ggerganov/llama.cpp.git

In [None]:
# I'll be using the parent directory /content because of the file structure of Google. You might want to change it if you're using Kaggle or doing this locally.

%cd /content/llama.cpp/
!python convert_hf_to_gguf.py /content/merged_model --outfile /content/merged_model/merged_model.gguf

### Downloading the model for Ollama

In [None]:
from google.colab import files
files.download('/content/merged_model/merged_model.gguf')