## Mounting Your Drive

Mount your Google Drive so that the downloaded model and output GGUF file are stored persistently.

In [None]:
from google.colab import drive
import os

# Mount Google Drive
drive.mount('/content/drive')

# Create a directory for the LLM models if it doesn't exist
os.makedirs('/content/drive/My Drive/llm', exist_ok=True)

# Change working directory to that folder
os.chdir('/content/drive/My Drive/llm')
print('Current working directory:', os.getcwd())

## Introduction

This notebook converts the Huginn‑0125 model to a GGUF file for Ollama using a two-step q8 conversion process. It loads the model’s F32 weights, casts them to F16, and then quantizes them to int8 using a computed scale factor. It also writes a header containing metadata (such as gguf_version, model_type, n_vocab, n_embd, and n_layer) that Ollama expects. (Note: Make sure the Huginn‑0125 repository includes its configuration file so that the correct hyperparameters—like n_layer=32—are used.)

## Installing Dependencies

We install NumPy, the Hugging Face Hub, and safetensors to handle the model weights and configuration.

In [None]:
!pip install numpy huggingface_hub safetensors

## Downloading the Model

We download the Huginn‑0125 model from its HF repository. The model is cached in the current directory. If the repository includes a config file (config.json or params.json), the conversion will use its hyperparameters.

In [None]:
import os
from huggingface_hub import snapshot_download

# Set the HF repository ID for Huginn-0125
model_repo = "tomg-group-umd/huginn-0125"
cache_dir = os.getcwd()

# Construct an expected folder name (replace '/' with '-')
expected_model_dir = os.path.join(cache_dir, model_repo.replace('/', '-'))

if os.path.exists(expected_model_dir):
    print(f"Model already downloaded at: {expected_model_dir}")
    model_path = expected_model_dir
else:
    print(f"Downloading model {model_repo}...")
    model_path = snapshot_download(repo_id=model_repo, cache_dir=cache_dir)
    print(f"Model downloaded to: {model_path}")

print('Final model path:', model_path)

## Loading and Converting the Model

We load the model’s tensors from safetensors and its hyperparameters from the configuration file. Then, for q8 quantization, each tensor is first cast to F16 and then quantized to int8 using a computed scale factor. Finally, we write a GGUF file that begins with a header containing all required metadata.

In [None]:
import os
import json
import numpy as np
from safetensors import safe_open

########################################
# Functions to load model and hyperparameters
########################################

def load_hf_model(model_dir):
    """
    Recursively scan the model directory for .safetensors files and load all tensors.
    Returns a dictionary mapping tensor names to NumPy arrays.
    """
    model = {}
    for root, dirs, files in os.walk(model_dir):
        for file in files:
            if file.endswith(".safetensors"):
                file_path = os.path.join(root, file)
                print(f"Loading tensors from {file_path}")
                with safe_open(file_path, framework="np") as f:
                    for key in f.keys():
                        if key in model:
                            print(f"Warning: key {key} already exists. Overwriting.")
                        model[key] = f.get_tensor(key)
    return model

def load_hf_hparams(model_dir):
    """
    Load hyperparameters from a config file in the model directory.
    Tries config.json first, then params.json.
    """
    for fname in ["config.json", "params.json"]:
        config_path = os.path.join(model_dir, fname)
        if os.path.exists(config_path):
            print(f"Loading hyperparameters from {config_path}")
            with open(config_path, "r") as f:
                return json.load(f)
    raise ValueError("No config.json or params.json found in the model directory.")

########################################
# GGUF Writer with header metadata
########################################

class GGUFWriter:
    def __init__(self, outfile, hparams, outtype):
        self.outfile = outfile
        self.hparams = hparams
        self.outtype = outtype  # e.g., "q8", "f16", or "f32"
        self.tensors = []

    def add_tensor(self, name, tensor, scale=None):
        self.tensors.append((name, tensor, scale))
        if scale is not None:
            print(f"Added tensor: {name}, shape: {tensor.shape}, dtype: {tensor.dtype}, scale: {scale}")
        else:
            print(f"Added tensor: {name}, shape: {tensor.shape}, dtype: {tensor.dtype}")

    def finalize(self):
        with open(self.outfile, "wb") as f:
            # Write header with required metadata
            header = "gguf_version: 1\n"
            header += f"model_type: {self.hparams.get('model_type', 'llama')}\n"
            header += f"n_vocab: {self.hparams.get('vocab_size', 32000)}\n"
            header += f"n_embd: {self.hparams.get('n_embd', 4096)}\n"
            header += f"n_layer: {self.hparams.get('n_layer', 32)}\n"
            header += f"outtype: {self.outtype}\n"
            f.write(header.encode('utf-8'))
            f.write(b"--TENSORS--\n")

            # Write each tensor's metadata and raw data
            for name, tensor, scale in self.tensors:
                meta = f"{name} | shape: {tensor.shape} | dtype: {tensor.dtype}" 
                if scale is not None:
                    meta += f" | scale: {scale}" 
                meta += "\n"
                f.write(meta.encode('utf-8'))
                f.write(tensor.tobytes())
                f.write(b"\n")

        print(f"Finalized GGUF file at {self.outfile}")

########################################
# Quantization from F16 to q8
########################################

def quantize_from_f16(tensor_f16):
    # Compute scale factor: maximum absolute value divided by 127
    scale = np.max(np.abs(tensor_f16)) / 127.0
    if scale == 0:
        scale = 1.0
    quantized = np.round(tensor_f16 / scale).astype(np.int8)
    return quantized, scale

########################################
# Conversion function for different output types
########################################

def convert_tensor(tensor, outtype):
    if outtype == "q8":
        # First cast to F16, then quantize from F16 to q8
        tensor_f16 = tensor.astype(np.float16)
        quantized, scale = quantize_from_f16(tensor_f16)
        return quantized, scale
    elif outtype == "f16":
        return tensor.astype(np.float16), None
    elif outtype == "f32":
        return tensor.astype(np.float32), None
    else:
        return tensor, None

def convert_model_to_gguf(model, hparams, outfile, outtype):
    writer = GGUFWriter(outfile, hparams, outtype)
    for name, tensor in model.items():
        converted, scale = convert_tensor(tensor, outtype)
        writer.add_tensor(name, converted, scale=scale)
    writer.finalize()
    print(f"GGUF conversion complete: {outfile}")

########################################
# End conversion functions
########################################

# Load the HF model from the downloaded directory
print("Loading HF model from:", model_path)
real_model = load_hf_model(model_path)
print(f"Loaded {len(real_model)} tensors from the model.")

# Load hyperparameters from config.json or params.json
real_hparams = load_hf_hparams(model_path)
print("Hyperparameters loaded.")

# Define output GGUF file and conversion type
output_filename = "output_model.gguf"
output_type = "q8"

# Convert the model to GGUF using two-step quantization from F16
convert_model_to_gguf(real_model, real_hparams, output_filename, outtype=output_type)

## Verifying the Output

Finally, list the contents of the working directory to verify that the GGUF file was created successfully.

In [None]:
!ls -lh