<a href="https://colab.research.google.com/github/rhodanankabirwa/Data-Visualization/blob/main/blender_usage.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LLM-Blender Usage examples

Requirements to run the following jupyter examples: (The default requirements supports inference with only minimal dependencies)
```bash
pip install -e .[example]
```

## Loading blender (quick start)
You can find more custom configurations in
- PairRanker: [./llm_blender/pair_ranker/config.py](./llm_blender/pair_ranker/config.py)
- GenFuser: [./llm_blender/gen_fuser/config.py](./llm_blender/gen_fuser/config.py)
- Blender: [./llm_blender/blender/config.py](./llm_blender/blender/config.py)

In [1]:
import psutil

ram_gb = psutil.virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')

Your runtime has 54.8 gigabytes of available RAM

You are using a high-RAM runtime!


In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
# Step 2: Set paths for models
ranker_path_drive = '/content/drive/MyDrive/models--llm-blender--PairRM'
ranker_path_cache = '/root/.cache/huggingface/hub/models--llm-blender--PairRM'

In [4]:
# Step 3: Copy model folder from Drive to cache if it exists
import os
if os.path.exists(ranker_path_drive):
    print("Copying ranker model from Drive to cache...")
    !cp -r "$ranker_path_drive" "/root/.cache/huggingface/hub/"
else:
    print("Ranker model not found in Drive at:", ranker_path_drive)

Copying ranker model from Drive to cache...


In [5]:
# Step 4: Import and initialize Blender, then load the ranker model
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

import llm_blender
blender = llm_blender.Blender()

# NOTE: Loading the ranker depends on how the library expects it
# For example, if a function load_ranker() exists, use it here.
# If not, you might have to check the repo/docs for the exact loading method.
# Here's a placeholder if the method exists:

try:
    blender.load_ranker("llm-blender/PairRM")  # adjust as per actual API
    print("Ranker loaded successfully!")
except AttributeError:
    print("Warning: Blender object has no method 'load_ranker'. You might need to load the ranker differently.")





In [6]:
from llm_blender.blender.ranker import RankerConfig
from llm_blender.blender.blender_utils import load_ranker, load_fuser

# Define where your models are cached locally
ranker_cache_dir = "/root/.cache/huggingface/hub/models--llm-blender--PairRM"
fuser_cache_dir = "/root/.cache/huggingface/hub/models--llm-blender--gen_fuser_3b"  # if you have fuser

# Create RankerConfig (make sure model_name and cache_dir are correct)
ranker_config = RankerConfig(
    model_name="llm-blender/PairRM",  # the model id or local path
    cache_dir=ranker_cache_dir,
    # add other required fields if any (check RankerConfig definition)
)

# Load ranker, this returns a Blender instance
blender = load_ranker(ranker_config)

# If you want to load fuser as well (optional)
try:
    fuser_config = ... # create similarly if needed
    blender = load_fuser(blender, fuser_config)
except:
    print("No fuser loaded or fuser config missing.")

# Now you can use blender.rank() or other methods


ModuleNotFoundError: No module named 'llm_blender.blender.ranker'

In [38]:
import os

# Path to the model folder on your Google Drive
drive_model_path = '/content/drive/MyDrive/models--llm-blender--PairRM'

# Path to the Hugging Face cache folder on Colab
cache_path = '/root/.cache/huggingface/hub/'

# Check if the model exists on Drive, then copy it to the cache folder
if os.path.exists(drive_model_path):
    !cp -r "$drive_model_path" "$cache_path"
    print("Model copied from Drive to cache folder successfully.")
else:
    print("Model folder not found on Drive.")

Model copied from Drive to cache folder successfully.


In [34]:
!git clone https://github.com/yuchenlin/LLM-Blender.git

Cloning into 'LLM-Blender'...
remote: Enumerating objects: 853, done.[K
remote: Counting objects: 100% (188/188), done.[K
remote: Compressing objects: 100% (37/37), done.[K
remote: Total 853 (delta 166), reused 152 (delta 151), pack-reused 665 (from 1)[K
Receiving objects: 100% (853/853), 76.42 MiB | 40.71 MiB/s, done.
Resolving deltas: 100% (499/499), done.


In [2]:
!pip install git+https://github.com/yuchenlin/LLM-Blender.git

Collecting git+https://github.com/yuchenlin/LLM-Blender.git
  Cloning https://github.com/yuchenlin/LLM-Blender.git to /tmp/pip-req-build-6ii_hzo5
  Running command git clone --filter=blob:none --quiet https://github.com/yuchenlin/LLM-Blender.git /tmp/pip-req-build-6ii_hzo5
  Resolved https://github.com/yuchenlin/LLM-Blender.git to commit 33204d2712944b6b17996f7c079e74cd963ccc7c
  Running command git submodule update --init --recursive -q
  Preparing metadata (setup.py) ... [?25l[?25hdone


In [4]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [5]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
import llm_blender
blender = llm_blender.Blender()
ranker_path_drive = '/content/drive/MyDrive/models--llm-blender--PairRM'
ranker_path_cache = '/root/.cache/huggingface/hub/models--llm-blender--PairRM'
if os.path.exists(ranker_path_drive):
    !cp -r "$ranker_path_drive" "/root/.cache/huggingface/hub/"

#blender.load_ranker("llm-blender/PairRM")




In [37]:
from llm_blender.blender.blender_utils import load_ranker

# This returns a Blender object with the ranker loaded
blender = load_ranker("llm-blender/PairRM")


AttributeError: 'str' object has no attribute 'model_name'

In [36]:
from llm_blender.blender.ranker import RankerConfig
from llm_blender.blender.blender_utils import load_ranker

ranker_config = RankerConfig(
    model_name="llm-blender/PairRM",
    cache_dir="/root/.cache/huggingface/hub/models--llm-blender--PairRM",
    ranker_type="pairranker",
    source_maxlength=512,
    candidate_maxlength=128,
)

blender = load_ranker(ranker_config)

ModuleNotFoundError: No module named 'llm_blender.blender.ranker'

In [29]:
!apt-get install ripgrep

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  ripgrep
0 upgraded, 1 newly installed, 0 to remove and 34 not upgraded.
Need to get 1,300 kB of archives.
After this operation, 4,247 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 ripgrep amd64 13.0.0-2ubuntu0.1 [1,300 kB]
Fetched 1,300 kB in 0s (10.6 MB/s)
Selecting previously unselected package ripgrep.
(Reading database ... 126102 files and directories currently installed.)
Preparing to unpack .../ripgrep_13.0.0-2ubuntu0.1_amd64.deb ...
Unpacking ripgrep (13.0.0-2ubuntu0.1) ...
Setting up ripgrep (13.0.0-2ubuntu0.1) ...
Processing triggers for man-db (2.10.2-1) ...


In [31]:
!rg --files-with-matches RankerConfig llm_blender/

llm_blender/: No such file or directory (os error 2)


In [16]:
# Restore fuser
#fuser_path_drive = '/content/drive/MyDrive/models--llm-blender--gen_fuser_3b'
#fuser_path_cache = '/root/.cache/huggingface/hub/models--llm-blender--gen_fuser_3b'
#if os.path.exists(fuser_path_drive):
    #!cp -r "$fuser_path_drive" "/root/.cache/huggingface/hub/"

In [None]:
# Restore fuser
fuser_path_drive = '/content/drive/MyDrive/models--llm-blender--gen_fuser_3b'
fuser_path_cache = '/root/.cache/huggingface/hub/models--llm-blender--gen_fuser_3b'
if os.path.exists(fuser_path_drive):
    !cp -r "$fuser_path_drive" "/root/.cache/huggingface/hub/"

# Load the fuser from cache
blender.load_fuser("llm-blender/gen_fuser_3b")

In [6]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
import llm_blender
blender = llm_blender.Blender()
# Load Ranker
blender.loadranker("llm-blender/PairRM") # load ranker checkpoint
# blender.loadranker("OpenAssistant/reward-model-deberta-v3-large-v2") # load ranker checkpoint
# Load Fuser
blender.loadfuser("llm-blender/gen_fuser_3b") # load fuser checkpoint if you want to use pre-trained fuser; or you can use ranker only

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Successfully loaded ranker from  /root/.cache/huggingface/hub/llm-blender/PairRM


In [11]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [15]:
!cp -r /root/.cache/huggingface/hub/models--llm-blender--gen_fuser_3b /content/drive/MyDrive/ # save gen-fuser model to google drive

In [14]:
!cp -r /root/.cache/huggingface/hub/models--llm-blender--PairRM /content/drive/MyDrive/ # save pair ranker to google drive
print("✅ Ranker model saved to Google Drive.")

✅ Ranker model saved to Google Drive.


In [7]:
!pip install datasets pandas



In [9]:
!pip install -U datasets fsspec

Collecting datasets
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting fsspec
  Downloading fsspec-2025.5.0-py3-none-any.whl.metadata (11 kB)
  Downloading fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.6.0-py3-none-any.whl (491 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m491.5/491.5 kB[0m [31m23.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2025.3.0-py3-none-any.whl (193 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m193.6/193.6 kB[0m [31m18.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: fsspec, datasets
  Attempting uninstall: fsspec
    Found existing installation: fsspec 2025.3.2
    Uninstalling fsspec-2025.3.2:
      Successfully uninstalled fsspec-2025.3.2
  Attempting uninstall: datasets
    Found existing installation: datasets 2.14.4
    Uninstalling datasets-2.14.4:
      Successfully uninstalled datasets-2.14.4
[31mERROR: pip's dependency resolver 

In [17]:
!pip install --upgrade datasets fsspec

Collecting fsspec
  Using cached fsspec-2025.5.0-py3-none-any.whl.metadata (11 kB)


In [19]:
from datasets import load_dataset

dataset = load_dataset("llm-blender/mix-instruct", split="test")
dataset

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/15.1k [00:00<?, ?B/s]

train_data_prepared.jsonl:   0%|          | 0.00/1.38G [00:00<?, ?B/s]

val_data_prepared.jsonl:   0%|          | 0.00/68.7M [00:00<?, ?B/s]

test_data_prepared.jsonl:   0%|          | 0.00/86.3M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/100000 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/5000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/5000 [00:00<?, ? examples/s]

Dataset({
    features: ['id', 'instruction', 'input', 'output', 'candidates', 'cmp_results'],
    num_rows: 5000
})

## Load Mixinstruct dataset for the following examples showing

In [7]:
import datasets
import json
from llm_blender.gpt_eval.cor_eval import COR_MAPS
from llm_blender.gpt_eval.utils import get_ranks_from_chatgpt_cmps
mixinstruct_test = datasets.load_dataset("llm-blender/mix-instruct", split="test", streaming=True)
few_examples = list(mixinstruct_test.take(8))
# remove cmp_results with none cmp results
for ex in few_examples:
    ex['cmp_results'] = json.loads(ex['cmp_results'])
few_examples = [x for x in few_examples if x['cmp_results']]
insts = [x['instruction'] for x in few_examples]
inputs = [x['input'] for x in few_examples]
candidates_texts = [[cand['text'] for cand in x['candidates']] for x in few_examples]
print("Example:")
print("Instruction 1:\n", insts[0])
print("Input 1:\n", inputs[0])
print("Candidate 1 for input 1:\n")
print(candidates_texts[0][0])

Example:
Instruction 1:
 
Input 1:
 I've always wondered what the difference is between a skeptic and a denier.
Candidate 1 for input 1:

A skeptic is someone who doubts or expresses doubt about a claim or idea without being dismissive of it. They are open-minded and approach evidence with an open mind, searching for reasonable explanations and evidence to support their beliefs.

A denier, on the other hand, is someone who actively works to deny or ignore evidence that contradicts their beliefs. They are often characterized by a closed mind and an unwillingness to consider alternative perspectives. They may also use rhetoric or false claims to try to discredit the evidence.


## Use case 1: Using LLM-Blender for ranking
By the rank function, LLM-Blender could ranks the candidates through pairwise comparisons and return the ranks. We show our ranker's ranks are highly correlated with ChatGPT ranks.

In [8]:
ranks = blender.rank(inputs, candidates_texts, instructions=insts, return_scores=False, batch_size=2)

Ranking candidates: 100%|██████████| 4/4 [00:46<00:00, 11.72s/it]


In [None]:
print("Ranks for input 1:", ranks[0]) # ranks of candidates for input 1
# Ranks for input 1: [ 1 11  4  9 12  5  2  8  6  3 10  7]

In [9]:
import numpy as np
llm_ranks_map, gpt_cmp_results = get_ranks_from_chatgpt_cmps(few_examples)
gpt_ranks = np.array(list(llm_ranks_map.values())).T
print("Correlation with ChatGPT")
print("------------------------")
for cor_name, cor_func in COR_MAPS.items():
    print(cor_name, cor_func(ranks, gpt_ranks))

Correlation with ChatGPT
------------------------
pearson 0.24613052623424228
spearman 0.2567394020779344
spearman_footrule 32.0
set_based 0.5858743686868686


## Use case 2: Using LLM-blender to directly compare two candidates

In [10]:
candidates_A = [x['candidates'][0]['text'] for x in few_examples]
candidates_B = [x['candidates'][1]['text'] for x in few_examples]
comparison_results = blender.compare(
    inputs, candidates_A, candidates_B, instructions=insts,
    batch_size=2, return_logits=False)
print("Comparison results for inputs:", comparison_results) # comparison results for input 1

Ranking candidates: 100%|██████████| 4/4 [00:00<00:00,  7.48it/s]

Comparison results for inputs: [ True  True  True  True False  True  True  True]





## Use case 3: Using LLM-Blender for fuse generation
We show that the the fused generation using the top-ranked candidate from the rankers could get outputs of higher quality.

In [11]:
from llm_blender.blender.blender_utils import get_topk_candidates_from_ranks
topk_candidates = get_topk_candidates_from_ranks(ranks, candidates_texts, top_k=3)
fuse_generations = blender.fuse(inputs, topk_candidates, instructions=insts, batch_size=2)
print("fuse_generations for input 1:", fuse_generations[0])

Fusing candidates: 100%|██████████| 4/4 [00:45<00:00, 11.30s/it]

fuse_generations for input 1: A skeptic is someone who is open to questioning and evaluating claims, while a denier is someone who actively refuses to accept evidence that contradicts their beliefs. So, a skeptic is someone who is open to questioning and evaluating claims, while a denier is someone who actively refuses to accept evidence that contradicts their beliefs.





In [12]:
# # Or do rank and fuser together
fuse_generations, ranks = blender.rank_and_fuse(inputs, candidates_texts, instructions=insts, return_scores=False, batch_size=2, top_k=3)

Ranking candidates: 100%|██████████| 4/4 [00:44<00:00, 11.06s/it]
Fusing candidates: 100%|██████████| 4/4 [00:44<00:00, 11.05s/it]


In [13]:
from llm_blender.common.evaluation import overall_eval
metrics = ['bartscore']
targets = [x['output'] for x in few_examples]
scores = overall_eval(fuse_generations, targets, metrics)

print("Fusion Scores")
for key, value in scores.items():
    print("  ", key+":", np.mean(value))

print("LLM Scores")
llms = [x['model'] for x in few_examples[0]['candidates']]
llm_scores_map = {llm: {metric: [] for metric in metrics} for llm in llms}
for ex in few_examples:
    for cand in ex['candidates']:
        for metric in metrics:
            llm_scores_map[cand['model']][metric].append(cand['scores'][metric])
for i, (llm, scores_map) in enumerate(llm_scores_map.items()):
    print(f"{i} {llm}")
    for metric, llm_scores in llm_scores_map[llm].items():
        print("  ", metric+":", "{:.4f}".format(np.mean(llm_scores)))


ModuleNotFoundError: No module named 'bert_score'

## Use case 4: Use LLM-Blender for decoding enhancement (best-of-n sampling)


In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")
model = AutoModelForCausalLM.from_pretrained("HuggingFaceH4/zephyr-7b-beta", device_map="auto")

system_message = {
    "role": "system",
    "content": "You are a friendly chatbot who always responds in the style of a pirate",
}
messages = [
    [
        system_message,
        {"role": "user", "content": _inst + "\n" + _input},
    ]
    for _inst, _input in zip(insts, inputs)
]
prompts = [tokenizer.apply_chat_template(m, tokenize=False, add_generation_prompt=True) for m in messages]
outputs = blender.best_of_n_generate(model, tokenizer, prompts, n=10)
print("### Prompt:")
print(prompts[0])
print("### best-of-n generations:")
print(outputs[0])


## Use case 5: Use PairRM for RLHF tuning

To get scalar rewards, you can use `blender.rank_with_ref` method (see the example below).

In [None]:
rewards = blender.rank_with_ref(inputs, candidates_texts, return_scores=True, batch_size=2, mode="longest")
print("Rewards for input 1:", rewards[0]) # rewards of candidates for input 1
"""
rewards is a List[List[float]] of shape (len(inputs), len(candidates_texts[0])).
representing the rewards of each candidate for each input.
By default, the rewards are calculated based on the the comparison with the longest generation as a reference.(mode="longest").
other supported modes are "shortest" "median_length" "first" "last"
"""

You can also pass a list of references to compare with, instead of automatically selecting one from the candidates as the fixed reference.


In [None]:
ref_candidates = [_c[0] for _c in candidates_texts] # use the first candidate as the reference, same as mode="first"
rewards = blender.rank_with_ref(inputs, candidates_texts, return_scores=True, batch_size=2, ref_candidates=ref_candidates)
"""
ref_candidates = [ref1, ref2, ref3, ...] # ref_candidates is a List[str], shape (len(inputs),)
this parameter will override the mode parameter, and use the ref_candidates as the reference for reward calculation.
rewards is a List[List[float]] of shape (len(inputs), len(candidates_texts[0])).
"""