<a href="https://colab.research.google.com/github/godhal/godhal/blob/main/Mosleh_Turning_Llama_3_1_into_a_Text_Embedding_Model_DEMO.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook shows how to transform an LLM into a text embedding model using LLM2Vec. For a 7B/8B LLM, it can run on a 24 GB GPU.

We need to install the following packages:

In [None]:
#pip install llm2vec
!pip install --upgrade  transformers
#!pip install --upgrade llm2vec==0.2.2 transformers
!pip install flash-attn --no-build-isolation

In [None]:
import transformers
print(transformers.__version__)

4.44.0


In [None]:
from huggingface_hub import notebook_login

notebook_login()

# Simple Transformation


The following cell transforms Llama 3 8B into an embedding model and serialize it to a directory named "Llama-3-8B-Emb"

In [None]:
!git clone https://github.com/McGill-NLP/llm2vec.git

Cloning into 'llm2vec'...
remote: Enumerating objects: 853, done.[K
remote: Counting objects: 100% (306/306), done.[K
remote: Compressing objects: 100% (152/152), done.[K
remote: Total 853 (delta 187), reused 202 (delta 151), pack-reused 547[K
Receiving objects: 100% (853/853), 1.40 MiB | 29.21 MiB/s, done.
Resolving deltas: 100% (478/478), done.


In [None]:
!pip install numpy
!pip install tqdm
!pip install torch
!pip install peft
!pip install transformers
!pip install datasets
!pip install evaluate
!pip install scikit-learn

In [None]:
import torch
from llm2vec.llm2vec.llm2vec import LLM2Vec

l2v = LLM2Vec.from_pretrained(
    "meta-llama/Meta-Llama-3.1-8B-Instruct",
    device_map="cuda" if torch.cuda.is_available() else "cpu",
    torch_dtype=torch.bfloat16,
)
l2v.save("Llama-3.1-8B-Emb")

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

## Code Addition found here:

https://github.com/McGill-NLP/llm2vec/commit/03382c358494a4e2f07222455b366fb75d625ab7


In [None]:
# Encoding queries using instructions
instruction = (
    "Given a web search query, retrieve relevant passages that answer the query:"
)

queries = [
    [instruction, "how much protein should a female eat"],
    [instruction, "summit define"],
]
q_reps = l2v.encode(queries)


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)


In [None]:
q_reps

tensor([[-0.1924, -4.0000,  0.2891,  ..., -1.6562,  1.1641,  1.1172],
        [ 1.9531, -2.0312,  0.0352,  ..., -2.8281,  0.9180, -0.6641]])

In [None]:
# Encoding documents. Instruction are not required for documents
documents = [
    "As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.",
    "Definition of summit for English Language Learners. : 1  the highest point of a mountain : the top of a mountain. : 2  the highest level. : 3  a meeting or series of meetings between the leaders of two or more governments.",
]
d_reps = l2v.encode(documents)
d_reps

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

tensor([[ 1.5156, -3.4688, -0.4297,  ..., -0.9531, -0.7109,  1.0703],
        [ 0.7852, -2.9688, -2.4688,  ...,  2.0312, -1.0156, -0.1221]])

In [None]:

# Compute cosine similarity
q_reps_norm = torch.nn.functional.normalize(q_reps, p=2, dim=1)
d_reps_norm = torch.nn.functional.normalize(d_reps, p=2, dim=1)
cos_sim = torch.mm(q_reps_norm, d_reps_norm.transpose(0, 1))

print(cos_sim)

tensor([[0.8161, 0.5067],
        [0.4729, 0.5262]])
