# Fine-tuned RAG with Gradient


Instruct-tuning Llama2-7b-chat leveraging Gradient's services using [MosaicML Instruct-v3](https://huggingface.co/datasets/mosaicml/instruct-v3) dataset


In [1]:
!pip install llama-index gradientai cohere langchain -qU

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.8/15.8 MB[0m [31m35.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m192.4/192.4 kB[0m [31m14.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.9/51.9 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m802.4/802.4 kB[0m [31m31.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m147.9/147.9 kB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m222.3/222.3 kB[0m [31m10.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m50.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━

In [2]:
import getpass
import os

os.environ["GRADIENT_ACCESS_TOKEN"] = getpass.getpass("Gradient Access Token: ")

Gradient Access Token: ··········


In [3]:
os.environ["GRADIENT_WORKSPACE_ID"] = getpass.getpass("Gradient Workspace ID: ")

Gradient Workspace ID: ··········


In [4]:
!pip install datasets -qU

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m507.1/507.1 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
[?25h

#### Load HF Dataset


In [None]:
from datasets import load_dataset

instruct_tune_dataset = load_dataset("mosaicml/instruct-v3")

In [7]:
instruct_tune_dataset

DatasetDict({
    train: Dataset({
        features: ['prompt', 'response', 'source'],
        num_rows: 56167
    })
    test: Dataset({
        features: ['prompt', 'response', 'source'],
        num_rows: 6807
    })
})

#### Create Formatted Prompt


```
<s>### Instruction:
Below is an instruction that describes a task. Write a response that appropriately completes the request.
{USER MESSAGE}

### Response:
{RESPONSE}</s>
```


In [8]:
def create_prompt(sample):
  bos_token = "<s>"
  system_message = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
  user_message = sample["prompt"].replace(system_message, "").replace("\n\n### Instruction\n", "").replace("\n### Response\n", "").strip()
  response = sample["response"]
  eos_token = "</s>"

  full_prompt = ""
  full_prompt += bos_token
  full_prompt += "### Instruction:"
  full_prompt += "\n" + system_message
  full_prompt += "\n" + user_message
  full_prompt += "\n\n### Response:"
  full_prompt += "\n" + response
  full_prompt += eos_token

  return {"inputs" : full_prompt}

In [9]:
create_prompt(instruct_tune_dataset["train"][1])["inputs"]

'<s>### Instruction:\nBelow is an instruction that describes a task. Write a response that appropriately completes the request.\nWhat are different types of grass?\n\n### Response:\nThere are more than 12,000 species of grass. The most common is Kentucky Bluegrass, because it grows quickly, easily, and is soft to the touch. Rygrass is shiny and bright green colored. Fescues are dark green and shiny. Bermuda grass is harder but can grow in drier soil.</s>'

#### Map to Dataset

In [10]:
instruct_tune_dataset = instruct_tune_dataset.map(create_prompt)

Map:   0%|          | 0/56167 [00:00<?, ? examples/s]

Map:   0%|          | 0/6807 [00:00<?, ? examples/s]

In [11]:
instruct_tune_dataset

DatasetDict({
    train: Dataset({
        features: ['prompt', 'response', 'source', 'inputs'],
        num_rows: 56167
    })
    test: Dataset({
        features: ['prompt', 'response', 'source', 'inputs'],
        num_rows: 6807
    })
})

In [12]:
instruct_tune_dataset["train"][1]["inputs"]

'<s>### Instruction:\nBelow is an instruction that describes a task. Write a response that appropriately completes the request.\nWhat are different types of grass?\n\n### Response:\nThere are more than 12,000 species of grass. The most common is Kentucky Bluegrass, because it grows quickly, easily, and is soft to the touch. Rygrass is shiny and bright green colored. Fescues are dark green and shiny. Bermuda grass is harder but can grow in drier soil.</s>'

#### Filtering Dataset


In [13]:
pruned_dataset = instruct_tune_dataset.filter(lambda x: len(x["inputs"]) <= 2000)

Filter:   0%|          | 0/56167 [00:00<?, ? examples/s]

Filter:   0%|          | 0/6807 [00:00<?, ? examples/s]

In [14]:
pruned_dataset

DatasetDict({
    train: Dataset({
        features: ['prompt', 'response', 'source', 'inputs'],
        num_rows: 40723
    })
    test: Dataset({
        features: ['prompt', 'response', 'source', 'inputs'],
        num_rows: 5511
    })
})

#### Saving to JSONL



In [15]:
for split, dataset in pruned_dataset.items():
  dataset.to_json(f"instruct_tune_{split}.jsonl")

Creating json from Arrow format:   0%|          | 0/41 [00:00<?, ?ba/s]

Creating json from Arrow format:   0%|          | 0/6 [00:00<?, ?ba/s]

### Instruct-tuning!

###### Initializing a Base Model `llama2-7b-chat`.

In [16]:
from llama_index.llms import GradientBaseModelLLM

base_model_slug = "llama2-7b-chat"
base_llm = GradientBaseModelLLM(
    base_model_slug=base_model_slug, max_tokens=300
)

#### Initializing  Fine-tune Engine


- `base_model_slug` - this is a reference to the model `Slug ID`, you can find those IDs [here](https://docs.gradient.ai/docs/models-1#%EF%B8%8F-gradient-hosted-llms) in the "Model IDs for reference in the API and CLI" table.
- `name` - this is the name given to your fine-tuned model
- `data_path` - this will point to the formatted `jsonl` file and be used by the `GradientFinetuneEngine` to pull training examples from.
- `verbose` - lets us know what's going on!
- `max_steps` - the number of steps the model will be fine-tuned on
- `batch_size` - the number of examples used to train at a time

In [17]:
from llama_index.finetuning.gradient.base import GradientFinetuneEngine

finetune_engine = GradientFinetuneEngine(
    base_model_slug=base_model_slug,
    name="instruct_tune",
    data_path="/content/instruct_tune_train.jsonl",
    verbose=True,
    max_steps=100,
    batch_size=4,
)

In [None]:
finetune_engine.model_adapter_id

#### Instruct-tuning Llama 2 7B Chat


In [19]:
epochs = 1
for i in range(epochs):
    print(f"** EPOCH {i} **")
    finetune_engine.finetune()

** EPOCH 0 **
fine-tuning step 4: loss=1938.7327, trainable tokens=971
fine-tuning step 8: loss=1072.271, trainable tokens=648
fine-tuning step 12: loss=920.4726, trainable tokens=646
fine-tuning step 16: loss=1320.2537, trainable tokens=908
fine-tuning step 20: loss=1445.1958, trainable tokens=913
fine-tuning step 24: loss=536.31165, trainable tokens=422
fine-tuning step 28: loss=1730.758, trainable tokens=1319
fine-tuning step 32: loss=1050.8328, trainable tokens=1093
fine-tuning step 36: loss=1474.1715, trainable tokens=893
fine-tuning step 40: loss=738.13275, trainable tokens=605
fine-tuning step 44: loss=1662.3542, trainable tokens=1329
fine-tuning step 48: loss=970.9984, trainable tokens=618
fine-tuning step 52: loss=811.68225, trainable tokens=604
fine-tuning step 56: loss=1521.3733, trainable tokens=931
fine-tuning step 60: loss=1036.5712, trainable tokens=874
fine-tuning step 64: loss=813.6956, trainable tokens=571
fine-tuning step 68: loss=1207.4124, trainable tokens=960
fine

## Hosting An Embedding Model with Gradient


In [20]:
from getpass import getpass
import os

if not os.environ.get("GRADIENT_ACCESS_TOKEN", None):
    os.environ["GRADIENT_ACCESS_TOKEN"] = getpass("gradient.ai access token:")
if not os.environ.get("GRADIENT_WORKSPACE_ID", None):
    os.environ["GRADIENT_WORKSPACE_ID"] = getpass("gradient.ai workspace id:")

In [44]:
from langchain.embeddings import GradientEmbeddings

embeddings = GradientEmbeddings(model="bge-large")

In [45]:
len(embeddings.embed_query("Hello, is it me you're looking for?"))

1024

## Creating a RAG Pipeline Powered by Gradient and LangChain




In [None]:
import gradientai

client = gradientai.Gradient()

models = client.list_models(only_base=False)
for model in models:
  if "adapter" in model.id:
    print(model.id, model.name)

In [24]:
from langchain.llms import GradientLLM

llm = GradientLLM(
    model=models[-1].id,
    model_kwargs=dict(max_generated_token_count=128),
)

In [25]:
from langchain.prompts import PromptTemplate

template = """"\
### Instruction:
Below is an instruction that describes a task. Write a response that appropriately completes the request.
{input}

### Response:
"""

prompt = PromptTemplate(template=template, input_variables=["input"])

In [26]:
from langchain.chains import LLMChain

llm_chain = LLMChain(prompt=prompt, llm=llm)

In [27]:
input = "What is the opposite of Gradient Descent?"

llm_chain.run(input=input)

  warn_deprecated(


' The opposite of Gradient Descent is probably Gradient Ascent.'

In [28]:
template = """"\
### Instruction:
Below is an instruction that describes a task. Write a response that appropriately completes the request.

Based on the provided context, please answer the provided question. You can only use the provided context to answer the question.
If you do not know the answer - please respond with "I don't know".

Context:
{context}

Question:
{question}

### Response:
"""

rag_prompt = PromptTemplate(template=template, input_variables=["context", "question"])

In [29]:
llm_chain = rag_prompt | llm

In [30]:
question = "What is the opposite of Gradient Descent?"
context = "In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the gradient (or approximate gradient) of the function at the current point, because this is the direction of steepest descent. Conversely, stepping in the direction of the gradient will lead to a local maximum of that function; the procedure is then known as gradient ascent. It is particularly useful in machine learning for minimizing the cost or loss function.[1] Gradient descent should not be confused with local search algorithms, although both are iterative methods for optimization."

llm_chain.invoke({"question" :question, "context" : context})

' The opposite of gradient descent is gradient ascent.'

In [31]:
question = "What is the maximum airspeed velocity of an unladen swallow?"
context = "In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the gradient (or approximate gradient) of the function at the current point, because this is the direction of steepest descent. Conversely, stepping in the direction of the gradient will lead to a local maximum of that function; the procedure is then known as gradient ascent. It is particularly useful in machine learning for minimizing the cost or loss function.[1] Gradient descent should not be confused with local search algorithms, although both are iterative methods for optimization."

llm_chain.invoke({"question" :question, "context" : context})

" I don't know the answer to your question."

### Creating a RAG Chain in LangChain


In [32]:
!pip install faiss-cpu arxiv pymupdf -qU

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m42.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.4/4.4 MB[0m [31m59.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.1/81.1 kB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.6/30.6 MB[0m [31m26.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for sgmllib3k (setup.py) ... [?25l[?25hdone


In [33]:
from langchain.document_loaders import ArxivLoader

docs = ArxivLoader(query="Gradient Descent", load_max_docs=5).load()

In [34]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1250,
    chunk_overlap = 100,
    length_function = len,
    is_separator_regex = False
)

In [35]:
split_docs = text_splitter.split_documents(docs)

In [36]:
len(split_docs)

219

In [37]:
from langchain.vectorstores import FAISS

# batch embeddings as gradient embeddings API can currently only handle 100 items at a time
vectorstore = FAISS.from_documents(split_docs[:100], embedding=embeddings)
vectorstore.add_documents(split_docs[100:200])
vectorstore.add_documents(split_docs[200:])

print("Completed")

Completed


In [38]:
retriever = vectorstore.as_retriever()

In [39]:
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough

rag_chain = (
    {
        "context" : retriever, "question" : RunnablePassthrough()
    }
    | rag_prompt
    | llm
    | StrOutputParser()
)

In [40]:
rag_chain.invoke("What is Gradient Descent?")

' Gradient descent is an optimization algorithm that iteratively updates the parameters of a function to minimize the function. It is a simple and effective algorithm that is widely used in machine learning and other fields.'

In [41]:
rag_chain.invoke("Is it mandatory to learn gradient descent in detail to build large language model applications?")

' No, it is not mandatory to learn gradient descent in detail to build large language model applications. However, it is important to have a basic understanding of gradient descent and its variants, such as stochastic gradient descent, to build large language models. This is because gradient descent is a fundamental optimization algorithm used in deep learning, and many deep learning models use gradient descent as their optimization algorithm.\n\nIn particular, gradient descent is used in many deep learning models to optimize the parameters of the model. For example, in a neural network, the weights and biases of the network are typically optimized using gradient descent. Similarly, in a language'

In [42]:
rag_chain.invoke("What do I need to learn about gradient descent to build large language model applciations?")

' Gradient descent is a popular optimization algorithm used in machine learning to minimize a cost function. It is a first-order optimization algorithm that iteratively updates the parameters of a model in the direction of the negative gradient of the cost function. The gradient descent algorithm is widely used in machine learning to train neural networks, and it is also used in other machine learning algorithms such as logistic regression and linear regression.\n\nTo build large language model applications, you will need to learn about the following topics related to gradient descent:\n\n1. Cost functions: The cost function is a mathematical function that measures the distance between the parameters of the'