<a href="https://colab.research.google.com/github/switch527/applied-ml/blob/main/Hands_on_with_Retrieval_Augmented_Generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Retrieval-augmented generation (RAG)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/use_cases/question_answering/qa.ipynb)

## Use case
Suppose you have some text documents (PDF, blog, Notion pages, etc.) and want to ask questions related to the contents of those documents.

LLMs, given their proficiency in understanding text, are a great tool for this.

In this walkthrough we'll go over how to build a question-answering over documents application using LLMs.

Two very related use cases which we cover elsewhere are:
- [QA over structured data](https://python.langchain.com/docs/use_cases/qa_structured/sql) (e.g., SQL)
- [QA over code](https://python.langchain.com/docs/use_cases/question_answering/code_understanding) (e.g., Python)

![intro.png](https://github.com/langchain-ai/langchain/blob/master/docs/static/img/qa_intro.png?raw=true)

## Overview
The pipeline for converting raw unstructured data into a QA chain looks like this:
1. `Loading`: First we need to load our data. Use the [LangChain integration hub](https://integrations.langchain.com/) to browse the full set of loaders.
2. `Splitting`: [Text splitters](/docs/modules/data_connection/document_transformers/) break `Documents` into splits of specified size
3. `Storage`: Storage (e.g., often a [vectorstore](/docs/modules/data_connection/vectorstores/)) will house [and often embed](https://www.pinecone.io/learn/vector-embeddings/) the splits

![rag_indexing](https://python.langchain.com/assets/images/rag_indexing-8160f90a90a33253d0154659cf7d453f.png)

4. `Retrieval`: The app retrieves splits from storage (e.g., often [with similar embeddings](https://www.pinecone.io/learn/k-nearest-neighbor/) to the input question)
5. `Generation`: An [LLM](/docs/modules/model_io/models/llms/) produces an answer using a prompt that includes the question and the retrieved data

![reg](https://python.langchain.com/assets/images/rag_retrieval_generation-1046a4668d6bb08786ef73c56d4f228a.png)



## Quickstart

Suppose we want a QA app over this [blog post](https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives).

We can create this in a few lines of code.

First set environment variables and install packages:

In [None]:
!pip install langchain chromadb sentence-transformers InstructorEmbedding transformers torch accelerate bitsandbytes ragas

## Step 1. Load

Specify a `DocumentLoader` to load in your unstructured data as `Documents`.

A `Document` is a dict with text (`page_content`) and `metadata`.

In [None]:
# Load documents
from langchain.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives")

### Go deeper
- Browse the > 160 data loader integrations [here](https://integrations.langchain.com/).
- See further documentation on loaders [here](/docs/modules/data_connection/document_loaders/).

## Step 2. Split

Split the `Document` into chunks for embedding and vector storage.

In [None]:
# Split documents


### Go deeper

- `DocumentSplitters` are just one type of the more generic `DocumentTransformers`.
- See further documentation on transformers [here](/docs/modules/data_connection/document_transformers/).
- `Context-aware splitters` keep the location ("context") of each split in the original `Document`:
    - [Markdown files](/docs/use_cases/question_answering/document-context-aware-QA)
    - [Code (py or js)](docs/integrations/document_loaders/source_code)
    - [Documents](/docs/integrations/document_loaders/grobid)

## Step 3. Store

To be able to look up our document splits, we first need to store them where we can later look them up.

The most common way to do this is to embed the contents of each document split.

We store the embedding and splits in a vectorstore.

In [None]:
# Embed and store splits
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceInstructEmbeddings

# Downloading embedding model
embedding_model = HuggingFaceInstructEmbeddings(
    model_name = "hkunlp/instructor-large",
    embed_instruction = "Represent the document for retrieval: ",
    query_instruction = "Represent the question for retrieving supporting documents: ",
    model_kwargs = {'device': 'cuda'}
)

vectorstore = Chroma.from_documents(documents=splits, embedding=embedding_model)
retriever = vectorstore.as_retriever()

### Go deeper
- Browse the > 40 vectorstores integrations [here](https://integrations.langchain.com/).
- See further documentation on vectorstores [here](/docs/modules/data_connection/vectorstores/).
- Browse the > 30 text embedding integrations [here](https://integrations.langchain.com/).
- See further documentation on embedding models [here](/docs/modules/data_connection/text_embedding/).

 Here are Steps 1-3:

![lc.png](https://github.com/langchain-ai/langchain/blob/master/docs/static/img/qa_data_load.png?raw=true)

## Step 4. Retrieve

Retrieve relevant splits for any question using [similarity search](https://www.pinecone.io/learn/what-is-similarity-search/).

This is simply "top K" retrieval where we select documents based on embedding similarity to the query.

In [None]:
# LLM
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from langchain.llms import HuggingFacePipeline
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [None]:
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1",
                                             load_in_4bit=True,
                                             device_map='auto')
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")

In [None]:
# Test the model
test_prompt = """### Instruction: Tell me what you know about the SR-71 Blackbird.

 ### Answer:
 """

encoded_instruction = tokenizer(test_prompt,
                                return_tensors="pt",
                                add_special_tokens=True)

model_inputs = encoded_instruction.to(device)

generated_ids = model.generate(**model_inputs,
                               max_new_tokens=1000,
                               do_sample=True,
                               pad_token_id=tokenizer.eos_token_id)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

In [None]:
import transformers

text_generation_pipeline = transformers.pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    do_sample=True,
    temperature=0.2,
    repetition_penalty=1.1,
    return_full_text=True,
    max_new_tokens=1000,
)
mistral_llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

In [None]:
from langchain.chains import RetrievalQA

# Prompt template
qa_template = """<s>[INST] You are a helpful assistant.
Use the following context to answer the question below.
If you cannot answer a questions based on the provided context, respond with "Unable to answer the question based on the provided context".

Context:
{context}

Question:
{question} [/INST] </s>
"""

# Create a prompt instance
QA_PROMPT = PromptTemplate.from_template(qa_template)

# Custom QA Chain


In [None]:
# Enter a Question


# Query Mistral 7B Instruct model


# Print your result


## Step 5. Evaluate

We'll start by generating some sample question and answer pairs:

In [None]:
eval_questions = [
    "What are the three main steps in the canonical training pipeline for modern transformer-based LLMs?",
    "What is the purpose of the pretraining phase in the LLM training pipeline?",
    "What does RLHF stand for, and how does it contribute to the LLM training pipeline?",
    "How does RLHF Step 2 create a reward model in the RLHF pipeline?",
    "What are some alternatives to RLHF discussed in the article, and how do they differ?",
]

eval_answers = [
    "The three main steps are Pretraining, Supervised finetuning, and Alignment.",
    "The Eagles are going to win the Super Bowl this season.",
    "RLHF stands for Reinforcement Learning with Human Feedback. It contributes by aligning the LLM with human preferences, improving its helpfulness and safety.",
    "In RLHF Step 2, for each prompt, multiple responses are generated from the finetuned LLM, and individuals rank these responses based on their preference, forming a dataset for creating a reward model.",
    "Some alternatives include Constitutional AI, Hindsight Instruction Labeling, Direct Preference Optimization, Contrastive Preference Learning, Reinforced Self-Training, and Reinforcement Learning with AI Feedback (RLAIF). Each alternative has distinct approaches, such as self-training based on rules, supervised finetuning with relabeling, direct use of cross entropy loss, contrastive loss learning, self-training with offline dataset generation, and reinforcement learning with AI-generated feedback.",
]

examples = [
    {"query": q, "ground_truths": [eval_answers[i]]}
    for i, q in enumerate(eval_questions)
]

In [None]:
# make sure you have you OpenAI API key ready
import os

os.environ["OPENAI_API_KEY"] = # Enter API Key here

In [None]:
from ragas.langchain.evalchain import RagasEvaluatorChain
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    context_relevancy,
    context_recall,
)

# create evaluation chains
faithfulness_chain = RagasEvaluatorChain(metric=faithfulness)
answer_rel_chain = RagasEvaluatorChain(metric=answer_relevancy)
context_rel_chain = RagasEvaluatorChain(metric=context_relevancy)
context_recall_chain = RagasEvaluatorChain(metric=context_recall)

In [None]:
result = qa_chain({"query": eval_questions[0]})
print(result["result"])

In [None]:
eval_result = faithfulness_chain(result)
eval_result["faithfulness_score"]

In [None]:
# run the queries as a batch for efficiency
predictions = qa_chain.batch(eval_questions)

In [None]:
predictions