#  Simple RAG-Pipeline for Question-Answering

## RAG Explained

**Retrieval Augmented Generation (RAG)** is a technique that combines the strengths of both retrieval-based and generative models to enhance text generation. RAG is commonly used to enhance response quality in question-answering scenarios. Before a generative model is prompted to answer the question, the user's input *(1)* is encoded as embedding *(2)* to retrieving relevant information from a database *(3)* of documents. By including them in the prompt *(4)* the retrieved data is used to inform and improve the responses generated by a generative model *(5)*. This method is especially useful because it circumvents the limitations of fine-tuning, which isn't always feasible due to various constraints such as data availability or computational resources. For example, by incorporating rich, academically-informed content directly into the input sequence, it significantly enhances its capability to provide detailed and relevant answers. Here are some resources with more information on the topic:

* [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401)
* [Improving language models by retrieving from trillions of tokens](https://arxiv.org/abs/2112.04426)
* [Retrieval-Augmented Generation for Large Language Models: A Survey](https://arxiv.org/abs/2312.10997)




### Load Markdown Files

See: https://python.langchain.com/v0.2/docs/how_to/document_loader_markdown/

In [1]:
from langchain.document_loaders import DirectoryLoader
from langchain_community.document_loaders import TextLoader

In [2]:
# Path to your directory containing markdown files
directory_path = "../data/raw/"

# Load all markdown files from the directory
loader = DirectoryLoader(directory_path, glob="**/*.md", loader_cls=TextLoader)
documents = loader.load()

print(f"Loaded {len(documents)} documents")

Loaded 3 documents


In [3]:
print("Document Metadata:", documents[2].metadata)
print("Document Content:", documents[2].page_content[:100])

Document Metadata: {'source': '../data/raw/git-tutorial.md'}
Document Content: 
## Git Basics

If you can read only one chapter to get going with Git, this is it. This chapter cov


## Split Documents

In [4]:
# Splitting the text into chunks
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Custom separators for Markdown documents (headers and other common Markdown markers)
markdown_separators = ["#", "\n#", "##", ".", " ", ""]

text_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=128, separators=markdown_separators)
texts = text_splitter.split_documents(documents)

In [5]:
print("Number of chunks:", len(texts))

Number of chunks: 1453


In [6]:
print(texts[3])

page_content='.

For full usage of each command, including abbreviations, see *Command reference*. You can see the same information at the command line by *viewing the command-line help*.' metadata={'source': '../data/raw/conda-tutorial.md'}


## Save Documents in Vector Store

In [7]:
EMBEDDING_MODEL_NAME = "thenlper/gte-small"

In [8]:
from langchain.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores.utils import DistanceStrategy

embedding_model = HuggingFaceEmbeddings(
    model_name=EMBEDDING_MODEL_NAME,
    multi_process=False,
    model_kwargs={"device": "mps"}, #TODO: Check if this is correct
    encode_kwargs={"normalize_embeddings": True},  # Set `True` for cosine similarity
)


  embedding_model = HuggingFaceEmbeddings(
  from tqdm.autonotebook import tqdm, trange


In [9]:
KNOWLEDGE_VECTOR_DATABASE = FAISS.from_documents(
    texts, embedding_model, distance_strategy=DistanceStrategy.COSINE
)

## Querying Vector Store

In [10]:
# Embed a user query in the same space
user_query = "How to start conda?"
query_vector = embedding_model.embed_query(user_query)

print(f"\nStarting retrieval for {user_query=}...")
retrieved_docs = KNOWLEDGE_VECTOR_DATABASE.similarity_search(query=user_query, k=5)

print("\n============================== Retrieved Documents ==============================")
for i, doc in enumerate(retrieved_docs):
    print(f"\n============================== Document {i+1} ==============================")
    print(doc.page_content)
    print(doc.metadata)



Starting retrieval for user_query='How to start conda?'...


# 1.3 **Getting Started With Conda**

Conda is a powerful package manager and environment manager that you use with command line commands at the Anaconda Prompt for Windows, or in a Terminal window for macOS or Linux.

This 20-minute guide to getting started with conda lets you try out the major features of conda. You should understand how conda works when you finish this guide
{'source': '../data/raw/conda-tutorial.md'}

# 1.1 **Overview**

This page provides an overview of how to use conda. For an overview of what conda is and what it does, please see the *front page*.

The quickest way to start using conda is to go through the 20-minute *Getting started with conda* guide.

The conda command is the primary interface for managing installations of various packages. It can:
- Query and search the Anaconda package index and current Anaconda installation.

- Create new conda environments
{'source': '../data/raw/conda-tutorial.m

# Load Generative Language Model

In [11]:
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForCausalLM#, #BitsAndBytesConfig
import torch

In [12]:
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint

# ---------------------------- Load Model ---------------------------

# Parameters for generation (you can adjust these as needed)
generation_params = {
    "temperature": 0.7,
    "max_new_tokens": 230,
    "max_new_tokens": 2048,
    "top_p": 0.9,
    "repetition_penalty": 1.2,
    "do_sample": True,
}

# Create the Hugging Face Endpoint using the specified parameters
endpoint = HuggingFaceEndpoint(
    repo_id="HuggingFaceH4/zephyr-7b-beta",
    task="text-generation",
    **generation_params,
)

TOKENIZER = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")
# Return the LangChain HuggingFacePipeline object with the endpoint
READER_LLM = ChatHuggingFace(llm=endpoint)


In [13]:
READER_LLM.invoke("What is conda in the context of the Python programming language?")

AIMessage(content='Conda is an open-source package management and environment management system that is commonly used with the Python programming language. Conda enables users to manage dependencies, install software packages efficiently and consistently, and create and manage isolated computing environments. It works by creating self-contained packages called environments, each of which contains all the necessary libraries and packages for a specific project or analysis task. This allows developers and data scientists to avoid version control issues, ensure software compatibility, and simplify collaboration in multi-user', additional_kwargs={}, response_metadata={'token_usage': ChatCompletionOutputUsage(completion_tokens=100, prompt_tokens=36, total_tokens=136), 'model': '', 'finish_reason': 'length'}, id='run-9fbfabaf-8a2e-4712-8a79-7d6f4a4b5636-0')

## Question Answering

### Prompt for QA

In [14]:
prompt_in_chat_format = [
    {
        "role": "system",
        "content": """Using the information contained in the context,
give a comprehensive answer to the question.
Respond only to the question asked, response should be concise and relevant to the question.
Provide the number of the source document when relevant.
If the answer cannot be deduced from the context, do not give an answer.""",
    },
    {
        "role": "user",
        "content": """Context:
{context}
---
Now here is the question you need to answer.

Question: {question}""",
    },
]
RAG_PROMPT_TEMPLATE = TOKENIZER.apply_chat_template(
    prompt_in_chat_format, tokenize=False, add_generation_prompt=True
)
print(RAG_PROMPT_TEMPLATE)

<|system|>
Using the information contained in the context,
give a comprehensive answer to the question.
Respond only to the question asked, response should be concise and relevant to the question.
Provide the number of the source document when relevant.
If the answer cannot be deduced from the context, do not give an answer.</s>
<|user|>
Context:
{context}
---
Now here is the question you need to answer.

Question: {question}</s>
<|assistant|>



**What is conda in the context of the Python programming language?**

In [15]:
user_query = "What is conda in the context of the Python programming language?"
query_vector = embedding_model.embed_query(user_query)

print(f"\nStarting retrieval for {user_query=}...")
retrieved_docs = KNOWLEDGE_VECTOR_DATABASE.similarity_search(query=user_query, k=2)

retrieved_docs_text = [doc.page_content for doc in retrieved_docs] 
context = "\nExtracted documents:\n"
context += "".join([f"Document {str(i)}:::\n" + doc for i, doc in enumerate(retrieved_docs_text)])

final_prompt = RAG_PROMPT_TEMPLATE.format(
    question=user_query, context=context
)

# Redact an answer
answer = READER_LLM.invoke(final_prompt)


Starting retrieval for user_query='What is conda in the context of the Python programming language?'...


In [16]:
print("Answer:", answer)

Answer: content='Condra is a package and environment management tool in the context of the Python programming language that can also be used with other languages. It allows for the easy installation, management, and updates of packages and their dependencies in a consistent environment. Conda packages contain system-level libraries, modules, and executable programs that are compressed and downloaded from remote channels, with dependencies automatically tracked and updated from the default channel at http://repo.continuum.io/pkgs/. Overall, Cond' additional_kwargs={} response_metadata={'token_usage': ChatCompletionOutputUsage(completion_tokens=100, prompt_tokens=299, total_tokens=399), 'model': '', 'finish_reason': 'length'} id='run-2214b124-11fa-4a2a-a748-849dea333248-0'
