# RAG Fundamentals & Workflow

- What is RAG? & Workflow
- Why RAG matters? Overcoming LLM limitations
- RAG Architecture
- Hands-On RAG demo using a pre-built tool - LangChain

## What is RAG? & Workflow

<img src='rag basic.png' />

In [None]:
it retrieve relevant external data from a database on which the LLM is not trained on.

In [None]:
workflow

- query = 'what is the latest advancements in renewable energy?'
- retrieval = a retriever searches a knowledge base(documents, articles, databases) to find relevant documents
- agumentation (context) = the retrieved documents are combine with the users query to form a better context
- generation = LLM uses the query and retrieved context to generate a response

- output

## Why RAG matters? Overcoming LLM limitations

In [None]:
limitations of LLM

- limited knowledge
- hallucination
- context window constraints


In [None]:
RAG

- integration of external data which is factual, and external context reduce hallucination
- up to date information
- domain specific

## RAG Architecture

<img src='rag architecture.png' />

### Retriever Type

- Sparse Retriever (BM25) - Best Matching
keyword-based, fast but less semantic(context and intent)


- Dense Retriever(DPR) - Dense Passage Retrieval (DPR)
use embeddings for semantic similarity, more accurate but computaionally heavier


### Knowledge base formats
- text files, databases, APIs(wikipedia)

### Embedding storage
- vector databases like FAISS, pinecone, weaviate

### fine-tuning
we will only do this in domain specific tasks

### types of RAG

- Vector RAG
vector db like FAISS, pinecone
unstructured data like text, video, images

- Graph RAG
knowledge graph like neo4j
structured data like relational database, csv etc

- Hybrid RAG
combine vector store + graph queries + SQL etc


## Hands-On - RAG demo using a pre-built tool - LangChain

In [None]:
build RAG system to answer questions about a pdf (project ideas)

In [None]:
!pip install langchain
!pip install transformers
!pip install faiss-cpu
!pip install sentence-transformers
!pip install -U langchain-community
!pip install huggingface_hub


In [None]:
1. load document
2. create embeddings
3. setup retriever
4. integrate LLM
5. build RAG chain
6. run query

In [8]:
from langchain.document_loaders import TextLoader

loader = TextLoader("projects.txt")

documents = loader.load()

In [13]:
from huggingface_hub import login
# login(token="")

In [14]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

embeddings = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2')

vector_store = FAISS.from_documents(documents, embeddings)

In [15]:
vector_store

<langchain_community.vectorstores.faiss.FAISS at 0x168de08f0>

In [16]:
retriever = vector_store.as_retriever()

In [None]:
# from langchain.llms import HuggingFacePipeline

# llm = HuggingFacePipeline.from_model_id(model_id='gpt2', task='text-generation')

from langchain.llms import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

model_id = "google/flan-t5-base"  # You can also use "tiiuae/falcon-7b-instruct" or "mistralai/Mistral-7B-Instruct-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

pipe = pipeline(
    task="text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=512,
    temperature=0
)

llm = HuggingFacePipeline(pipeline=pipe)


In [27]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever, chain_type="stuff")


In [None]:
docs = retriever.get_relevant_documents(query)
print(f"Number of documents retrieved: {len(docs)}")

In [25]:
print(retriever.vectorstore.index.ntotal)  # should be > 0

1


In [None]:
query = "What are some ideas from energy sector?"

result = qa_chain.run(query)

print(result)