# RAG Fundamentals & Workflow

- What is RAG? & Workflow
- Why RAG matters? Overcoming LLM limitations
- RAG Architecture
- Hands-On RAG demo using a pre-built tool - LangChain

## What is RAG? & Workflow

<img src='rag basic.png' />

In [None]:
it retrieve relevant external data from a database on which the LLM is not trained on.

In [None]:
workflow

- query = 'what is the latest advancements in renewable energy?'
- retrieval = a retriever searches a knowledge base(documents, articles, databases) to find relevant documents
- agumentation (context) = the retrieved documents are combine with the users query to form a better context
- generation = LLM uses the query and retrieved context to generate a response

- output

## Why RAG matters? Overcoming LLM limitations

In [None]:
limitations of LLM

- limited knowledge
- hallucination
- context window constraints


In [None]:
RAG

- integration of external data which is factual, and external context reduce hallucination
- up to date information
- domain specific

## RAG Architecture

<img src='rag architecture.png' />

### Retriever Type

- Sparse Retriever (BM25) - Best Matching
keyword-based, fast but less semantic(context and intent)


- Dense Retriever(DPR) - Dense Passage Retrieval (DPR)
use embeddings for semantic similarity, more accurate but computaionally heavier


### Knowledge base formats
- text files, databases, APIs(wikipedia)

### Embedding storage
- vector databases like FAISS, pinecone, weaviate

### fine-tuning
we will only do this in domain specific tasks

### types of RAG

- Vector RAG
vector db like FAISS, pinecone
unstructured data like text, video, images

- Graph RAG
knowledge graph like neo4j
structured data like relational database, csv etc

- Hybrid RAG
combine vector store + graph queries + SQL etc


## Hands-On - RAG demo using a pre-built tool - LangChain

In [None]:
build RAG system to answer questions about a pdf (project ideas)

In [1]:
!pip install langchain

Collecting langchain
  Downloading langchain-0.3.26-py3-none-any.whl.metadata (7.8 kB)
Collecting langchain-core<1.0.0,>=0.3.66 (from langchain)
  Downloading langchain_core-0.3.68-py3-none-any.whl.metadata (5.8 kB)
Collecting langchain-text-splitters<1.0.0,>=0.3.8 (from langchain)
  Using cached langchain_text_splitters-0.3.8-py3-none-any.whl.metadata (1.9 kB)
Collecting langsmith>=0.1.17 (from langchain)
  Downloading langsmith-0.4.4-py3-none-any.whl.metadata (15 kB)
Collecting SQLAlchemy<3,>=1.4 (from langchain)
  Downloading sqlalchemy-2.0.41-cp312-cp312-macosx_11_0_arm64.whl.metadata (9.6 kB)
Collecting tenacity!=8.4.0,<10.0.0,>=8.1.0 (from langchain-core<1.0.0,>=0.3.66->langchain)
  Using cached tenacity-9.1.2-py3-none-any.whl.metadata (1.2 kB)
Collecting orjson<4.0.0,>=3.9.14 (from langsmith>=0.1.17->langchain)
  Downloading orjson-3.10.18-cp312-cp312-macosx_15_0_arm64.whl.metadata (41 kB)
Collecting requests-toolbelt<2.0.0,>=1.0.0 (from langsmith>=0.1.17->langchain)
  Using cac

In [2]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.53.1-py3-none-any.whl.metadata (40 kB)
Collecting regex!=2019.12.17 (from transformers)
  Using cached regex-2024.11.6-cp312-cp312-macosx_11_0_arm64.whl.metadata (40 kB)
Collecting tokenizers<0.22,>=0.21 (from transformers)
  Using cached tokenizers-0.21.2-cp39-abi3-macosx_11_0_arm64.whl.metadata (6.8 kB)
Collecting safetensors>=0.4.3 (from transformers)
  Using cached safetensors-0.5.3-cp38-abi3-macosx_11_0_arm64.whl.metadata (3.8 kB)
Downloading transformers-4.53.1-py3-none-any.whl (10.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.8/10.8 MB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Using cached regex-2024.11.6-cp312-cp312-macosx_11_0_arm64.whl (284 kB)
Using cached safetensors-0.5.3-cp38-abi3-macosx_11_0_arm64.whl (418 kB)
Using cached tokenizers-0.21.2-cp39-abi3-macosx_11_0_arm64.whl (2.7 MB)
Installing collected packages: safetensors, regex, tokenizers, transformers
Successfully install

In [3]:
!pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0-cp312-cp312-macosx_14_0_arm64.whl.metadata (4.8 kB)
Downloading faiss_cpu-1.11.0-cp312-cp312-macosx_14_0_arm64.whl (3.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m1.0 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Installing collected packages: faiss-cpu
Successfully installed faiss-cpu-1.11.0


In [4]:
!pip install sentence-transformers

Collecting sentence-transformers
  Downloading sentence_transformers-5.0.0-py3-none-any.whl.metadata (16 kB)
Collecting torch>=1.11.0 (from sentence-transformers)
  Downloading torch-2.7.1-cp312-none-macosx_11_0_arm64.whl.metadata (29 kB)
Collecting sympy>=1.13.3 (from torch>=1.11.0->sentence-transformers)
  Using cached sympy-1.14.0-py3-none-any.whl.metadata (12 kB)
Collecting networkx (from torch>=1.11.0->sentence-transformers)
  Downloading networkx-3.5-py3-none-any.whl.metadata (6.3 kB)
Collecting mpmath<1.4,>=1.1.0 (from sympy>=1.13.3->torch>=1.11.0->sentence-transformers)
  Using cached mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)
Downloading sentence_transformers-5.0.0-py3-none-any.whl (470 kB)
Downloading torch-2.7.1-cp312-none-macosx_11_0_arm64.whl (68.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m68.6/68.6 MB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:02[0m
Using cached sympy-1.14.0-py3-none-any.whl (6.3 MB)
Downloading networkx-3.5-py3-no

In [6]:
pip install -U langchain-community

Collecting langchain-community
  Downloading langchain_community-0.3.27-py3-none-any.whl.metadata (2.9 kB)
Collecting aiohttp<4.0.0,>=3.8.3 (from langchain-community)
  Downloading aiohttp-3.12.13-cp312-cp312-macosx_11_0_arm64.whl.metadata (7.6 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Using cached dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.10.1-py3-none-any.whl.metadata (3.4 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.1-py3-none-any.whl.metadata (9.4 kB)
Collecting aiohappyeyeballs>=2.5.0 (from aiohttp<4.0.0,>=3.8.3->langchain-community)
  Using cached aiohappyeyeballs-2.6.1-py3-none-any.whl.metadata (5.9 kB)
Collecting aiosignal>=1.1.2 (from aiohttp<4.0.0,>=3.8.3->langchain-community)
  Downloading aiosignal-1.4.0-py3-none-any.whl.metadata (3.7 kB)
Collecting frozenlist>=1.1.1 (from aioht

In [12]:
!pip install huggingface_hub



In [None]:
1. load document
2. create embeddings
3. setup retriever
4. integrate LLM
5. build RAG chain
6. run query

In [8]:
from langchain.document_loaders import TextLoader

loader = TextLoader("projects.txt")

documents = loader.load()

In [13]:
from huggingface_hub import login
login(token="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")

In [14]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

embeddings = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2')

vector_store = FAISS.from_documents(documents, embeddings)

In [15]:
vector_store

<langchain_community.vectorstores.faiss.FAISS at 0x168de08f0>

In [16]:
retriever = vector_store.as_retriever()

In [26]:
# from langchain.llms import HuggingFacePipeline

# llm = HuggingFacePipeline.from_model_id(model_id='gpt2', task='text-generation')

from langchain.llms import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

model_id = "google/flan-t5-base"  # You can also use "tiiuae/falcon-7b-instruct" or "mistralai/Mistral-7B-Instruct-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

pipe = pipeline(
    task="text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=512,
    temperature=0
)

llm = HuggingFacePipeline(pipeline=pipe)


Device set to use mps:0
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
  llm = HuggingFacePipeline(pipeline=pipe)


In [27]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever, chain_type="stuff")


In [21]:
docs = retriever.get_relevant_documents(query)
print(f"Number of documents retrieved: {len(docs)}")

  docs = retriever.get_relevant_documents(query)


Number of documents retrieved: 1


In [25]:
print(retriever.vectorstore.index.ntotal)  # should be > 0

1


In [None]:
query = "What are some ideas from energy sector?"

result = qa_chain.run(query)

print(result)

Token indices sequence length is longer than the specified maximum sequence length for this model (1727 > 512). Running this sequence through the model will result in indexing errors
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
