<a href="https://colab.research.google.com/github/rodiwaa/learnings-pocs/blob/main/notebooks/yt_rag_system.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# YT Rag System
Learning RAG components from the excellent CampusX Playlist on YT.

Scope
- RAG
  - Doc Loaders
  - Text Splitters
  - Vector Stores
  - Retrievers
- Advanced RAG (Future)
  - UI (streamlit, chainlit, gradio, react)
  - Evaluations
    - Ragas (4xmetrics)
    - Langsmith (traces, tags)
  - Indexing
  - Retrieval
    - Pre R
      - LLM Q Rewrite
      - Multi query
      - Domain Aware Routing
    - During R (search strategy)
      - MMR
      - Hybrid (Semantic, BM25, Keyword)
      - Reranking (algo, LLMs)
    - Post R
      - contextual compression
  - Augmentation
    - Prompt templating
    - Grounding (use context only, else say IDK)
    - Context window optimisation
  - Generation
    - Citations
    - Guardrails
  - System Design
    - Multimodal
    - Agentic (web search, routers)
    - Memory based (from last time convos)



In [1]:
!pip install langchain langgraph langsmith langchain.community wikipedia langchain_openai chromadb python-dotenv



## Ingestion Module

In [10]:
from google.colab import drive
from dotenv import load_dotenv
import os

MOUNT_PATH="/content/drive"
drive.mount(MOUNT_PATH, force_remount=True)

ENV_PATH=f"{MOUNT_PATH}/MyDrive/Projects/.env/.env"
print(ENV_PATH)

load_dotenv(dotenv_path=ENV_PATH)

# FIXME: get creds, API KEYS from external .env
# fetch .env from gdrive
# dotenv the .env


Mounted at /content/drive
/content/drive/MyDrive/Projects/.env/.env


True

In [None]:
from langchain_core.documents import Document

# custom docs
docs = [
    Document(page_content="Rohit is software engineer"),
    Document(page_content="Rohit is AI engineer"),
    Document(page_content="Rohit is data engineer")
]
print(f"3 docs added")

3 docs added


In [None]:
# from langchain_community.vectorstores import FAISS
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

embedding_model = OpenAIEmbeddings()

vectorstore = Chroma.from_documents(
    documents = docs,
    embedding = embedding_model,
    collection_name = "temp-3-documents"
)
print("vs created")

vs created


In [None]:
# retriave documents from VS

retriever = vectorstore.as_retriever(
    search_type="mmr",
    kwargs=2
)
print("retriever created 2")

vectorstore.add_documents([Document(page_content="Adam is astronaut", id=1234)])
print("new doc added")

retriever created 2
new doc added


In [None]:
query = "who all are astronauts?"
result = retriever.invoke(query)
result

[Document(metadata={}, page_content='Adam is astronaut'),
 Document(metadata={}, page_content='Rohit is AI engineer'),
 Document(metadata={}, page_content='Rohit is software engineer'),
 Document(metadata={}, page_content='Rohit is data engineer')]

In [None]:
from langchain_community.retrievers import WikipediaRetriever

retriever = WikipediaRetriever(top_k_results=2, lang="en")
query = "2024 IPL"
docs = retriever.invoke(query)
len(docs)
docs

# for i, doc in enumerate(docs):
#   print(doc.page_content)

[Document(metadata={'title': '2024 Indian Premier League', 'summary': 'The 2024 Indian Premier League (also known as IPL 17 and branded as TATA IPL 2024) was the 17th edition of the Indian Premier League. The tournament featured ten teams competing in 74 matches from 22 March to 26 May 2024. It was held across 13 cities in India, with Chennai hosting the opening ceremony and the final as the defending champions.\nIn the final, Kolkata Knight Riders defeated Sunrisers Hyderabad by 8 wickets to win their third IPL title.', 'source': 'https://en.wikipedia.org/wiki/2024_Indian_Premier_League'}, page_content='The 2024 Indian Premier League (also known as IPL 17 and branded as TATA IPL 2024) was the 17th edition of the Indian Premier League. The tournament featured ten teams competing in 74 matches from 22 March to 26 May 2024. It was held across 13 cities in India, with Chennai hosting the opening ceremony and the final as the defending champions.\nIn the final, Kolkata Knight Riders defeat