<a href="https://colab.research.google.com/github/lakhanrajpatlolla/aiml-learning/blob/master/Cohort24RAGSession.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install langchain openai faiss-cpu tiktoken python-dotenv pypdf langchain-community langchain_openai



In [None]:
!pip show langchain

Name: langchain
Version: 0.3.23
Summary: Building applications with LLMs through composability
Home-page: 
Author: 
Author-email: 
License: MIT
Location: /usr/local/lib/python3.11/dist-packages
Requires: langchain-core, langchain-text-splitters, langsmith, pydantic, PyYAML, requests, SQLAlchemy
Required-by: langchain-community


In [None]:
import os
from dotenv import load_dotenv
load_dotenv()
OPEN_AI_KEY = os.getenv("OPENAI_API_KEY")

In [None]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS

In [None]:
pdf_path = "/content/doc202517482201.pdf"
loader = PyPDFLoader(pdf_path)
pages = loader.load()
print(f"🤷‍♂️loaded {len(pages)} Pages")
splitter = RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=100)
chunks = splitter.split_documents(pages)
print(f"🤷‍♂️split into {len(chunks)} chunks")

🤷‍♂️loaded 10 Pages
🤷‍♂️split into 47 chunks


In [None]:
embeddings = OpenAIEmbeddings(openai_api_key=OPEN_AI_KEY)
vectorstore = FAISS.from_documents(chunks, embeddings)
print("Vector store created")

Vector store created


In [None]:
!pip install openai



In [None]:
from openai import OpenAI
client = OpenAI()
query = "what is GDP growth rate of india for 2024-25 ?"
direct_response=client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": query}
    ]
)
print("Gdp without RAG")
print(direct_response.choices[0].message.content.strip())

Gdp without RAG
It is not possible to accurately predict the GDP growth rate for India in 2024-25 as it depends on various economic factors and events that may influence the economy in the future. Projections and estimates for future GDP growth rates are typically made closer to the actual time period based on current economic conditions and trends.


In [None]:
results = vectorstore.similarity_search(query,k=3)
context = "\n\n".join([doc.page_content for doc in results])

In [None]:
#Build Augmentation
prompt = f"""
use the following document context to answer the question.
context:
{context}
Question:{query}
Answer:"""
rag_response = client.chat.completions.create(
   model = "gpt-3.5-turbo",
   messages=[
       {"role": "system", "content": "You answer based only on the given document context."},
       {"role": "user", "content": prompt}
   ]

)
print("GPT with RAG Answer")
print(rag_response.choices[0].message.content.strip())

GPT with RAG Answer
The GDP growth rate of India for 2024-25 is estimated at 6.4%.


In [None]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings,ChatOpenAI
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA


In [None]:
pdf_path = "/content/ens_d2.pdf"
loader = PyPDFLoader(pdf_path)
pages = loader.load()
print(f"🤷‍♂️loaded {len(pages)} Pages")

🤷‍♂️loaded 2 Pages


In [None]:
splitter = RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=50)
chunks = splitter.split_documents(pages)
print(f"🤷‍♂️split into {len(chunks)} chunks")
#

🤷‍♂️split into 7 chunks


In [None]:
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(chunks, embeddings)
print("Vector store created")

Vector store created


In [None]:
llm = ChatOpenAI(model_name="gpt-3.5-turbo",temperature=0)

In [None]:
def get_agent_retriever(keyword):
  relevant_docs = [doc for doc in chunks if keyword.lower() in doc.page_content.lower()]
  local_vectorstore = FAISS.from_documents(relevant_docs, embeddings)
  return local_vectorstore.as_retriever(search_type="similarity",search_kwargs={"k":2})


In [None]:
agentA = RetrievalQA.from_chain_type(llm=llm,retriever=get_agent_retriever("bagging"))
agentB = RetrievalQA.from_chain_type(llm=llm,retriever=get_agent_retriever("boosting"))
agentC = RetrievalQA.from_chain_type(llm=llm,retriever=get_agent_retriever("variance"))

In [None]:
#co-ordinator
def coordinator(query):
  print('Query',query)
  responseA = agentA.run(query)
  responseB = agentB.run(query)
  responseC = agentC.run(query)
  final = llm.invoke([
      {'role': 'system', 'content': 'You are a helpful coordinator combining answers from multiple expert agents.'},

      {'role': 'user', 'content': f"""
        Given the following agent reponses,merge them into a clear and unified answer to the question:
        Bagging Expert:\n{responseA}\n # Changed from responseA['result'] to responseA
        Boosting Expert:\n{responseB}\n # Changed from responseB['result'] to responseB
        Concept Expert:\n{responseC}\n # Changed from responseC['result'] to responseC
        Question: {query}
        Answer:
        """}
  ])
  return final.content





In [None]:
 #test CO-RAG
query="why do ensemble methods improve prediction accuracy?"
result = coordinator(query)
print("Final Anser from all the three agents",result)

Query why do ensemble methods improve prediction accuracy?
Final Anser from all the three agents Ensemble methods improve prediction accuracy by combining multiple models to reduce variance, decrease bias, and ultimately enhance predictions. Techniques like bagging, boosting, and stacking are utilized within ensemble methods to achieve these objectives. Bagging reduces variance by sampling and replacing data, boosting decreases bias by adjusting observation weights, and stacking combines multiple models to enhance overall prediction accuracy. By leveraging the strengths of different models and compensating for weaknesses, ensemble methods help reduce overfitting, increase generalization, and improve the overall accuracy of predictions.


In [None]:
#Using Langgrpah Frame work for Agentic AI implimentation
!pip install langgraph

Collecting langgraph
  Downloading langgraph-0.3.31-py3-none-any.whl.metadata (7.9 kB)
Collecting langgraph-checkpoint<3.0.0,>=2.0.10 (from langgraph)
  Downloading langgraph_checkpoint-2.0.24-py3-none-any.whl.metadata (4.6 kB)
Collecting langgraph-prebuilt<0.2,>=0.1.8 (from langgraph)
  Downloading langgraph_prebuilt-0.1.8-py3-none-any.whl.metadata (5.0 kB)
Collecting langgraph-sdk<0.2.0,>=0.1.42 (from langgraph)
  Downloading langgraph_sdk-0.1.61-py3-none-any.whl.metadata (1.8 kB)
Collecting xxhash<4.0.0,>=3.5.0 (from langgraph)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting ormsgpack<2.0.0,>=1.8.0 (from langgraph-checkpoint<3.0.0,>=2.0.10->langgraph)
  Downloading ormsgpack-1.9.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.5/43.5 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
Downloading langgraph-0.3.31-py3-none-any.whl

In [None]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings,ChatOpenAI
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langgraph.graph import END,StateGraph

In [None]:
pdf_path = "/content/ens_d2.pdf"
loader = PyPDFLoader(pdf_path)
pages = loader.load()
print(f"🤷‍♂️loaded {len(pages)} Pages")

🤷‍♂️loaded 2 Pages


In [None]:
splitter = RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=50)
docs = splitter.split_documents(pages)
print(f"🤷‍♂️split into {len(docs)} chunks")

🤷‍♂️split into 7 chunks


In [None]:
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(docs, embeddings)
print("Vector store created")

Vector store created


In [None]:
llm = ChatOpenAI(model_name="gpt-3.5-turbo",temperature=0)

In [None]:
def create_retriever(ketword):
  filtered = [doc for doc in docs if ketword.lower() in doc.page_content.lower()]
  store = FAISS.from_documents(filtered, embeddings)
  return store.as_retriever(search_type="similarity",search_kwargs={"k":2})

In [None]:
#Agent chains
BaggingQA = RetrievalQA.from_chain_type(llm=llm,retriever=create_retriever("bagging"))
BoostingQA = RetrievalQA.from_chain_type(llm=llm,retriever=create_retriever("boosting"))
ConceptQA = RetrievalQA.from_chain_type(llm=llm,retriever=create_retriever("variance"))

In [None]:
#LanggraphNodes
def bagging_node(state):
  query = state["query"]
  return {"bagging": BaggingQA.invoke(query)["result"],**state}
def boosting_node(state):
  query = state["query"]
  return {"boosting": BoostingQA.invoke(query)["result"],**state}
def concept_node(state):
  query = state["query"]
  return {"concept": ConceptQA.invoke(query)["result"],**state}

def coordinator_node(state):
  query = state["query"]
  bagging = state.get("bagging","")
  boosting = state.get("boosting","")
  concept = state.get("concept","")
  prompt = f"""
  you are a helpful coordinator AI.merge insights from experts into unified answer.
  Bagging Expert :{bagging},
  Boosting Expert :{boosting},
  Concept Expert :{concept},
  Question: {query}
  """
  response = llm.invoke(prompt).content
  return {"final_answer":response,**state}


In [None]:
#Build Langraph
from typing import TypedDict
class CoRAGstate(TypedDict):
  query : str
  bagging:str
  boosting:str
  concept:str
  final_answer:str

graph = StateGraph(state_schema=CoRAGstate)
graph.add_node("BaggingExpert",bagging_node)
graph.add_node("BoostingExpert",boosting_node)
graph.add_node("ConceptExpert",concept_node)
graph.add_node("Coordinator",coordinator_node)

graph.set_entry_point("BaggingExpert")
graph.add_edge("BaggingExpert","BoostingExpert")
graph.add_edge("BoostingExpert","ConceptExpert")
graph.add_edge("ConceptExpert","Coordinator")
graph.add_edge("Coordinator",END)

coRAG_app = graph.compile()

In [None]:
query = "why do ensemble methods improve prediction accuracy?"
result = coRAG_app.invoke({'query':query})
print("Final Answer",result["final_answer"])

Final Answer Ensemble methods improve prediction accuracy by combining multiple models to reduce variance, decrease bias, and ultimately improve predictions. Bagging reduces variance by sampling and replacing data, boosting decreases bias by giving more weight to misclassified data points, and stacking combines multiple models to learn from their strengths and weaknesses. By leveraging the strengths of different models and reducing their weaknesses, ensemble methods can capture more complex patterns in the data and make more accurate predictions than any single model could achieve on its own.


In [None]:
print(coRAG_app)

<langgraph.graph.state.CompiledStateGraph object at 0x7e8c02d77850>


In [None]:
!pip install graphviz



In [None]:
import graphviz
graph.get_graph().draw("coRAG_graph.png",format="png")

AttributeError: 'StateGraph' object has no attribute 'get_graph'