RAG apps have two big failure modes:
1. Retriever errors - wrong/irrelevant docs retrieved.
2. Generator errors - model hallucinates or misuses context.

In production, it's often unclear where the failure happened. Was the retriever bad, or did the LLM ignore
the docs?

LangSmith automatically records:\
    - User query\
    - Retrived documents\
    - LLM prompt (with inserted docs)\
    - LLM response

In [2]:
# Libraries Installation
# pip install langchain_google_genai --quiet --exists-action i --no-input
# pip install langchain --quiet --exists-action i --no-input
# pip install langchain-community faiss-cpu pypdf
# pip install -qU langchain

# Authenticate User
from google.colab import auth
auth.authenticate_user()

# Import libraries
import os
from google.colab import userdata
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain_core.prompts import PromptTemplate
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.tracers.langchain import LangChainTracer # <-- Import the specific Tracer class
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda

# Configure Environment Variables
os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_PROJECT"] = "Sequential LLM App" # setting up newer one Tracing projects, however sometime older project may get considered as environment like Google Colab stores Variables at initial run and may ignore in next run so we can explictly pass it during invoking process
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = userdata.get("LANGCHAIN_API_KEY")
os.environ["GEMINI_API_KEY"] = userdata.get("GEMINI_API_KEY")

# DEFINE MODEL
llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash-lite",
    project=userdata.get("GOOGLE_CLOUD_PROJECT"),
    location="global",
    temperature=0,
    vertexai=True
)

# Create a tracer for the specific project
tracer = LangChainTracer(project_name="RAG_Chatbot_v1")

# Create a custom configurations
custom_configurations = {
    "callbacks": [tracer],  # <--- This forces the LangSmith project name
}


# Locate the PDF
PDF_PATH = "/content/islr.pdf"  # <-- change to your PDF filename

# 1) Load PDF
loader = PyPDFLoader(PDF_PATH)
docs = loader.load()  # one Document per page

# 2) Chunk
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
splits = splitter.split_documents(docs)

# 3) Embed + index
emb = GoogleGenerativeAIEmbeddings(model="gemini-embedding-001")
vs = FAISS.from_documents(splits, emb)
retriever = vs.as_retriever(search_type="similarity", search_kwargs={"k": 4})

# 4) Prompt
prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer ONLY from the provided context. If not found, say you don't know."),
    ("human", "Question: {question}\n\nContext:\n{context}")
])

# 5) Chain
def format_docs(docs): return "\n\n".join(d.page_content for d in docs)

parallel = RunnableParallel(
    {
        "context": retriever | RunnableLambda(format_docs),
        "question": RunnablePassthrough()
    }
)

chain = parallel | prompt | llm | StrOutputParser()

# 6) Ask questions
print("PDF RAG ready. Ask a question (or type `exit`, `quit`, `q` to exit).")
while True:
    q = input("Q: ")

    if q.strip().lower() in ["exit", "quit", "q"]:
        print(f"Exiting... as recived {q.strip().lower()}")
        break

    else:
        print("Thinking...")
        ans = chain.invoke(q.strip(),config=custom_configurations)
        print("A:", ans)
        print("")

PDF RAG ready. Ask a question (or type `exit`, `quit`, `q` to exit).
Q: summarize the book in 50 words
Thinking...
A: This book offers an accessible introduction to statistical learning, covering key modeling and prediction techniques like linear regression, classification, and support vector machines. It uses real-world examples and R tutorials to help practitioners apply these methods in various fields.

Q: who wrote this book?
Thinking...
A: The book was written by Gareth James, Daniela Witten, and Robert Tibshirani.

Q: q
Exiting... as recived q


But here are some challenges like:
- You can not trace a normal Python function like loading the PDF, chunking, embeddnig, prompting as we are not using Runnable.
- and main issue is if we run the code again and again it will keep on loading the PDF and then will generate an embeddings and then users will send queries on it which is not a good practice
    - to avoid this we kept user in loop but its not much logical and practical approach