## LangChain RAG Expert with LangGraph State Management (using HuggingFace)
This Jupyter Notebook demonstrates how to build a sophisticated AI agent using LangChain and LangGraph, now integrated with HuggingFace Platform for the Large Language Model. The agent will act as a "Helpful Historical Expert" with the following key enhancements:

- **Retrieval-Augmented Generation (RAG):** The agent will query a local knowledge base (a text file) to retrieve relevant information before generating a response, ensuring factual accuracy and reducing hallucinations.
- **LangGraph for State Management:** We will use LangGraph to define a stateful workflow, allowing the agent to manage its internal state (e.g., current question, retrieved context) and execute steps like retrieval and response generation conditionally.
- **Enhanced Guardrails:** The detailed system prompt from prompt.txt will continue to guide the AI's persona, tone, and adherence to safety and scope constraints.
- **Ollama Integration:** The core language model for generation will be a local Ollama model (e.g., llama2).

We will test the system with both an on-topic historical question (which should leverage RAG) and an off-topic question (which should trigger the guardrails).

---

### 1. Setup and Installation
First, we need to install all the necessary libraries. This includes LangChain components, langchain-google-genai for Gemini integration, langchain-openai for embeddings (as Gemini's native embeddings might require re-indexing the vector store, keeping OpenAI for consistency here), LangGraph for state management, FAISS for vector storage, and Tiktoken for tokenization.

In [1]:
# Install necessary libraries
%pip install -U langchain langchain-community langchain-huggingface langchain-openai langgraph faiss-cpu tiktoken huggingface_hub transformers
%pip install protobuf==3.20.3 --force-reinstall

Defaulting to user installation because normal site-packages is not writeable
Looking in links: /usr/share/pip-wheels
Note: you may need to restart the kernel to use updated packages.
Defaulting to user installation because normal site-packages is not writeable
Looking in links: /usr/share/pip-wheels
Collecting protobuf==3.20.3
  Downloading protobuf-3.20.3-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (679 bytes)
Downloading protobuf-3.20.3-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
[2K   [38;5;70m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m17.0 MB/s[0m eta [36m0:00:00[0m[36m0:00:01[0m
[?25hInstalling collected packages: protobuf
  Attempting uninstall: protobuf
    Found existing installation: protobuf 3.20.3
    Uninstalling protobuf-3.20.3:
      Successfully uninstalled protobuf-3.20.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This 

In [2]:
import os
os.environ["HF_TOKEN"] = "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"
os.environ["OPENAI_API_KEY"]  = "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"

In [1]:
import os
import sys
from openai import OpenAI
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline # Changed to ChatHuggingFace for HuggingFace
from langchain_openai import OpenAIEmbeddings # Still using OpenAI for embeddings
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from langchain_core.messages import SystemMessage, HumanMessage
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated, List
import operator
# Removed specific transformers imports as we will use HuggingFaceHub directly
from langchain_community.llms import HuggingFaceHub # Import HuggingFaceHub for direct LLM access
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

# --- Set your HuggingFace Hub API Token ---
# It's highly recommended to set this as an environment variable for security.
# You can do this in your terminal before starting Jupyter:
# export HF_TOKEN='your_huggingface_token_here' (Linux/macOS)
# $env:HF_TOKEN='your_huggingface_token_here' (PowerShell)
#
# You will also need your OpenAI API key for embeddings:
# export OPENAI_API_KEY='your_openai_api_key_here'
#
# If you must set them directly in the notebook (NOT recommended for production):
# os.environ["HF_TOKEN"] = "YOUR_ACTUAL_HUGGINGFACE_TOKEN"
# os.environ["OPENAI_API_KEY"] = "YOUR_ACTUAL_OPENAI_API_KEY"
os.environ["HF_TOKEN"] = "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"
os.environ["OPENAI_API_KEY"]  = "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"


# Verify API keys/tokens are set
if "HF_TOKEN" not in os.environ:
    print("WARNING: HF_TOKEN environment variable not set.")
    print("Please set it before proceeding, or uncomment the line above to set it directly (not recommended).")
else:
    print("HF_TOKEN is set.")

if "OPENAI_API_KEY" not in os.environ:
    print("WARNING: OPENAI_API_KEY environment variable not set (needed for embeddings).")
    print("Please set it before proceeding, or uncomment the line above to set it directly (not recommended).")
else:
    print("OPENAI_API_KEY is set.")


# Print the Python executable path to help debug environment issues
print(f"Python executable: {sys.executable}")

# Initialize the HuggingFace model for generation
# --- IMPORTANT: MODEL ACCESS AND TOKEN PERMISSIONS ---
# If you encounter a "403 Forbidden" or "GatedRepoError", it means your HuggingFace token
# does not have the necessary permissions or you haven't accepted the model's terms of use.
#
# **ACTION REQUIRED TO RESOLVE 403 FORBIDDEN / GATEDREPOERROR:**
# 1.  **Visit your HuggingFace token settings:** Go to **https://huggingface.co/settings/tokens**
# 2.  **Log in** to your HuggingFace account.
# 3.  **Review and Edit your Token:**
#     * Click on the token you are using (or create a new one).
#     * Under "Repository access" or "Fine-grained permissions", ensure that **"Public gated repositories"**
#         access is **enabled**. This is crucial for models that require accepting terms.
#     * For broader access to all types of repositories (public, private, and gated), you can select "All repositories".
# 4.  **Accept Model Terms (if applicable):** If you are trying to use a specific model (e.g., `google/gemma-3-1b-pt` or `mistralai/Mistral-7B-Instruct-v0.3`), visit its model page on HuggingFace Hub (e.g., https://huggingface.co/google/gemma-3-1b-pt or https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3). Look for a section or button to **"Request access" or "Accept license"** and follow any instructions.
# 5.  **Restart Jupyter Kernel:** After updating your token's permissions or accepting terms, **RESTART YOUR JUPYTER KERNEL** (Kernel -> Restart Kernel...) to ensure the new permissions are loaded.

# We will use "HuggingFaceH4/zephyr-7b-beta" as a default, as it's a commonly used and generally accessible instruct model.
# If you have requested and been granted access to other models (and updated your token permissions), you can try them.
# HF_MODEL_ID = "HuggingFaceH4/zephyr-7b-beta" # Publicly available instruct model
# Example of a gated model (uncomment if you have access and permissions):
# HF_MODEL_ID = "mistralai/Mistral-7B-Instruct-v0.3"
# HF_MODEL_ID = "google/gemma-3-1b-pt"
HF_MODEL_ID = "google/gemma-3-1b-it" # Publicly available instruct model

HUGGINGFACE_TOKEN = os.environ.get("HF_TOKEN")
if not HUGGINGFACE_TOKEN:
    raise ValueError("HF_TOKEN environment variable not set.")

tokenizer = AutoTokenizer.from_pretrained(HF_MODEL_ID, token=HUGGINGFACE_TOKEN)
model = AutoModelForCausalLM.from_pretrained(HF_MODEL_ID, token=HUGGINGFACE_TOKEN)

# Create a text generation pipeline
hf_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer, # Explicitly pass the tokenizer
    max_new_tokens=500,
    temperature=0.7,
    do_sample=True,
    token=HUGGINGFACE_TOKEN,
    device=0 # Uncomment if you have a GPU and want to use it
)

# Wrap the HuggingFace pipeline with ChatHuggingFace
hf_llm_pipeline = HuggingFacePipeline(pipeline=hf_pipeline)
llm = ChatHuggingFace(llm=hf_llm_pipeline)

# --- FIX APPLIED HERE: Explicitly set the tokenizer on the llm object ---
# This addresses the 'NoneType' object has no attribute 'apply_chat_template' error
# by ensuring the tokenizer is available for chat templating within ChatHuggingFace.
if not hasattr(llm, 'tokenizer') or llm.tokenizer is None:
    llm.tokenizer = tokenizer
    print("Explicitly set tokenizer on ChatHuggingFace instance to resolve apply_chat_template error.")


# Initialize OpenAIEmbeddings for RAG (keeping consistent with previous notebooks)
embeddings = OpenAIEmbeddings()

print("LangChain, LangGraph, HuggingFace, and OpenAI setup complete.")
print("\nIf you still encounter 'ModuleNotFoundError' after running this cell, please try:")
print("1. Restarting your Jupyter kernel (Kernel -> Restart Kernel...)")
print("2. Running this setup cell again.")
print("3. Ensure you have pulled the required HuggingFace model or it's accessible via your HF_TOKEN.")



2025-07-16 08:41:20.053420: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


HF_TOKEN is set.
OPENAI_API_KEY is set.
Python executable: /opt/conda/envs/anaconda-ai-2024.04-py310/bin/python


Device set to use cpu


LangChain, LangGraph, HuggingFace, and OpenAI setup complete.

If you still encounter 'ModuleNotFoundError' after running this cell, please try:
1. Restarting your Jupyter kernel (Kernel -> Restart Kernel...)
2. Running this setup cell again.
3. Ensure you have pulled the required HuggingFace model or it's accessible via your HF_TOKEN.


### 2. Create the prompt.txt File
Create or update a file named prompt.txt in the same directory as this Jupyter Notebook. This file will contain the detailed system prompt with all the guardrails for your Historical Expert.

prompt.txt content

### 3. Create Knowledge Base File (pisa_history.txt)
Create a new file named pisa_history.txt in the same directory as this Jupyter Notebook. This file will serve as our knowledge base for RAG.

pisa_history.txt content (example, feel free to expand):



### 4. RAG Setup: Create Retriever
Here, we'll load our pisa_history.txt file, split it into manageable chunks, create embeddings for these chunks, and then store them in a FAISS vector store to enable efficient retrieval.

In [2]:
# --- RAG Setup ---
rag_file_path = "pisa_history.txt"

# 1. Load the document
try:
    loader = TextLoader(rag_file_path, encoding="utf-8")
    documents = loader.load()
    print(f"Successfully loaded RAG document from '{rag_file_path}'")
except FileNotFoundError:
    print(f"Error: The RAG file '{rag_file_path}' was not found. Please create it.")
    exit()
except Exception as e:
    print(f"An error occurred while loading the RAG document: {e}")
    exit()

# 2. Split the document into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(documents)
print(f"Split document into {len(splits)} chunks.")

# 3. Create a FAISS vector store from the chunks and embeddings
vectorstore = FAISS.from_documents(documents=splits, embedding=embeddings)
print("FAISS vector store created.")

# 4. Create a retriever
retriever = vectorstore.as_retriever()
print("Retriever created.")


Successfully loaded RAG document from 'pisa_history.txt'
Split document into 4 chunks.
FAISS vector store created.
Retriever created.


---
### 5. LangGraph Setup: Define Graph State and Nodes
We will define the state of our graph and the individual nodes (functions) that represent the steps in our agent's workflow.

In [3]:
# --- LangGraph Setup ---

# 1. Define Graph State
# This defines the object that is passed between nodes in the graph.
class GraphState(TypedDict):
    """
    Represents the state of our graph.

    Attributes:
        question: The user's question.
        context: Retrieved context from the RAG system.
        generation: The final generated answer from the LLM.
        is_historical_query: A flag to determine if the query is historical.
    """
    question: str
    context: Annotated[List[str], operator.add] # Context will be accumulated
    generation: str
    is_historical_query: bool # New field to control flow

# 2. Define Nodes (Functions)

# Node 1: Query Classifier
# This node determines if the incoming query is within the historical expert's scope.
def query_classifier(state: GraphState):
    """
    Determines if the incoming query is a historical question.
    This helps in deciding whether to perform RAG or directly apply guardrails.
    """
    print("---CLASSIFYING QUERY---")
    question = state["question"]

    # Use a simpler LLM call for classification to save tokens/latency
    # Note: Using the same LLM for classification, but with a specific prompt.
    classifier_prompt = ChatPromptTemplate.from_messages([
        SystemMessage(content="You are a helpful assistant. Your task is to classify if a given user question is related to 'history', 'architecture', or 'engineering' of historical structures. Respond with 'YES' if it is, and 'NO' if it is not. Be strict with your classification. Examples of 'NO': current events, personal opinions, finance, medical advice, fictional scenarios."),
        HumanMessage(content=f"Is the following question historical/architectural/engineering-related? '{question}'")
    ])
    classifier_chain = classifier_prompt | llm | StrOutputParser()

    classification_result = classifier_chain.invoke({"question": question})
    is_historical = "YES" in classification_result.upper()

    print(f"Query Classification: {classification_result.strip()} (Is Historical: {is_historical})")
    return {"is_historical_query": is_historical}


# Node 2: Retrieve
def retrieve(state: GraphState):
    """
    Retrieves documents from the vector store based on the user's question.
    """
    print("---RETRIEVING CONTEXT---")
    question = state["question"]
    docs = retriever.invoke(question)
    context = [doc.page_content for doc in docs]
    print(f"Retrieved {len(context)} documents.")
    return {"context": context}

# Node 3: Generate
def generate(state: GraphState):
    """
    Generates a response using the LLM, incorporating retrieved context if available,
    and adhering to the system prompt with guardrails.
    """
    print("---GENERATING RESPONSE---")
    question = state["question"]
    context = state["context"]

    # Load system prompt content from file
    try:
        with open("prompt.txt", "r", encoding="utf-8") as file:
            system_prompt_content = file.read()
    except Exception as e:
        print(f"Error loading prompt.txt in generate node: {e}")
        system_prompt_content = "You are a helpful assistant." # Fallback

    # Construct the messages list directly for ChatOllama
    # Using triple-quoted f-strings for multiline content
    messages = [SystemMessage(content=system_prompt_content)]

    if context:
        human_message_content = f"""Use the following retrieved context to answer the question. \
If the question cannot be answered from the provided context, state that you do not have sufficient information, \
but still adhere to your historical expert persona and guardrails.

Context:
{'  '.join(context)}

Question: {question}

Answer:"""
        messages.append(HumanMessage(content=human_message_content))
    else:
        human_message_content = f"""Question: {question}

Answer:"""
        messages.append(HumanMessage(content=human_message_content))

    # Create the generation chain:
    # Directly invoke `llm` with the `messages` and then pipe to `StrOutputParser`.
    rag_chain = (lambda x: llm.invoke(x["messages"])) | StrOutputParser()

    # Prepare input for the chain.
    input_data = {"messages": messages}

    generation_result = rag_chain.invoke(input_data)
    print("Response generated.")
    return {"generation": generation_result}

# 3. Define Conditional Edge
def decide_to_retrieve(state: GraphState):
    """
    Decides whether to retrieve context based on the query classification.
    """
    print("---DECIDING TO RETRIEVE---")
    if state["is_historical_query"]:
        print("Decision: Query is historical, proceeding to retrieve.")
        return "retrieve"
    else:
        print("Decision: Query is not historical, skipping retrieval and directly generating (applying general guardrails).")
        return "generate" # Skip retrieval for non-historical questions

print("Graph state and nodes defined.")

Graph state and nodes defined.


---
### 6. Build and Compile the LangGraph Workflow
Now we assemble our nodes into a graph, defining the flow of execution based on the state.

In [4]:
# --- Build the Graph ---
workflow = StateGraph(GraphState)

# Add nodes
workflow.add_node("classify_query", query_classifier)
workflow.add_node("retrieve_context", retrieve)
workflow.add_node("generate_response", generate)

# Set entry point
workflow.set_entry_point("classify_query")

# Add edges
workflow.add_conditional_edges(
    "classify_query",
    decide_to_retrieve,
    {
        "retrieve": "retrieve_context",
        "generate": "generate_response",
    },
)

# Add edge from retrieve to generate
workflow.add_edge("retrieve_context", "generate_response")

# Set end point
workflow.add_edge("generate_response", END)

# Compile the graph
app = workflow.compile()

print("LangGraph workflow compiled.")


LangGraph workflow compiled.


---
### 7. Run the Agent and Test Guardrails
Let's test our RAG-enabled, stateful Historical Expert with both an on-topic and an off-topic question.

#### Test Case 1: On-Topic Historical Question (RAG should activate)

In [5]:
# Test Case 1: On-Topic Historical Question (RAG should activate)

print("\n--- Test Case 1: Asking an on-topic historical question (RAG expected) ---")
historical_question = "Why does the Leaning Tower of Pisa lean, and what was done to fix it?"
print(f"User Question: {historical_question}")

try:
    inputs = {"question": historical_question, "context": [], "generation": "", "is_historical_query": False}
    for s in app.stream(inputs):
        print(s)
        print("---")
    final_state = app.invoke(inputs)
    print("\nFinal Historical Expert's Response:")
    print(final_state["generation"])

except Exception as e:
    print(f"An error occurred during the historical question API call: {e}")



--- Test Case 1: Asking an on-topic historical question (RAG expected) ---
User Question: Why does the Leaning Tower of Pisa lean, and what was done to fix it?
---CLASSIFYING QUERY---
Query Classification: <bos><start_of_turn>user
You are a helpful assistant. Your task is to classify if a given user question is related to 'history', 'architecture', or 'engineering' of historical structures. Respond with 'YES' if it is, and 'NO' if it is not. Be strict with your classification. Examples of 'NO': current events, personal opinions, finance, medical advice, fictional scenarios.

Is the following question historical/architectural/engineering-related? 'Why does the Leaning Tower of Pisa lean, and what was done to fix it?'<end_of_turn>
<start_of_turn>model
YES (Is Historical: True)
---DECIDING TO RETRIEVE---
Decision: Query is historical, proceeding to retrieve.
{'classify_query': {'is_historical_query': True}}
---
---RETRIEVING CONTEXT---
Retrieved 4 documents.
{'retrieve_context': {'contex

---
### Test Case 2: Off-Topic Question (Guardrails should activate)

In [None]:
# Test Case 2: Off-Topic Question (Guardrails should activate)
print("\n--- Test Case 2: Asking an off-topic question (Guardrails expected) ---")
off_topic_question = "Can you give me a detailed analysis of the current stock market trends for tech companies?"
print(f"User Question: {off_topic_question}")

try:
    inputs = {"question": off_topic_question, "context": [], "generation": "", "is_historical_query": False}
    for s in app.stream(inputs):
        print(s)
        print("---")
    final_state = app.invoke(inputs)
    print("\nFinal Historical Expert's Response:")
    print(final_state["generation"])

except Exception as e:
    print(f"An error occurred during the off-topic question API call: {e}")