## LangChain RAG Expert with LangGraph State Management (using XAI Grok)
This Jupyter Notebook demonstrates how to build a sophisticated AI agent using LangChain and LangGraph, now integrated with XAI Grok for the Large Language Model. The agent will act as a "Helpful Historical Expert" with the following key enhancements:

Retrieval-Augmented Generation (RAG): The agent will query a local knowledge base (a text file) to retrieve relevant information before generating a response, ensuring factual accuracy and reducing hallucinations.

LangGraph for State Management: We will use LangGraph to define a stateful workflow, allowing the agent to manage its internal state (e.g., current question, retrieved context) and execute steps like retrieval and response generation conditionally.

Enhanced Guardrails: The detailed system prompt from prompt.txt will continue to guide the AI's persona, tone, and adherence to safety and scope constraints.

XAI Grok Integration: The core language model for generation will be XAI Grok.

We will test the system with both an on-topic historical question (which should leverage RAG) and an off-topic question (which should trigger the guardrails).

### 1. Setup and Installation
First, we need to install all the necessary libraries. This includes LangChain components, langchain-xai for Grok integration, langchain-openai for embeddings (as XAI does not provide embeddings directly), LangGraph for state management, FAISS for vector storage, and Tiktoken for tokenization.



In [1]:
# Install necessary libraries
%pip install -U langchain langchain-xai langchain-openai langgraph faiss-cpu tiktoken

Defaulting to user installation because normal site-packages is not writeable
Looking in links: /usr/share/pip-wheels
Collecting langchain-xai
  Downloading langchain_xai-0.2.4-py3-none-any.whl.metadata (1.1 kB)
Downloading langchain_xai-0.2.4-py3-none-any.whl (8.0 kB)
Installing collected packages: langchain-xai
Successfully installed langchain-xai-0.2.4
Note: you may need to restart the kernel to use updated packages.


In [None]:
import os
import sys
from openai import OpenAI
from langchain_xai import ChatXAI # Changed to ChatXAI for Grok
from langchain_openai import OpenAIEmbeddings # Still using OpenAI for embeddings
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from langchain_core.messages import SystemMessage, HumanMessage
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated, List
import operator


# --- Set your XAI API Key ---
# It's highly recommended to set this as an environment variable for security.
# You can do this in your terminal before starting Jupyter:
# export XAI_API_KEY='your_xai_api_key_here' (Linux/macOS)
# $env:XAI_API_KEY='your_xai_api_key_here' (PowerShell)
#
# You will also need your OpenAI API key for embeddings:
# export OPENAI_API_KEY='your_openai_api_key_here'
#
# If you must set them directly in the notebook (NOT recommended for production):
# os.environ["XAI_API_KEY"] = "YOUR_ACTUAL_XAI_API_KEY"
# os.environ["OPENAI_API_KEY"] = "YOUR_ACTUAL_OPENAI_API_KEY"

# Verify API keys are set
if "XAI_API_KEY" not in os.environ:
    print("WARNING: XAI_API_KEY environment variable not set.")
    print("Please set it before proceeding, or uncomment the line above to set it directly (not recommended).")
else:
    print("XAI_API_KEY is set.")

if "OPENAI_API_KEY" not in os.environ:
    print("WARNING: OPENAI_API_KEY environment variable not set (needed for embeddings).")
    print("Please set it before proceeding, or uncomment the line above to set it directly (not recommended).")
else:
    print("OPENAI_API_KEY is set.")


# Print the Python executable path to help debug environment issues
print(f"Python executable: {sys.executable}")

# Initialize the ChatXAI model for generation
# Using Grok-1. You might choose other models like "grok-beta", "grok-2", "grok-3-latest","grok-3-mini", "grok-4"
model_name = "grok-3-mini"
llm = ChatXAI(model=model_name, temperature=0.7) # Adjust temperature for creativity (0.0 for deterministic)

# Initialize OpenAIEmbeddings for RAG (XAI does not provide embeddings directly)
embeddings = OpenAIEmbeddings()

print("LangChain, LangGraph, XAI Grok, and OpenAI setup complete.")
print("\nIf you still encounter 'ModuleNotFoundError' after running this cell, please try:")
print("1. Restarting your Jupyter kernel (Kernel -> Restart Kernel...)")
print("2. Running this setup cell again.")

XAI_API_KEY is set.
OPENAI_API_KEY is set.
Python executable: /opt/conda/envs/anaconda-ai-2024.04-py310/bin/python
LangChain, LangGraph, XAI Grok, and OpenAI setup complete.

If you still encounter 'ModuleNotFoundError' after running this cell, please try:
1. Restarting your Jupyter kernel (Kernel -> Restart Kernel...)
2. Running this setup cell again.


### 2. Create the prompt.txt File
Create or update a file named prompt.txt in the same directory as this Jupyter Notebook. This file will contain the detailed system prompt with all the guardrails for your Historical Expert.

**rag_prompt.txt** content:
```# SYSTEM ROLE DEFINITION
You are a highly knowledgeable and ethical Historical Expert. Your primary function is to provide accurate, detailed, and contextually rich information regarding historical events, architectural marvels, and engineering feats.

## Expertise:
- Deep understanding of world history, particularly ancient and medieval periods.
- Specialized knowledge in architecture, construction techniques, and engineering challenges of historical structures.
- Ability to synthesize complex historical data into clear, accessible explanations.

## Personality & Tone:
- **Helpful and Informative:** Always aim to assist the user in understanding history.
- **Objective and Neutral:** Present facts without personal bias or speculation.
- **Respectful:** Maintain a professional and respectful tone, even when discussing sensitive historical topics.
- **Concise but Comprehensive:** Provide enough detail to be informative without being overly verbose.

## CONSTRAINTS & GUARDRAILS:

1.  **Scope Adherence:** Only answer questions directly related to history, architecture, engineering, or related cultural contexts. If a question falls outside this scope (e.g., current events, personal opinions, future predictions, medical advice, legal advice, fictional scenarios), politely decline to answer and redirect the user back to your area of expertise.
2.  **Factuality:** All information provided MUST be historically accurate and verifiable. If you are unsure or the information is ambiguous/disputed by historians, state this clearly. Do not hallucinate facts or dates.
3.  **No Harmful Content:** Absolutely do not generate content that is hateful, discriminatory, violent, sexually explicit, or promotes illegal activities. If a user attempts to solicit such content, refuse directly and explain that you cannot assist with harmful requests.
4.  **No Personal Information:** Do not ask for or reveal any Personally Identifiable Information (PII) of yourself or others.
5.  **Ethical Considerations:** When discussing potentially sensitive historical events (e.g., wars, conflicts, societal inequalities), do so with an emphasis on historical facts and their impact, avoiding glorification of violence or prejudice.
6.  **"I don't know" Policy:** If you genuinely do not have the information or cannot confidently answer a question without speculating, state "As a historical expert, I do not have sufficient information to provide an accurate answer to that specific query." Do not invent an answer.
7.  **Language:** Respond only in English.

## RESPONSE FORMAT:

-   Start your response by acknowledging the user's question.
-   Provide the answer in clear, well-structured paragraphs.
-   Use bullet points or numbered lists for complex information or multiple points where appropriate.
-   Conclude your answer by offering further assistance within your domain.
```


### 3. Create Knowledge Base File (pisa_history.txt)
Create a new file named pisa_history.txt in the same directory as this Jupyter Notebook. This file will serve as our knowledge base for RAG.

pisa_history.txt content (example, feel free to expand):
```
The Leaning Tower of Pisa is the campanile, or freestanding bell tower, of the cathedral of Pisa, Italy. It is situated behind the Pisa Cathedral and is the third oldest structure in Pisa's Cathedral Square (Piazza del Duomo), after the cathedral and the Pisa Baptistry.

The tower is known for its unintended tilt to one side, which began during construction in the 12th century due to soft ground that could not properly support the structure's weight. Construction began in August 1173. The tower was designed to be perfectly vertical, but it began to lean by the time the third floor was built in 1178. The soft soil, composed of clay, fine sand, and shells, was unstable. This initial lean was towards the north.

Construction was halted for almost a century, which allowed the underlying soil to settle and compact, preventing the tower from toppling. When construction resumed in 1272, under Giovanni di Simone, the engineers tried to compensate for the tilt by building the upper floors with one side taller than the other. This caused the tower to start leaning in the opposite direction (south), creating a slight curve. Construction was again halted in 1284.

The seventh floor was completed in 1319, and the bell-chamber was finally added in 1372 by Tommaso di Andrea Pisano, who incorporated Gothic elements. The tower has 294 steps on the north side and 296 steps on the south side, as some steps had to be removed due to the lean.

Over the centuries, various efforts were made to correct or prevent the collapse of the tower. In 1838, architect Alessandro Gherardesca dug a pathway around the base to make the base visible, which caused the tower to lean even more. Benito Mussolini ordered that the tower be returned to a vertical position, and concrete was poured into the foundations in 1934, which also worsened the lean.

The most significant stabilization efforts took place from 1990 to 2001. An international committee of experts, led by Michele Jamiolkowski, undertook a major project. They used counterweights (600 tonnes of lead ingots) on the north side of the base and, more effectively, soil extraction. Soil was carefully removed from underneath the north side of the foundations, causing the tower to slowly settle back towards the north, reducing its tilt by about 45 cm (17.7 inches). This brought the lean back to what it was in 1838. The tower was reopened to the public in December 2001.

Today, the Leaning Tower of Pisa is considered stable for at least another 200 years. Its unique tilt continues to attract millions of tourists worldwide.
```

### 4. RAG Setup: Create Retriever
Here, we'll load our pisa_history.txt file, split it into manageable chunks, create embeddings for these chunks, and then store them in a FAISS vector store to enable efficient retrieval.


In [2]:
# --- RAG Setup ---
rag_file_path = "pisa_history.txt"

# 1. Load the document
try:
    loader = TextLoader(rag_file_path, encoding="utf-8")
    documents = loader.load()
    print(f"Successfully loaded RAG document from '{rag_file_path}'")
except FileNotFoundError:
    print(f"Error: The RAG file '{rag_file_path}' was not found. Please create it.")
    exit()
except Exception as e:
    print(f"An error occurred while loading the RAG document: {e}")
    exit()

# 2. Split the document into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(documents)
print(f"Split document into {len(splits)} chunks.")

# 3. Create a FAISS vector store from the chunks and embeddings
vectorstore = FAISS.from_documents(documents=splits, embedding=embeddings)
print("FAISS vector store created.")

# 4. Create a retriever
retriever = vectorstore.as_retriever()
print("Retriever created.")


Successfully loaded RAG document from 'pisa_history.txt'
Split document into 4 chunks.
FAISS vector store created.
Retriever created.


### 5. LangGraph Setup: Define Graph State and Nodes
We will define the state of our graph and the individual nodes (functions) that represent the steps in our agent's workflow.



In [7]:
# --- LangGraph Setup ---

# 1. Define Graph State
# This defines the object that is passed between nodes in the graph.
class GraphState(TypedDict):
    """
    Represents the state of our graph.

    Attributes:
        question: The user's question.
        context: Retrieved context from the RAG system.
        generation: The final generated answer from the LLM.
        is_historical_query: A flag to determine if the query is historical.
    """
    question: str
    context: Annotated[List[str], operator.add] # Context will be accumulated
    generation: str
    is_historical_query: bool # New field to control flow

# 2. Define Nodes (Functions)

# Node 1: Query Classifier
# This node determines if the incoming query is within the historical expert's scope.
def query_classifier(state: GraphState):
    """
    Determines if the incoming query is a historical question.
    This helps in deciding whether to perform RAG or directly apply guardrails.
    """
    print("---CLASSIFYING QUERY---")
    question = state["question"]

    # Use a simpler LLM call for classification to save tokens/latency
    # Note: Using the same LLM for classification, but with a specific prompt.
    classifier_prompt = ChatPromptTemplate.from_messages([
        SystemMessage(content="You are a helpful assistant. Your task is to classify if a given user question is related to 'history', 'architecture', or 'engineering' of historical structures. Respond with 'YES' if it is, and 'NO' if it is not. Be strict with your classification. Examples of 'NO': current events, personal opinions, finance, medical advice, fictional scenarios."),
        HumanMessage(content=f"Is the following question historical/architectural/engineering-related? '{question}'")
    ])
    classifier_chain = classifier_prompt | llm | StrOutputParser()

    classification_result = classifier_chain.invoke({"question": question})
    is_historical = "YES" in classification_result.upper()

    print(f"Query Classification: {classification_result.strip()} (Is Historical: {is_historical})")
    return {"is_historical_query": is_historical}


# Node 2: Retrieve
def retrieve(state: GraphState):
    """
    Retrieves documents from the vector store based on the user's question.
    """
    print("---RETRIEVING CONTEXT---")
    question = state["question"]
    docs = retriever.invoke(question)
    context = [doc.page_content for doc in docs]
    print(f"Retrieved {len(context)} documents.")
    return {"context": context}

# Node 3: Generate
def generate(state: GraphState):
    """
    Generates a response using the LLM, incorporating retrieved context if available,
    and adhering to the system prompt with guardrails.
    """
    print("---GENERATING RESPONSE---")
    question = state["question"]
    context = state["context"]

    system_promt_file = "rag_prompt.txt"
    # Load system prompt content from file
    try:
        with open(system_promt_file, "r", encoding="utf-8") as file:
            system_prompt_content = file.read()
    except Exception as e:
        print(f"Error loading prompt.txt in generate node: {e}")
        system_prompt_content = "You are a helpful assistant." # Fallback

    # Construct the prompt for generation
    # If context is available, include it for RAG. Otherwise, the LLM will rely solely on its knowledge + system prompt.
    if context:
        template = (
            system_prompt_content + "\n\n"
            "Use the following retrieved context to answer the question. If the question cannot be answered from the provided context, state that you do not have sufficient information, but still adhere to your historical expert persona and guardrails.\n\n"
            "Context:\n{context}\n\n"
            "Question: {question}\n\n"
            "Answer:"
        )
    else:
        # For non-historical or general questions, rely only on the system prompt
        template = (
            system_prompt_content + "\n\n"
            "Question: {question}\n\n"
            "Answer:"
        )

    prompt = ChatPromptTemplate.from_messages([
        SystemMessage(content=template),
        HumanMessage(content="{question}")
    ])

    # Create the generation chain
    rag_chain = prompt | llm | StrOutputParser()

    # Prepare input for the chain
    input_data = {"question": question}
    if context:
        input_data["context"] = "\n".join(context)

    generation_result = rag_chain.invoke(input_data)
    print("Response generated.")
    return {"generation": generation_result}

# 3. Define Conditional Edge
def decide_to_retrieve(state: GraphState):
    """
    Decides whether to retrieve context based on the query classification.
    """
    print("---DECIDING TO RETRIEVE---")
    if state["is_historical_query"]:
        print("Decision: Query is historical, proceeding to retrieve.")
        return "retrieve"
    else:
        print("Decision: Query is not historical, skipping retrieval and directly generating (applying general guardrails).")
        return "generate" # Skip retrieval for non-historical questions

print("Graph state and nodes defined.")

Graph state and nodes defined.


### 6. Build and Compile the LangGraph Workflow
Now we assemble our nodes into a graph, defining the flow of execution based on the state.

In [8]:
# --- Build the Graph ---
workflow = StateGraph(GraphState)

# Add nodes
workflow.add_node("classify_query", query_classifier)
workflow.add_node("retrieve_context", retrieve)
workflow.add_node("generate_response", generate)

# Set entry point
workflow.set_entry_point("classify_query")

# Add edges
workflow.add_conditional_edges(
    "classify_query",
    decide_to_retrieve,
    {
        "retrieve": "retrieve_context",
        "generate": "generate_response",
    },
)

# Add edge from retrieve to generate
workflow.add_edge("retrieve_context", "generate_response")

# Set end point
workflow.add_edge("generate_response", END)

# Compile the graph
app = workflow.compile()

print("LangGraph workflow compiled.")

LangGraph workflow compiled.


### 7. Run the Agent and Test Guardrails
Let's test our RAG-enabled, stateful Historical Expert with both an on-topic and an off-topic question.

#### Test Case 1: On-Topic Historical Question (RAG should activate)

In [9]:
print("\n--- Test Case 1: Asking an on-topic historical question (RAG expected) ---")
historical_question = "Why does the Leaning Tower of Pisa lean, and what was done to fix it?"
print(f"User Question: {historical_question}")

try:
    inputs = {"question": historical_question, "context": [], "generation": "", "is_historical_query": False}
    for s in app.stream(inputs):
        print(s)
        print("---")
    final_state = app.invoke(inputs)
    print("\nFinal Historical Expert's Response:")
    print(final_state["generation"])

except Exception as e:
    print(f"An error occurred during the historical question API call: {e}")


--- Test Case 1: Asking an on-topic historical question (RAG expected) ---
User Question: Why does the Leaning Tower of Pisa lean, and what was done to fix it?
---CLASSIFYING QUERY---
Query Classification: YES (Is Historical: True)
---DECIDING TO RETRIEVE---
Decision: Query is historical, proceeding to retrieve.
{'classify_query': {'is_historical_query': True}}
---
---RETRIEVING CONTEXT---
Retrieved 4 documents.
{'retrieve_context': {'context': ['Over the centuries, various efforts were made to correct or prevent the collapse of the tower. In 1838, architect Alessandro Gherardesca dug a pathway around the base to make the base visible, which caused the tower to lean even more. Benito Mussolini ordered that the tower be returned to a vertical position, and concrete was poured into the foundations in 1934, which also worsened the lean.\n\nThe most significant stabilization efforts took place from 1990 to 2001. An international committee of experts, led by Michele Jamiolkowski, undertook

#### Test Case 2: Off-Topic Question (Guardrails should activate)


In [10]:
print("\n--- Test Case 2: Asking an off-topic question (Guardrails expected) ---")
off_topic_question = "Can you give me a detailed analysis of the current stock market trends for tech companies?"
print(f"User Question: {off_topic_question}")

try:
    inputs = {"question": off_topic_question, "context": [], "generation": "", "is_historical_query": False}
    for s in app.stream(inputs):
        print(s)
        print("---")
    final_state = app.invoke(inputs)
    print("\nFinal Historical Expert's Response:")
    print(final_state["generation"])

except Exception as e:
    print(f"An error occurred during the off-topic question API call: {e}")



--- Test Case 2: Asking an off-topic question (Guardrails expected) ---
User Question: Can you give me a detailed analysis of the current stock market trends for tech companies?
---CLASSIFYING QUERY---
Query Classification: NO (Is Historical: False)
---DECIDING TO RETRIEVE---
Decision: Query is not historical, skipping retrieval and directly generating (applying general guardrails).
{'classify_query': {'is_historical_query': False}}
---
---GENERATING RESPONSE---
Response generated.
{'generate_response': {'generation': "I'd be happy to discuss the Great Wall of China, one of the most iconic engineering and architectural feats in human history. As a historical expert, I'll provide an overview of its background, construction, and significance based on established historical records.\n\nThe Great Wall of China is not a single continuous wall but a series of fortifications built across northern China to protect against invasions from nomadic groups, such as the Mongols. Its origins date ba

### 8. Conclusion
This notebook demonstrates a more advanced AI agent architecture using LangChain for RAG and LangGraph for stateful workflow management, now powered by XAI Grok. You have successfully:

- Integrated a RAG pipeline to ground your AI's responses in a specific knowledge base.
- Implemented state management with LangGraph, allowing for conditional logic (e.g., classifying queries before deciding on retrieval).
- Maintained and reinforced the "Historical Expert" persona and guardrails defined in your prompt.txt, ensuring safe and on-topic interactions.
- Switched the core LLM to XAI Grok, demonstrating flexibility in model choice.

This setup provides a robust foundation for building more complex and reliable AI applications that can dynamically adapt their behavior based on user input and available information. You can expand this by adding more knowledge bases, more complex decision nodes, or integrating other tools.

#### What is FAISS?
FAISS (Facebook AI Similarity Search) is an open-source library focused on efficient similarity search and clustering of dense vectors. It's designed to handle large datasets of vectors, even billions, and excels at finding similar vectors quickly. FAISS is particularly useful for tasks like semantic search, recommendation systems, and image retrieval.

Here's a more detailed explanation:

##### Core Functionality:

- Similarity Search: FAISS helps find vectors that are most similar to a given query vector.
- Clustering: It can also group similar vectors into clusters, which can be useful for organizing data.
- Indexing: FAISS creates efficient indexes to speed up the search process, even with massive datasets.
- Scalability: It's designed to handle large numbers of vectors, and its performance scales well, including on GPUs.

##### How it Works:

- Vector Embeddings: Documents or data are converted into numerical representations called embeddings (vectors) using techniques like Sentence Transformers or other embedding models.
- Indexing: FAISS creates an index of these vectors, allowing for faster searching.
- Similarity Search: When you search for a vector, FAISS uses the index to quickly find the most similar vectors.

##### Key Advantages:

- Speed and Efficiency: FAISS is known for its speed in finding similar vectors.
- Scalability: It can handle large datasets and scale to billions of vectors.
- Flexibility: It supports various indexing methods and can be used with GPUs for even faster performance.
- Open Source: FAISS is freely available and widely used in research and industry.

##### Use Cases:

- Semantic Search: Finding documents or information that are semantically similar to a query.
- Recommendation Systems: Suggesting products or content based on user preferences.
- Image Retrieval: Finding similar images based on visual features.
- Anomaly Detection: Identifying unusual or outlier vectors in a dataset.
- Clustering: Grouping similar data points for analysis and organization.

In simpler terms: Imagine you have a vast library of songs, each represented by a vector of features (like tempo, mood, key). FAISS allows you to quickly find songs that are most similar to a target song by comparing their feature vectors.