### LangChain Vectorstore RAG Implementation
---
This notebook demonstrates a Retrieval-Augmented Generation (RAG) system using LangChain with local models via Ollama. The implementation follows a multi-step reasoning process:

1. **Setup**: Loads two Ollama models (phi4-mini for reasoning and Gemma3:1b for synthesis) to handle different parts of the process.

2. **Question Analysis**: Uses the reasoning model to break down complex questions into logical sub-steps that can be individually researched.

3. **Document Processing**: Loads a local text corpus about space exploration, splits it into manageable chunks, and creates vector embeddings using the nomic-embed-text model.

4. **Knowledge Retrieval**: For each identified reasoning step, performs a similarity search in the Chroma vectorstore to find the most relevant information from the knowledge base.

5. **Answer Synthesis**: Feeds the original question and all retrieved contextual information to the synthesis model, which generates a cohesive, factual response.

This approach enhances the quality of AI-generated answers by combining structured reasoning with targeted information retrieval from a domain-specific knowledge base, allowing for more accurate and contextually relevant responses than using an LLM alone.

In [1]:
# Install the required libraries from requirements.txt 
# pip install -r requirements.txt

In [2]:
try:
    # Import the built-in regular expressions module for pattern matching and text processing
    import re

    # Import the Ollama LLM class from the LangChain community package (often used for integrating local LLMs)
    from langchain_community.llms import Ollama

    # Import the PromptTemplate class used to define and structure prompts for LLMs
    from langchain.prompts import PromptTemplate

    # Import RunnableMap, a utility for composing and executing a sequence of runnable components
    from langchain_core.runnables import RunnableMap

    # Import Chroma vector store, used for storing and searching vector embeddings (RAG retrieval)
    from langchain.vectorstores import Chroma

    # Import Ollama-specific embeddings and LLM classes for use with LangChain
    from langchain_ollama import OllamaEmbeddings, OllamaLLM

    # Import TextLoader to load plain text documents from files for processing
    from langchain.document_loaders import TextLoader

    # Import CharacterTextSplitter to split large documents into smaller chunks based on character count
    from langchain.text_splitter import CharacterTextSplitter

except ImportError as e:
    print(f"Import error: {e}")
except Exception as e:
    print(f"Unexpected error during imports: {e}")


In [3]:
# --- Step 1: Ensure Ollama models are available ---
# Attempt to load the reasoning and synthesis LLM models from Ollama.
# If the models are not available, provide instructions to the user and exit.
try:
    reasoning_llm = OllamaLLM(model="phi4-mini")
    synthesis_llm = OllamaLLM(model="Gemma3:1b")
except Exception as e:
    print("❌ Failed to connect to Ollama or load model 'phi4-mini'.")
    print("💡 Make sure Ollama is running and the model is available:")
    print("    ollama run phi4-mini")
    print(f"Error details: {e}")
    exit(1)

In [4]:
# --- Step 2: Prompt to break down the question ---
# Define a prompt template to break down a question into logical steps.
# This uses the reasoning LLM to generate a step-by-step breakdown.
reasoning_prompt = PromptTemplate.from_template("""
You are a reasoning assistant. Break the following question into logical steps to help answer it:

Question: {question}

Step-by-step breakdown:
""")
step_chain = reasoning_prompt | reasoning_llm

In [5]:
# --- Step 3: Set up vectorstore with Chroma ---
# Load a local text corpus using TextLoader.
loader = TextLoader("C:\Python\Agent-School\docs\Space.txt")  # Load your local corpus
documents = loader.load()

# Split the documents into smaller chunks for better processing.
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
split_docs = text_splitter.split_documents(documents)

# Set up embeddings using the Ollama 'nomic-embed-text' model.
embeddings = OllamaEmbeddings(model="nomic-embed-text")

# Create a vectorstore (Chroma) using the split documents and embeddings.
vectorstore = Chroma.from_documents(split_docs, embeddings)

Created a chunk of size 505, which is longer than the specified 500


In [None]:
# --- Step 4: Prompt for final synthesis ---
# Define a prompt template for synthesizing a final answer.
# This uses the synthesis LLM to generate a complete and informative response.
# Remember - this is just a template, with {question} and {facts} as placeholders which are populated later.

synthesis_prompt = PromptTemplate.from_template("""
Based on the following question and information, write a complete, informative answer.

Question: {question}

Information:
{facts}

Answer:
""")
synthesis_chain = synthesis_prompt | synthesis_llm

In [7]:
# --- Step 5: Ask a question ---
# Define the question to be answered and invoke the reasoning chain to get step-by-step reasoning.
question = "Who was the U.S. president during the moon landing, and what was his policy on space exploration?"
steps_text = step_chain.invoke({"question": question})

# Print the reasoning steps generated by the LLM.
print("\n🧠 Reasoning steps:\n")
print(steps_text)


🧠 Reasoning steps:

To break down this question logically for answering "Who was the U.S. president during the Moon Landing?" Follow these step-by-step reasoning actions.

1. **Identify the event's date**: The first manned mission to land on the moon, Apollo 11, occurred in July 1969.
2. **Determine who is associated with this historical milestone as President of the United States (POTUS) during that time period.**

Given these steps:

- John F. Kennedy was not president when Apollo 11 landed; he served from January 20, 1961 to November 22, 1963 and died in office.
- Lyndon B. Johnson succeeded him on October 27, 1963 but resigned later that year.

Therefore,

- Richard Nixon became POTUS after winning the election in September 1968 (he was inaugurated as president just over two months before Apollo 11).

2. **Identify his stance or policy regarding space exploration**:

To find out what President Richard Nixon's policies on Space Exploration were, we can look at some of key historica

In [8]:
# --- Step 6: Parse reasoning steps ---
# Extract individual reasoning steps from the generated text using regex.
step_lines = re.findall(r"\d+\.\s+(.*)", steps_text)
facts = []

In [9]:
# --- Step 7: Lookup each reasoning step with vectorstore ---
# For each reasoning step, perform a similarity search in the vectorstore.
for step in step_lines:
    print(f"\n🔍 Looking up: {step}")
    
    # Retrieve the most relevant document from the vectorstore.
    docs = vectorstore.similarity_search(step, k=1)
    result = docs[0].page_content if docs else "No relevant info found in local knowledge base."
    
    # Append the result to the list of facts.
    facts.append(f"- {step.strip()}: {result}")

# Combine all retrieved facts into a single string.
combined_facts = "\n".join(facts)

# Print the retrieved facts.
print("\n📚 Retrieved facts from Vectorstore:\n")
print(combined_facts)


🔍 Looking up: **Identify the event's date**: The first manned mission to land on the moon, Apollo 11, occurred in July 1969.

🔍 Looking up: **Determine who is associated with this historical milestone as President of the United States (POTUS) during that time period.**

🔍 Looking up: **Determine who is associated with this historical milestone as President of the United States (POTUS) during that time period.**

🔍 Looking up: **Identify his stance or policy regarding space exploration**:

🔍 Looking up: **Identify Nixon's role as POTUS during the period of significant decisions related to U.S. space programs**:

🔍 Looking up: **Look for major policies or directives issued by President Nixon concerning Space Exploration post-moon landing:**

📚 Retrieved facts from Vectorstore:

- **Identify the event's date**: The first manned mission to land on the moon, Apollo 11, occurred in July 1969.: ### Nixon's Role in the Moon Landing
When Nixon took office in January 1969, the Apollo program wa

In [10]:
# --- Step 8: Summarize using second LLM ---
# Use the synthesis chain to generate a final answer based on the question and retrieved facts.
final_answer = synthesis_chain.invoke({
    "question": question,
    "facts": combined_facts
})

# Print the final synthesized answer.
print("\n✅ Final synthesized answer:\n")
print(final_answer)


✅ Final synthesized answer:

During the U.S. presidency of Richard Nixon, the U.S. president was **Richard Nixon**. He oversaw the Apollo 11 mission, which occurred on July 20, 1969, marking the first manned landing on the moon. While Kennedy initially spearheaded the space race, Nixon prioritized a more pragmatic approach to space exploration, significantly reducing the Apollo program’s funding and scale after Apollo 17 in 1972, reflecting a balance between scientific advancements and other national priorities.
