# Vector Store & Reasoning Query 

This code takes the information from a Wikipedia page and stores it in a ChromaDB Vector Store. It then uses a reasoning model along with a LLM to illustrate how the agent queries the vector store for information

#### Installing Required Libraries

This command installs the necessary Python libraries for your project:

1. **ChromaDB:**
   - A vector-based database used for managing embeddings and performing similarity searches.
   - Enables efficient storage and querying of vectorized data.

2. **Wikipedia-API:**
   - A Python wrapper for accessing Wikipedia's content.
   - Facilitates fetching pages, summaries, and text directly from Wikipedia.

To execute this installation, run the following command:
```bash
!pip install chromadb wikipedia-api
```

Ensure your environment has internet access and permissions to install packages. Once installed, you can utilize these libraries for tasks like querying vector databases and retrieving content from Wikipedia.

In [1]:
# !pip install chromadb wikipedia-api

#### Fetching and Displaying Wikipedia Page Content

This code utilizes the `wikipediaapi` library to interact with Wikipedia and retrieve information about a specific page.

**Key Steps:**
1. **Initialize Wikipedia API Client:**
   - Sets the desired language (e.g., English) and specifies a user agent for identification purposes.

2. **Retrieve a Wikipedia Page:**
   - Fetches the content of the specified Wikipedia page using its title, in this case, "Climate change."

3. **Check Page Existence:**
   - Verifies if the page exists to avoid errors and proceed safely.

4. **Display Page Information:**
   - Prints key information about the page:
     - **Title:** The name of the page.
     - **Summary:** A brief overview of the page content.
     - **Text:** The full text of the page.

5. **Handle Missing Pages:**
   - Outputs a message if the page does not exist to inform the user gracefully.

In [2]:
import wikipediaapi

# Initialize Wikipedia API
wiki_wiki = wikipediaapi.Wikipedia(
    language='en',  # Specify language (e.g., 'en' for English)
    user_agent='Agent-School LangChain Test Project'  # Add your user agent
)

# Fetch a page
page = wiki_wiki.page("Climate change")
if page.exists():
    print("Page Title:", page.title)
    print("Page Summary:", page.summary)
    print("Page Text:", page.text)
else:
    print("Page does not exist.")

Page Title: Climate change
Page Summary: Present-day climate change includes both global warming—the ongoing increase in global average temperature—and its wider effects on Earth’s climate system. Climate change in a broader sense also includes previous long-term changes to Earth's climate. The current rise in global temperatures is driven by human activities, especially fossil fuel burning since the Industrial Revolution. Fossil fuel use, deforestation, and some agricultural and industrial practices release greenhouse gases. These gases absorb some of the heat that the Earth radiates after it warms from sunlight, warming the lower atmosphere. Carbon dioxide, the primary gas driving global warming, has increased in concentration by about 50% since the pre-industrial era to levels not seen for millions of years.
Climate change has an increasingly large impact on the environment. Deserts are expanding, while heat waves and wildfires are becoming more common. Amplified warming in the Arct

#### Adding Wikipedia Content to a ChromaDB Collection

This code utilizes ChromaDB, a vector-based database, to store and manage embeddings of Wikipedia content for efficient querying.

**Key Steps:**
1. **Import ChromaDB Module:**
   - Load the `chromadb` library to interact with the database.

2. **Initialize ChromaDB Client:**
   - Create a client instance to connect to ChromaDB and perform database operations.

3. **Create a New Collection:**
   - Define a collection named `Wikipedia_Collection` to group related documents and their embeddings.

4. **Add Wikipedia Content:**
   - Store the full text of a Wikipedia page (assumed to be predefined as `page.text`) within the collection.
   - Assign a unique identifier (`ids=["climate_change"]`) to this document for reference and retrieval.

5. **Print Confirmation:**
   - Display a message to confirm that the data has been successfully added to ChromaDB.

In [3]:
# Import the ChromaDB module for managing vector databases
import chromadb

# Initialize ChromaDB client
# This client enables interaction with ChromaDB, a vector-based database that allows storing and querying embeddings.
chroma_client = chromadb.Client()

# Create a new collection within ChromaDB
# Collections are logical groups that store related documents and their embeddings for efficient retrieval.
collection = chroma_client.create_collection(name="Wikipedia_Collection")

# Add Wikipedia content to the collection
# This step adds documents to the collection. The document embeddings are stored alongside unique identifiers.
collection.add(
    documents=[page.text],  # 'page.text' represents the full text of a Wikipedia page (assuming 'page' is predefined elsewhere).
    ids=["climate_change"]  # 'ids' assigns a unique identifier to the document for retrieval purposes.
)

# Print a confirmation message after adding data to the collection
print("Data added to ChromaDB!")

Data added to ChromaDB!


#### Querying a ChromaDB Collection for Relevant Results

This code demonstrates how to perform a similarity search in ChromaDB to retrieve the most relevant documents based on a given query.

**Key Steps:**
1. **Query ChromaDB Collection:**
   - The `collection.query` method searches the database using the query text `"What are the effects of climate change?"`.
   - The search leverages embeddings to find documents most similar to the query.

2. **Specify the Number of Results:**
   - The `n_results` parameter determines how many top-ranking documents are returned (in this case, 1).

3. **Retrieve Query Results:**
   - The query returns a dictionary containing:
     - Relevant documents.
     - Metadata associated with the search results.

4. **Display Results:**
   - Prints the query results, allowing the user to review the retrieved information.

In [4]:
# Query the ChromaDB collection
# This step queries the specified collection within ChromaDB to retrieve documents that match the input text or query.
# The database performs a similarity search based on embeddings to return the most relevant results.

results = collection.query(
    query_texts=["What are the effects of climate change?"],  # The query text to search for relevant documents.
    n_results=1  # Specifies the number of top results to return based on similarity ranking.
)

# Print the results of the query
# The query returns a dictionary containing relevant documents and metadata, which are displayed here for review.
print("Query Results:", results)




#### Workflow to Process Questions with Reasoning, Database Queries, and Summarization

This code integrates LangChain and ChromaDB to create a workflow for answering complex questions by leveraging reasoning, database querying, and summarization.

**Key Steps:**
1. **Initialize ChromaDB:**
   - Sets up a client for ChromaDB to manage document collections and perform similarity searches. A pre-populated collection is assumed.

2. **Define Language Models (LLMs):**
   - Utilizes two `OllamaLLM` models:
     - `phi4-mini`: Generates reasoning to query the database.
     - `Gemma3:1b`: Summarizes the findings from database results.

3. **Create Prompt Templates:**
   - `reasoning_prompt`: Guides the reasoning LLM to explain the thought process behind the database query.
   - `summarizing_prompt`: Directs the summarization LLM to condense database results into concise findings.

4. **ChromaDB Query Function:**
   - Implements a function (`query_chromadb`) to search the database with the generated reasoning and return relevant documents.

5. **Define Reasoning and Summarization Chains:**
   - Combines the LLMs and prompt templates into two processing chains:
     - Reasoning Chain: Generates reasoning text from the input question.
     - Summarization Chain: Produces a summary from the database results.

6. **Custom Workflow:**
   - The `workflow` function orchestrates the entire process:
     - Step 1: Generate reasoning for querying the database.
     - Step 2: Query ChromaDB using the generated reasoning.
     - Step 3: Summarize the query results into a final output.

7. **Example Execution:**
   - Executes the workflow with the question *"What are the implications of climate change on biodiversity?"* and outputs the summarized findings.

In [5]:
# Import necessary modules
from langchain.prompts import PromptTemplate
from langchain_ollama import OllamaLLM
from langchain.chains import LLMChain, SequentialChain

# Import ChromaDB
import chromadb

# Initialize ChromaDB
chroma_client = chromadb.Client()
# collection = chroma_client.create_collection(name="Wikipedia_Collection")  # Ensure your ChromaDB is pre-populated

# Define the LLMs
reasoning_llm = OllamaLLM(model="phi4-mini")
summarizing_llm = OllamaLLM(model="Gemma3:1b")

# Create a reasoning prompt template
reasoning_prompt = PromptTemplate(
    template="Based on the question: '{question}', explain the reasoning process to query the database.",
    input_variables=["question"]
)

# Define the ChromaDB query function
def query_chromadb(reasoning):
    try:
        results = collection.query(
            query_texts=[reasoning],
            n_results=2
        )
        return " ".join([" ".join(doc) if isinstance(doc, list) else doc for doc in results['documents']])
    except Exception as e:
        return f"Error querying ChromaDB: {e}"


# Create a summarizing prompt template
summarizing_prompt = PromptTemplate(
    template="Summarize the findings from the database results: '{db_results}'",
    input_variables=["db_results"]
)

# Define the reasoning chain
reasoning_chain = reasoning_prompt | reasoning_llm

# Define the summarization chain
summarizing_chain = summarizing_prompt | summarizing_llm

# Combine everything in a custom workflow
def workflow(question):
    """
    Executes a custom workflow to reason, query a database, and summarize results.
    """
    # Step 1: Use the reasoning LLM to generate reasoning text
    reasoning = reasoning_chain.invoke({"question": question})
    
    # Step 2: Query ChromaDB with the reasoning text
    db_results = query_chromadb(reasoning)
    
    # Step 3: Summarize the results from ChromaDB using the summarization LLM
    summary = summarizing_chain.invoke({"db_results": db_results})
    
    return summary

# Example execution
question = "What are the implications of climate change on biodiversity?"
final_summary = workflow(question)
print("Final Summary:", final_summary)

Final Summary: Okay, here’s a breakdown of the key points from the text, organized into categories for clarity:

**1. Historical Origins of the Climate Concern:**

* **Early Observations:** The idea of Earth's temperature fluctuations dates back to ancient times. Early scientists observed the Sun’s influence on Earth’s temperature.
* **Fourier's Contributions:** Joseph Fourier proposed that the Earth's temperature is higher than the Sun’s energy due to atmospheric heat.
* **Newton Foote's Discovery:** Eunice Newton Foote demonstrated that the warming effect of the sun is greater for air with water vapour than for dry air.
* **Tyndall’s Work:** John Tyndall proposed that changes in atmospheric gases could influence climate patterns, including ice ages.

**2. The Rise of Scientific Understanding & Modeling:**

* **Early 19th Century:**  Scientists like Alexander von Humboldt began to recognize the effects of climate change.
* **Fourier’s Model:** Fourier’s model of greenhouse gases helpe