# Vector Store RAge - A Local AI-Powered Search System

The code builds a local AI-powered search system. It lets you ask questions about a collection of documents, and the AI will respond using only those documents to answer you.

You get a reliable, document-grounded answer to your question — all done locally using your own files and models.

The user_query field is where you enter the particular question you have

The final output you get can vary somewhat, likely caused by the small size of the models since you have to run them locally 

In [None]:
# Install dependencies

# Installs the `unstructured` package, which provides tools for parsing and extracting data from unstructured documents such as PDFs, text files, and images.
!pip install unstructured

# Installs `faiss-cpu`, a CPU-only version of FAISS (Facebook AI Similarity Search). FAISS is used for efficient similarity search and clustering of high-dimensional vectors, commonly used for tasks like information retrieval and recommendation systems.
!pip install faiss-cpu

# Installs `langchain`, an open-source framework for building applications that integrate with language models (such as GPT). It provides a unified interface for working with AI models, document loaders, and other tools in your workflows.
!pip install langchain

# Installs the `langchain_community` package, which contains additional community-contributed components for LangChain, including custom document loaders, retrievers, and other enhancements that are not part of the core LangChain library.
!pip install langchain_community

# Installs `langchain_ollama`, a package that enables integration with Ollama’s local models for language model-based tasks. It allows you to run models like `phi-4-mini` locally without the need for an API call.
!pip install langchain_ollama

# Installs `python-magic-bin`, a precompiled binary version of `python-magic` for Windows. `python-magic` is a Python interface to the `libmagic` file type detection library, which helps automatically detect file types based on file content. This is useful when working with mixed document types (e.g., PDFs, text files) in tasks like document loading.
!pip install python-magic-bin

# Installs `PyMuPDF`, a Python binding for MuPDF, a lightweight PDF and XPS viewer. It provides tools for working with PDF files, including text extraction, rendering, and manipulation. This is useful for parsing and extracting data from PDF documents in the context of unstructured data processing.
!pip install pymupdf


### Part 1: Prepare the documents and setup the vectorstore

Load files from a folder by reading files from a local folder on your computer.

Break them into smaller pieces - since documents can be long, chop them into smaller, overlapping chunks for easier processing.

Convert text into numbers (embeddings) - use a local model (via Ollama) to turn each chunk into a special set of numbers that represent the meaning of the text. This is called an embedding.

Store these embeddings in a searchable index - save everything into a database (called a vector store, using FAISS) so it can later find similar chunks when you ask a question.

Save it for later use - the index is saved to your computer so you don’t have to rebuild it each time.

In [2]:
import os
from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.document_loaders import PyMuPDFLoader

# Define a custom loader for mixed file types (txt, csv, pdf)
# This custom loader class allows us to handle different types of document formats in a single unified way.
class CustomDocumentLoader:
    def __init__(self, path, glob="**/*.{txt,csv,pdf}"):
        # `path`: The directory path where the files are located.
        # `glob`: A glob pattern that specifies which file types to look for, in this case .txt, .csv, and .pdf.
        self.path = path
        self.glob = glob
    
    def load(self):
        # Manually load files from the directory using the glob pattern.
        # We initialize an empty list `documents` to store the loaded content.
        documents = []
        
        # Recursively traverse the directory and check each file's extension.
        # The `os.walk()` function iterates over all directories and files within `self.path`.
        for root, _, files in os.walk(self.path):
            for file in files:
                # Construct the full file path.
                file_path = os.path.join(root, file)
                
                # Check file extension to determine how to load the file.
                if file.endswith('.txt'):
                    # If it's a text file, open it and read its content.
                    with open(file_path, 'r') as f:
                        # Append the content of the text file to the `documents` list.
                        documents.append(f.read())
                elif file.endswith('.csv'):
                    # If it's a CSV file, use the CSVLoader to load the CSV content.
                    # CSVLoader will read the CSV and split it into rows or columns as needed.
                    csv_loader = CSVLoader(file_path=file_path)
                    # Extend the documents list with the content from the CSV loader.
                    documents.extend(csv_loader.load())
                elif file.endswith('.pdf'):
                    # If it's a PDF file, use PyMuPDFLoader to extract text from the PDF.
                    # PyMuPDFLoader reads the PDF content, extracting text from each page.
                    pdf_loader = PyMuPDFLoader(file_path=file_path)
                    # Extend the documents list with the content from the PDF loader.
                    documents.extend(pdf_loader.load())
        
        # Return the list of loaded documents (text from .txt, .csv, and .pdf files).
        return documents

# Use the CustomDocumentLoader to load .txt, .csv, and .pdf files
# Specify the directory where the documents are stored.
loader = CustomDocumentLoader(r"C:\Python\Agent-School\docs\docs2")

# Load the documents using the custom loader. This will load all .txt, .csv, and .pdf files from the specified directory.
docs = loader.load()

# Now `docs` contains all the documents (text, CSV, and PDF content) that were loaded.
# Print the number of documents that were loaded to give feedback on how many files were processed.
print(f"📄 Loaded {len(docs)} documents.")


📄 Loaded 51 documents.


In [3]:
from langchain.vectorstores import FAISS
from langchain_ollama import OllamaEmbeddings
from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# 1. Load local text/PDF/Markdown/etc. files
# Use the DirectoryLoader to load documents from a specified folder.
# In this case, we are loading text files (with `.txt` extension) from the directory path "C:\Python\Agent-School\docs\docs2".
# The `glob="**/*.txt"` pattern ensures that the loader will pick up all `.txt` files in the directory and its subdirectories.
loader = DirectoryLoader(r"C:\Python\Agent-School\docs\docs2", glob="**/*.txt")

# Load the documents into the `docs` variable. This will return a list of documents, 
# each of which will contain the text content of the corresponding file.
docs = loader.load()

# 2. Split into manageable chunks
# We use the `RecursiveCharacterTextSplitter` to split the loaded documents into smaller chunks.
# The `chunk_size=500` argument specifies that each chunk should contain at most 500 characters.
# The `chunk_overlap=50` argument ensures that there will be an overlap of 50 characters between consecutive chunks, 
# which can help preserve context when processing the document.
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)

# Split the documents into chunks and store the result in `chunks`.
# Each element in `chunks` will be a smaller piece of text from the original documents.
chunks = splitter.split_documents(docs)

# 3. Create embeddings using Ollama's local model
# Now we create an embedding model using Ollama's local "nomic-embed-text" model.
# Embeddings are vector representations of text that capture semantic meaning, which are used for similarity searches.
embedding_model = OllamaEmbeddings(model="nomic-embed-text")

# 4. Create FAISS index
# FAISS is used to create an index for efficient similarity search. We use the `from_documents` method to generate embeddings for each document chunk.
# The `embedding_model` generates embeddings for the chunks, which are then stored in the FAISS vector store.
vectorstore = FAISS.from_documents(chunks, embedding_model)

# 5. Save locally
# Save the created FAISS vector store to the local disk so it can be reused in the future.
# The vector store will be saved in the directory "faiss_index_docs2".
vectorstore.save_local("faiss_index_docs2")

# Print confirmation that the vector store has been created and saved successfully.
print("✅ Vector store created and saved.")


✅ Vector store created and saved.


### Part 2: Ask a question and get an answer (rag_phi4_query)
Load the saved vector store i.e. load the document database created earlier.

Find the most relevant chunks - when you ask a question (like "Describe the Apollo programme"), it searches the index and pulls out the top 5 chunks most related to your question.

Create a grounded prompt - it puts those chunks into a prompt that tells the AI:
“Only use this information to answer — don’t make things up.”

Ask a local language model (phi4-mini) - it sends the prompt to a small AI model running locally (no internet needed), and the model generates a thoughtful answer based only on the context it was given.

In [4]:
# Perform RAG query

from langchain.vectorstores import FAISS
from langchain_ollama import OllamaEmbeddings
import ollama

# 1. Load FAISS vector store
# Create an embedding model using Ollama's "nomic-embed-text" model.
# This model will be used to convert documents and queries into embeddings (vector representations).
embedding_model = OllamaEmbeddings(model="nomic-embed-text")

# Load a pre-trained FAISS vector store from the local disk.
# The vector store is assumed to be located in the folder "faiss_index_docs2".
# The `embedding_model` ensures that the embeddings are processed in the right way when loading the vector store.
# `allow_dangerous_deserialization=True` is used to allow loading potentially unsafe data, so only use it with trusted sources.
vectorstore = FAISS.load_local(
    "faiss_index_docs2",
    embedding_model,
    allow_dangerous_deserialization=True
)

# 2. Retrieve top-k relevant chunks
# Create a retriever from the vector store that will search for the most similar document chunks.
# The retriever is configured to return the top 5 most relevant chunks (`k=5`).
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# Define the user query (question) we want to ask the system.
user_query = "Describe the Apollo programme"

# Use the retriever to search for the top 5 most relevant documents from the vector store based on the user query.
# The result is stored in the `docs` variable, which contains the retrieved document chunks.
docs = retriever.invoke(user_query)

# 3. Construct grounded prompt
# Combine the content of the retrieved documents into a single string, with each document separated by a double newline (`\n\n`).
# This string will serve as the context for the language model.
context = "\n\n".join([doc.page_content for doc in docs])

# Create the final prompt to send to the language model.
# The prompt includes instructions for the model to use only the information from the context to answer the question.
# If the answer cannot be found in the context, the model is instructed to say "I don't have enough information to answer that."
prompt = f"""
You are a helpful assistant.

Use only the information in the context below to answer the question.
If the answer is not in the context, say:
"I don't have enough information to answer that."

Context:
{context}

Question: {user_query}
Answer:
"""

# 4. Ask phi-4-mini via Ollama
# Send the constructed prompt to the phi-4-mini model using the Ollama API.
# The `messages` parameter contains the role of the sender (in this case, "user") and the content of the message (the prompt).
# The response from the model will be stored in the `response` variable.
response = ollama.chat(
    model="phi4-mini",
    messages=[{"role": "user", "content": prompt}]
)

# 5. Output the Answer
# Extract and print the answer from the model's response.
# The answer is located in the `message["content"]` field of the response.
print("\n🧠 Answer:")
print(response["message"]["content"])



🧠 Answer:
The Apollo program was an American initiative aimed at landing humans on the Moon as part of President John F. Kennedy's 1961 goal for human exploration beyond Earth orbit (HEO). Announced by JFK, its ambitious objective culminated with astronauts Neil Armstrong and Buzz Aldrin stepping onto lunar soil during NASA’s historic mission in July-August 1969.

Subsequent Apollo missions expanded upon the initial success of landing humans on a celestial body. They conducted extended trips to explore Earth-orbiting satellites while also collecting samples from various locations both around our planet as well as others throughout interstellar space that were previously unreachable by man, with many being returned for study and analysis back here at home.

The Apollo program had three significant goals: 1) demonstrate human capability in achieving HEO; 2) test new spacecraft technologies (particularly the Saturn V rocket which carried humans beyond Earth’s atmosphere); & 3) retrieve s