# Vector Search at Scale: Cloud vs Local DBs - Build a RAG App with Pinecone, LangChain & OpenAI

This notebook demonstrates how to build a Retrieval-Augmented Generation (RAG) application using vector search technologies, comparing cloud-based solutions like Pinecone with local alternatives.

## Table of Contents
1. [Introduction to Vector Search](#introduction)
2. [Cloud vs Local Vector Databases](#comparison)
3. [Setting Up the Environment](#setup)
4. [Building with Pinecone (Cloud)](#pinecone)
5. [Building with Local Vector DB](#local)
6. [RAG Application Implementation](#rag)
7. [Performance Comparison](#performance)
8. [Conclusion](#conclusion)

## 1. Introduction to Vector Search {#introduction}

Vector search is a fundamental component of modern AI applications, enabling semantic similarity search across large datasets. Unlike traditional keyword-based search, vector search uses high-dimensional embeddings to find semantically similar content.

### Key Concepts:
- **Embeddings**: Dense vector representations of text, images, or other data
- **Similarity Metrics**: Methods to measure similarity between vectors (cosine, euclidean, dot product)
- **Vector Databases**: Specialized databases optimized for storing and querying high-dimensional vectors
- **RAG**: Retrieval-Augmented Generation combines retrieval of relevant information with generative AI

## 2. Cloud vs Local Vector Databases {#comparison}

| Aspect | Cloud (Pinecone) | Local (FAISS/Chroma) |
|--------|------------------|----------------------|
| **Scalability** | Highly scalable, managed | Limited by hardware |
| **Cost** | Pay-per-use, subscription | Hardware + maintenance |
| **Latency** | Network dependent | Very low latency |
| **Setup Complexity** | Minimal setup | Requires configuration |
| **Data Privacy** | Data leaves premises | Complete data control |
| **Maintenance** | Fully managed | Self-maintained |
| **Performance** | Optimized for scale | Optimized for local use |

### When to Choose Cloud:
- Large-scale applications
- Variable workloads
- Limited infrastructure resources
- Need for high availability

### When to Choose Local:
- Data privacy requirements
- Low-latency needs
- Cost optimization for consistent workloads
- Full control over infrastructure

## 3. Setting Up the Environment {#setup}

First, let's install the required packages for both cloud and local implementations.

In [None]:
# Install required packages
!pip install pinecone-client langchain openai chromadb faiss-cpu sentence-transformers python-dotenv

In [None]:
# Import necessary libraries
import os
import pinecone
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Pinecone, Chroma, FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
import numpy as np
from dotenv import load_dotenv
import time

# Load environment variables
load_dotenv()

## 4. Configuration and API Keys

Set up your API keys and configuration. Create a `.env` file with your credentials:

In [None]:
# Configuration
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
PINECONE_API_KEY = os.getenv('PINECONE_API_KEY')
PINECONE_ENVIRONMENT = os.getenv('PINECONE_ENVIRONMENT')  # e.g., 'us-west1-gcp'

# Verify API keys are loaded
if not OPENAI_API_KEY:
    print("⚠️ OPENAI_API_KEY not found. Please set it in your .env file.")
if not PINECONE_API_KEY:
    print("⚠️ PINECONE_API_KEY not found. Please set it in your .env file.")
else:
    print("✅ API keys loaded successfully")

## 5. Sample Data Preparation

Let's create some sample documents to work with:

In [None]:
# Sample documents for testing
sample_documents = [
    "Vector databases are specialized databases designed to store and query high-dimensional vectors efficiently. They use advanced indexing techniques like HNSW, IVF, and LSH to enable fast similarity search.",
    "Pinecone is a cloud-native vector database that provides managed vector search capabilities. It offers features like real-time updates, filtering, and horizontal scaling.",
    "FAISS (Facebook AI Similarity Search) is an open-source library for efficient similarity search and clustering of dense vectors. It's optimized for both CPU and GPU execution.",
    "ChromaDB is an open-source embedding database that focuses on simplicity and developer experience. It supports multiple embedding models and provides easy-to-use APIs.",
    "Retrieval-Augmented Generation (RAG) combines the power of large language models with external knowledge retrieval. It helps reduce hallucinations and provides more accurate, up-to-date responses.",
    "Embeddings are dense vector representations of data that capture semantic meaning. OpenAI's text-embedding-ada-002 model produces 1536-dimensional embeddings for text.",
    "LangChain is a framework for developing applications powered by language models. It provides abstractions for vector stores, document loaders, and retrieval chains.",
    "Semantic search goes beyond keyword matching to understand the intent and contextual meaning of search queries. It uses embeddings to find semantically similar content."
]

print(f"Prepared {len(sample_documents)} sample documents for testing")

## 6. Building with Pinecone (Cloud Solution) {#pinecone}

Let's implement a vector search solution using Pinecone as our cloud-based vector database.

In [None]:
# Initialize Pinecone
def setup_pinecone():
    try:
        pinecone.init(
            api_key=PINECONE_API_KEY,
            environment=PINECONE_ENVIRONMENT
        )
        print("✅ Pinecone initialized successfully")
        return True
    except Exception as e:
        print(f"❌ Error initializing Pinecone: {e}")
        return False

pinecone_ready = setup_pinecone()

In [None]:
# Create or connect to Pinecone index
INDEX_NAME = "rag-demo-index"
DIMENSION = 1536  # OpenAI ada-002 embedding dimension

def create_pinecone_index():
    if not pinecone_ready:
        print("❌ Pinecone not ready")
        return None
    
    try:
        # Check if index exists
        if INDEX_NAME not in pinecone.list_indexes():
            # Create index
            pinecone.create_index(
                name=INDEX_NAME,
                dimension=DIMENSION,
                metric="cosine"
            )
            print(f"✅ Created new Pinecone index: {INDEX_NAME}")
        else:
            print(f"✅ Using existing Pinecone index: {INDEX_NAME}")
        
        # Connect to index
        index = pinecone.Index(INDEX_NAME)
        return index
    except Exception as e:
        print(f"❌ Error with Pinecone index: {e}")
        return None

pinecone_index = create_pinecone_index()

In [None]:
# Initialize embeddings and create Pinecone vector store
def create_pinecone_vectorstore():
    if not pinecone_index:
        print("❌ Pinecone index not available")
        return None
    
    try:
        # Initialize OpenAI embeddings
        embeddings = OpenAIEmbeddings(
            openai_api_key=OPENAI_API_KEY,
            model="text-embedding-ada-002"
        )
        
        # Create vector store from documents
        vectorstore = Pinecone.from_texts(
            texts=sample_documents,
            embedding=embeddings,
            index_name=INDEX_NAME
        )
        
        print(f"✅ Created Pinecone vector store with {len(sample_documents)} documents")
        return vectorstore, embeddings
    except Exception as e:
        print(f"❌ Error creating Pinecone vector store: {e}")
        return None, None

pinecone_vectorstore, openai_embeddings = create_pinecone_vectorstore()

## 7. Building with Local Vector Database {#local}

Now let's implement the same functionality using local vector databases (FAISS and ChromaDB).

In [None]:
# Create FAISS vector store (local)
def create_faiss_vectorstore():
    try:
        if not openai_embeddings:
            embeddings = OpenAIEmbeddings(
                openai_api_key=OPENAI_API_KEY,
                model="text-embedding-ada-002"
            )
        else:
            embeddings = openai_embeddings
        
        # Create FAISS vector store
        faiss_vectorstore = FAISS.from_texts(
            texts=sample_documents,
            embedding=embeddings
        )
        
        # Save the vector store locally
        faiss_vectorstore.save_local("faiss_index")
        
        print(f"✅ Created FAISS vector store with {len(sample_documents)} documents")
        return faiss_vectorstore
    except Exception as e:
        print(f"❌ Error creating FAISS vector store: {e}")
        return None

faiss_vectorstore = create_faiss_vectorstore()

In [None]:
# Create ChromaDB vector store (local)
def create_chroma_vectorstore():
    try:
        if not openai_embeddings:
            embeddings = OpenAIEmbeddings(
                openai_api_key=OPENAI_API_KEY,
                model="text-embedding-ada-002"
            )
        else:
            embeddings = openai_embeddings
        
        # Create ChromaDB vector store
        chroma_vectorstore = Chroma.from_texts(
            texts=sample_documents,
            embedding=embeddings,
            persist_directory="./chroma_db"
        )
        
        # Persist the database
        chroma_vectorstore.persist()
        
        print(f"✅ Created ChromaDB vector store with {len(sample_documents)} documents")
        return chroma_vectorstore
    except Exception as e:
        print(f"❌ Error creating ChromaDB vector store: {e}")
        return None

chroma_vectorstore = create_chroma_vectorstore()

## 8. RAG Application Implementation {#rag}

Now let's build RAG applications using each vector database.

In [None]:
# Initialize OpenAI LLM
llm = OpenAI(
    openai_api_key=OPENAI_API_KEY,
    temperature=0.1,
    model_name="gpt-3.5-turbo-instruct"
)

print("✅ OpenAI LLM initialized")

In [None]:
# Create RAG chains for each vector database
def create_rag_chains():
    chains = {}
    
    # Pinecone RAG chain
    if pinecone_vectorstore:
        chains['pinecone'] = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type="stuff",
            retriever=pinecone_vectorstore.as_retriever(search_kwargs={"k": 3}),
            return_source_documents=True
        )
        print("✅ Pinecone RAG chain created")
    
    # FAISS RAG chain
    if faiss_vectorstore:
        chains['faiss'] = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type="stuff",
            retriever=faiss_vectorstore.as_retriever(search_kwargs={"k": 3}),
            return_source_documents=True
        )
        print("✅ FAISS RAG chain created")
    
    # ChromaDB RAG chain
    if chroma_vectorstore:
        chains['chroma'] = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type="stuff",
            retriever=chroma_vectorstore.as_retriever(search_kwargs={"k": 3}),
            return_source_documents=True
        )
        print("✅ ChromaDB RAG chain created")
    
    return chains

rag_chains = create_rag_chains()

## 9. Performance Comparison {#performance}

Let's test and compare the performance of different vector databases.

In [None]:
# Test queries
test_queries = [
    "What is a vector database?",
    "How does RAG work?",
    "Compare Pinecone and FAISS",
    "What are embeddings used for?"
]

def test_rag_performance(chains, queries):
    results = {}
    
    for chain_name, chain in chains.items():
        print(f"\n🔍 Testing {chain_name.upper()} RAG Chain")
        print("=" * 50)
        
        chain_results = []
        
        for i, query in enumerate(queries, 1):
            print(f"\nQuery {i}: {query}")
            
            try:
                # Measure response time
                start_time = time.time()
                response = chain({"query": query})
                end_time = time.time()
                
                response_time = end_time - start_time
                
                print(f"Response Time: {response_time:.2f} seconds")
                print(f"Answer: {response['result'][:200]}...")
                print(f"Sources: {len(response['source_documents'])} documents retrieved")
                
                chain_results.append({
                    'query': query,
                    'response_time': response_time,
                    'answer': response['result'],
                    'sources_count': len(response['source_documents'])
                })
                
            except Exception as e:
                print(f"❌ Error: {e}")
                chain_results.append({
                    'query': query,
                    'error': str(e)
                })
        
        results[chain_name] = chain_results
    
    return results

# Run performance tests
performance_results = test_rag_performance(rag_chains, test_queries)

In [None]:
# Analyze and display performance metrics
def analyze_performance(results):
    print("\n📊 PERFORMANCE ANALYSIS")
    print("=" * 60)
    
    for chain_name, chain_results in results.items():
        # Calculate average response time
        response_times = [r['response_time'] for r in chain_results if 'response_time' in r]
        
        if response_times:
            avg_time = np.mean(response_times)
            min_time = np.min(response_times)
            max_time = np.max(response_times)
            
            print(f"\n{chain_name.upper()} Performance:")
            print(f"  Average Response Time: {avg_time:.3f}s")
            print(f"  Min Response Time: {min_time:.3f}s")
            print(f"  Max Response Time: {max_time:.3f}s")
            print(f"  Successful Queries: {len(response_times)}/{len(chain_results)}")
        else:
            print(f"\n{chain_name.upper()}: No successful queries")

analyze_performance(performance_results)

## 10. Advanced Features and Optimizations

Let's explore some advanced features and optimization techniques.

In [None]:
# Advanced similarity search with filtering
def advanced_similarity_search():
    query = "What is vector search?"
    
    print("🔍 Advanced Similarity Search Comparison")
    print("=" * 50)
    
    # Test different search parameters
    search_params = [
        {"k": 1, "description": "Top 1 result"},
        {"k": 3, "description": "Top 3 results"},
        {"k": 5, "description": "Top 5 results"}
    ]
    
    for vectorstore_name, vectorstore in [("FAISS", faiss_vectorstore), ("ChromaDB", chroma_vectorstore)]:
        if vectorstore:
            print(f"\n{vectorstore_name} Results:")
            for params in search_params:
                try:
                    results = vectorstore.similarity_search(query, k=params["k"])
                    print(f"  {params['description']}: {len(results)} documents")
                    if results:
                        print(f"    Most similar: {results[0].page_content[:100]}...")
                except Exception as e:
                    print(f"    Error: {e}")

advanced_similarity_search()

In [None]:
# Similarity search with scores
def similarity_search_with_scores():
    query = "How does semantic search work?"
    
    print("\n📈 Similarity Search with Scores")
    print("=" * 45)
    
    for vectorstore_name, vectorstore in [("FAISS", faiss_vectorstore)]:
        if vectorstore:
            try:
                # Get similarity search with scores
                results_with_scores = vectorstore.similarity_search_with_score(query, k=3)
                
                print(f"\n{vectorstore_name} Results with Scores:")
                for i, (doc, score) in enumerate(results_with_scores, 1):
                    print(f"  {i}. Score: {score:.4f}")
                    print(f"     Content: {doc.page_content[:80]}...")
                    print()
            except Exception as e:
                print(f"  Error: {e}")

similarity_search_with_scores()

## 11. Interactive RAG Demo

Create an interactive function to test the RAG system with custom queries.

In [None]:
def interactive_rag_demo():
    """
    Interactive demo function for testing RAG with different vector databases
    """
    print("🤖 Interactive RAG Demo")
    print("=" * 30)
    print("Available databases:", list(rag_chains.keys()))
    
    # Example queries for demonstration
    demo_queries = [
        "What are the advantages of cloud vector databases?",
        "Explain the difference between FAISS and Pinecone",
        "How do embeddings work in vector search?"
    ]
    
    for query in demo_queries:
        print(f"\n❓ Query: {query}")
        print("-" * 50)
        
        for db_name, chain in rag_chains.items():
            try:
                start_time = time.time()
                response = chain({"query": query})
                end_time = time.time()
                
                print(f"\n🔹 {db_name.upper()} ({end_time-start_time:.2f}s):")
                print(f"   {response['result'][:150]}...")
                
            except Exception as e:
                print(f"\n🔹 {db_name.upper()}: Error - {e}")

# Run the interactive demo
interactive_rag_demo()