# Free RAG Implementation
## Using HuggingFace Embeddings + Pinecone Free Tier + Free LLM

This notebook implements a completely free RAG system for development purposes:
- **Embeddings**: HuggingFace (all-MiniLM-L6-v2) - Free
- **Vector Store**: Pinecone Free Tier (2GB, 2M writes, 1M reads/month)
- **LLM**: HuggingFace Transformers (Flan-T5) - Free local inference
- **Document Processing**: Same as original implementation

In [2]:
# Import Libraries for Free RAG Implementation
import pinecone 
from langchain.document_loaders.pdf import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import HuggingFacePipeline
from langchain.vectorstores import Pinecone
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
from dotenv import load_dotenv
import os
import warnings
warnings.filterwarnings('ignore')

# Load environment variables
load_dotenv()
print("✅ Libraries imported successfully!")

✅ Libraries imported successfully!


In [3]:
# API Keys (Only Pinecone needed for free tier)
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
PINECONE_ENVIRONMENT = os.getenv("PINECONE_ENVIRONMENT")
PINECONE_INDEX_NAME = os.getenv("PINECONE_INDEX_NAME")

# Verify Pinecone credentials
if not all([PINECONE_API_KEY, PINECONE_ENVIRONMENT, PINECONE_INDEX_NAME]):
    print("⚠️ Please set PINECONE_API_KEY, PINECONE_ENVIRONMENT, and PINECONE_INDEX_NAME in your .env file")
else:
    print("✅ Pinecone credentials loaded!")

✅ Pinecone credentials loaded!


In [4]:
# Initialize Pinecone (Free Tier)
pinecone.Pinecone(
    api_key=PINECONE_API_KEY,
    environment=PINECONE_ENVIRONMENT
)
index_name = PINECONE_INDEX_NAME
print(f"✅ Pinecone initialized with index: {index_name}")

✅ Pinecone initialized with index: ai-chatbot


In [8]:
## Free HuggingFace Embeddings
# Using sentence-transformers all-MiniLM-L6-v2 model
# This model is free, fast, and produces good quality embeddings
print("📥 Loading HuggingFace embeddings model...")
embeddings = HuggingFaceEmbeddings(
    model_name="all-MiniLM-L6-v2",
    model_kwargs={'device': 'cpu'},  # Use 'cuda' if you have GPU
    encode_kwargs={'normalize_embeddings': True},
)
print("✅ HuggingFace embeddings loaded!")
print(f"📊 Embedding dimension: {len(embeddings.embed_query('test'))}")

📥 Loading HuggingFace embeddings model...


ImportError: Could not import sentence_transformers python package. Please install it with `pip install sentence-transformers`.

In [None]:
## Document Loading (Same as original)
def read_doc(directory):
    """Load PDF documents from directory"""
    file_loader = PyPDFDirectoryLoader(directory)
    documents = file_loader.load()
    return documents

print("📚 Loading documents...")
doc = read_doc('documents/')
print(f"✅ Loaded {len(doc)} documents")

In [None]:
## Text Chunking (Same as original)
def chunk_data(docs, chunk_size=800, chunk_overlap=50):
    """Split documents into chunks for better retrieval"""
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap
    )
    doc = text_splitter.split_documents(docs)
    return doc

print("✂️ Splitting documents into chunks...")
documents = chunk_data(docs=doc)
print(f"✅ Created {len(documents)} document chunks")

In [None]:
# Test embedding generation
print("🧪 Testing embedding generation...")
test_vectors = embeddings.embed_query("How are you?")
print(f"✅ Generated embedding vector of length: {len(test_vectors)}")
print(f"📈 Sample values: {test_vectors[:5]}")

In [None]:
# Create Pinecone vector index using free HuggingFace embeddings
print("🔄 Creating Pinecone vector index with free embeddings...")
print("⏳ This may take a few minutes for large documents...")

index = Pinecone.from_documents(
    documents, 
    embeddings, 
    index_name=index_name
)

print("✅ Vector index created successfully!")
print(f"📊 Indexed {len(documents)} document chunks")

In [None]:
## Free Local LLM Setup
# Using Google's Flan-T5 model - free and runs locally
print("🤖 Loading free LLM (Flan-T5)...")
print("⏳ First time loading may take a few minutes...")

model_name = "google/flan-t5-base"  # ~250MB model
# For better quality but larger size, use: "google/flan-t5-large" (~800MB)

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Create pipeline for text generation
pipe = pipeline(
    "text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=512,
    temperature=0.3,
    do_sample=True,
    device=-1  # Use CPU, change to 0 for GPU
)

# Create LangChain LLM
llm = HuggingFacePipeline(pipeline=pipe)
print("✅ Free LLM loaded successfully!")

In [None]:
# Create QA chain with free LLM
chain = load_qa_chain(llm, chain_type="stuff")
print("🔗 QA chain created with free LLM!")

In [None]:
## Retrieval Functions (Same as original)
def retrieve_query(query, k=2):
    """Retrieve relevant documents using cosine similarity"""
    matching_results = index.similarity_search(query, k=k)
    return matching_results

def retrieve_answers(query):
    """Get answers using free RAG pipeline"""
    print(f"🔍 Query: {query}")
    print("📚 Retrieving relevant documents...")
    
    doc_search = retrieve_query(query)
    print(f"✅ Found {len(doc_search)} relevant documents")
    
    print("🤖 Generating answer with free LLM...")
    response = chain.run(input_documents=doc_search, question=query)
    
    return response, doc_search

print("🛠️ Retrieval functions ready!")

In [None]:
## Test the Free RAG System
our_query = "How much the agriculture target will be increased by how many crore?"

print("🚀 Testing Free RAG System")
print("=" * 50)

answer, source_docs = retrieve_answers(our_query)

print("\n📋 ANSWER:")
print("-" * 20)
print(answer)

print("\n📚 SOURCE DOCUMENTS:")
print("-" * 30)
for i, doc in enumerate(source_docs, 1):
    print(f"\n📄 Document {i}:")
    print(f"Source: {doc.metadata.get('source', 'Unknown')}")
    print(f"Page: {doc.metadata.get('page', 'Unknown')}")
    print(f"Content: {doc.page_content[:200]}...")

In [None]:
## Interactive Query Function
def ask_question(question):
    """Interactive function to ask questions"""
    print(f"\n❓ Question: {question}")
    print("=" * 60)
    
    answer, docs = retrieve_answers(question)
    
    print(f"\n💡 Answer: {answer}")
    print(f"\n📊 Based on {len(docs)} relevant document(s)")
    
    return answer

# Test with different questions
questions = [
    "What is the agriculture credit target?",
    "What initiatives are mentioned for farmers?",
    "Tell me about Shree Anna"
]

print("🎯 Testing multiple queries:")
for q in questions:
    ask_question(q)
    print("\n" + "="*80 + "\n")

## 🎉 Free RAG System Summary

### ✅ What We Achieved:
- **100% Free RAG Implementation** for development
- **HuggingFace Embeddings**: High-quality, free embeddings
- **Pinecone Free Tier**: Cloud vector storage (2GB limit)
- **Local LLM**: Flan-T5 running without API costs
- **Same Functionality**: Document processing, chunking, and Q&A

### 💰 Cost Comparison:
- **Original**: OpenAI embeddings + OpenAI LLM = $$ per query
- **Free Version**: $0 for unlimited local queries
- **Only Cost**: Pinecone free tier (sufficient for development)

### 🚀 Next Steps:
1. **Production**: Can easily switch to paid services when needed
2. **Scaling**: Upgrade Pinecone or use local vector stores
3. **Performance**: Use larger models or GPU acceleration
4. **Integration**: Connect with Django backend using these components

### 🔧 Environment Variables Needed:
```env
PINECONE_API_KEY=your_free_pinecone_key
PINECONE_ENVIRONMENT=your_pinecone_environment
PINECONE_INDEX_NAME=your_index_name
```