# EcoHome Energy Advisor - RAG Setup

In this notebook, we'll set up the Retrieval-Augmented Generation (RAG) pipeline for the EcoHome Energy Advisor. This will allow the agent to access and cite relevant energy-saving tips and best practices.

- Set up ChromaDB vector store
- Load and process energy-saving documents
- Create embeddings for document chunks
- Implement semantic search functionality
- Test the RAG pipeline

## Documents Available
- `tip_device_best_practices.txt` 
- `tip_energy_savings.txt` 
- `tip_seasonal_energy_management.txt` 
- `tip_hvac_optimization_strategies.txt`
- `tip_energy_storage_optimization.txt`
- `tip_renewable_energy_integration.txt`
- `tip_smart_home_automation.txt`


## 1. Import Required Libraries


In [8]:
# Import the necessary libraries for RAG setup
import os
import glob
from langchain_chroma  import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from dotenv import load_dotenv

In [4]:
load_dotenv()

True

## 2. Load and Process Documents


In [10]:
# Load the energy-saving tip documents
# Load both tip_device_best_practices.txt and tip_energy_savings.txt
# Use TextLoader to load the documents

documents = []
document_paths = glob.glob("data/documents/*.txt")
if not document_paths:
    print("No text files found in data/documents/")
else:
    for doc_path in document_paths:
        loader = TextLoader(doc_path)
        docs = loader.load()
        documents.extend(docs)
        print(f"Loaded {len(docs)} documents from {doc_path}")

print(f"Total documents loaded: {len(documents)}")


Loaded 1 documents from data/documents/tip_renewable_energy_integration.txt
Loaded 1 documents from data/documents/tip_device_best_practices.txt
Loaded 1 documents from data/documents/tip_hvac_optimization_strategies.txt
Loaded 1 documents from data/documents/tip_energy_storage_optimization.txt
Loaded 1 documents from data/documents/tip_energy_savings.txt
Loaded 1 documents from data/documents/tip_smart_home_automation.txt
Loaded 1 documents from data/documents/ tip_seasonal_energy_management.txt
Total documents loaded: 7


## 3. Split Documents into Chunks


In [11]:
# Split documents into smaller chunks for better retrieval
# Use RecursiveCharacterTextSplitter with appropriate chunk_size and chunk_overlap
# Experiment with different chunk sizes (e.g., 500, 1000, 1500 characters)

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
    separators=["\n\n", "\n", " ", ""]
)

# Split the documents
splits = text_splitter.split_documents(documents)
print(f"Split {len(documents)} documents into {len(splits)} chunks")

# Show sample chunk
if splits:
    print(f"\nSample chunk (first 200 characters):")
    print(splits[0].page_content[:200] + "...")


Split 7 documents into 19 chunks

Sample chunk (first 200 characters):
As more households adopt rooftop solar and other renewable technologies, maximizing self-generated clean energy becomes increasingly important. Smart scheduling, flexible loads, and awareness of forec...


## 4. Create Vector Store


In [12]:
# Create a ChromaDB vector store
# Initialize OpenAIEmbeddings
# Create the vector store with the document chunks
# Persist the vector store to disk for future use

# Set up the persist directory
persist_directory = "data/vectorstore"
os.makedirs(persist_directory, exist_ok=True)

# Initialize embeddings
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small"
)

# Create the vector store
vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=embeddings,
    persist_directory=persist_directory
)

print(f"Vector store created and persisted to {persist_directory}")
print(f"Total vectors stored: {len(splits)}")

Vector store created and persisted to data/vectorstore
Total vectors stored: 19


## 5. Test the RAG Pipeline


In [13]:
# Test the search functionality
# Try different queries related to energy optimization
# Test queries like:
# - "electric vehicle charging tips"
# - "thermostat optimization"
# - "dishwasher energy saving"
# - "solar power maximization"

test_queries = [
    "electric vehicle charging tips",
    "thermostat optimization",
    "dishwasher energy saving",
    "solar power maximization",
    "HVAC system efficiency",
    "pool pump scheduling"
]

print("=== Testing Vector Search ===")
for query in test_queries:
    print(f"\nQuery: '{query}'")
    docs = vectorstore.similarity_search(query, k=2)
    for i, doc in enumerate(docs):
        print(f"  Result {i+1}: {doc.page_content[:100]}...")


=== Testing Vector Search ===

Query: 'electric vehicle charging tips'
  Result 1: Align EV charging with solar and tariffs:
- Charge EVs primarily between late morning and mid-aftern...
  Result 2: Home batteries and electric vehicles offer powerful opportunities to store and strategically use ene...

Query: 'thermostat optimization'
  Result 1: Efficient HVAC operation is one of the most impactful ways to reduce home energy costs while staying...
  Result 2: Use ceiling fans and natural ventilation:
- In summer, use ceiling fans to feel cooler at slightly h...

Query: 'dishwasher energy saving'
  Result 1: Dishwasher Best Practices:
- Only run when completely full
- Use the energy-saving or eco mode when ...
  Result 2: Saving energy at home can be simple and effective. Turn off lights when not in use and unplug device...

Query: 'solar power maximization'
  Result 1: As more households adopt rooftop solar and other renewable technologies, maximizing self-generated c...
  Result 2: P

## 6. Test the Search Tool


In [14]:
# Test the search_energy_tips tool from tools.py
# Import and test the tool with various queries
# Verify that it returns relevant results

from tools import search_energy_tips

# Test the search_energy_tips function
print("=== Testing search_energy_tips Tool ===")

test_queries = [
    "electric vehicle charging",
    "thermostat settings",
    "dishwasher optimization",
    "solar power tips"
]

for query in test_queries:
    print(f"\nQuery: '{query}'")
    result = search_energy_tips.invoke(
        input={
            "query": query, 
            "max_results": 3,
        }
    )
    
    if "error" in result:
        print(f"  Error: {result['error']}")
    else:
        print(f"  Found {result['total_results']} results")
        for i, tip in enumerate(result['tips']):
            print(f"    {i+1}. {tip['content'][:100]}...")
            print(f"       Source: {tip['source']}")
            print(f"       Relevance: {tip['relevance_score']}")


=== Testing search_energy_tips Tool ===

Query: 'electric vehicle charging'
  Found 3 results
    1. Use power strips to easily turn off multiple devices at once. Many electronics continue to draw powe...
       Source: data/documents/tip_energy_savings.txt
       Relevance: high
    2. Plan loads based on forecasted solar:
- Use next-day solar irradiance and weather forecasts to plan ...
       Source: data/documents/tip_renewable_energy_integration.txt
       Relevance: high
    3. Protect battery health:
- Avoid keeping batteries at 100% state of charge for long periods unless ne...
       Source: data/documents/tip_energy_storage_optimization.txt
       Relevance: medium

Query: 'thermostat settings'
  Found 3 results
    1. Use power strips to easily turn off multiple devices at once. Many electronics continue to draw powe...
       Source: data/documents/tip_energy_savings.txt
       Relevance: high
    2. Saving energy at home can be simple and effective. Turn off lights when no