# Personalized Real Estate Agent
The goal of this project is to create a personalized experience for each buyer, making the property search process more engaging and tailored to individual preferences.  The application we are developing is called **HomeMatch** and has the following core components:
- Understanding buyer preferences
- Integrating with a vector database
- Personalized listing descriptino generation
- Listing presentation

## 1. Configuration
Uncomment the code below to install libraries.

In [None]:
# !pip show pandas

In [None]:
# !pip install pandas --upgrade

In [None]:
# !pip install -r requirements.txt

In [57]:
import os
# import PIL
# import torch
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, NonNegativeInt
from typing import List
from langchain.vectorstores import Chroma # vector database
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.schema import Document
from langchain.prompts import ChatPromptTemplate
from langchain.schema import HumanMessage

import pandas as pd
from fastapi.encoders import jsonable_encoder

Set the environment variables and initialize the model.

In [32]:
os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY"
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"

# load the model
model_name = "gpt-3.5-turbo"
temperature = 0.0
llm = ChatOpenAI(model_name=model_name, temperature=temperature)

## 2. Generate Synthetic Data for Real Estate Listings
### 2.1 Setup instructions and template for listings

In [34]:
instruction = "Generate a CSV file with at least 10 real estate listings."
listing_template = \
"""
Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.
"""

### 2.2 Create Class objects
Define the structure of the data for the real estate listings.

In [35]:
class RealEstateListing(BaseModel):
    """
    A real estate listing.
    
    Attributes:
    - neighborhood: str
    - price: NonNegativeInt
    - bedrooms: NonNegativeInt
    - bathrooms: NonNegativeInt
    - house_size: NonNegativeInt
    - description: str
    - neighborhood_description: str
    """
    
    neighborhood: str = Field(description="The neighborhood where the property is located")
    price: NonNegativeInt = Field(description="The price of the property in USD")
    bedrooms: NonNegativeInt = Field(description="The number of bedrooms in the property")
    bathrooms: NonNegativeInt = Field(description="The number of bathrooms in the property")
    house_size: NonNegativeInt = Field(description="The size of the house in square feet")
    description: str = Field(description="A description of the property")
    neighborhood_description: str = Field(description="A description of the neighborhood.")  

class ListingCollection(BaseModel):
    """
    A collection of real estate listings.
    
    Attributes:
    - listings: List[RealEstateListing]
    """
    
    listings: List[RealEstateListing] = Field(description = "A list of real estate listings")

In [36]:
parser = PydanticOutputParser(pydantic_object = ListingCollection)
# print(parser.get_format_instructions())

### 2.3 Setup the Prompt Template

In [37]:
# Define the prompt template
prompt = PromptTemplate(
    input_variables=["instruction", "template"],
    partial_variables={"format_instructions": parser.get_format_instructions},
    template="{instruction}\n{template}\n{format_instructions}\n"
)

# Format the prompt
query = prompt.format(
    instruction=instruction,
    template=listing_template,
)

print(query)

Generate a CSV file with at least 10 real estate listings.

Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bi

### 2.4 Generate the response and parse the results

In [38]:
# Generate a response using the LLM
response = llm.predict(query)

In [42]:
# print("Response from LLM:", response)

In [41]:
result = parser.parse(response)

In [43]:
# create a dataframe from the response
df = pd.DataFrame(jsonable_encoder(result.listings))
df.head()

Unnamed: 0,neighborhood,price,bedrooms,bathrooms,house_size,description,neighborhood_description
0,Green Oaks,800000,3,2,2000,Welcome to this eco-friendly oasis nestled in ...,"Green Oaks is a close-knit, environmentally-co..."
1,Downtown Loft,1200000,2,2,1500,Luxury living in the heart of the city! This m...,Downtown Loft is a vibrant urban neighborhood ...
2,Suburban Retreat,600000,4,3,2500,Escape to this peaceful suburban retreat surro...,Suburban Retreat is a family-friendly neighbor...
3,Beachfront Paradise,1500000,5,4,3000,Live the beachfront dream in this stunning 5-b...,Beachfront Paradise is a sought-after coastal ...
4,Mountain Retreat,900000,3,2,2200,Escape to this serene mountain retreat surroun...,Mountain Retreat is a nature lover's paradise ...


In [44]:
df.to_csv('listings.csv', index_label = 'id')

## 3. Store Listings in a Vector Database

In [52]:
# Define the path for Chroma
CHROMA_PATH = "chroma"

# Initialize the embeddings
embeddings = OpenAIEmbeddings()

# Initialize the text splitter
splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=100)  # Adjust chunk_size and chunk_overlap as needed

# Create a list of Document objects from the DataFrame
documents = []
for index, row in df.iterrows():
    # Split the description into chunks
    chunks = splitter.split_text(row['description'])
    for chunk in chunks:
        documents.append(Document(page_content=chunk, metadata={'id': str(index)}))

# Create the Chroma vector store from the documents
db = Chroma.from_documents(
    documents,  # Use the split documents
    embeddings,  # Reuse the initialized embeddings
    persist_directory=CHROMA_PATH  # Ensure CHROMA_PATH is defined
)

# Persist the vector store
db.persist()
print(f"Saved {len(documents)} documents to {CHROMA_PATH}.")  # Use 'documents' for the count

Saved 10 documents to chroma.


## 4. Building the User Preferance Interface

In [60]:
def collect_buyer_preferences():
    print("Please enter your preferences for the property:")
    
    # Collecting preferences
    bedrooms = input("Number of bedrooms: ")
    bathrooms = input("Number of bathrooms: ")
    location = input("Preferred location: ")
    price_range = input("Price range (e.g., 500000-800000): ")
    house_size = input("Minimum house size (in sqft): ")
    
    # Store preferences in a dictionary
    preferences = {
        "bedrooms": bedrooms,
        "bathrooms": bathrooms,
        "location": location,
        "price_range": price_range,
        "house_size": house_size
    }
    
    return preferences

# Example usage
buyer_preferences = collect_buyer_preferences()
print("Collected Buyer Preferences:", buyer_preferences)

Please enter your preferences for the property:
Number of bedrooms: 3
Number of bathrooms: 2
Preferred location: irvine, california
Price range (e.g., 500000-800000): 700000
Minimum house size (in sqft): 1800
Collected Buyer Preferences: {'bedrooms': '3', 'bathrooms': '2', 'location': 'irvine, california', 'price_range': '700000', 'house_size': '1800'}


## 5. Semantic Search Implementation
- Semantic Search Implementation: Use the structured buyer preferences to perform a semantic search on the vector database, retrieving listings that most closely match the user's requirements.
- Listing Retrieval Logic: Fine-tune the retrieval algorithm to ensure that the most relevant listings are selected based on the semantic closeness to the buyer’s preferences.

In [68]:
def construct_query(preferences):
    """Construct a natural language query from buyer preferences."""
    return (
        f"Find properties with {preferences['bedrooms']} bedrooms, {preferences['bathrooms']} bathrooms, "
        f"located in {preferences['location']} within the price range of {preferences['price_range']} "
        f"and at least {preferences['house_size']} sqft."
    )

def query_vector_database(preferences, db, embeddings, k=3):
    """
    Query the vector database using buyer preferences.

    :param preferences: Dictionary containing buyer preferences.
    :param db: Vector database instance.
    :param embeddings: Embedding model to convert the query into a vector.
    :param k: Number of results to retrieve.
    :return: List of retrieved documents.
    """
    # Construct the query
    query_text = construct_query(preferences)

    # Convert query into an embedding vector
    query_vector = embeddings.embed_query(query_text)

    # Search the database using similarity search
    try:
        results = db.similarity_search_by_vector(query_vector, k=k)
        if not results:
            print("No matching properties found.")
        return results
    except Exception as e:
        print(f"An error occurred while querying the database: {e}")
        return []

In [69]:
# Query the database with collected preferences
results = query_vector_database(buyer_preferences, db, embeddings)
if results:
    for doc in results:
        print(f"ID: {doc.metadata.get('id', 'N/A')}")
        print(f"Description: {doc.page_content}")
        print("-" * 40)  # Separator for clarity


ID: 2
Description: Escape to this peaceful suburban retreat surrounded by lush greenery. This spacious 4-bedroom, 3-bathroom home offers a cozy fireplace, a gourmet kitchen, and a private backyard oasis with a pool and spa. The master suite features a luxurious en-suite bathroom and a walk-in closet. Enjoy the tranquility of suburban living while still being close to top-rated schools, parks, and shopping.
----------------------------------------
ID: 8
Description: Discover your own urban oasis in the heart of the city. This stylish 3-bedroom, 2-bathroom loft features high ceilings, exposed brick walls, and a private rooftop deck with city views. The open-concept living area is perfect for entertaining, while the master suite offers a peaceful retreat. Enjoy the convenience of city living with top restaurants, shops, and entertainment just steps away.
----------------------------------------
ID: 3
Description: Live the beachfront dream in this stunning 5-bedroom, 4-bathroom paradise. W

## 6. LLM Augmented Response Generation
- LLM Augmentation: For each retrieved listing, use the LLM to augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
- Maintaining Factual Integrity: Ensure that the augmentation process enhances the appeal of the listing without altering factual information.

In [76]:
def augment_listing(description, preferences):
    """
    Augment the property description based on buyer preferences.

    :param description: Original property description.
    :param preferences: Dictionary containing buyer preferences.
    :return: Augmented description.
    """
    prompt = (
        f"Buyer Preferences:\n"
        f" - Bedrooms: {preferences['bedrooms']}\n"
        f" - Bathrooms: {preferences['bathrooms']}\n"
        f" - Location: {preferences['location']}\n"
        f" - Price Range: {preferences['price_range']}\n"
        f" - Minimum House Size: {preferences['house_size']} sqft\n\n"
        f"Property Description:\n"
        f"{description}\n\n"
        f"Task: Rewrite the property description to subtly emphasize features that align with the buyer's preferences. "
        f"Ensure factual accuracy while making the description more appealing to the buyer."
    )

    try:
#         print(f"DEBUG: Sending prompt to LLM:\n{prompt[:500]}...")  # Log truncated prompt
        response = llm([HumanMessage(content=prompt)])
        augmented_description = response.content
#         print(f"DEBUG: Received augmented description:\n{augmented_description[:500]}...")
        return augmented_description.strip()
    except Exception as e:
        print(f"An error occurred during LLM augmentation: {e}")
        return description  # Return the original description if augmentation fails

In [79]:
# Process each retrieved listing
augmented_results = []
for doc in results:
    original_description = doc.page_content
    augmented_description = augment_listing(original_description, buyer_preferences)
    augmented_results.append({
        "id": doc.metadata.get('id', 'N/A'),
        "original_description": original_description,
        "augmented_description": augmented_description
    })

# Output the augmented listings
for result in augmented_results:
    print(f"ID: {result['id']}")
#     print(f"Original Description: {result['original_description']}")
    print(f"Augmented Description: {result['augmented_description']}")
    print("-" * 40)  # Separator for clarity

ID: 2
Augmented Description: Nestled in the desirable city of Irvine, California, this charming 3-bedroom, 2-bathroom home is the perfect sanctuary for those seeking a peaceful retreat. Boasting a spacious layout of over 1800 sqft, this home features a cozy fireplace, a gourmet kitchen, and a private backyard oasis with a pool and spa. The master suite offers a luxurious en-suite bathroom and a walk-in closet, providing the perfect blend of comfort and style. Located in a serene neighborhood close to top-rated schools, parks, and shopping, this home offers the ideal balance of suburban living and convenience. Don't miss out on this opportunity to make this dream home yours for under $700,000.
----------------------------------------
ID: 8
Augmented Description: Welcome to your dream home in Irvine, California! This stunning 3-bedroom, 2-bathroom loft boasts a spacious 1800 sqft layout, perfect for your family. With high ceilings and exposed brick walls, this urban oasis exudes charm an