Project Introduction
Imagine you're a talented developer at "Future Homes Realty", a forward-thinking real estate company. In an industry where personalization is key to customer satisfaction, your company wants to revolutionize how clients interact with real estate listings. The goal is to create a personalized experience for each buyer, making the property search process more engaging and tailored to individual preferences.

In [1]:
!pip install pandas

Defaulting to user installation because normal site-packages is not writeable


Section: Synthetic Data Generation
1.Generating Real Estate Listings with an LLM: The submission must demonstrate using a Large Language Model (LLM) to generate at least 10 diverse and realistic real estate listings containing facts about the real estate

In [19]:
import os
import pandas as pd
import re
from dataclasses import dataclass
from langchain.llms import OpenAI
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from pydantic import BaseModel, Field

os.environ["OPENAI_API_KEY"] = "voc-200544551126677187877867151067946181.86931116"
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"

from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.prompts.few_shot import FewShotPromptTemplate

model_name = "gpt-3.5-turbo"
temperature = 0.7
llm = OpenAI(model_name=model_name, temperature=temperature, max_tokens=4000)

In [26]:
# Define Pydantic model for the real estate listing
class RealEstateListing(BaseModel):
    neighborhood: str = Field(..., description="Neighborhood where the house is located")
    price: str = Field(..., description="Listing price of the house")
    bedrooms: int = Field(..., description="Number of bedrooms")
    bathrooms: int = Field(..., description="Number of bathrooms")
    house_size: str = Field(..., description="Size of the house in square feet")
    description: str = Field(..., description="Detailed description of the house")
    neighborhood_description: str = Field(..., description="Description of the neighborhood")

# Initialize PydanticOutputParser
output_parser = PydanticOutputParser(pydantic_object=RealEstateListing)

# Define the prompt template for generating listings
prompt_template = PromptTemplate(
    input_variables=[],
    template=("""
    Generate a detailed real estate listing with the following details:
    - Neighborhood
    - Price
    - Bedrooms
    - Bathrooms
    - House Size
    - Description
    - Neighborhood Description
    
    Each listing should include realistic information that would appear in a real estate ad.
    """)
)

# Extract data using regular expressions
def parse_listing(text):
    try:
        return RealEstateListing(
            neighborhood=re.search(r"Neighborhood:\s*(.+)", text).group(1),
            price=re.search(r"Price:\s*(.+)", text).group(1),
            bedrooms=int(re.search(r"Bedrooms:\s*(\d+)", text).group(1)),
            bathrooms=int(re.search(r"Bathrooms:\s*(\d+)", text).group(1)),
            house_size=re.search(r"House Size:\s*(.+)", text).group(1),
            description=re.search(r"Description:\s*(.+?)(?=\nNeighborhood Description:)", text, re.DOTALL).group(1).strip(),
            neighborhood_description=re.search(r"Neighborhood Description:\s*(.+)", text, re.DOTALL).group(1).strip()
        )
    except AttributeError:
        print(f"Failed to parse listing:\n{text}")
        return None

# Function to generate listings
def generate_real_estate_listings(n=10):
    listings = []
    for _ in range(n):
        prompt = prompt_template.format()
        response = llm(prompt)
        parsed_listing = parse_listing(response)
        if parsed_listing:
            listings.append(parsed_listing.dict())
    return listings

# Generate listings and save them to a CSV file
def save_listings_to_csv(filename="listings.csv"):
    listings = generate_real_estate_listings(n=10)
    df = pd.DataFrame(listings)
    df.to_csv(filename, index=False)
    print(f"Real estate listings saved to {filename}")


In [27]:
# Run the function to save listings
save_listings_to_csv()

Real estate listings saved to listings.csv


Section: Semantic Search
1.Creating a Vector Database and Storing Listings: The project must demonstrate the creation of a vector database and successfully storing real estate listing embeddings within it. The database should effectively store and organize the embeddings generated from the LLM-created listings.
2.Semantic Search of Listings Based on Buyer Preferences: The application must include a functionality where listings are semantically searched based on given buyer preferences. The search should return listings that closely match the input preferences.

In [30]:
import os
import pandas as pd
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.document_loaders import CSVLoader
from langchain.llms import OpenAI

In [31]:
# Step 1: Configure the OpenAI API and embeddings model
os.environ["OPENAI_API_KEY"] = "voc-200544551126677187877867151067946181.86931116"
model_name = "gpt-3.5-turbo"
embeddings = OpenAIEmbeddings()

# Load listings from CSV
loader = CSVLoader(file_path='./listings.csv')
documents = loader.load()

# Step 2: Set up ChromaDB (or another vector database)
chroma_db_path = "./chroma_db"  # Path to store ChromaDB data
vector_db = Chroma(collection_name="real_estate_listings", embedding_function=embeddings, persist_directory=chroma_db_path)


# Step 3: Convert each listing to embeddings and store in ChromaDB
# Filter out None values from metadata
def filter_metadata(metadata):
    return {key: value for key, value in metadata.items() if value is not None}

for document in documents:
    # Generate embedding for the document content (e.g., 'Description' field)
    embedding = embeddings.embed_query(document.page_content)
    
    # Store in vector database with metadata (all other fields of the listing)
    metadata = {
        "neighborhood": document.metadata.get("neighborhood"),
        "price": document.metadata.get("price"),
        "bedrooms": document.metadata.get("bedrooms"),
        "bathrooms": document.metadata.get("bathrooms"),
        "house_size": document.metadata.get("house_size"),
        "description": document.metadata.get("description"),
        "neighborhood_description": document.metadata.get("neighborhood_description")
    }
    
    filtered_metadata = filter_metadata(metadata)
    
    vector_db.add_texts([document.page_content], embeddings=[embedding], metadatas=[filtered_metadata])

# Save vector database to disk for persistence
vector_db.persist()
print("Real estate listings have been stored in the vector database.")

Real estate listings have been stored in the vector database.


Sale's questions:
What are 3 most important things for you in choosing this property?
How urban do you want your neighborhood to be?
User Answer:
A quiet neighborhood, good local schools, and convenient shopping options.
A balance between suburban tranquility and access to urban amenities like restaurants and theaters.

In [46]:
# Define the questions and collect answers
questions = [
    "What are 3 most important things for you in choosing this property?",
    "How urban do you want your neighborhood to be?",
]

# Example answers (these would come from the user in a real application)
answers = [
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters.",
]

# Combine answers into a preference text to use for semantic search
buyer_preferences_text = " ".join(answers)
print("Buyer Preferences for Search:\n", buyer_preferences_text)


Buyer Preferences for Search:
 A quiet neighborhood, good local schools, and convenient shopping options. A balance between suburban tranquility and access to urban amenities like restaurants and theaters.


In [40]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

# Set up the embeddings model
embeddings = OpenAIEmbeddings()

# Load the existing Chroma database
chroma_db_path = "./chroma_db"
vector_db = Chroma(collection_name="real_estate_listings", embedding_function=embeddings, persist_directory=chroma_db_path)

# Perform the semantic search directly with the preference text
def search_listings_by_preferences(preference_text, top_k=5):
    # Pass the preference text directly instead of an embedding
    results = vector_db.similarity_search(preference_text, k=top_k)
    return results

# Retrieve top listings based on preferences
top_k = 3  # Number of top listings to retrieve
matching_listings = search_listings_by_preferences(buyer_preferences_text, top_k=top_k)

# Display matching listings
for i, listing in enumerate(matching_listings):
    print(f"Listing {i + 1}:\n{listing}\n")


Listing 1:
page_content="neighborhood: Westwood Village\nprice: $750,000\nbedrooms: 3\nbathrooms: 2\nhouse_size: 1,800 square feet\ndescription: Welcome to this charming 3 bedroom, 2 bathroom home in the desirable Westwood Village neighborhood. This beautifully maintained property features a spacious living room with a cozy fireplace, a bright and airy kitchen with granite countertops and stainless steel appliances, and a dining area perfect for entertaining. The master bedroom boasts an en-suite bathroom and walk-in closet. Outside, you'll find a well-manicured backyard with a patio area for outdoor dining and a detached garage for parking or storage. Don't miss the opportunity to own this lovely home in a prime location.\nneighborhood_description: Westwood Village is known for its tree-lined streets, friendly community atmosphere, and convenient location near shopping, dining, and entertainment options. Residents enjoy easy access to parks, schools, and major freeways for commuting. 

Section: Augmented Response Generation
1.Logic for Searching and Augmenting Listing Descriptions:The project must demonstrate a logical flow where buyer preferences are used to search and then augment the description of real estate listings. The augmentation should personalize the listing without changing factual information.
2.Use of LLM for Generating Personalized Descriptions: The submission must utilize an LLM to generate personalized descriptions for the real estate listings based on buyer preferences. The descriptions should be unique, appealing, and tailored to the preferences provided.

In [47]:
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate

# Set up the LLM
llm = OpenAI(model_name="gpt-3.5-turbo")

# Function to personalize the listing description
def personalize_listing_description(listing_content, buyer_preferences):
    # Define the prompt template for LLM personalization
    prompt_template = PromptTemplate(
        input_variables=["listing_content", "preferences"],
        template="""
        You are a real estate agent helping a buyer find their dream home. Personalize the following property description
        to highlight features that resonate with the buyer's preferences. Maintain factual accuracy and only emphasize aspects
        that align with the buyer's preferences.

        Property Description:
        {listing_content}

        Buyer Preferences:
        {preferences}

        Personalized Description:
        """
    )

    # Format the prompt with the listing content and buyer preferences
    prompt = prompt_template.format(
        listing_content=listing_content,
        preferences=buyer_preferences
    )

    # Generate the personalized description
    personalized_description = llm(prompt).strip()
    return personalized_description

# Example usage: Apply personalization to each retrieved listing
personalized_listings = []
for listing in matching_listings:
    # Use the page_content of each listing as input for personalization
    listing_content = listing.page_content  # Raw listing details
    personalized_description = personalize_listing_description(listing_content, buyer_preferences_text)
    
    # Add the personalized description to the listing metadata
    listing.metadata["personalized_description"] = personalized_description
    personalized_listings.append(listing)

# Display the personalized descriptions
for i, listing in enumerate(personalized_listings):
    print(f"Listing {i + 1} - Enhanced Description:\n{listing.metadata['personalized_description']}\n")


Listing 1 - Enhanced Description:
Welcome to your dream home in the serene Westwood Village neighborhood! This 3 bedroom, 2 bathroom property offers everything you're looking for, including a cozy fireplace in the spacious living room, a bright kitchen with granite countertops and stainless steel appliances, and a dining area perfect for hosting gatherings. The master bedroom features an en-suite bathroom and walk-in closet for added convenience. Outside, the well-manicured backyard with a patio area is perfect for outdoor dining and relaxation. Plus, the detached garage provides ample parking or storage space.

Westwood Village is a quiet, tree-lined community known for its friendly atmosphere and proximity to top-rated schools. With easy access to shopping, dining, parks, and major freeways, you'll enjoy the perfect blend of suburban charm and urban convenience in this vibrant neighborhood. Don't miss out on the opportunity to make this lovely home yours!

Listing 2 - Enhanced Descri