# HomeMatch

HomeMatch is a property recommendation system that helps users find their ideal home based on their preferences and requirements. The system uses a combination of language models and vector embeddings to match user-defined criteria with real estate listings. It ranks the listings based on relevance to the user's preferences and generates personalized descriptions for the top matches.

In [1]:
# Initialization

import re
from typing import List

from dotenv import load_dotenv
from langchain_chroma import Chroma
from langchain_core.documents import Document
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from pydantic import BaseModel

load_dotenv(override=True)

vocareum_url = "https://openai.vocareum.com/v1"
llm = ChatOpenAI(
    model="gpt-4o-mini",
    base_url=vocareum_url,
)

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    base_url=vocareum_url,
)

listings_path = "Listings.txt"
db_path = "chroma_db"

## Prepare listings

Listings are loaded from the text file and prepared for embedding. Relevant keywords are extracted from the description and neighborhood description using LLM. This will be used for the similarity search.

In [2]:
class Listing(BaseModel):
    listing_number: int
    neighborhood: str
    price: str
    bedrooms: int
    bathrooms: float
    house_size: str
    description: str
    neighborhood_description: str
    description_keywords: str
    neighborhood_keywords: str
    score: float = None

In [3]:


def parse_listings(file_content: str) -> List[Listing]:
    """Parse the content of the listings file and construct a list of Listing objects. Keywords are generated using LLM."""
    # Split the content into individual listings
    raw_listings = file_content.split("\n\n")

    listings = []
    for raw_listing in raw_listings:
        # Extract information using regex
        listing_number = int(re.search(r"Listing (\d+)", raw_listing).group(1))
        neighborhood = re.search(r"Neighborhood: (.+)", raw_listing).group(1)
        price = re.search(r"Price: (.+)", raw_listing).group(1)
        bedrooms = int(re.search(r"Bedrooms: (\d+)", raw_listing).group(1))
        bathrooms = float(re.search(r"Bathrooms: ([\d.]+)", raw_listing).group(1))
        house_size = re.search(r"House Size: (.+)", raw_listing).group(1)
        description = re.search(r"Description: (.+)", raw_listing).group(1).strip()
        neighborhood_description = re.search(r"Neighborhood Description: (.+)", raw_listing).group(1).strip()

        description_keywords = extract_keywords(
            "Generate a comma-separated list of search keywords that a potential buyer might use to find this property, based on its description:\n{description}",
            description)
        neighborhood_keywords = extract_keywords(
            "Generate a comma-separated list of search keywords that would help someone find this type of neighborhood, based on the following description:\n{description}",
            neighborhood_description)

        listing = Listing(
            listing_number=listing_number,
            neighborhood=neighborhood,
            price=price,
            bedrooms=bedrooms,
            bathrooms=bathrooms,
            house_size=house_size,
            description=description,
            neighborhood_description=neighborhood_description,
            description_keywords=description_keywords,
            neighborhood_keywords=neighborhood_keywords,
        )
        listings.append(listing)

    return listings


def extract_keywords(prompt, description) -> str:
    """Extract keywords from the description using LLM."""
    prompt = PromptTemplate.from_template(prompt + ": {description}")
    chain = prompt | llm | StrOutputParser()
    summary = chain.invoke({"description": description})
    return summary


def load_listings(file_path: str) -> List[Listing]:
    """Load the listings from the specified file and return a list of Listing objects."""
    with open(file_path, 'r') as file:
        content = file.read()

    listings = parse_listings(content)

    return listings


processed_listings = load_listings(listings_path)
print(f"Loaded {len(processed_listings)} listings.")

# Print one of the listings:
listing = processed_listings[10]
print(f"Listing {listing.listing_number}")
print(f"Neighborhood: {listing.neighborhood}")
print(f"Price: {listing.price}")
print(f"Bedrooms: {listing.bedrooms}")
print(f"Bathrooms: {listing.bathrooms}")
print(f"House Size: {listing.house_size}")
print(f"Description: {listing.description}")
print(f"Description Keywords: {listing.description_keywords}")
print(f"Neighborhood Description: {listing.neighborhood_description}")
print(f"Neighborhood Keywords: {listing.neighborhood_keywords}")
print("\n")



Loaded 13 listings.
Listing 11
Neighborhood: Riverside Heights
Price: $1,250,000
Bedrooms: 4
Bathrooms: 3.5
House Size: 3,200 sqft
Description: Experience luxury living in this stunning waterfront property in Riverside Heights. This contemporary 4-bedroom, 3.5-bathroom home offers breathtaking views of the river from floor-to-ceiling windows. The gourmet kitchen features top-of-the-line appliances and a large island, perfect for entertaining. Relax in the master suite with a private balcony and spa-like bathroom. The lower level includes a home theater and wine cellar. Step outside to your private dock for easy access to water activities. This modern masterpiece combines elegance with the tranquility of waterfront living.
Description Keywords: waterfront property, luxury home for sale, Riverside Heights real estate, contemporary house, 4-bedroom home, 3.5-bathroom property, river view house, gourmet kitchen, home theater, wine cellar, private dock, modern luxury living, spa-like master

## Create Embeddings

Embed the listings in a Chroma vector database. Save locally for future use.

In [4]:
def get_documents_for_embedding(listings: List[Listing]) -> List[Document]:
    """Convert the listings into documents for embedding. The documents contain only the content that is optimal for the similarity search."""
    documents = []
    for listing in listings:
        text = f"""
        Neighborhood: {listing.neighborhood}
        Price: {listing.price}
        Bedrooms: {listing.bedrooms}
        Bathrooms: {listing.bathrooms}
        House Size: {listing.house_size}
        Description Keywords: {listing.description_keywords}
        Neighborhood Description Keywords: {listing.neighborhood_keywords}
        """
        document = Document(
            page_content=text,
            metadata=listing.model_dump(exclude_none=True, exclude_unset=True),
        )
        documents.append(document)

    return documents


# load db from disk
db = Chroma(persist_directory=db_path, embedding_function=embeddings)
if db._collection.count() == 0:
    print("Database is empty. Adding documents...")

    documents_for_embedding = get_documents_for_embedding(processed_listings)

    db = Chroma.from_documents(
        documents_for_embedding,
        embeddings,
        persist_directory=db_path
    )
    print(f"Database now contains {db._collection.count()} documents.")
else:
    print(f"Database already contains {db._collection.count()} documents.")


Database is empty. Adding documents...
Database now contains 13 documents.


## User Search

User will answer a series of questions to define their property search criteria. The answers will be used to generate an ideal personalized property listing summary. This summary will be used to search for real similar listings in the vector database. The top 5 similar listings will be retrieved and reranked by the LLM based on the user's preferences. The three listings with the largest score will be used to generate personalized descriptions and displayed to the user.

In [5]:
question_1 = "How big do you want your house to be?"
question_2 = "What are 3 most important things for you in choosing this property?"
question_3 = "Which amenities would you like?"
question_4 = "Which transportation options are important to you?"
question_5 = "How urban do you want your neighborhood to be?"

In [6]:
# Create an ideal listing based on user's answers

ideal_listing_prompt_template = PromptTemplate.from_template(f"""
Based on the user's answers to the following questions, create a property listing summary in the format provided below. Ensure the document reflects the user's preferences and requirements for their ideal home. The summary should be detailed and cohesive, capturing the user's desired house size, important features, amenities, transportation options, and urban preference.

# Format for the Property Listing Summary:
Neighborhood: [Include only if explicitly specified by the user]
Price: [Estimate a price range based on house size and amenities]
Bedrooms: [Match the number of bedrooms specified by the user]
Bathrooms: [Estimate the number of bathrooms based on house size or user preference]
House Size: [Match the house size specified by the user]
Description Keywords: [Generate a comma-separated list of search keywords based on the user's important features and amenities. Mention the house layout, special features, and overall appeal]
Neighborhood Description Keywords: [Generate a comma-separated list of search keywords based on the user's urban preference, transportation needs, and lifestyle]

# Example User's Answers:
{question_1}
"A comfortable three-bedroom house with a spacious kitchen and a cozy living room."

{question_2}
"A quiet neighborhood, good local schools, and convenient shopping options."

{question_3}
"A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system."

{question_4}
"Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads."

{question_5}
"A balance between suburban tranquility and access to urban amenities like restaurants and theaters."

# Example Summary:
Price: $750,000
Bedrooms: 3
Bathrooms: 2
House Size: 1,800 sqft
Description Keywords: three-bedroom, spacious kitchen, cozy living room, backyard for gardening, two-car garage, modern heating system, energy-efficient, comfortable home
Neighborhood Description Keywords: quiet neighborhood, good local schools, convenient shopping, suburban tranquility, urban amenities, restaurants, theaters, reliable bus line, major highway, bike-friendly roads

# User's Answers:
{question_1}
"{{user_search_1}}"

{question_2}
"{{user_search_2}}"

{question_3}
"{{user_search_3}}"

{question_4}
"{{user_search_4}}"

{question_5}
"{{user_search_5}}"

Now create the property listing summary based on the user's answers in the format provided above.
""")


def create_ideal_listing(user_search: List[str]) -> str:
    """Create an ideal property listing based on the user's answers to the questions using LLM."""

    chain = ideal_listing_prompt_template | llm | StrOutputParser()
    return chain.invoke({
        "user_search_1": user_search[0],
        "user_search_2": user_search[1],
        "user_search_3": user_search[2],
        "user_search_4": user_search[3],
        "user_search_5": user_search[4],
    })

In [7]:
# Search for similar listings based on the ideal listing

def search_listings(ideal_listing) -> List[Listing]:
    """Execute a similarity search in the vector database to find listings similar to the ideal listing."""

    retriever = db.as_retriever(
        search_type="similarity",
        search_kwargs={"k": 5},
    )
    documents = retriever.invoke(ideal_listing)
    return [Listing(**doc.metadata) for doc in documents]



In [8]:
# Rank the listings based on the user's preferences

class LLMRating(BaseModel):
    reasoning: str
    score: float


@tool(return_direct=True)
def create_rating(reasoning: str, score: float) -> LLMRating:
    """Creates a rating based on the reasoning and score provided."""

    return LLMRating(reasoning=reasoning, score=score)


def rate_listing(listing: Listing, user_search) -> Listing:
    """Rate the listing based on the user's preferences and requirements using LLM."""

    listing_text = f"""
    Neighborhood: {listing.neighborhood}
    Price: {listing.price}
    Bedrooms: {listing.bedrooms}
    Bathrooms: {listing.bathrooms}
    House Size: {listing.house_size}
    Description: {listing.description}
    Neighborhood Description: {listing.neighborhood_description}
    """

    rating_template = PromptTemplate.from_template(f"""
    Based on the user's preferences and requirements, rate the following property listing on a scale of 1 to 10, with 10 being the most relevant and 1 being the least relevant. Consider the user's ideal house size, important features, amenities, transportation options, and urban preference.

    # User's Answers:
    {question_1}
    "{user_search[0]}"

    {question_2}
    "{user_search[1]}"

    {question_3}
    "{user_search[2]}"

    {question_4}
    "{user_search[3]}"

    {question_5}
    "{user_search[4]}"

    # Listing:
    {listing_text}
    
    Please provide a detailed reasoning for your rating based on the user's preferences and the listing details. After the reasoning, give a final score on a scale of 1 to 10. Use tool 'create_rating' to transmit the results.
    """)

    llm_with_tools = llm.bind_tools([create_rating], tool_choice="create_rating")

    chain = rating_template | llm_with_tools

    # Invoke the chain and extract the score
    response = chain.invoke({})

    listing.score = response.tool_calls[0]["args"]["score"]

    return listing


def rank_listings(listings: List[Listing], user_search) -> List[Listing]:
    """Rate and rank the listings based on the user's preferences and requirements. The best matches will be at the top."""

    for listing in listings:
        rate_listing(listing, user_search)
    return sorted(listings, key=lambda x: x.score, reverse=True)

In [9]:
# Generate personalized descriptions

def create_personalized_description(original_listing: Listing, user_search) -> str:
    """Generate a personalized property and neighborhood description based on the user's answers and the original listing."""

    description_template = PromptTemplate.from_template(f"""
    Generate a personalized description for the following property listing.

    # User's Answers:
    {question_1}
    "{user_search[0]}"

    {question_2}
    "{user_search[1]}"

    {question_3}
    "{user_search[2]}"

    {question_4}
    "{user_search[3]}"

    {question_5}
    "{user_search[4]}"

    # Original Listing:
    Neighborhood: {original_listing.neighborhood}
    Price: {original_listing.price}
    Bedrooms: {original_listing.bedrooms}
    Bathrooms: {original_listing.bathrooms}
    House Size: {original_listing.house_size}
    Property Description: {original_listing.description}
    Neighborhood Description: {original_listing.neighborhood_description}
    
    The generated description should subtly highlight the key features and amenities that align with the user's answers. DO NOT INVENT FEATURES THAT DO NOT EXIST IN THE ORIGINAL DESCRIPTION. The output should consist of two parts: the personalized property and neighborhood descriptions.
    """)

    chain = description_template | llm | StrOutputParser()
    return chain.invoke({})

In [14]:
# Print results

def search_and_print_results(user_answers: List[str]):
    """Search for listings based on the user's answers and print the top 3 matches with personalized descriptions."""

    ideal_listing = create_ideal_listing(user_answers)
    found_listings = search_listings(ideal_listing)
    ranked_listings = rank_listings(found_listings, user_answers)

    print("We selected the top 3 listings that most closely match your preferences:\n\n")
    #iterate with index
    for i, listing in enumerate(ranked_listings[:3]):
        personalized_description = create_personalized_description(listing, [answer_1, answer_2, answer_3, answer_4, answer_5])

        print(f"Listing {i + 1}\n\n",

              f"HomeMatch Score: {listing.score}\n",
              f"Internal listing id: {listing.listing_number}\n\n",

              f"Original Listing:\n",
              f"Neighborhood: {listing.neighborhood}\n",
              f"Price: {listing.price}\n",
              f"Bedrooms: {listing.bedrooms}\n",
              f"Bathrooms: {listing.bathrooms}\n",
              f"House Size: {listing.house_size}\n",
              f"Description: {listing.description}\n",
              f"Neighborhood Description: {listing.neighborhood_description}\n\n",
              
              f"{personalized_description}\n",
              )

### Simulate User Search 1

In [15]:
# "How big do you want your house to be?"
answer_1 = "A spacious four-bedroom house with an open-concept layout and a home office."

# "What are 3 most important things for you in choosing this property?"
answer_2 = "Proximity to nature, a vibrant community with social activities, and a well-maintained property."

# "Which amenities would you like?"
answer_3 = "A swimming pool, a gourmet kitchen, and a home theater."

# "Which transportation options are important to you?"
answer_4 = "Access to a reliable train station, easy highway access, and walkable streets."

# "How urban do you want your neighborhood to be?"
answer_5 = "Suburban, with a quiet atmosphere but within a short drive to urban conveniences like shopping centers and restaurants."

search_and_print_results([answer_1, answer_2, answer_3, answer_4, answer_5])


We selected the top 3 listings that most closely match your preferences:


Listing 1

 HomeMatch Score: 10
 Internal listing id: 1

 Original Listing:
 Neighborhood: Sunset Valley
 Price: $1,200,000
 Bedrooms: 4
 Bathrooms: 3.0
 House Size: 3,500 sqft
 Description: Discover luxury living in this modern 4-bedroom, 3-bathroom home in Sunset Valley. This high-end property features smart home technology, a gourmet kitchen with state-of-the-art appliances, and a spacious home theater. The expansive living area opens up to a large backyard with a swimming pool and an outdoor kitchen, perfect for entertaining. The master suite includes a private balcony with stunning mountain views.
 Neighborhood Description: Sunset Valley is an upscale community known for its serene mountain views and proximity to Sunset Park. Residents enjoy the area's exclusive shopping and dining options, as well as easy access to hiking trails and top-rated schools.

 **Personalized Property Description:**

Welcome to yo

### Simulate User Search 2

In [16]:
# How big do you want your house to be?
answer_1 = "A cozy two-bedroom condo with a modern design and lots of natural light."

# What are 3 most important things for you in choosing this property?
answer_2 = "Close to public transportation, within walking distance to cafes and shops, and in a safe neighborhood."

# Which amenities would you like?
answer_3 = "A fitness center in the building, a balcony with a view, and high-speed internet."

# Which transportation options are important to you?
answer_4 = "Proximity to a metro station, bike lanes, and ride-sharing services."

# How urban do you want your neighborhood to be?
answer_5 = "Urban, with a bustling city feel and lots of cultural and entertainment options nearby."

search_and_print_results([answer_1, answer_2, answer_3, answer_4, answer_5])



We selected the top 3 listings that most closely match your preferences:


Listing 1

 HomeMatch Score: 7
 Internal listing id: 7

 Original Listing:
 Neighborhood: Riverstone Ridge
 Price: $950,000
 Bedrooms: 3
 Bathrooms: 2.5
 House Size: 2,300 sqft
 Description: Contemporary 3-bedroom, 2.5-bathroom townhouse in Riverstone Ridge. This property offers a sleek design with an open-concept living area, high ceilings, and large windows. The rooftop terrace provides panoramic views of the river and the city.
 Neighborhood Description: Riverstone Ridge is a trendy, up-and-coming neighborhood with a mix of modern living and natural beauty. Enjoy walking along the river, visiting local boutiques, and dining at hip cafes and restaurants. Excellent public transportation options make commuting easy.

 **Personalized Property Description:**

Welcome to your dream retreat in the heart of Riverstone Ridge! This contemporary townhouse, featuring three spacious bedrooms and 2.5 bathrooms, embodies mo

### Simulate User Search 3

In [17]:
# How big do you want your house to be?
answer_1 = "A three-bedroom townhouse with a finished basement and plenty of storage space."

# What are 3 most important things for you in choosing this property?
answer_2 = "A family-friendly neighborhood, close to good schools, and a sense of community."

# Which amenities would you like?
answer_3 = "A playground for kids, a two-car garage, and energy-efficient appliances."

# Which transportation options are important to you?
answer_4 = "Access to a bus route, proximity to major highways, and carpooling options."

# How urban do you want your neighborhood to be?
answer_5 = "Suburban, with a mix of tranquility and convenient access to essential services and family-oriented amenities."

search_and_print_results([answer_1, answer_2, answer_3, answer_4, answer_5])

We selected the top 3 listings that most closely match your preferences:


Listing 1

 HomeMatch Score: 8
 Internal listing id: 6

 Original Listing:
 Neighborhood: Pine Hill
 Price: $800,000
 Bedrooms: 4
 Bathrooms: 3.5
 House Size: 2,800 sqft
 Description: Elegant 4-bedroom, 3.5-bathroom Colonial home in Pine Hill. This property features a spacious traditional layout, hardwood floors, and a finished basement with a home theater. The large backyard includes a swimming pool and a play area for kids.
 Neighborhood Description: Pine Hill is a picturesque, family-friendly neighborhood with tree-lined streets and top-rated schools. Residents enjoy the proximity to Pine Hill Park and the local community center, which offers various activities and programs.

 **Personalized Property Description:**

Welcome to your dream home in Pine Hill! This elegant 4-bedroom, 3.5-bathroom Colonial residence offers a spacious 2,800 sqft layout, perfect for families seeking both comfort and style. The finis