## HomeMatch - Personalized home search
This is a notebook for the "Future Homes Realty". In an industry where personalization is key to customer satisfaction, we want to revolutionize how clients interact with real estate listings. The goal is to create a personalized experience for each buyer, making the property search process more engaging and tailored to individual preferences.

### Core Components of "HomeMatch"
* Understanding Buyer Preferences
* Integrating with a Vector Database
* Personalized Listing Description Generation
* Listing final Presentation/Output 

### Step 1: Setting Up the Python Application
Initialize a Python Project: Create a new Python project, setting up a virtual environment and installing necessary packages like LangChain, a suitable LLM library (e.g., OpenAI's GPT), and a vector database package compatible with Python (e.g., ChromaDB or LanceDB).

In [1]:
#!pip install pandas

In [2]:
import pandas as pd
import requests
import os
import shutil #Shell utility to remove chroma directory if it exists

from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI # this is the new import statement
from langchain.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.evaluation import load_evaluator
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
from langchain.vectorstores.chroma import Chroma
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from fastapi.encoders import jsonable_encoder

from pydantic import BaseModel, Field, NonNegativeInt
from typing import List
from random import sample 

In [3]:


os.environ["OPENAI_API_KEY"] = "voc-xxx.96694845"
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"


### Step 2: Generating Real Estate Listings
Generate at least 10 real estate listings using a Large Language Model. Use these listings to populate the database for testing and development of "HomeMatch".

An example of a listing: Neighborhood: Green Oaks

Price: $800,000

Bedrooms: 3

Bathrooms: 2

House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

*Neighborhood Description:* 
Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.

In [4]:
# Define LLM model
model_name = 'gpt-3.5-turbo'

In [6]:
# load the model
llm = OpenAI(model_name=model_name, temperature=0, max_tokens=2000)

INSTRUCTION = "Generate a CSV file with at least 10 real estate listings."
SAMPLE_LISTING = \
"""
Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.
"""

In [7]:
class RealEstateListing(BaseModel):
    """
    A real estate listing.
    
    Attributes:
    - neighborhood: str
    - price: NonNegativeInt
    - bedrooms: NonNegativeInt
    - bathrooms: NonNegativeInt
    - house_size: NonNegativeInt
    - description: str
    - neighborhood_description: str
    """
    neighborhood: str = Field(description="Neighborhood where the house is located")
    price: NonNegativeInt = Field(description="Price of house in USD")
    bedrooms: NonNegativeInt = Field(description="Number of bedrooms in the house")
    bathrooms: NonNegativeInt = Field(description="Number of bathrooms in the house")
    house_size: NonNegativeInt = Field(description="Size of the house in square feet")
    description: str = Field(description="Description of the house")
    neighborhood_description: str = Field(description="Description of the neighborhood.")  

class RealEstateListingCollection(BaseModel):
    """
    A collection of real estate listings.
    
    Attributes:
    - listings: List[RealEstateListing]
    """
    listings: List[RealEstateListing] = Field(description="A list of real estate listings")

In [8]:
# Generate output parser
# Print it just to study and debug
parser = PydanticOutputParser(pydantic_object=RealEstateListingCollection)
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"description": "A collection of real estate listings.\n\nAttributes:\n- listings: List[RealEstateListing]", "properties": {"listings": {"title": "Listings", "description": "A list of real estate listings", "type": "array", "items": {"$ref": "#/definitions/RealEstateListing"}}}, "required": ["listings"], "definitions": {"RealEstateListing": {"title": "RealEstateListing", "description": "A real estate listing.\n\nAttributes:\n- neighborhood: str\n- price: NonNegativeInt\n- bedrooms: NonNegativeInt\n- bathrooms: NonNegativeInt\n- house_size: Non

In [9]:
# Create prompt query and print prompt
# Attribution - See sentiment-analysis class from Udacity Course 4
prompt = PromptTemplate(
    template="{instruction}\n{sample}\n{format_instructions}\n",
    input_variables=["instruction", "sample"],
    partial_variables={"format_instructions": parser.get_format_instructions},
)

query = prompt.format(
    instruction=INSTRUCTION,
    sample=SAMPLE_LISTING,
)
print(query)

Generate a CSV file with at least 10 real estate listings.

Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bi

In [10]:
# Response from the model
response = llm(query)

In [11]:
result = parser.parse(response)

In [12]:
# Create a dataframe from LLM's response and save the response to a CSV file called real_estate_listings.csv
df = pd.DataFrame(jsonable_encoder(result.listings))
df.head()
df.to_csv('listings.csv', index_label = 'id')

### Step 3: Storing Listings in a Vector Database
* Vector Database Setup: Initialize and configure ChromaDB to store real estate listings.

* Generating and Storing Embeddings: Convert the LLM-generated listings into suitable embeddings that capture the semantic content of each listing, and store these embeddings in the vector database.

In [13]:
CHROMA_PATH = "chroma"
CSV_PATH = "listings.csv" 

In [14]:
# Initialize and configure ChromaDB. Store real estate listings.
# Attribution - See rag.py coursework from Udacity class Course 4

embedding_function = OpenAIEmbeddings()

df = pd.read_csv(CSV_PATH)
documents = []
for index, row in df.iterrows():
    documents.append(Document(page_content=row['description'], metadata={'id': str(index)}))


# Split Text
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,
    chunk_overlap=100,
    length_function=len,
    add_start_index=True,
)
chunks = text_splitter.split_documents(documents)
print(f"Split {len(documents)} documents into {len(chunks)} chunks.")

if chunks:
    document = chunks[10]
    print(document.page_content)
    print(document.metadata)

# Save to Chroma
if os.path.exists(CHROMA_PATH):
    shutil.rmtree(CHROMA_PATH)

db = Chroma.from_documents(
    chunks, OpenAIEmbeddings(), persist_directory=CHROMA_PATH
)
db.persist()
print(f"Saved {len(chunks)} chunks to {CHROMA_PATH}.")

Split 10 documents into 25 chunks.
Stunning 5-bedroom, 4-bathroom brownstone in the historic neighborhood of Brooklyn Heights. This beautifully renovated home features original details, modern amenities, and a private garden. With
{'id': '3', 'start_index': 0}
Saved 25 chunks to chroma.


### Step 4: Building the User Preference Interface
- Collect buyer preferences, such as the number of bedrooms, bathrooms, location, and other specific requirements from a set of questions. Example

questions = [
"How big do you want your house to be?" "What are 3 most important things for you in choosing this property?", "Which amenities would you like?", "Which transportation options are important to you?", "How urban do you want your neighborhood to be?",
] answers = [ "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.", "A quiet neighborhood, good local schools, and convenient shopping options.", "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.", "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.", "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."

** Buyer Preference Parsing: Implement logic to interpret and structure these preferences for querying the vector database.

In [16]:
PROMPT_TEMPLATE =\
"""
Based on the following context:

{context}

---

Answer : {question}
"""

### Step 5 Searching Based on Preferences
Semantic Search Implementation: Use the structured buyer preferences to perform a semantic search on the vector database, retrieving listings that most closely match the user's requirements.
Listing Retrieval Logic: Fine-tune the retrieval algorithm to ensure that the most relevant listings are selected based on the semantic closeness to the buyer’s preferences.

In [17]:
def predict_response(query_text, PROMPT_TEMPLATE):
    embeddings = OpenAIEmbeddings()
    db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embeddings)

    # Search the DB.
    results = db.similarity_search_with_relevance_scores(query_text, k=3)
    if len(results) == 0 or results[0][1] < 0.7:
        print(f"Unable to find matching results.")
    else:
        context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
        prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
        prompt = prompt_template.format(context=context_text, question=query_text)
        print(f"Generated Prompt:\n{prompt}")
        
        model = ChatOpenAI()
        response_text = model.predict(prompt)
        sources = [doc.metadata.get("id", None) for doc, _score in results]
        formatted_response = f"Response: {response_text}\nSources: {sources}"
        print(formatted_response)

In [18]:
query_text1 =  "A comfortable three-bedroom house with a spacious kitchen and a cozy living room."
predict_response(query_text1, PROMPT_TEMPLATE)

Generated Prompt:
Human: 
Based on the following context:

beautifully renovated home features original details, modern amenities, and a private garden. With spacious living areas and a chef's kitchen, this home is perfect for families and entertaining.

---

Perfect for entertaining or relaxing with family, this home offers the best of indoor and outdoor living.

---

Modern 3-bedroom, 3-bathroom condo in the trendy neighborhood of South Beach. This sleek and stylish home features floor-to-ceiling windows, a gourmet kitchen, and a private balcony with city views.

---

Answer : A comfortable three-bedroom house with a spacious kitchen and a cozy living room.

Response: This cozy and inviting home is perfect for families and entertaining. With modern amenities and a private garden, it offers the best of indoor and outdoor living.
Sources: ['3', '1', '5']


### Step 6: Personalizing Listing Descriptions

* LLM Augmentation: For each retrieved listing, use the LLM to augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
* Maintaining Factual Integrity: Ensure that the augmentation process enhances the appeal of the listing without altering factual information.

In [20]:
AUGMENTED_PROMPT_TEMPLATE =\
"""
Based on the following context:

{context}

---

craft a response that not only answers the question {question}, but also ensures that your explanation is distinct, appealing, factual, and aligns with the specified preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
"""

In [21]:
predict_response(query_text1, AUGMENTED_PROMPT_TEMPLATE)

Generated Prompt:
Human: 
Based on the following context:

beautifully renovated home features original details, modern amenities, and a private garden. With spacious living areas and a chef's kitchen, this home is perfect for families and entertaining.

---

Perfect for entertaining or relaxing with family, this home offers the best of indoor and outdoor living.

---

Modern 3-bedroom, 3-bathroom condo in the trendy neighborhood of South Beach. This sleek and stylish home features floor-to-ceiling windows, a gourmet kitchen, and a private balcony with city views.

---

craft a response that not only answers the question A comfortable three-bedroom house with a spacious kitchen and a cozy living room., but also ensures that your explanation is distinct, appealing, factual, and aligns with the specified preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.

Response: This charming three-bedroom house is the perfect blend of 

### Step 7: Deliverables and Testing

In [22]:
query_text2 =  "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
predict_response(query_text2, AUGMENTED_PROMPT_TEMPLATE)

Generated Prompt:
Human: 
Based on the following context:

stunning city views, high-end finishes, and top-of-the-line appliances. Enjoy the convenience of living in the bustling city center with access to world-class dining, shopping, and entertainment.

---

home features floor-to-ceiling windows, a gourmet kitchen, and a private balcony with city views. Live the urban lifestyle with access to world-class dining, shopping, and nightlife.

---

home features panoramic city views, high-end finishes, and a private rooftop terrace. Live in style with access to fine dining, shopping, and cultural attractions.

---

craft a response that not only answers the question A balance between suburban tranquility and access to urban amenities like restaurants and theaters., but also ensures that your explanation is distinct, appealing, factual, and aligns with the specified preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.

Respons

In [23]:
query_text3 =  "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads."
predict_response(query_text3, AUGMENTED_PROMPT_TEMPLATE)

Generated Prompt:
Human: 
Based on the following context:

stunning city views, high-end finishes, and top-of-the-line appliances. Enjoy the convenience of living in the bustling city center with access to world-class dining, shopping, and entertainment.

---

home features floor-to-ceiling windows, a gourmet kitchen, and a private balcony with city views. Live the urban lifestyle with access to world-class dining, shopping, and nightlife.

---

home features panoramic city views, high-end finishes, and a private rooftop terrace. Live in style with access to fine dining, shopping, and cultural attractions.

---

craft a response that not only answers the question Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads., but also ensures that your explanation is distinct, appealing, factual, and aligns with the specified preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.

Response: This s

In [24]:
query_text4 =  "A quiet neighborhood, good local schools, and convenient shopping options."
predict_response(query_text4, AUGMENTED_PROMPT_TEMPLATE)

Generated Prompt:
Human: 
Based on the following context:

stunning city views, high-end finishes, and top-of-the-line appliances. Enjoy the convenience of living in the bustling city center with access to world-class dining, shopping, and entertainment.

---

Perfect for entertaining or relaxing with family, this home offers the best of indoor and outdoor living.

---

features original details, modern updates, and a private courtyard. With multiple living spaces and a gourmet kitchen, this home is perfect for families and entertaining.

---

craft a response that not only answers the question A quiet neighborhood, good local schools, and convenient shopping options., but also ensures that your explanation is distinct, appealing, factual, and aligns with the specified preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.

Response: This charming home is nestled in a quiet neighborhood with top-rated local schools and conve

## Conclusion
In this project we

1. Generated 10 diverse and realistic real estate listings using an LLM.
2. Created a vector database and successfully stored real estate listing embeddings within it. DB effectively stored and organized the embeddings generated from the LLM-created listings.
3. Included functionality where listings are semantically searched based on given buyer preferences. The search returned listings that closely match the input preferences.
4. Demonstrated a logical flow where buyer preferences are used to search and then augment the description of real estate listings. The augmentation personalized the listing without changing factual information.
5. Utilized an LLM to generate personalized descriptions for the real estate listings based on buyer preferences. The descriptions were unique, appealing, and tailored to the preferences provided.
6. We tested for various scenarios.