# Project: Personalized Real Estate Agent

## Rubric

### Synthetic Data Generation

- [x] Generating Real Estate Listings with an LLM
    - The submission must demonstrate using a Large Language Model (LLM) to generate at least 10 diverse and realistic real estate listings containing facts about the real estate.

### Semantic Search

- [x] Creating a Vector Database and Storing Listings
    - The project must demonstrate the creation of a vector database and successfully storing real estate listing embeddings within it. The database should effectively store and organize the embeddings generated from the LLM-created listings.

- [x] Semantic Search of Listings Based on Buyer Preferences
    - The application must include a functionality where listings are semantically searched based on given buyer preferences. The search should return listings that closely match the input preferences.

### Augmented Response Generation

- [x] Logic for Searching and Augmenting Listing Descriptions
    - The project must demonstrate a logical flow where buyer preferences are used to search and then augment the description of real estate listings. The augmentation should personalize the listing without changing factual information.

- [x] Use of LLM for Generating Personalized Descriptions
    - The submission must utilize an LLM to generate personalized descriptions for the real estate listings based on buyer preferences. The descriptions should be unique, appealing, and tailored to the preferences provided.

# Step 1: Setup
## Install dependencies

In [None]:
%pip install -r requirements.txt

## Load environment variables
`env` file has all required environment variables we don't want to explicity state in the code (including Vocareum OpenAI API Key)

In [None]:
%load_ext dotenv
%dotenv

## Import dependencies

In [None]:
import json
import os

from langchain.chains import ConversationChain
from langchain.chat_models import ChatOpenAI
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.memory import ConversationSummaryMemory, ChatMessageHistory
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain.schema.document import Document
from langchain.vectorstores import Chroma
from pydantic import BaseModel, Field, NonNegativeInt

## Setup some constants

In [None]:
REAL_ESTATE_LISTINGS_JSON_FILE = './real_estate_listings.json'

##  Initialize OpenAI Models

In [None]:
# Completion model
completion_model_name = "gpt-3.5-turbo-instruct"
completion_temperature = 0.5
completion_llm = OpenAI(
    model_name=completion_model_name,
    temperature=completion_temperature,
    max_tokens=1000
)

# Chat model
chat_model_name = "gpt-3.5-turbo"
chat_temperature = 0.7
chat_llm = OpenAI(
    model_name=chat_model_name,
    temperature=chat_temperature,
    max_tokens=500
)

# Step 2: Generate Real Estate Listings
Use a completion model to generate at least 10 real estate listings

An example listing might be:

```
Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.
```

## Define desired output schema

In [None]:
class RealEstateListing(BaseModel):
    neighborhood: str = Field(description="the neighborhood the home is in")
    price: NonNegativeInt = Field(description="the current home listing price")
    bedrooms: NonNegativeInt = Field(description="the number of bedrooms the home has")
    bathrooms: NonNegativeInt = Field(description="the number of bathrooms the home has")
    size: NonNegativeInt = Field(description="the size of the home in square feet")
    house_description: str = Field(description="a brief description of the home itself")
    neighborhood_description: str = Field(description="a brief description of the neighborhood the home is in")

# Create a parser to instruct LLM to output a response that conforms to our schema
parser = PydanticOutputParser(pydantic_object=RealEstateListing)
print(parser.get_format_instructions())

## Create a prompt template for the LLM

In [None]:
prompt = PromptTemplate(
    template="{instruction}\n{format_instructions}",
    input_variables=["instruction"],
    partial_variables={"format_instructions": parser.get_format_instructions},
)

instruction = """
Generate a real estate listing that lists its price, the neighborhood, how many bedrooms and bathrooms, the size in
square footage, and its house and neighborhood descriptions.
"""

query = prompt.format(instruction=instruction)
query

## Generate real estate listings

In [None]:
def generate_real_estate_listings(max_listings=10) -> list[dict]:
    listings = []
    for i in range(max_listings):
        resp = completion_llm.predict(query)
        data = json.loads(resp)
        listings.append(data)
    with open(REAL_ESTATE_LISTINGS_JSON_FILE, 'w') as f:
        f.write(json.dumps(listings))
    return listings


def get_real_estate_listings(max_listings=10) -> list[dict]:
    if os.path.exists(REAL_ESTATE_LISTINGS_JSON_FILE):
        print(f'Loading real estate listings data from {REAL_ESTATE_LISTINGS_JSON_FILE}')
        with open(REAL_ESTATE_LISTINGS_JSON_FILE, 'r') as f:
            listings = json.loads(f.read())
    else:
        print('Generating real estate listings data from OpenAI')
        listings = generate_real_estate_listings(max_listings)
    return listings[:max_listings]

In [None]:
real_estate_listings = get_real_estate_listings()
real_estate_listings

# Step 3: Store Listings in a Vector Database

First, convert JSON data into text data to be able to store into a vector database

In [None]:
content_template = """
Neighborhood: {neighborhood}
Price: ${price}
Bedrooms: {bedrooms}
Bathrooms: {bathrooms}
House Size: {size} sqft

Description: {house_description}

Neighborhood Description: {neighborhood_description}
"""
docs = []
for idx, listing in enumerate(real_estate_listings):
  doc = Document(
    page_content=content_template.format(**listing),
    metadata=listing,
    id=idx
  )
  docs.append(doc)


In [None]:
# Get OpenAI embeddings
embeddings = OpenAIEmbeddings()

# Create vector db client
db = Chroma.from_documents(documents=docs, embedding=embeddings)

# Step 4: Build the User Preference Interface
Collect buyer preferences, such as the number of bedrooms, bathrooms, location, and other specific requirements from a set of questions or telling the buyer to enter their preferences in natural language.

In [None]:
questions = [
    "How big do you want your house to be?",
    "What are 3 most important things for you in choosing this property?",
    "Which amenities would you like?",
    "Which transportation options are important to you?",
    "How urban do you want your neighborhood to be?",
]
answers = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters.",
]

Create a real estate recommender personal for the chatbot and have it understand the user's home buying preferences

In [None]:
# Create chat history
chat_history = ChatMessageHistory()
system_prompt = f"""
You are AI that will recommend user a home listing based on their answers to
personal questions. Ask user {len(questions)} questions.
"""
chat_history.add_user_message(system_prompt)
for question, answer in zip(questions, answers):
  chat_history.add_ai_message(question)
  chat_history.add_user_message(answer)

chat_history.messages

In [None]:
# Summarize chat history to create a user persona
user_persona_memory = ConversationSummaryMemory.from_messages(
  llm=ChatOpenAI(model_name=chat_model_name, temperature=0),
  chat_memory=chat_history,
  return_messages=True,
  memory_key='user_persona'
)
user_persona_memory.buffer

# Step 5: Search Based on Preferences

In [None]:
# Search vector database for top 5 docs based on
# user persona built off preferences
similar_docs = db.similarity_search(user_persona_memory.buffer, k=5)
similar_docs

# Step 6: Personalize Listing Descriptions

Build recommender prompt and conversation chain

In [None]:
recommender_template = """
The following is a friendly conversation between a human and an AI Real Estate
Recommender. The AI asks the human a series of questions related to the human's
home buying preferences and builds a persona derived from their answers to
these questions.

User persona build based on summary of conversation with human:
{user_persona}

Human: {input}
AI:
"""
recommender_prompt = PromptTemplate(
    input_variables=["user_persona", "input"],
    template=recommender_template,
)
recommender = ConversationChain(
    llm=chat_llm,
    memory=user_persona_memory,
    prompt=recommender_prompt,
)

For each real estate listing similar to the user's preferences, summarize
it based on the user's persona

In [None]:
summarization_template = """
RATING INSTRUCTIONS THAT MUST BE STRICTLY FOLLOWED:
Please summarize the following real estate listing based on my own preferences.
You should be very sensible to my personal preferences and should not be
influenced by anything else.

OUTPUT FORMAT:
DO NOT add anything superflous such as "Based on your preferences".
Simply summarize the listing.

Listing:
{listing}
"""

summarizations = []
for doc in similar_docs:
  listing = doc.page_content
  summarization_instruction = summarization_template.format(listing=listing)
  summary = recommender.predict(input=summarization_instruction)
  summarizations.append(summary)

### Now, output a response to show to the user

In [None]:
response_template = """
Neighborhood: {neighborhood}
Price: ${price:,}
Bedrooms: {bedrooms}
Bathrooms: {bathrooms}
House Size: {size} sqft

{summary}
"""

# Hard code friendly start to response
print('Based on your preferences, here are some listings I think you might like:')
for listing, summary in zip(similar_docs, summarizations):
  data = listing.metadata
  response = response_template.format(
    neighborhood=data['neighborhood'],
    price=data['price'],
    bedrooms=data['bedrooms'],
    bathrooms=data['bathrooms'],
    size=data['size'],
    summary=summary
  )
  print(response)