This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 


# Project: Personalized Real Estate Agent

An AI Real Estate which can provide personalized house list based on individual preference. 
Built with Python, Langchain, Vector Database and OpenAI's API.


## Step 1: Synthetic Data Generation

Generate a list of at least 10 real estates/houses using LLM, 
This list will be served as the data source to store into the vector database.


In [1]:
# Import Python Packages

from langchain.llms import OpenAI
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, NonNegativeInt
from typing import List
from random import sample 
from langchain.document_loaders.csv_loader import CSVLoader 




In [74]:

# Step 1.1: Initialize OpenAI

from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
import os

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

model_name = "gpt-3.5-turbo"
llm = ChatOpenAI(model=model_name, temperature=0, api_key=OPENAI_API_KEY)


In [75]:

# Step 1.2: Define data model for parser

class RealEstate(BaseModel):
    title: str = Field(description="The name or title of a house")
    bedroom: int = Field(description="Number of bedroom for a house")
    bathroom: int = Field(description="Number of bathroom for a house")
    garage: int = Field(description="Number of garage for a house")
    price_usd: int = Field(description="The price of a house in USD")
    size_sqft: int = Field(description="The size of a house in square feet") 
    description: str = Field(description="The 200-word description of a house")
    neighborhood: str = Field(description="The brief summary or name of the neighborhood for the house")
    neighborhood_details: str = Field(description="The 200-word description of the neighborhood")

# parser to get structured data from LLM response
parser = PydanticOutputParser(pydantic_object=RealEstate)
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"title": {"description": "The name or title of a house", "title": "Title", "type": "string"}, "bedroom": {"description": "Number of bedroom for a house", "title": "Bedroom", "type": "integer"}, "bathroom": {"description": "Number of bathroom for a house", "title": "Bathroom", "type": "integer"}, "garage": {"description": "Number of garage for a house", "title": "Garage", "type": "integer"}, "price_usd": {"description": "The price of a house in USD", "title": "Price Usd", "type": "integer"}, "size_sqft": {"description": "The siz

In [76]:

# Step 1.3: Ask LLM to generate a list of real estate and save to a text file 

from langchain_core.prompts import ChatPromptTemplate
import pandas as pd
import os.path

HOUSE_FILE_NAME_CSV = "Listings.csv"
HOUSE_FILE_NAME_TXT = "Listings.txt"

def generate_house_list():
    question = """
    Generate 11 houses which are currently for sale in the US market, 
    earch house should include these properties: 
    title, 
    number of bedrooms,
    number of bathrooms,
    number of garadges
    price (integer in USD), 
    size (integer in squre feet), 
    description of the house with at least 200 words, 
    neighborhood, 
    description of neighborhood with at least 100 words
    """
    
    structured_llm = llm.with_structured_output(RealEstate, method="json_mode")
    listings = structured_llm.invoke(question + "\n\n" + parser.get_format_instructions())
    return listings

def save_house_list_to(file_name: str, list_dic): 
    df = pd.DataFrame.from_dict(list_dic)
    df.to_csv(file_name)


# if the house list file is not there, generate it from LLM
if not os.path.isfile(HOUSE_FILE_NAME_CSV):
    houses = generate_house_list()
    save_house_list_to(HOUSE_FILE_NAME_CSV, houses["houses"])


print("House data source is ready!")


House data source is ready!



## Step 2: Semantic Search

In this section the AI agent will search the vector database and return the best match based on user's preference.

In [40]:

# Step 2.1: Create a vector database from the house list

from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings

# Name for the database
DATABASE_FILE = "listings_chroma_db"

# Create database from the datasource if not exists
if len(os.listdir(DATABASE_FILE)) == 0:
    loader = CSVLoader(file_path = HOUSE_FILE_NAME_CSV)
    docs = loader.load()

    splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    split_docs = splitter.split_documents(docs)

    db = Chroma.from_documents(split_docs, OpenAIEmbeddings(), persist_directory= DATABASE_FILE)

# Database already exist, just load it 
else:
    db = Chroma(persist_directory=DATABASE_FILE, embedding_function = OpenAIEmbeddings())

print(db)


<langchain_community.vectorstores.chroma.Chroma object at 0x123e43460>


In [77]:

# Step 2.2: Prepare User preferences

# questions for the user
agent_questions = [
    "How many bedrooms would you like?", 
    "How many bathrooms do you prefer?",
    "How many garage do you prefer?",
    "What is your budget?", 
    "How big do you want your house to be?",
    "Which amenities would you like?",
    "How urban do you want your neighborhood to be?",
    "Which transportation options are important to you?",
]

# user's anwsers 
user_anwsers = [
    "between 3 and 5",
    "At least two",
    "At least one", 
    "Between 500,000 and 800,000 USD",
    "At least 2000 sqare feet", 
    "A backyard for gardening and a modern, energy-efficient heating system.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters.", 
    "Easy access to a reliable public transport will be a plus but not essential, away from busy vehicle roads and close to bike-friendly roads are preferrable",
]

assert (len(agent_questions) == len(user_anwsers)) 

# Combine all the questions and anwsers
seperator = "\n\n###\n\n"
qa = seperator
for i in range(len(agent_questions)):
    question = agent_questions[i]
    anwser = user_anwsers[i]
    qa += "Question: " + question + "\nAnwser: " + anwser + seperator

print(qa)

print("User preferences are ready!")




###

Question: How many bedrooms would you like?
Anwser: between 3 and 5

###

Question: How many bathrooms do you prefer?
Anwser: At least two

###

Question: How many garage do you prefer?
Anwser: At least one

###

Question: What is your budget?
Anwser: Between 500,000 and 800,000 USD

###

Question: How big do you want your house to be?
Anwser: At least 2000 sqare feet

###

Question: Which amenities would you like?
Anwser: A backyard for gardening and a modern, energy-efficient heating system.

###

Question: How urban do you want your neighborhood to be?
Anwser: A balance between suburban tranquility and access to urban amenities like restaurants and theaters.

###

Question: Which transportation options are important to you?
Anwser: Easy access to a reliable public transport will be a plus but not essential, away from busy vehicle roads and close to bike-friendly roads are preferrable

###


User preferences are ready!


In [78]:

# Step 2.3: Find the best house based from the vector database based on user's anwsers

from langchain.chains import RetrievalQA

# prepare the prompts
query = """

A user is looking for a dream house to buy and has anwsered some questions related to his preference.

Based on the following anwsers from the user, please suggest the best house for this user and explain why.

Here are the questions and anwsers:

"""

prompt = query + qa

# Ask LLM to find the best match
rag = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=db.as_retriever())
result = rag.invoke(prompt)


In [79]:

# Inspect the LLM response

print("Origin Query: ")
print(result['query'])

print("AI suggested the best house: ")
print(result['result'])


Origin Query: 


A user is looking for a dream house to buy and has anwsered some questions related to his preference.

Based on the following anwsers from the user, please suggest the best house for this user and explain why.

Here are the questions and anwsers:



###

Question: How many bedrooms would you like?
Anwser: between 3 and 5

###

Question: How many bathrooms do you prefer?
Anwser: At least two

###

Question: How many garage do you prefer?
Anwser: At least one

###

Question: What is your budget?
Anwser: Between 500,000 and 800,000 USD

###

Question: How big do you want your house to be?
Anwser: At least 2000 sqare feet

###

Question: Which amenities would you like?
Anwser: A backyard for gardening and a modern, energy-efficient heating system.

###

Question: How urban do you want your neighborhood to be?
Anwser: A balance between suburban tranquility and access to urban amenities like restaurants and theaters.

###

Question: Which transportation options are important


## Step 3: Augmented Response 

In this section, we will first create a conversation history from the chat above between human and the AI Agent. 
Then ask LLM to generate a better description for the selected house.


In [71]:

# Step 3.1: Create a chat history to include all context information 

from typing import Any, Dict, Optional, Tuple
from langchain.memory import ConversationSummaryMemory, ConversationBufferMemory, CombinedMemory, ChatMessageHistory

# Summary of the conversations
summary_memory = ConversationSummaryMemory(
    llm=llm, 
    memory_key="recommendation_summary",
    input_key= "input",
    buffer="The AI has found the best house based on the human's preference, with explainations of why that was the best selection.", 
    return_messages=True)

# History of the conversation
history = ChatMessageHistory()
history.add_ai_message("You are a Real Estate agent who is familiar with the housing market in the US.")
history.add_user_message(result['query'])
history.add_ai_message(result['result'])

class MementoBufferMemory(ConversationBufferMemory):
    def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
        input_str, output_str = self._get_input_output(inputs, outputs)
        self.chat_memory.add_ai_message(output_str)

conversational_memory = MementoBufferMemory(
    chat_memory = history, 
    memory_key = "questions_and_answers",
    input_key= "input"
)

# Create memories for the new conversation
memory = CombinedMemory(memories=[conversational_memory, summary_memory])



In [72]:

# Step 3.2: Ask LLM to write a new description for the selected house

from langchain.chains import ConversationChain

RECOMMENDER_TEMPLATE = """The following is a friendly conversation between a human (the user) and an AI Read Estate Agent. 
                          The AI Agent has found the best house based on the user's preference. 

Summary of Recommendations:
{recommendation_summary}
Personal Questions and Answers:
{questions_and_answers}
User: {input}
AI:"""

PROMPT = PromptTemplate(
    input_variables=["recommendation_summary", "input", "questions_and_answers"],
    template=RECOMMENDER_TEMPLATE
)

recommender = ConversationChain(llm=llm, verbose=True, memory=memory, prompt=PROMPT)

user_input = """
Please write a new description for the selected house to make it more appeal to the user. 
The AI must use the existing factors of the selected house and not to add or makeup any new properties for the house.
"""

# Fire LLM to get the result
prediction = recommender.predict(input = user_input)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human (the user) and an AI Read Estate Agent. 
                          The AI Agent has found the best house based on the user's preference. 

Summary of Recommendations:
[SystemMessage(content="The AI has found the best house to meet the human's preference, with explainations of why that was the best selection.")]
Personal Questions and Answers:
AI: You are a Real Estate agent who is familiar with the housing market in the US.
Human: 

A user is looking for a dream house to buy and has anwsered some questions related to his preference.

Based on the following anwsers from the user, please suggest the best house for this user and explain why.

Here are the questions and anwsers:



###

Question: How many bedrooms would you like?
Anwser: between 3 and 5

###

Question: How many bathrooms do you prefer?
Anwser: At least two

###

Question: How ma

In [80]:
# Show the new description
prediction


'Introducing the "Spacious Family Home" in the Family-Friendly Neighborhood, a perfect blend of comfort and convenience for your dream living experience. This 5-bedroom, 4-bathroom haven boasts a generous 3500 square feet of space, ensuring ample room for your family to grow and thrive. With a 3-car garage and a backyard play area for gardening enthusiasts, this home offers both practicality and leisure in one package.\n\nStep into the modern kitchen with an island, perfect for culinary adventures, and enjoy the benefits of an energy-efficient heating system that keeps you cozy all year round. Priced at 500,000 USD, this home fits comfortably within your budget range of 500,000 to 800,000 USD, making it a smart investment for your future.\n\nLocated in a Family-Friendly Neighborhood renowned for its excellent schools, parks, and community events, this home offers the ideal balance between suburban tranquility and urban amenities. Safe streets and recreational facilities make it a bike-