This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 

In [1]:
! pip install jq

Defaulting to user installation because normal site-packages is not writeable


In [11]:
import os

os.environ["OPENAI_API_KEY"] = "API_KEY"
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"

from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.prompts.few_shot import FewShotPromptTemplate
import json

from langchain.document_loaders import JSONLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain import LLMChain
from langchain.chains.question_answering import load_qa_chain
from langchain.memory import ConversationSummaryMemory, ConversationBufferMemory, CombinedMemory, ChatMessageHistory
from typing import Any, Dict, Optional, Tuple
from langchain.chains import ConversationChain


In [12]:
model_name = "gpt-3.5-turbo"
temperature = 0.0
llm = OpenAI(model_name=model_name, temperature=temperature, max_tokens = 3000)

instruction = """
Generate at least 10 real estate listings for various types of properties. Each listing should include the following details:
Neighborhood: A well-known or fictional neighborhood where the property is located.
Price: The listing price of the property in USD.
Bedrooms: The number of bedrooms in the property.
Bathrooms: The number of bathrooms in the property.
Size: The total square footage of the property.
Description: A compelling and engaging description of the property, highlighting its key features, amenities, and unique selling points.
Neighborhood Description: A brief overview of the neighborhood, including notable landmarks, nearby attractions, schools, and the general atmosphere.
Ensure that each listing varies in property type (e.g., single-family homes, apartments, condos, townhouses, luxury estates, vacation homes) and location (urban, suburban, rural).
return the response as a JSON array.
"""

## Synthetic Data Generation
Criteria	
- Generating Real Estate Listings with an LLM

Submission Requirements
- The submission must demonstrate using a Large Language Model (LLM) to generate at least 10 diverse and realistic real estate listings containing facts about the real estate.

In [None]:
neighborhood1 = "Green Oaks"
price1 = "$800,000"
bedrooms1 = 3
bathrooms1 = 2
size1 = "2,000 sqft"
description1 = """Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem."""
neighborhood_description1 = """Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze."""

example_prompt = PromptTemplate(input_variables=["neighborhood", "price", "bedrooms", "bathrooms", "size", "description","neighborhood_description"], 
                                template="{neighborhood}\n{price}\n{bedrooms}\n{bathrooms}\n{size}\n{description}\n{neighborhood_description}")
examples = [ 
    {
        "neighborhood":neighborhood1,
        "price":price1,
        "bedrooms":bedrooms1,
        "bathrooms":bathrooms1,
        "size":size1,
        "description":description1,
        "neighborhood_description": neighborhood_description1,
    }
]

cot_prompt = FewShotPromptTemplate(
    examples=examples, 
    example_prompt=example_prompt, 
    suffix="Use this example to generate corrent set of realstate listings: {input}", 
    input_variables=["input"]
)

cot_text = cot_prompt.format(input=instruction)
print("=== Chain of Thought Prompt ===")
print(cot_text)

In [None]:
print("=== Chain of Thought Answer ===")

res=llm(cot_text)

In [None]:
print(res)

Turn the results into a file So i dont have to generate the data every time I work on this notebook saving on the API costs.

In [None]:

with open("listings.json", "w") as file:
    json.dump(json.loads(res), file, indent=4)

print("File saved successfully as real_estate_listings.json")

## Semantic Search p1
Criteria	
- Creating a Vector Database and Storing Listings

Submission Requirements
- The project must demonstrate the creation of a vector database and successfully storing real estate listing embeddings within it. The database should effectively store and organize the embeddings generated from the LLM-created listings.

In [4]:
# Load the JSON file
with open("listings.json", "r") as file:
    listings = json.load(file)

# Print the loaded data
print(json.dumps(listings, indent=4))  # Pretty print the JSON data

[
    {
        "Neighborhood": "Brooklyn Heights",
        "Price": "$1,200,000",
        "Bedrooms": 4,
        "Bathrooms": 3,
        "Size": "2,500 sqft",
        "Description": "Stunning brownstone in the heart of Brooklyn Heights with original details and modern upgrades. This 4-bedroom, 3-bathroom home features a chef's kitchen, rooftop terrace, and spacious backyard.",
        "Neighborhood Description": "Brooklyn Heights is a historic neighborhood known for its tree-lined streets, charming brownstones, and proximity to Brooklyn Bridge Park."
    },
    {
        "Neighborhood": "Beverly Hills",
        "Price": "$5,500,000",
        "Bedrooms": 6,
        "Bathrooms": 7,
        "Size": "8,000 sqft",
        "Description": "Luxurious estate in Beverly Hills with a pool, tennis court, and guest house. This 6-bedroom, 7-bathroom home offers privacy and elegance in a prestigious neighborhood.",
        "Neighborhood Description": "Beverly Hills is synonymous with luxury living, 

In [13]:
loader = JSONLoader(file_path='./listings.json', jq_schema='.[]', text_content=False)  # Adjust jq_schema as needed
docs = loader.load()

print(docs)

[Document(page_content='{"Neighborhood": "Brooklyn Heights", "Price": "$1,200,000", "Bedrooms": 4, "Bathrooms": 3, "Size": "2,500 sqft", "Description": "Stunning brownstone in the heart of Brooklyn Heights with original details and modern upgrades. This 4-bedroom, 3-bathroom home features a chef\'s kitchen, rooftop terrace, and spacious backyard.", "Neighborhood Description": "Brooklyn Heights is a historic neighborhood known for its tree-lined streets, charming brownstones, and proximity to Brooklyn Bridge Park."}', metadata={'source': '/workspace/listings.json', 'seq_num': 1}), Document(page_content='{"Neighborhood": "Beverly Hills", "Price": "$5,500,000", "Bedrooms": 6, "Bathrooms": 7, "Size": "8,000 sqft", "Description": "Luxurious estate in Beverly Hills with a pool, tennis court, and guest house. This 6-bedroom, 7-bathroom home offers privacy and elegance in a prestigious neighborhood.", "Neighborhood Description": "Beverly Hills is synonymous with luxury living, upscale shopping

In [14]:
splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
split_docs = splitter.split_documents(docs)

embeddings = OpenAIEmbeddings()

db = Chroma.from_documents(split_docs, embeddings)

## Semantic Search p2
Criteria
- Semantic Search of Listings Based on Buyer Preferences

Submission Requirements
- The application must include a functionality where listings are semantically searched based on given buyer preferences. The search should return listings that closely match the input preferences.

### Buyer Pref

In [15]:
summary_prompt = PromptTemplate(
    input_variables=["questions", "answers"],
    template="""
Given the following user responses to house preference questions, generate a concise summary.

Questions:
{questions}

Answers:
{answers}

Summary:
""",
)

# Create LLMChain
summary_chain = LLMChain(llm=llm, prompt=summary_prompt)

# Prepare input
questions = [
    "How big do you want your house to be?",
    "What are 3 most important things for you in choosing this property?",
    "Which amenities would you like?",
    "Which transportation options are important to you?",
    "How urban do you want your neighborhood to be?",
]

answers = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
]

# Generate summary
summary = summary_chain.run({"questions": questions, "answers": answers})

# Output result
print(summary)

The user prefers a comfortable three-bedroom house with a spacious kitchen and cozy living room. Their top priorities are a quiet neighborhood, good local schools, and convenient shopping options. Desired amenities include a backyard for gardening, a two-car garage, and a modern, energy-efficient heating system. Important transportation options include easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads. They prefer a neighborhood that offers a balance between suburban tranquility and access to urban amenities like restaurants and theaters.


##  Personalizing Listing Descriptions

In [16]:
query = """You are a real state agent. For each listing augment the description, tailoring it to resonate with the buyer’s specific preferencesby subtly emphasizing aspects of the property that align with what the buyer is looking for based on these preferences: {} 
""".format(summary)

task = """
- Personalize each listing descriptions based on the buyer's preferences.
- Highlight features that align with their needs while keeping the information factual.
- Ensure the tone is engaging and appealing without misrepresenting the property."""

similar_docs = db.similarity_search(query, k=5)
prompt = PromptTemplate(
    template="{query}\nContext: {context}\nTask:{task}",
    input_variables=["query", "context","task"],)
chain = load_qa_chain(llm, prompt = prompt, chain_type="stuff")
llm_resp=chain.run(input_documents=similar_docs, query = query, task=task)

In [17]:
# print("These are the top 5 listings similar to what you are looking for:")
# for doc in similar_docs:
#     print(doc.page_content)

print("\n__________________________________________________________________\n")
print("Our suggestion:")
print(llm_resp)


__________________________________________________________________

Our suggestion:
1. **Georgetown Townhouse:**
   Historic townhouse in Georgetown with a private garden and original details. This 3-bedroom, 2-bathroom home features a gourmet kitchen, hardwood floors, and a cozy fireplace. Perfect for those seeking a comfortable three-bedroom house with a spacious kitchen and cozy living room. Nestled in the quiet neighborhood of Georgetown, known for its upscale shops and historic architecture. Enjoy the convenience of nearby shopping options and good local schools. The backyard is ideal for gardening, and the two-car garage provides ample storage. Stay cozy with the modern, energy-efficient heating system. Easy access to a reliable bus line and major highway, with bike-friendly roads for your convenience. Experience suburban tranquility with access to urban amenities like restaurants and theaters in this prestigious enclave.

2. **Soho Loft-Style Apartment:**
   Loft-style apartmen

In [18]:
max_rating = 100

history = ChatMessageHistory()
history.add_user_message(f"""You are AI Real Estate Agent that will recommend user listings based on their answers to personal questions. Ask user {len(questions)} questions""")
for i in range(len(questions)):
    history.add_ai_message(questions[i])
    history.add_user_message(answers[i])
    
summary_memory = ConversationSummaryMemory(
    llm=llm,
    memory_key="recommendation_summary", 
    input_key="input",
    buffer=f"The human answered {len(questions)} personal questions). Use them to rate, from 1 to {max_rating}, how much they like a listing they describe to you.",
    return_messages=True)

# you could choose to store some of the q/a in memory as well, in addition to original questions
class MementoBufferMemory(ConversationBufferMemory):
    def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
        input_str, output_str = self._get_input_output(inputs, outputs)
        self.chat_memory.add_ai_message(output_str)
    
conversational_memory = MementoBufferMemory(
    chat_memory=history,
    memory_key="questions_and_answers", 
    input_key="input"
)

# Combined
memory = CombinedMemory(memories=[conversational_memory, summary_memory])
RECOMMENDER_TEMPLATE = """The following is a friendly conversation between a human and an AI Real State Agent. 
                        The AI is follows human instructions and real state ratings for a human based on the listing information
                        and human's answers to questions. 

Summary of Recommendations:
{recommendation_summary}
Personal Questions and Answers:
{questions_and_answers}
Human: {input}
AI:"""
PROMPT = PromptTemplate(
    input_variables=["recommendation_summary", "input", "questions_and_answers"],
    template=RECOMMENDER_TEMPLATE
)
recommender = ConversationChain(llm=llm, verbose=False, memory=memory, prompt=PROMPT)
 
for doc in similar_docs:
#     print(doc)
    listing = doc
    # print(f"Plot: {movie_plot}")
    
    plot_rating_instructions = f"""
        === START listing ===
        {doc}
        === END listing SUMMARY ===
        =====================================

        RATING INSTRUCTIONS THAT MUST BE STRICTLY FOLLOWED:
        AI will provide a highly personalized rating based only on the listing information and the summary provided by the user, as well as the user's answers to questions included within the context.
        AI should be highly sensitive to the user's personal preferences captured in their responses and should not be influenced by any other factors.
        AI will also construct a persona for the user based on their answers to questions and use this persona to rate the listing.
        OUTPUT FORMAT:
            First, include the persona you have developed in the explanation for the rating. Describe the persona in a few sentences.
            Explain how the user's preferences, as captured in their answers to personal questions, influenced the creation of this persona.
            Additionally, consider other ratings for this user that you might have, as they might provide more information about the user's preferences.
            Your goal is to provide a rating that closely aligns with the rating the user would give to this listing.
            Remember that the user has limited time and wants to see listings they will appreciate, so your rating should be as accurate as possible.
            Ratings will range from 1 to {max_rating}, with {max_rating} meaning the user will love it, and 1 meaning the user will dislike it.
            Include a logical explanation for your rating based on the persona you've built and the user's responses to questions.
            - Personalize each listing description based on the buyer's preferences.
            - Highlight features that align with their needs while keeping the information factual.
            - Ensure the tone is engaging and appealing without misrepresenting the property.

            YOUR REVIEW MUST END WITH THE TEXT: "RATING for this Listing is " FOLLOWED BY THE RATING.
            FOLLOW THE INSTRUCTIONS STRICTLY, OTHERWISE, THE USER MAY NOT UNDERSTAND YOUR REVIEW.
        """
    prediction = recommender.predict(input=plot_rating_instructions)
    print(prediction)
    print("___________________________________________________________________________________")
final_recommendation = """Now that AI has rated all the listings, AI will recommend human the one that human will like the most. 
                            AI will respond with property recommendation, and short explanation for why human will like it over all other movies. 
                            AI will not include any ratings in your explanation, only the reasons why human will like it the most.
                            However, the property you will pick must be one of the property you rated the highest.
                            For example, if you rated one property 65, and the other 60, you will recommend the property with rating 65 because rating 65 
                            is greate than rating of 60 ."""
prediction = recommender.predict(input=final_recommendation)
print(prediction)

Based on the user's preferences, I have developed a persona of someone who values a comfortable and spacious home in a quiet neighborhood with good local schools and convenient shopping options. They also prioritize amenities such as a backyard for gardening, a two-car garage, and a modern, energy-efficient heating system. Transportation options are important to them, including easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads. They prefer a balance between suburban tranquility and access to urban amenities like restaurants and theaters.

Considering the listing in Georgetown, which features a historic townhouse with a private garden, a gourmet kitchen, hardwood floors, and a cozy fireplace, I believe this property aligns well with the user's preferences. The neighborhood of Georgetown offers charm, historic architecture, and upscale shops, providing a mix of tranquility and access to urban amenities.

Overall, based on the user's persona and pre

In [19]:
! tar -czvf HomeMatch.tar.gz *


HomeMatch.ipynb
HomeMatch.py
HomeMatch.tar.gz
README.md
listings.json
requirements.txt
