# Step 1: Setting Up the Python Application.

In [80]:
#Install libraries.
!pip install pandas
!pip install chromadb
!pip install -U --quiet sentence-transformers==2.5.1 transformers==4.36.0 
!pip freeze | grep transformers

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


sentence-transformers==2.5.1
transformers==4.36.0


In [81]:
#Import libraries.
import os
import pandas as pd
import numpy as np
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.output_parsers import PydanticOutputParser
from langchain.chains import LLMChain
from pydantic import BaseModel
from typing import List
from typing import Union
from fastapi.encoders import jsonable_encoder
import chromadb
from chromadb.config import Settings
from sentence_transformers import SentenceTransformer

In [82]:
#Access OpenAI API. Replace the empty string with your own API key.
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"
os.environ["OPENAI_API_KEY"] = "voc-521589269126677343067966c0bdbc647de7.54391014"

# Step 2: Generating Real Estate Listings.

In [83]:
#Apply few-shot prompting to generate real estate listings with an LLM.
#This is the template for one example.
example_template = PromptTemplate(input_variables=["listing", "description", "neighborhood"], 
                                template="{listing}\n{description}\n{neighborhood}")

#Example one.
listing1 = """
Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft
"""

description1 = """
Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.
"""

neighborhood1 = """
Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.
"""

#Example two.
listing2 = """
Neighborhood: Newland
Price: $150,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,500 sqft
"""

description2 = """
Description: You'd love this spacious (2,500 sqft) three-bedroom house in the Bohemian neighborhood of Newland. For the low price of $150,000, you can purchase this stylish house with two modern bathrooms. Multiple residents have died violently in the kitchen, including a father who was stabbed by his own daughter. If you die next, please replace the washing machine.
"""

neighborhood2 = """
Neighborhood Description: Newland is a Bohemian, vibrant, and ethnically diverse region. The community is full of stupid and violent people. Easy access to drugs, gangs, and toilets. Take a piss in the nearby cemetery or vandalise in the Tesco. Crime rates are high. You should move here so that you can get into trouble.
"""

#Combine the two examples.
examples = [ 
    {
        "listing": listing1,
        "description": description1,
        "neighborhood": neighborhood1
    },
    {
        "listing": listing2,
        "description": description2,
        "neighborhood": neighborhood2
    }    
]

In [84]:
#This is the query that will become the final part of the prompt.
instruction = """
Generate 10 diverse and realistic real estate listings containing facts about the real estate.
"""

In [85]:
#Pydantic model that defines the schema for a single RE listing.
class Listing(BaseModel):
    neighborhood: str
    price: int
    bedrooms: int
    bathrooms: int
    size: int
    description: str
    neigh_description: str

#Pydantic model that defines a list of RE listings (instances of class Listing).
class ListingCollection(BaseModel):
    listings: List[Listing]
        
#The parser parses the LLM's output (10 RE listings) into a structured Python object (instance of class ListingCollection).
parser = PydanticOutputParser(pydantic_object=ListingCollection)

In [86]:
#Combine different parts into a single prompt.
prompt = FewShotPromptTemplate(
    examples=examples, 
    example_prompt=example_template, 
    suffix="Use these examples to generate a response to the problem below: \n\n{input}\n\n{format_instructions}", 
    input_variables=["input"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)
prompt_text = prompt.format(input=instruction)
print("=== Real Estate Listings Prompt ===")
print(prompt_text)

=== Real Estate Listings Prompt ===

Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft


Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.


Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting i

In [87]:
#Pass the prompt to an LLM to get a response.
model_name = "gpt-3.5-turbo"
temperature = 0.3
llm = OpenAI(model_name=model_name, temperature=temperature, max_tokens = 2000)
RElistings_LLM = llm(prompt_text)



In [88]:
#Parse the LLM's responses into the schema defined in Listing and store parsed responses in a list.
result = parser.parse(RElistings_LLM)
result.listings

[Listing(neighborhood='Sunnyvale', price=500000, bedrooms=4, bathrooms=3, size=2200, description='Beautiful 4-bedroom, 3-bathroom home in the family-friendly neighborhood of Sunnyvale. This spacious house features a large backyard perfect for entertaining, as well as a cozy fireplace in the living room.', neigh_description='Sunnyvale is known for its top-rated schools, safe streets, and friendly neighbors. Enjoy easy access to parks, shopping centers, and restaurants in this desirable community.'),
 Listing(neighborhood='Downtown Loft District', price=700000, bedrooms=2, bathrooms=2, size=1800, description='Modern 2-bedroom, 2-bathroom loft in the heart of the Downtown Loft District. This industrial-style unit features exposed brick walls, high ceilings, and a rooftop deck with city views.', neigh_description='Live in the center of the action with easy access to trendy restaurants, art galleries, and nightlife. Walk to work or take public transportation from this vibrant urban neighbor

In [89]:
#Store the LLM-generated listings in a dataframe.
df = pd.DataFrame(jsonable_encoder(result.listings))
df.to_csv('RElistings_LLM.csv')
df.head()

Unnamed: 0,neighborhood,price,bedrooms,bathrooms,size,description,neigh_description
0,Sunnyvale,500000,4,3,2200,"Beautiful 4-bedroom, 3-bathroom home in the fa...","Sunnyvale is known for its top-rated schools, ..."
1,Downtown Loft District,700000,2,2,1800,"Modern 2-bedroom, 2-bathroom loft in the heart...",Live in the center of the action with easy acc...
2,Lakefront Estates,1200000,5,4,3500,"Luxurious 5-bedroom, 4-bathroom estate on the ...",Experience waterfront living at its finest in ...
3,Historic Old Town,300000,2,1,1500,"Charming 2-bedroom, 1-bathroom cottage in the ...",Step back in time in this picturesque neighbor...
4,Mountain View Heights,850000,3,2,2000,"Inviting 3-bedroom, 2-bathroom home in the des...","Live close to nature with hiking trails, parks..."


# Step 3: Storing Listings in a Vector Database.

In [92]:
#Create a vector database.
client = chromadb.Client(Settings(persist_directory="./database"))
collection = client.create_collection("RElistings_db")

In [93]:
#Select an embedding model and load the dataframe with 10 listings.
MODEL_NAME = 'paraphrase-MiniLM-L6-v2'
df = pd.read_csv("RElistings_LLM.csv")

In [94]:
#This function creates embeddings for textual data.
def generate_embeddings(input_data: Union[str, list[str]]) -> np.ndarray:    
    model = SentenceTransformer(MODEL_NAME)
    embeddings = model.encode(input_data)
    return embeddings

for index, row in df.iterrows():
    listing_id = str(index)  #Turn a listing's ID into a string.
    description = row['description'] #Extract a listing's description.
    embedding = generate_embeddings(description) #Embed the extracted description.
    #Add the listing to the vector database.
    collection.add(
        documents=[description],
        metadatas=[row.to_dict()],
        ids=[listing_id],
        embeddings=[embedding.tolist()]
    )

collection.get(ids=["7", "8", "9"])



{'ids': ['7', '8', '9'],
 'embeddings': None,
 'metadatas': [{'Unnamed: 0': 7,
   'bathrooms': 1,
   'bedrooms': 1,
   'description': 'Artistic 1-bedroom, 1-bathroom loft in the vibrant Downtown Arts District. This creative space features exposed ductwork, concrete floors, and a gallery wall for showcasing artwork.',
   'neigh_description': 'Immerse yourself in the arts scene with galleries, theaters, and studios just steps away in the Downtown Arts District. Live among fellow artists and creatives in this dynamic urban neighborhood.',
   'neighborhood': 'Downtown Arts District',
   'price': 400000,
   'size': 1000},
  {'Unnamed: 0': 8,
   'bathrooms': 2,
   'bedrooms': 3,
   'description': 'Cozy 3-bedroom, 2-bathroom home in the family-friendly Suburban Meadows neighborhood. This well-maintained house features a fenced backyard, updated appliances, and a two-car garage.',
   'neigh_description': 'Settle down in this quiet suburban community with parks, schools, and shopping centers ne

# Step 4: Building the User Preference Interface.

In [95]:
#Simulate the process of collecting user preferences by hard-coding.

questions = [   
                "How big do you want your house to be?",
                "What are 3 most important things for you in choosing this property?",
                "Which amenities would you like?",
                "Which transportation options are important to you?",
                "How urban do you want your neighborhood to be?"
            ]

answers = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
            ]

#Combine the provided questions and answers into a simulated conversation. List of dictionaries.
simulation = [{"question": q, "answer": a} for q, a in zip(questions, answers)]

#Convert the above list of dictionaries into one string.
preference = "\n".join([f"Q: {item['question']}\nA: {item['answer']}" for item in simulation])
print(preference)

Q: How big do you want your house to be?
A: A comfortable three-bedroom house with a spacious kitchen and a cozy living room.
Q: What are 3 most important things for you in choosing this property?
A: A quiet neighborhood, good local schools, and convenient shopping options.
Q: Which amenities would you like?
A: A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.
Q: Which transportation options are important to you?
A: Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.
Q: How urban do you want your neighborhood to be?
A: A balance between suburban tranquility and access to urban amenities like restaurants and theaters.


# Step 5: Searching Based on Preferences.

In [96]:
def retrieve(preference):
    #Embed the user preferences.
    embedding_of_preference = generate_embeddings(preference)

    #Find the most relevant listing to the embedded user preferences in the database.
    RElistings_retrieved = collection.query(
        query_embeddings=embedding_of_preference.tolist(),
        n_results=1  #This is necessary to identify the top listing and nothing else.
    )

    #Extract the key statistics from the retrieved listing.
    metadata = RElistings_retrieved['metadatas'][0][0]
    metadata_string = f"Neighborhood: {metadata['neighborhood']}\n" \
                      f"Price: ${metadata['price']}\n" \
                      f"Size: {metadata['size']} sqft\n" \
                      f"Number of Bedrooms: {metadata['bedrooms']}\n" \
                      f"Number of Bathrooms: {metadata['bathrooms']}\n"


    #Combine the key statistics with the property and neighborhood descriptions.
    description = RElistings_retrieved['documents'][0][0]
    neigh_description = metadata['neigh_description']
    listing = f"Description:\n{description}\n\nMetadata:\n{metadata_string}\n\nNeighborhood Description:\n{neigh_description}"
    
    return listing

listing = retrieve(preference)
print("=== Most Relevant Listing ===")
print(listing)

=== Most Relevant Listing ===
Description:
Cozy 3-bedroom, 2-bathroom home in the family-friendly Suburban Meadows neighborhood. This well-maintained house features a fenced backyard, updated appliances, and a two-car garage.

Metadata:
Neighborhood: Suburban Meadows
Price: $450000
Size: 1600 sqft
Number of Bedrooms: 3
Number of Bathrooms: 2


Neighborhood Description:
Settle down in this quiet suburban community with parks, schools, and shopping centers nearby. Enjoy a safe and welcoming environment for families in Suburban Meadows.


# Step 6: Personalizing Listing Descriptions.

In [97]:
#Similar to step 5, find the most relevant listing to the embedded user preferences in the database. 
#Then, combine the retrieved listing with a personalization prompt.
def create_prompt(preference):
    
    listing = retrieve(preference)
    
    prompt_template = """
For the retrieved listing below, combine the description with the metadata and neighborhood description.

---

Context: 

{}

---

Question:

Augment the result by tailoring it to resonate with the buyer’s specific preferences.
Subtly emphasize aspects of the property that align with what the buyer is looking for.
The augmentation should personalize the listing without changing factual information, especially the numbers.

---

Answer:
    """
    return prompt_template.format(listing)

prompt_personal = create_prompt(preference)

print("=== Most Relevant Listing and Instructions ===")
print(prompt_personal)

=== Most Relevant Listing and Instructions ===

For the retrieved listing below, combine the description with the metadata and neighborhood description.

---

Context: 

Description:
Cozy 3-bedroom, 2-bathroom home in the family-friendly Suburban Meadows neighborhood. This well-maintained house features a fenced backyard, updated appliances, and a two-car garage.

Metadata:
Neighborhood: Suburban Meadows
Price: $450000
Size: 1600 sqft
Number of Bedrooms: 3
Number of Bathrooms: 2


Neighborhood Description:
Settle down in this quiet suburban community with parks, schools, and shopping centers nearby. Enjoy a safe and welcoming environment for families in Suburban Meadows.

---

Question:

Augment the result by tailoring it to resonate with the buyer’s specific preferences.
Subtly emphasize aspects of the property that align with what the buyer is looking for.
The augmentation should personalize the listing without changing factual information, especially the numbers.

---

Answer:
    


In [98]:
#WARNING. You may need to run this cell a few times for the augmented response to update to match the provided preferences.

#This is an instance of the LLMChain class. It connects LLMs with prompt templates to generate responses in a structured way.
llm_chain = LLMChain(llm=llm, prompt=PromptTemplate.from_template(prompt_personal))

#Implement the LLM chain to augment the retrieved listing that matches with the preferences specified by a user.
personalized_listing = llm_chain.run(input=preference)

print("=== Most Relevant Listing (Augmented) ===")
print(personalized_listing)

=== Most Relevant Listing (Augmented) ===
Welcome to your future home in the peaceful Suburban Meadows neighborhood! This cozy 3-bedroom, 2-bathroom house is perfect for you and your family. Imagine coming home to a well-maintained property with a fenced backyard, ideal for outdoor gatherings and playtime. With updated appliances and a two-car garage, convenience is at your fingertips. 

Located in a family-friendly community with parks, schools, and shopping centers nearby, you'll have everything you need within reach. Embrace the safe and welcoming environment that Suburban Meadows offers, providing a sense of security and comfort for your loved ones. 

Priced at $450,000 for a 1600 sqft home, this property is a fantastic investment for your family's future. Don't miss out on the opportunity to make this house your own in the heart of Suburban Meadows.


# Step 7: Create new preferences to test the agent.

In [100]:
#WARNING. You may need to run this cell a few times for the augmented response to update to match the provided preferences.

llm_chain = LLMChain(llm=llm, prompt=PromptTemplate.from_template(prompt_personal))

preference = "I am looking for a luxurious property with beach access."

prompt_personal = create_prompt(preference)
print("=== Most Relevant Listing and Instructions ===")
print(prompt_personal)

personalized_listing = llm_chain.run(input=preference)
print("=== Most Relevant Listing (Augmented) ===")
print(personalized_listing)

=== Most Relevant Listing and Instructions ===

For the retrieved listing below, combine the description with the metadata and neighborhood description.

---

Context: 

Description:
Luxurious 5-bedroom, 4-bathroom estate on the shores of Lakefront Estates. This custom-built home features a gourmet kitchen, private dock, and breathtaking lake views.

Metadata:
Neighborhood: Lakefront Estates
Price: $1200000
Size: 3500 sqft
Number of Bedrooms: 5
Number of Bathrooms: 4


Neighborhood Description:
Experience waterfront living at its finest in this exclusive gated community. Enjoy boating, fishing, and relaxing by the lake in this prestigious neighborhood.

---

Question:

Augment the result by tailoring it to resonate with the buyer’s specific preferences.
Subtly emphasize aspects of the property that align with what the buyer is looking for.
The augmentation should personalize the listing without changing factual information, especially the numbers.

---

Answer:
    
=== Most Relevant L

In [104]:
#WARNING. You may need to run this cell a few times for the augmented response to update to match the provided preferences.

llm_chain = LLMChain(llm=llm, prompt=PromptTemplate.from_template(prompt_personal))

preference = "I like fine art and want a property in a neighborhood full of galleries and museums."

prompt_personal = create_prompt(preference)
print("=== Most Relevant Listing and Instructions ===")
print(prompt_personal)

personalized_listing = llm_chain.run(input=preference)
print("=== Most Relevant Listing (Augmented) ===")
print(personalized_listing)

=== Most Relevant Listing and Instructions ===

For the retrieved listing below, combine the description with the metadata and neighborhood description.

---

Context: 

Description:
Artistic 1-bedroom, 1-bathroom loft in the vibrant Downtown Arts District. This creative space features exposed ductwork, concrete floors, and a gallery wall for showcasing artwork.

Metadata:
Neighborhood: Downtown Arts District
Price: $400000
Size: 1000 sqft
Number of Bedrooms: 1
Number of Bathrooms: 1


Neighborhood Description:
Immerse yourself in the arts scene with galleries, theaters, and studios just steps away in the Downtown Arts District. Live among fellow artists and creatives in this dynamic urban neighborhood.

---

Question:

Augment the result by tailoring it to resonate with the buyer’s specific preferences.
Subtly emphasize aspects of the property that align with what the buyer is looking for.
The augmentation should personalize the listing without changing factual information, especially