This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 

In [1]:
import os

os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY"
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"

from langchain.llms import OpenAI

### Generating Real Estate Listings with an LLM

The submission must demonstrate using a Large Language Model (LLM) to generate at least 10 diverse and realistic real estate listings containing facts about the real estate.

Instructions: just run the entire cell. If you want to customise the number of listings being generated, change N variable.

In [2]:
import os
import json
from langchain.chains import ConversationChain, LLMChain
from langchain.memory import ConversationSummaryMemory
from langchain.prompts import PromptTemplate

# Instatiate LLM for Generator
llm_generator = OpenAI(model_name="gpt-3.5-turbo", temperature=1.5, max_tokens=2000)
llm_memory = OpenAI(model_name="gpt-3.5-turbo", temperature=0, max_tokens=1000)

template = """
You are a real estate listing generator specializing in Lisbon, Portugal.

Your listing should contain:
- Neighborhood: (Neighborhood in Lisbon)
- Price: (A realistic price in Euros)
- Bedrooms: (Number of bedrooms)
- Bathrooms: (Number of bathrooms)
- Square Meters: (Size in m2)
- Description: (A short description, similar style to example)
- Neighborhood Description: (A short description that mentions amenities, transportation, etc.)

Requirements:
- The listing must be unique and not identical to the example or previously generated listings.
- Vary the neighborhood, price, number of rooms, and other details to ensure uniqueness.
- Keep the same overall structure and style as the example.

Example:
{{
    "neighborhood": "Saldanha",
    "price":"698000",
    "bedrooms":"3",
    "bathrooms":"2",
    "square_meters":"98",
    "description":"Excelent apartment of high-quality. The apartment is sold fully furnished! It's on the first floor in a building without lifts. It is equipped with a modern centralized air conditioning system for the entire apartment, water heating is guaranteed by 2 cylinders. The frames have triple glazing and electric shutters, guaranteeing extreme acoustic and thermal comfort. Armored entrance door. Excellent 2 bedroom apartment with high quality finishes comprising a living room, two bedrooms (one en suite), two complete bathrooms.",
    "neighborhood_description":"This is an excellent option for those looking for an urban lifestyle with comfort and convenience in the center of Lisbon. This apartment is located in the center of Lisbon, in a central and privileged area of Picoas in Saldanha, well served by shops, services and transport. Set back in a private condominium, it has parking spaces for residents. The best restaurants, parks, shopping centers, gyms, playgrounds just a few minutes' walk away. Picoas Metro Station (yellow line) is a 2-minute walk away, Saldanha is 5 minutes away (red line)."
}}

Below is a summary of previously generated listings and their key attributes. Please do not reuse the same neighborhoods or identical configurations already mentioned.
{summary}

Instruction: {input}
"""

instruction = "Based on the example listing above and previously generated listings, generate new, unique real estate listing in a similar style and format it in JSON."


prompt = PromptTemplate(
    input_variables=["summary", "input"],
    template=template
)

generator_memory = ConversationSummaryMemory(
    llm=llm_memory,
    memory_key="summary",
    return_messages=True
)

generator = ConversationChain(
    llm=llm_generator,
    verbose=False,
    memory=generator_memory,
    prompt=prompt
)

listings_data = []
json_filename = 'listings.json'

if os.path.exists(json_filename):
    with open(json_filename, "r") as f:
        listings_data = json.load(f)
        print("listings.json already exists. Loaded existing data.")
else:
    N = 10
    for _ in range(N):
        result = generator.run(input=instruction)
        listing = json.loads(result)
        listings_data.append(listing)
        
    with open(json_filename, "w") as f:
        json.dump(listings_data, f, indent=4)
        print(f"Saved {len(listings_data)} listings to {json_filename}")    

listings.json already exists. Loaded existing data.




### Creating a Vector Database and Storing Listings
The project must demonstrate the creation of a vector database and successfully storing real estate listing embeddings within it. The database should effectively store and organize the embeddings generated from the LLM-created listings.

In [3]:
from langchain.docstore.document import Document
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma

def prepare_documents(listings):
    documents = []
    for listing in listings:
        # Prepare metadata
        metadata = {
            'neighborhood': listing['neighborhood'],
            'price': int(listing['price']),
            'bedrooms': int(listing['bedrooms']),
            'bathrooms': int(listing['bathrooms']),
            'square_meters': int(listing['square_meters'])
        }
        # Create a document
        doc = Document(page_content=str(listing), metadata=metadata)
        documents.append(doc)
    return documents

docs = prepare_documents(listings_data)

embeddings = OpenAIEmbeddings()
vector_store = Chroma("real_estate_store", embeddings)
    
vector_store.add_documents(docs)
print(f"Stored {len(docs)} listings in Chroma.")

Stored 10 listings in Chroma.


In [4]:
# Creating a Vector Database and Storing Listings
vector_store.get(
    limit=100)

{'ids': ['26dbdeee-bbd8-11ef-9267-0e4a8c245730',
  '26dbdf84-bbd8-11ef-9267-0e4a8c245730',
  '26dbdfb6-bbd8-11ef-9267-0e4a8c245730',
  '26dbdfd4-bbd8-11ef-9267-0e4a8c245730',
  '26dbdff2-bbd8-11ef-9267-0e4a8c245730',
  '26dbe006-bbd8-11ef-9267-0e4a8c245730',
  '26dbe024-bbd8-11ef-9267-0e4a8c245730',
  '26dbe042-bbd8-11ef-9267-0e4a8c245730',
  '26dbe056-bbd8-11ef-9267-0e4a8c245730',
  '26dbe074-bbd8-11ef-9267-0e4a8c245730'],
 'embeddings': None,
 'metadatas': [{'bathrooms': 1,
   'bedrooms': 2,
   'neighborhood': 'Graca',
   'price': 525000,
   'square_meters': 75},
  {'bathrooms': 1,
   'bedrooms': 2,
   'neighborhood': 'Alfama',
   'price': 550000,
   'square_meters': 80},
  {'bathrooms': 1,
   'bedrooms': 2,
   'neighborhood': 'Baixa',
   'price': 850000,
   'square_meters': 75},
  {'bathrooms': 3,
   'bedrooms': 4,
   'neighborhood': 'Campo de Ourique',
   'price': 825000,
   'square_meters': 142},
  {'bathrooms': 3,
   'bedrooms': 4,
   'neighborhood': 'Marqueas de Pombal',
   'pri

### Semantic Search of Listings Based on Buyer Preferences
The application must include a functionality where listings are semantically searched based on given buyer preferences. The search should return listings that closely match the input preferences.

In [11]:
# Building the User Preference Interface
from langchain.memory import ChatMessageHistory

questions = [   
                "How big do you want your house to be?", 
                "What are 3 most important things for you in choosing this property?", 
                "What price range you have budget for?", 
                "Which transportation options are important to you?",
                "How urban do you want your neighborhood to be?",   
            ]

# Option 1
answers = [   
                "For me and my wife, 2 or 3 bedrooms if fine.", 
                "View, access to public transport and supermarkets close by.", 
                "From 500k to 900k Euros", 
                "Metro station",
                "City center preferably",   
            ]           
# Option 2
"""answers = [   
                "I have a big family, at least 4 bedrooms", 
                "Supermarkets close by, apartment new or renovated and good neighborhood.", 
                "Less than 1 million Euros", 
                "Car",
                "No preference",   
            ]"""

# For iteractive questioning...
#for question in questions:
#    answer = input(question)
#    answers.append(answer)
    

history = ChatMessageHistory()
history.add_user_message(f"""You are AI that will recommend property listing based on their answers to personal questions. Ask user {len(questions)} questions""")
for i in range(len(questions)):
    history.add_ai_message(questions[i])
    history.add_user_message(answers[i]) 

llm_recommender = OpenAI(model_name="gpt-3.5-turbo", temperature=0.7, max_tokens=2000)

combined_preferences_prompt = """
{history}

Based on the following questions and answers, generate a single paragraph summarizing the user's preferences for a property.

Additionally, identify if exists clear evidence of specific buyer's preference the followings property features and only if you are certain fill in the blanks.
For example, only fill in the neighborhood, if the user states specifically that they are looking for a specific neighborhood in Lisbon.

Answer in JSON format according to this structure:
{{
    "user_preference_summary": "", #string
    "min_bathrooms": "", #integer
    "max_bathrooms": "", #integer
    "min_bedrooms": "", #integer
    "max_bedrooms": "", #integer
    "neighborhood": "", #integer
    "min_price": "", #integer
    "max_price": "", #integer
    "min_square_meters": "", #integer
    "max_square_meters": "", #integer
}}
"""

combined_preferences_template = PromptTemplate(
    input_variables=["history"],
    template=combined_preferences_prompt,
)    

combined_preferences_chain = LLMChain(llm=llm_recommender, prompt=combined_preferences_template)

buyer_pref = combined_preferences_chain.run(history=history)

buyer_pref_json = json.loads(buyer_pref)
print(buyer_pref_json)

{'user_preference_summary': 'The user is looking for a property with 2 or 3 bedrooms, a nice view, access to public transport, supermarkets nearby, close to a metro station, in the city center, and priced between 500k to 900k Euros.', 'min_bathrooms': '', 'max_bathrooms': '', 'min_bedrooms': 2, 'max_bedrooms': 3, 'neighborhood': '', 'min_price': 500000, 'max_price': 900000, 'min_square_meters': '', 'max_square_meters': ''}


In [12]:
def filter_listings(filters, vector_store):
    where_conditions = []

    if filters['min_bathrooms']:
        where_conditions.append({"bathrooms": {"$gte": int(filters['min_bathrooms'])}})
    if filters['max_bathrooms']:
        where_conditions.append({"bathrooms": {"$lte": int(filters['max_bathrooms'])}})

    if filters['min_bedrooms']:
        where_conditions.append({"bedrooms": {"$gte": int(filters['min_bedrooms'])}})
    if filters['max_bedrooms']:
        where_conditions.append({"bedrooms": {"$lte": int(filters['max_bedrooms'])}})

    if filters['min_price']:
        where_conditions.append({"price": {"$gte": int(filters['min_price'])}})
    if filters['max_price']:
        where_conditions.append({"price": {"$lte": int(filters['max_price'])}})

    if filters['min_square_meters']:
        where_conditions.append({"square_meters": {"$gte": int(filters['min_square_meters'])}})
    if filters['max_square_meters']:
        where_conditions.append({"square_meters": {"$lte": int(filters['max_square_meters'])}})

    if filters['neighborhood']:
        where_conditions.append({"neighborhood": filters['neighborhood']})

    # Combine where_conditions into a single dictionary with '$and'
    if where_conditions:
        combined_where_conditions = {"$and": where_conditions}
    else:
        combined_where_conditions = {}
        
    filtered_listings = vector_store.similarity_search_with_relevance_scores(
        query=filters['user_preference_summary'],
        k=3,
        filter=combined_where_conditions,
    )
    
    return filtered_listings

filtered_listings = filter_listings(buyer_pref_json, vector_store)

filtered_listings

[(Document(page_content="{'neighborhood': 'Parque das Nacoes', 'price': '850000', 'bedrooms': '2', 'bathrooms': '2', 'square_meters': '110', 'description': 'Newly renovated waterfront apartment with stunning views of the Tagus River. The apartment features two spacious bedrooms, two modern bathrooms, a fully equipped kitchen, and a bright living room. The large windows allow natural light to flow into the living space, creating a warm and inviting atmosphere. Enjoy beautiful sunsets from your private balcony overlooking the marina.', 'neighborhood_description': 'Located in the vibrant Parque das Nações, this apartment offers a unique lifestyle with easy access to restaurants, cafes, shopping centers, and recreational areas. The neighborhood is known for its modern architecture, extensive promenades, and riverside dining options. Residents can also enjoy the nearby park, cable car rides, and the Oceanarium. Convenient access to public transportation, including the metro and train statio

### Augmenting & Personalizing Listing Descriptions
The augmentation should personalize the listing without changing factual information. The submission must utilize an LLM to generate personalized descriptions for the real estate listings based on buyer preferences. The descriptions should be unique, appealing, and tailored to the preferences provided.

In [7]:
def augment_descriptions(llm, listings, preferences):
    augmented_listings = []
    for listing in listings:
        prompt = f"""
        Questions and Answers to detect Buyer's Preferences:
        {preferences}
        
        Original Description: {listing[0].page_content}  
        
        Rewrite the description to highlight how this property aligns with the preferences without changing factual information.
        If you are not certain of specific features, do not invent it. 
        """
        augmented_description = llm(prompt)
        augmented_listings.append(augmented_description)
    return augmented_listings


In [13]:
augmented_listings = augment_descriptions(llm_recommender, filtered_listings, history)

RECOMMENDER_TEMPLATE ="""
Conversation History: {history}

Enhanced Listings Descriptions that matches user's preference:
{augmented_listings}

Based on enhanced real estate listings descriptions that matches user's preference. Give a final recommendation
that presents to the user the possible options and compares it based on user's preference. 
"""

PROMPT = PromptTemplate(
    input_variables=["history", "augmented_listings"],
    template=RECOMMENDER_TEMPLATE
)

recommender_chain = LLMChain(llm=llm_recommender, prompt=PROMPT)

final_recommendation = recommender_chain.run(history=history, augmented_listings=augmented_listings)

In [14]:
print("Example using answers option 1 \n")
print(final_recommendation)

Example using answers option 1 

Final Recommendation:
Based on your preferences for a property with 2-3 bedrooms, a captivating view, access to public transport (specifically a metro station), proximity to supermarkets, and a preference for an urban neighborhood in the city center, here are three options that align with your criteria:

1. Newly renovated waterfront apartment in Parque das Nações:
- Features 2 bedrooms, 2 bathrooms
- Stunning views of the Tagus River
- Easy access to public transport, including a metro station
- Proximity to supermarkets and modern amenities
- Priced at 850,000 Euros

2. Luxurious refurbished apartment in Baixa:
- Offers 2 spacious bedrooms, 2 modern bathrooms
- Panoramic city views
- Located in a bustling urban neighborhood with access to public transport
- Close to supermarkets and trendy boutiques
- Priced within your budget range of 500k to 900k Euros

3. Renovated apartment in historic Baixa district:
- 2 bedrooms, 1 bathroom with views of Sao Jor

In [10]:
print("Example using answers option 2 \n")
print(final_recommendation)

Example using answers option 2 

Based on your preferences for a property, I have found three options that match your criteria:

1. Renovated Apartment in Marquês de Pombal:
- 4 bedrooms and 3 modern bathrooms
- Good neighborhood with supermarkets close by
- Easy access to public transportation
- Budget of less than 1 million Euros
- Urban neighborhood preference

2. Renovated Apartment in Campo de Ourique:
- 4 spacious bedrooms and 3 modern bathrooms
- Good neighborhood with supermarkets close by
- Easy access to public transportation
- Car-friendly neighborhood
- Budget of less than 1 million Euros
- Urban neighborhood preference

3. Renovated Apartment in Campo de Ourique with balcony:
- 4 bedrooms and 3 bathrooms
- Tranquil and vibrant neighborhood
- Supermarkets close by
- Easy access to public transportation
- Price of 825,000 Euros
- Urban neighborhood preference

Based on these options, all three properties meet your criteria for a spacious home in a good neighborhood with supe