This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace.

## SETUP

In [65]:
# This command installs the following libraries for a Large Language Model (LLM) or AI-powered application:
# - lancedb: A fast, lightweight vector database for storing and querying embeddings.
# - langchain==0.0.305: A framework that streamlines the development of LLM-driven workflows (prompt engineering, chaining, etc.).
# - openai==0.28.1: The official OpenAI Python library for accessing models like GPT via the OpenAI API.
# - torch: PyTorch, a deep learning framework used for model training and inference.
# - chromadb: Another vector database often used for similarity search and storing embeddings.
!pip install -U lancedb langchain==0.0.305 openai==0.28.1 torch chromadb

Collecting lancedb
  Using cached lancedb-0.17.1-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (4.7 kB)
INFO: pip is looking at multiple versions of lancedb to determine which version is compatible with other requirements. This could take a while.


In [66]:
# lancedb: A fast, lightweight vector database for storing and querying embeddings.
import lancedb

# LanceModel, vector: Pydantic models/extensions tailored for defining and validating LanceDB schemas.
from lancedb.pydantic import LanceModel, vector

# shutil: Provides high-level file operations (copy, move, etc.).
import shutil

# pandas: A data analysis library for managing structured data (e.g., CSV files).
import pandas as pd

# OpenAI: LangChain’s interface for interacting with OpenAI LLMs.
from langchain.llms import OpenAI

# openai: The official OpenAI library for making calls to OpenAI’s API.
import openai

# PromptTemplate, FewShotPromptTemplate: Classes for creating and managing prompt templates in LangChain.
from langchain.prompts import PromptTemplate
from langchain.prompts.few_shot import FewShotPromptTemplate

# PydanticOutputParser: Helps parse AI model outputs into structured data using Pydantic.
from langchain.output_parsers import PydanticOutputParser

# BaseModel, Field, NonNegativeInt: Pydantic features for model validation and data type enforcement.
from langchain_core.pydantic_v1 import BaseModel, Field, NonNegativeInt

# OpenAIEmbeddings: Generates embeddings using OpenAI’s Embeddings API.
from langchain.embeddings import OpenAIEmbeddings

# Document: A standardized structure in LangChain for handling textual data.
from langchain.schema import Document

# RecursiveCharacterTextSplitter: Utility to split large texts into smaller chunks for LLM processing.
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Chroma: A vector store used to index and search embeddings efficiently.
from langchain.vectorstores.chroma import Chroma

# CSVLoader: Loads CSV data and converts rows into Documents.
from langchain.document_loaders import CSVLoader

# List: Generic typing support for function signatures and type hints.
from typing import List

# (Repeated) PromptTemplate: Used to build reusable prompt templates in LangChain.
from langchain.prompts import PromptTemplate

# DiffusionPipeline, AutoPipelineForText2Image: Pipelines from Diffusers for text-to-image generation.
from diffusers import DiffusionPipeline, AutoPipelineForText2Image

# load_image, make_image_grid: Utility methods for image handling and creating image grids.
from diffusers.utils import load_image, make_image_grid

# jsonable_encoder: Utility to convert Python objects to JSON-serializable format (commonly used in FastAPI).
from fastapi.encoders import jsonable_encoder

# PIL: Python Imaging Library for image manipulation.
import PIL

# torch: PyTorch, a deep learning framework for building and training neural networks.
import torch

# sys: Provides system-specific parameters and functions (e.g., sys.path).
import sys

# os: Operating system interfaces (e.g., for environment variables and directory manipulation).
import os


In [67]:
# Setting environment variables for the OpenAI API:
# - OPENAI_API_KEY: Your API key for authenticating requests to OpenAI.
# - OPENAI_API_BASE: The base URL for sending API calls (e.g., a custom or proxy endpoint).
# Note: In production settings, do not commit your API keys to version control.
#       Instead, store them securely (e.g., environment variables, vault, etc.).
os.environ["OPENAI_API_KEY"] = "" # Hidden for security purpose
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"


In [68]:
# Assign your OpenAI API key from the environment variable.
# This authenticates your requests to OpenAI.
openai.api_key = os.environ["OPENAI_API_KEY"]

# Set the base URL for the OpenAI API from the environment variable.
# This can be used to point to a custom or proxy endpoint.
openai.api_base = os.environ["OPENAI_API_BASE"]

## Real Estate Listing Generation

In [69]:
# INSTRUCTION holds the prompt that tells the model to generate
# 20 different real estate listings for various Paris neighborhoods.
INSTRUCTION = "Generate 20 different and realistic real estate listings from multiple Neighborhoods of Paris."

# SAMPLE_LISTING provides an example of the desired listing format and details,
# including the neighborhood, price, and a brief description of the property.
# This helps guide the model on structure and content style.
SAMPLE_LISTING = \
"""
A sample listing is the following as shown below:

"Neighborhood": "Paris 4th District"
"Price (USD)": 773515
"Bedrooms": 3
"Bathrooms": 2
"House Size sqft": 549
"Description": "This south-facing 51 sqm split-level apartment is on the 1st floor of a late 18th century building. Overlooking a peaceful leafy courtyard, it comprises a spacious living room and dining room, and an equipped open-plan kitchen. A bedroom with a shower room and wc is upstairs."
"Neighborhood description": "Rue de Jarente, near Place du March Sainte Catherine in the heart of the historic Marais neighbourhood."

"""


In [70]:
# This Pydantic model represents a single real estate listing,
# defining fields for neighborhood, price, and more.
# Each field has a description to guide validation and documentation.
class REListing(BaseModel):
    neighborhood: str = Field(description="Name of the neighborhood")
    price: NonNegativeInt = Field(description="Price of the property in USD")
    bedrooms: NonNegativeInt = Field(description="Number of bedrooms in the property")
    bathrooms: NonNegativeInt = Field(description="Number of bathrooms in the property")
    house_size: NonNegativeInt = Field(description="Size of the property in square feet")
    description: str = Field(description="Description of the property.")
    neighborhood_description: str = Field(description="A description of the neighborhood.")

# This Pydantic model serves as a container for multiple real estate listings.
class REListingCollection(BaseModel):
    listings: List[REListing] = Field(description="List of available real estate listings")

# The parser uses PydanticOutputParser to take raw text (from an LLM or other source)
# and parse it into the REListingCollection model defined above.
parser = PydanticOutputParser(pydantic_object=REListingCollection)

In [71]:
# The PromptTemplate configures how instructions, samples, and format instructions
# are combined into a prompt for the language model. The 'partial_variables' parameter
# injects the parser’s formatting instructions automatically.
prompt = PromptTemplate(
    template="{instruction}\n{sample}\n{format_instructions}\n",
    input_variables=["instruction", "sample"],
    partial_variables={"format_instructions": parser.get_format_instructions},
)

# This generates the final prompt by filling in the placeholders
# with the instruction and sample listing, then prints it for debugging or review.
query = prompt.format(instruction=INSTRUCTION, sample=SAMPLE_LISTING)
print(query)

Generate 20 different and realistic real estate listings from multiple Neighborhoods of Paris.

A sample listing is the following as shown below:

"Neighborhood": "Paris 4th District"
"Price (USD)": 773515
"Bedrooms": 3
"Bathrooms": 2
"House Size sqft": 549
"Description": "This south-facing 51 sqm split-level apartment is on the 1st floor of a late 18th century building. Overlooking a peaceful leafy courtyard, it comprises a spacious living room and dining room, and an equipped open-plan kitchen. A bedroom with a shower room and wc is upstairs."
"Neighborhood description": "Rue de Jarente, near Place du March Sainte Catherine in the heart of the historic Marais neighbourhood."


The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-f

In [72]:
# Define the model configuration for OpenAI's GPT:
# - model_name: The name of the GPT model (e.g., gpt-3.5-turbo).
# - temperature: Controls the creativity/randomeness of the output.
#   A higher temperature (e.g., 1.0) results in more varied responses,
#   while a lower temperature (e.g., 0.0) makes the responses more deterministic.
# - max_tokens: The maximum number of tokens (words/pieces) in the response.
model_name = "gpt-3.5-turbo"
temperature = 1.0
max_tokens = 3500

# Instantiate the LLM object with the specified parameters.
llm = OpenAI(model_name=model_name, temperature=temperature, max_tokens=max_tokens)

# Get the response to the query using the configured LLM.
response = llm(query)



In [73]:
# Parse the LLM's response into the structured data model using the parser.
result = parser.parse(response)

# Convert the parsed listings into a JSON-serializable format, then create a DataFrame with that data.
df = pd.DataFrame(jsonable_encoder(result.listings))

# Display the first few rows of the DataFrame to verify and inspect the results.
df.head()


Unnamed: 0,neighborhood,price,bedrooms,bathrooms,house_size,description,neighborhood_description
0,Paris 5th District,950000,2,2,753,Beautiful apartment in a traditional Haussmann...,"Located in the Latin Quarter, close to famous ..."
1,Paris 7th District,2100000,4,3,1296,Luxurious penthouse with stunning views of the...,Situated in the prestigious 7th arrondissement...
2,Paris 16th District,1750000,3,2,1176,Charming townhouse with a private garden and g...,Located in a quiet residential neighborhood ne...
3,Paris 8th District,3200000,5,4,2152,Elegant mansion with a private courtyard and l...,Situated in the prestigious 8th arrondissement...
4,Paris 1st District,1350000,2,2,968,Modern loft-style apartment with high ceilings...,Located in the heart of Paris near the Louvre ...


In [74]:
# Display all the rows of the DataFrame to verify and inspect the results.
df.head(n=20)

Unnamed: 0,neighborhood,price,bedrooms,bathrooms,house_size,description,neighborhood_description
0,Paris 5th District,950000,2,2,753,Beautiful apartment in a traditional Haussmann...,"Located in the Latin Quarter, close to famous ..."
1,Paris 7th District,2100000,4,3,1296,Luxurious penthouse with stunning views of the...,Situated in the prestigious 7th arrondissement...
2,Paris 16th District,1750000,3,2,1176,Charming townhouse with a private garden and g...,Located in a quiet residential neighborhood ne...
3,Paris 8th District,3200000,5,4,2152,Elegant mansion with a private courtyard and l...,Situated in the prestigious 8th arrondissement...
4,Paris 1st District,1350000,2,2,968,Modern loft-style apartment with high ceilings...,Located in the heart of Paris near the Louvre ...
5,Paris 9th District,890000,3,2,1045,Spacious apartment with a private terrace and ...,Situated in a lively neighborhood near the Ope...
6,Paris 6th District,1875000,4,3,1527,Renovated duplex in a historic building with o...,"Located in Saint-Germain-des-Prés, known for i..."
7,Paris 11th District,775000,2,1,861,Characterful apartment with exposed brick wall...,Situated in a trendy neighborhood near the Pla...
8,Paris 3rd District,620000,1,1,624,Bright and airy loft-style apartment with a sk...,"Located in the trendy Marais district, known f..."
9,Paris 13th District,550000,2,1,732,Modern apartment with a balcony and panoramic ...,Situated in a diverse neighborhood near the Bi...


## Storing the Real Estate Listings in a Vector Database and Generating and Storing Embeddings



In [75]:
def load_listings_from_dataframe(df: pd.DataFrame) -> REListingCollection:
    """
    Convert a pandas DataFrame of real estate listings into an REListingCollection.

    :param df: A pandas DataFrame that contains columns corresponding to the REListing fields.
    :return: An REListingCollection object populated with the listings from the DataFrame.
    """
    listings = []

    # Iterate through each row in the DataFrame.
    for _, row in df.iterrows():
        # Create a single REListing object from the current row's data.
        listing = REListing(
            neighborhood=row["neighborhood"],
            price=int(row["price"]),
            bedrooms=int(row["bedrooms"]),
            bathrooms=int(row["bathrooms"]),
            house_size=int(row["house_size"]),
            description=row["description"],
            neighborhood_description=row["neighborhood_description"],
        )
        # Accumulate the listing in the list.
        listings.append(listing)

    # Wrap all listings into an REListingCollection and return.
    return REListingCollection(listings=listings)

# Example usage: Convert the DataFrame into an REListingCollection object.
csv_collection = load_listings_from_dataframe(df)
print(csv_collection)


listings=[REListing(neighborhood='Paris 5th District', price=950000, bedrooms=2, bathrooms=2, house_size=753, description='Beautiful apartment in a traditional Haussmann-style building with a balcony overlooking the Seine River. The property features a spacious living room, dining room, and fully equipped kitchen.', neighborhood_description='Located in the Latin Quarter, close to famous landmarks such as Notre Dame Cathedral and the Pantheon.'), REListing(neighborhood='Paris 7th District', price=2100000, bedrooms=4, bathrooms=3, house_size=1296, description='Luxurious penthouse with stunning views of the Eiffel Tower. Features include a large terrace, high ceilings, and an elegant master suite with a walk-in closet.', neighborhood_description='Situated in the prestigious 7th arrondissement, known for its upscale shops and gourmet restaurants.'), REListing(neighborhood='Paris 16th District', price=1750000, bedrooms=3, bathrooms=2, house_size=1176, description='Charming townhouse with a 

In [76]:
## Generate text to embed

def build_text_for_embedding(listing: REListing) -> str:
    """
    Create a string representation of an REListing object
    that can be used for embedding (e.g., for vector search).
    """
    return (
        f"Neighborhood: {listing.neighborhood}\n"
        f"Price: {listing.price}\n"
        f"Size: {listing.house_size}\n"
        f"Bedrooms: {listing.bedrooms}\n"
        f"Bathrooms: {listing.bathrooms}\n"
        f"Description: {listing.description}\n\n"
        f"Neighborhood Description: {listing.neighborhood_description}"
    )

# Example usage: Builds a text string from the first listing in csv_collection,
# which can be fed into an embedding model or stored as needed.
build_text_for_embedding(csv_collection.listings[0])

'Neighborhood: Paris 5th District\nPrice: 950000\nSize: 753\nBedrooms: 2\nBathrooms: 2\nDescription: Beautiful apartment in a traditional Haussmann-style building with a balcony overlooking the Seine River. The property features a spacious living room, dining room, and fully equipped kitchen.\n\nNeighborhood Description: Located in the Latin Quarter, close to famous landmarks such as Notre Dame Cathedral and the Pantheon.'

In [77]:
# Use Pydantic to define a data model that aligns with LanceDB’s requirements.
# This model includes a 'vector' field to store embeddings (size 1536),
# along with additional fields for real estate details.

from lancedb.pydantic import vector, LanceModel

class RE_listing(LanceModel):
    vector: vector(1536)  # Stores the embedding vector representation of the listing.
    Neighborhood: str
    Price: int
    Size: int
    Bedrooms: int
    Bathrooms: int
    Description: str
    Neighborhood_Description: str


In [78]:
## Generate Embeddings

import openai
import os

# Configure the OpenAI API key from environment variables.
openai.api_key = os.getenv("OPENAI_API_KEY")

def generate_embedding(text: str) -> list[float]:
    """
    Use OpenAI’s Embedding API to generate a vector representation (embedding)
    for the input text using the 'text-embedding-ada-002' model.
    """
    response = openai.Embedding.create(
        model="text-embedding-ada-002",
        input=text
    )

    # The embedding is found in the 'data' field of the response (index 0).
    return response["data"][0]["embedding"]

data = []
# Iterate over each listing in our collection.
for listing in csv_collection.listings:
    # Generate an embedding for the text representation of the listing.
    embed = generate_embedding(build_text_for_embedding(listing))

    # Print out the listing text and its corresponding embedding for debugging/inspection.
    print(build_text_for_embedding(listing))
    print("Embeddings:", embed)
    print("Embedding length:", len(embed))

    # Append a new RE_listing object to 'data', populating both fields for metadata
    # and the embedding vector.
    data.append(
        RE_listing(
            vector=embed,
            Neighborhood=listing.neighborhood,
            Size=listing.house_size,
            Price=listing.price,
            Bedrooms=listing.bedrooms,
            Bathrooms=listing.bathrooms,
            Description=listing.description,
            Neighborhood_Description=listing.neighborhood_description
        )
    )
    print("")

Neighborhood: Paris 5th District
Price: 950000
Size: 753
Bedrooms: 2
Bathrooms: 2
Description: Beautiful apartment in a traditional Haussmann-style building with a balcony overlooking the Seine River. The property features a spacious living room, dining room, and fully equipped kitchen.

Neighborhood Description: Located in the Latin Quarter, close to famous landmarks such as Notre Dame Cathedral and the Pantheon.
Embeddings: [-0.002843372756615281, 0.029174640774726868, 0.00289789866656065, 0.0020719897001981735, 0.007704209070652723, 0.024658601731061935, -0.013176057487726212, -0.0007709821802563965, -0.0336906798183918, -0.007004992570728064, -0.010026376694440842, 0.021682120859622955, -0.009936569258570671, -0.02030934765934944, -0.015408418141305447, -0.01001996174454689, 0.018230943009257317, -0.0053275153040885925, 0.011270853690803051, -0.011347831226885319, -0.008583040907979012, -0.008454743772745132, -0.0026493242476135492, -0.02528725378215313, -0.008935855701565742, 0.00

In [79]:
# Now connect to a local db at ~/.lancedb and create an empty LanceDB table

import lancedb

# Connect to LanceDB, specifying the local path where data will be stored.
db = lancedb.connect("~/.lancedb")

# Define the name of the table that will store real estate listings.
table_name = "RE_listing"

# Drop the table if it already exists, ignoring the error if it doesn’t.
db.drop_table(table_name, ignore_missing=True)

# Create a new table with the schema defined by the RE_listing Pydantic model.
table = db.create_table(table_name, schema=RE_listing)

In [80]:
# Insert the records into the LanceDB table.
# Each 'd' in 'data' is an RE_listing object, so we convert it into a dictionary
# before passing it to the 'table.add()' method for insertion.
table.add([dict(d) for d in data])


In [81]:
# Preview the first 10 rows of the LanceDB table by converting them to a Pandas DataFrame.
table.head(n=20).to_pandas()


Unnamed: 0,vector,Neighborhood,Price,Size,Bedrooms,Bathrooms,Description,Neighborhood_Description
0,"[-0.0028433728, 0.02917464, 0.0028978987, 0.00...",Paris 5th District,950000,753,2,2,Beautiful apartment in a traditional Haussmann...,"Located in the Latin Quarter, close to famous ..."
1,"[0.012825659, 0.01283844, -0.015682196, -0.008...",Paris 7th District,2100000,1296,4,3,Luxurious penthouse with stunning views of the...,Situated in the prestigious 7th arrondissement...
2,"[-0.015739147, 0.026733214, -0.009522573, 0.00...",Paris 16th District,1750000,1176,3,2,Charming townhouse with a private garden and g...,Located in a quiet residential neighborhood ne...
3,"[-0.008363645, 0.024450593, -0.007122166, -0.0...",Paris 8th District,3200000,2152,5,4,Elegant mansion with a private courtyard and l...,Situated in the prestigious 8th arrondissement...
4,"[-0.0030107452, 0.019868674, -0.011600742, -0....",Paris 1st District,1350000,968,2,2,Modern loft-style apartment with high ceilings...,Located in the heart of Paris near the Louvre ...
5,"[0.006230003, 0.023571946, 0.006140132, -0.011...",Paris 9th District,890000,1045,3,2,Spacious apartment with a private terrace and ...,Situated in a lively neighborhood near the Ope...
6,"[-0.006623478, 0.023901276, -0.012511731, 0.00...",Paris 6th District,1875000,1527,4,3,Renovated duplex in a historic building with o...,"Located in Saint-Germain-des-Prés, known for i..."
7,"[-0.007027798, 0.03351719, -0.013567737, -0.00...",Paris 11th District,775000,861,2,1,Characterful apartment with exposed brick wall...,Situated in a trendy neighborhood near the Pla...
8,"[0.008806096, 0.025151132, -0.004505445, -0.01...",Paris 3rd District,620000,624,1,1,Bright and airy loft-style apartment with a sk...,"Located in the trendy Marais district, known f..."
9,"[0.00031183995, 0.016904324, -0.008673049, -0....",Paris 13th District,550000,732,2,1,Modern apartment with a balcony and panoramic ...,Situated in a diverse neighborhood near the Bi...


## Testing Semantic Search of Listings Based on Buyer Preferences

In [82]:
# Define a user query that describes the type of property they want.
user_query = "Looking for a luxurious appartment in the 16th district"

# Generate an embedding vector for the user’s query using the same model as before.
query_embed = generate_embedding(user_query)

# Perform a similarity search in the LanceDB table using the query embedding.
# Limit the results to the top 3 most similar listings, and convert the results to a Pandas DataFrame.
results = table.search(query_embed).limit(3).to_pandas()

# Print out the search results for inspection.
print(results)

                                              vector         Neighborhood  \
0  [0.00031183995, 0.016904324, -0.008673049, -0....  Paris 13th District   
1  [-0.015739147, 0.026733214, -0.009522573, 0.00...  Paris 16th District   
2  [-0.0030068862, 0.01521905, -0.007956289, -0.0...  Paris 15th District   

     Price  Size  Bedrooms  Bathrooms  \
0   550000   732         2          1   
1  1750000  1176         3          2   
2   980000  1103         3          2   

                                         Description  \
0  Modern apartment with a balcony and panoramic ...   
1  Charming townhouse with a private garden and g...   
2  Contemporary apartment with a large balcony an...   

                            Neighborhood_Description  _distance  
0  Situated in a diverse neighborhood near the Bi...   0.282985  
1  Located in a quiet residential neighborhood ne...   0.285308  
2  Situated in a residential neighborhood near th...   0.285740  


In [83]:
# Define a user query that describes the type of property they want.
user_query = "Looking for a duplex for a big family"

# Generate an embedding vector for the user’s query using the same model as before.
query_embed = generate_embedding(user_query)

# Perform a similarity search in the LanceDB table using the query embedding.
# Limit the results to the top 3 most similar listings, and convert the results to a Pandas DataFrame.
results = table.search(query_embed).limit(3).to_pandas()

# Print out the search results for inspection.
print(results)

                                              vector         Neighborhood  \
0  [-0.006623478, 0.023901276, -0.012511731, 0.00...   Paris 6th District   
1  [0.0004836391, 0.023083521, -0.0019853795, -0....  Paris 10th District   
2  [-0.0022427451, 0.023879215, -0.0214314, -0.00...  Paris 12th District   

     Price  Size  Bedrooms  Bathrooms  \
0  1875000  1527         4          3   
1   695000   816         2          2   
2   820000   978         3          2   

                                         Description  \
0  Renovated duplex in a historic building with o...   
1  Stylish duplex with a private terrace and expo...   
2  Contemporary duplex with a private terrace and...   

                            Neighborhood_Description  _distance  
0  Located in Saint-Germain-des-Prés, known for i...   0.382872  
1  Located in a vibrant neighborhood near the Can...   0.387762  
2  Situated in a diverse neighborhood near the Bo...   0.402812  


In [84]:
# Define a user query that describes the type of property they want.
user_query = "Looking for a 1-bedroom apartment with a balcony"

# Generate an embedding vector for the user’s query using the same model as before.
query_embed = generate_embedding(user_query)

# Perform a similarity search in the LanceDB table using the query embedding.
# Limit the results to the top 3 most similar listings, and convert the results to a Pandas DataFrame.
results = table.search(query_embed).limit(3).to_pandas()

# Print out the search results for inspection.
print(results)


                                              vector         Neighborhood  \
0  [0.00031183995, 0.016904324, -0.008673049, -0....  Paris 13th District   
1  [0.0069224695, 0.013987403, 0.0012166128, -0.0...  Paris 20th District   
2  [0.0033905006, 0.02791173, 0.00473948, -0.0071...   Paris 2nd District   

    Price  Size  Bedrooms  Bathrooms  \
0  550000   732         2          1   
1  450000   594         1          1   
2  690000   721         1          1   

                                         Description  \
0  Modern apartment with a balcony and panoramic ...   
1  Light-filled apartment with a balcony and city...   
2  Chic apartment with a Juliet balcony and hardw...   

                            Neighborhood_Description  _distance  
0  Situated in a diverse neighborhood near the Bi...   0.335465  
1  Located in a vibrant neighborhood near the Pèr...   0.344219  
2  Situated in the bustling Opera district, near ...   0.347060  


## Building the User Preference Interface

In [85]:
# hard-code the buyer preferences in questions and answers
questions = [
                "How big do you want your house to be?"
                "What are 3 most important things for you in choosing this property?",
                "Which amenities would you like?",
                "Which transportation options are important to you?",
                "How urban do you want your neighborhood to be?",
            ]

In [86]:
# Helper function to get the top K listings based on a list of user preferences.
def get_top_listings(preferences: list[str], top_k: int=5) -> list[int]:
    '''
    Given a list of preference strings, this function will:
    1. Combine them into a single query string.
    2. Generate an embedding for that query.
    3. Use similarity search against the LanceDB table.
    4. Return the top K listings' data (neighborhood, price, etc.).
    '''
    top_listingsP = []
    top_listingsBed = []
    top_listingsBath = []
    top_listingsSize = []
    top_listingsDesc = []
    top_listingsNeigh = []
    top_listingsNeighDesc = []

    # Combine the user's multiple preferences into a single query string.
    combined_pref = '\n'.join(preferences)

    # Generate an embedding for the combined preferences.
    query_embed = generate_embedding(combined_pref)

    # Search for the top K most similar listings and convert results to a DataFrame.
    results = table.search(query_embed).limit(top_k).to_pandas()

    # Iterate through the top results, appending each relevant field to the lists.
    for index, row in results.iterrows():
        top_listingsNeigh.append(row['Neighborhood'])
        top_listingsP.append(row['Price'])
        top_listingsSize.append(row['Size'])
        top_listingsBed.append(row['Bedrooms'])
        top_listingsBath.append(row['Bathrooms'])
        top_listingsDesc.append(row['Description'])
        top_listingsNeighDesc.append(row['Neighborhood_Description'])

        # Stop if we've reached the requested top_k listings.
        if len(top_listingsNeigh) == top_k:
            break

    # Return each list in an organized format for further processing.
    return top_listingsNeigh, top_listingsP, top_listingsBed, top_listingsBath, top_listingsSize, top_listingsDesc, top_listingsNeighDesc

In [87]:
# Hardcode the buyer's preferences in ANSWERS. This list simulates the user's requirements.
ANSWERS = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "access to urban amenities like restaurants and theaters."
]

In [88]:
# Call the get_top_listings function with the user's preferences and capture the returned data.
resDesc, resP, resBed, resBath, resSize, resNeig, resNeigDesc = get_top_listings(ANSWERS)

In [89]:
# Print the description of the top listing.
print("Description:", resDesc[0])

# Print the price of the top listing.
print("Price:", resP[0])

# Print the number of bedrooms in the top listing.
print("#Bedrooms:", resBed[0])

# Print the number of bathrooms in the top listing.
print("#Bathrooms:", resBath[0])

# Print the size (in square feet) of the top listing.
print("Size:", resSize[0])

# Print the neighborhood where the top listing is located.
print("Neighborhood:", resNeig[0])

# Print a description of the neighborhood for the top listing.
print("Neighborhood Description:", resNeigDesc[0])

Description: Paris 16th District
Price: 1750000
#Bedrooms: 3
#Bathrooms: 2
Size: 1176
Neighborhood: Charming townhouse with a private garden and garage. The property boasts a spacious living room, dining area, and a modern kitchen with high-end appliances.
Neighborhood Description: Located in a quiet residential neighborhood near the Bois de Boulogne park and exclusive shops on rue de Passy.


## LLM Augmentation

In [90]:
# Construct a formatted string containing details of the top real estate listing.
# This context will be used to provide the language model with specific information
# about the property when generating an augmented description.
context = (
    "Description: {}\n"
    "Price: {}\n"
    "#Bedrooms: {}\n"
    "#Bathrooms: {}\n"
    "Size: {}\n"
    "Neighborhood: {}\n"
    "Neighborhood Description: {}"
).format(
    resDesc[0],          # Description of the property
    resP[0],             # Price in USD
    resBed[0],           # Number of bedrooms
    resBath[0],          # Number of bathrooms
    resSize[0],          # Size in square feet
    resNeig[0],          # Neighborhood name
    resNeigDesc[0]       # Description of the neighborhood
)

# Assign the user's query to the variable 'question'.
# This query represents what the user is asking about the property.
question = user_query

# Create an augmentation prompt template that combines the context and the user's question.
# The prompt instructs the language model to:
# 1. Provide an answer to the user's question based on the given context.
# 2. Augment the property's description to better align with the buyer’s specific preferences.
#    This involves highlighting features of the property that match the buyer's desires.
augmentation_prompt_template = (
    f"Based on the following context:\n{context}\n----\n"
    f"Provide an answer to the question '{question}' and also augment the description, "
    f"tailoring it to resonate with the buyer’s specific preferences. This involves "
    f"subtly emphasizing aspects of the property that align with what the buyer is looking for."
)

# Print the augmentation prompt template.
# This is useful for debugging purposes to ensure that the prompt is correctly formatted
# before sending it to the language model.
print(augmentation_prompt_template)

Based on the following context:
Description: Paris 16th District
Price: 1750000
#Bedrooms: 3
#Bathrooms: 2
Size: 1176
Neighborhood: Charming townhouse with a private garden and garage. The property boasts a spacious living room, dining area, and a modern kitchen with high-end appliances.
Neighborhood Description: Located in a quiet residential neighborhood near the Bois de Boulogne park and exclusive shops on rue de Passy.
----
Provide an answer to the question 'Looking for a 1-bedroom apartment with a balcony' and also augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.


In [91]:
# Function to call the OpenAI GPT-3.5 API
def generate_augmentation(prompt_template):
    """
    Generates an augmented description based on the provided prompt template
    by interacting with OpenAI's GPT-3.5-turbo model.

    :param prompt_template: A formatted string containing the context and instructions
                            for the language model.
    :return: The generated augmented description or an error message if the API call fails.
    """
    try:
        # Calling the OpenAI ChatCompletion API with a system message and the user prompt.
        # Note: The method to call may vary based on the OpenAI library version:
        # - Use openai.ChatCompletion.create for OpenAI library versions < 1.0
        # - Use openai.chat.completions.create for OpenAI library versions >= 1.0
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",  # Specifies the model to use.
            messages=[
                {
                    "role": "system",
                    "content": (
                        "You are a real estate expert. You are writing the matching property "
                        "according to the buyer's criteria and you want to close the deal but "
                        "you want to be accurate, honest and factual."
                    )
                },
                {
                    "role": "user",
                    "content": prompt_template  # The user's prompt containing context and instructions.
                }
            ],
            temperature=1,          # Controls the randomness of the output. Higher values produce more creative responses.
            max_tokens=256,         # The maximum number of tokens to generate in the response.
            top_p=1,                # Nucleus sampling parameter. 1 means no restriction.
            frequency_penalty=0,    # Penalizes new tokens based on their existing frequency in the text so far.
            presence_penalty=0      # Penalizes new tokens based on whether they appear in the text so far.
        )

        # Extract and return the content of the generated message from the response.
        # The response structure contains a list of choices; we take the first one.
        return response.choices[0].message.content

    except Exception as e:
        # If an error occurs during the API call, return an error message.
        return f"An error occurred: {e}"

# Generating the response from the model using the augmentation prompt template.
ans = generate_augmentation(augmentation_prompt_template)

# Printing the augmented description received from the language model.
print("Augmented:")
print(ans)

Augmented:
Based on your preference for a 1-bedroom apartment with a balcony, I have a property in the Paris 16th District that might interest you. Even though it has 3 bedrooms, I believe the charming townhouse with a private garden and garage could accommodate your desire for outdoor space and tranquility.

This property features a spacious living room, dining area, and a modern kitchen with high-end appliances, offering a comfortable and stylish living space. Furthermore, the quiet residential neighborhood near the Bois de Boulogne park and exclusive shops on rue de Passy provides a perfect blend of nature and convenience. The peace and serenity of the area coupled with the private garden could offer you the sanctuary you are looking for.

Although it exceeds your bedroom requirement, the additional space can offer versatility if you desire a home office, guest room, or extra storage. I believe this property aligns with your taste for a peaceful setting with outdoor space, making it