#### Step 1: Setting Up the Python Application

Initialize a Python Project: Create a new Python project, setting up a virtual environment and installing necessary packages like LangChain, a suitable LLM library (e.g., OpenAI's GPT), and a vector database package compatible with Python (e.g., ChromaDB or LanceDB). If you don't wish to create your files from scratch, starter files are available in the workspace on the next page as an application skeleton.

In [25]:
# Install necessary packages
# Ensure you have these installed in your environment.
# You might need to run these commands in your terminal if not already installed.
#!pip install --quiet -r requirements.txt

In [26]:
# Import necessary libraries
import os
import getpass
import warnings


from langchain.llms import OpenAI
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.vectorstores import Chroma
from langchain.document_loaders.csv_loader import CSVLoader # Correct for this version

warnings.filterwarnings("ignore")
print("Libraries imported successfully.")

Libraries imported successfully.


In [None]:
def setup_environment_and_llm(openai_api_key_input=None, openai_api_base_url="https://openai.vocareum.com/v1", llm_temp=0.7, llm_model='gpt-3.5-turbo', llm_max_tokens=1000):
    """
    Sets up the OpenAI API key, base URL, and initializes the LLM and embeddings model.
    Uses openai==0.28.1 and langchain==0.0.305 conventions.
    Returns the LLM and embeddings model instances.
    """
    if openai_api_key_input:
        os.environ["OPENAI_API_KEY"] = openai_api_key_input
    elif "OPENAI_API_KEY" not in os.environ:
        # Fallback to getpass if no key is provided and not in env
        api_key_from_prompt = getpass.getpass("OpenAI API Key not found in environment. Please enter your OpenAI API Key: ")
        if api_key_from_prompt:
            os.environ["OPENAI_API_KEY"] = api_key_from_prompt
        else: 
            raise ValueError("OpenAI API Key is required. Please set it as an environment variable or pass it to the function.")

    # Set the OpenAI API base URL if provided
    if openai_api_base_url:
         os.environ["OPENAI_API_BASE"] = openai_api_base_url

    # Set the llm instance and embeddings model with default values
    # You can adjust these parameters as needed
    llm_instance = OpenAI(
        temperature=llm_temp,
        model_name=llm_model, 
        max_tokens=llm_max_tokens
    )
    embeddings_model_instance = OpenAIEmbeddings() 
    
    print(f"Environment configured. LLM: {llm_model}, Embeddings: OpenAI default.")
    return llm_instance, embeddings_model_instance

In [None]:
# --- Configuration for LLM Setup ---
# Set up OpenAI API Key if not already in environment
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key: ")

# Set up OpenAI API Base URL
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"

# Set up LLM model name (default is 'gpt-3.5-turbo')
CONFIG_LLM_MODEL_NAME = 'gpt-3.5-turbo' 

# Set up LLM configuration (these can be adjusted as needed)
CONFIG_LLM_TEMPERATURE = 0.5
CONFIG_LLM_MAX_TOKENS = 1200

# --- Execute LLM Setup ---
llm, embeddings_model = setup_environment_and_llm(
    openai_api_key_input=os.environ["OPENAI_API_KEY"],
    openai_api_base_url=os.environ["OPENAI_API_BASE"],
    llm_temp=CONFIG_LLM_TEMPERATURE,
    llm_model=CONFIG_LLM_MODEL_NAME,
    llm_max_tokens=CONFIG_LLM_MAX_TOKENS
)

Environment configured. LLM: gpt-3.5-turbo, Embeddings: OpenAI default.


#### Step 2: Generating Real Estate Listings
Generate real estate listings using a Large Language Model. Generate at least 10 listings. This can involve creating prompts for the LLM to produce descriptions of various properties.

In [None]:
def generate_listings_csv_rows_raw(llm_instance, num_listings_to_gen: int = 10, listing_characteristics: list[str] | None = None) -> list[str]:
    """
    Generates a specified number of real estate listings as raw strings using an LLM.
    Returns the raw output string from the LLM for each listing, without CSV parsing or validation.
    Uses chain.run(characteristic=characteristic_value) for langchain==0.0.305.

    If no listing_characteristics are provided or the list is empty, a generic
    characteristic "a typical listing" will be used for all generations.

    Args:
        llm_instance: An instance of a Langchain LLM.
        num_listings_to_gen: The number of listings to generate.
        listing_characteristics: An optional list of characteristics to cycle through for generating listings.

    Returns:
        A list of raw strings, where each string is the LLM's output for one listing.
    """

    # Define the prompt template for generating a single real estate listing
    listing_generation_prompt_template_csv = """
    Generate a single real estate listing as a single row of CSV data.
    The CSV should have the following columns, IN THIS EXACT ORDER AND CASE:
    neighborhood,price,bedrooms,bathrooms,house_size,description,neighborhood_description

    Make sure to include:
    1. Neighborhood name (e.g., 'Willow Creek Estates', 'Sunset Bluffs', you can be creative)
    2. Price (integer, between $100,000 and $1,000,000, e.g., $500,000)
    3. Number of Bedrooms (integer, between 1 and 5, e.g., 3)
    4. Number of Bathrooms (integer, between 1 and 3, e.g., 2)
    5. House Size (integer, between 50 and 1500, e.g., 100m², 1200 m²)
    6. A compelling property Description (between 30 and 50 words).
    7. A short Neighborhood Description (between 30 and 50 words).

    For the CSV output:
    - Use a comma (,) as the delimiter.
    - Enclose each field in double quotes (") to handle commas or newlines within the text.
    - If a double quote appears within a field, escape it by doubling it ("").
    - Do NOT include a header row in your response.
    - Provide details matching the characteristic if specified.

    Characteristic for this listing: {characteristic_description}

    Example CSV row:
    "Green Oaks","$800,000","3","2","185 m²","Welcome to this eco-friendly oasis...","Green Oaks is a close-knit community..."

    Generate ONLY the single CSV row for the listing.
    ---
    Output:
    """

    # Create a PromptTemplate instance with the defined template
    # and the input variable for the characteristic description
    listing_prompt_csv = PromptTemplate(
        input_variables=["characteristic_description"], # Matches variable in template
        template=listing_generation_prompt_template_csv
    )
    listing_generation_chain = LLMChain(llm=llm_instance, prompt=listing_prompt_csv)

    # Use the provided characteristics, or a list with one generic characteristic if none are provided or the list is empty
    characteristics_to_use = listing_characteristics if listing_characteristics and len(listing_characteristics) > 0 else ["a typical listing"]

    generated_raw_rows_list = []
    print(f"Generating {num_listings_to_gen} listings as raw LLM output strings...\n")

    # Loop to generate the specified number of listings
    for i in range(num_listings_to_gen):
        # Use modulo to cycle through characteristics
        current_characteristic = characteristics_to_use[i % len(characteristics_to_use)]
        print(f"Attempting to generate listing {i+1}/{num_listings_to_gen} with characteristic: '{current_characteristic}'")
        try:
            # Call the LLMChain to generate the raw string output
            response_csv_row_str = listing_generation_chain.run(characteristic_description=current_characteristic)

            # Append the raw string output directly to the list
            generated_raw_rows_list.append(response_csv_row_str)
            print(f"Listing {i+1} generated. Raw output captured.")

        except Exception as e:
            print(f"Error generating listing {i+1}: {e}")
        print("---")

    print(f"\nFinished generating {len(generated_raw_rows_list)} raw listing strings.")
    return generated_raw_rows_list

In [None]:
def save_listings_to_csv_file(csv_rows_data, target_file_path="listings.csv"):
    """
    Saves the generated CSV rows to a file.
    Returns the file_path if successful, None otherwise.
    """
    # CSV header line
    csv_header_line = "neighborhood,price,bedrooms,bathrooms,house_size,description,neighborhood_description"

    # Check if the CSV rows data is not empty
    if csv_rows_data:
        try:
            with open(target_file_path, "w", encoding="utf-8") as f:
                f.write(csv_header_line + "\n")
                for row_str in csv_rows_data:
                    f.write(row_str + "\n")
            print(f"\nGenerated {len(csv_rows_data)} listings saved to {target_file_path}")
            return target_file_path
        except IOError as e:
            print(f"Error saving listings to CSV '{target_file_path}': {e}")
            return None
    else:
        print("\nNo valid CSV listings were provided to save.")
        return None

In [31]:
# --- Configuration for Listing Generation ---
CONFIG_NUM_LISTINGS_TO_GENERATE = 10
CONFIG_CSV_OUTPUT_FILE = "listings.csv"
CONFIG_CUSTOM_LISTING_CHARACTERISTICS = [
        "A modern downtown apartment with city views", "A sprawling ranch-style home with a large backyard",
        "A cozy cottage perfect for a small family", "A luxurious penthouse with high-end amenities",
        "A historic Victorian home with original features", "A suburban family home with 4 bedrooms and a pool",
        "An eco-friendly house with solar panels", "A minimalist loft in an artistic neighborhood",
        "A lakefront property with a private dock", "A townhouse in a gated community"
    ]

In [None]:
# --- Execute Listing Generation and Saving ---

# Check if the number of listings to generate is greater than 0
if CONFIG_NUM_LISTINGS_TO_GENERATE > 0:
    print(f"\n--- Stage: Generating {CONFIG_NUM_LISTINGS_TO_GENERATE} New Listings ---")
    generated_listing_rows = generate_listings_csv_rows_raw(
        llm, # LLM instance from Step 1
        num_listings_to_gen=CONFIG_NUM_LISTINGS_TO_GENERATE, 
        listing_characteristics=CONFIG_CUSTOM_LISTING_CHARACTERISTICS
    )
    saved_csv_file_path = save_listings_to_csv_file(generated_listing_rows, target_file_path=CONFIG_CSV_OUTPUT_FILE)
    if not saved_csv_file_path:
        print(f"Critical: Failed to generate and save new listings to {CONFIG_CSV_OUTPUT_FILE}.")
else:
    print(f"\n--- Stage: Skipping listing generation. Using existing '{CONFIG_CSV_OUTPUT_FILE}' if available. ---")
    if not os.path.exists(CONFIG_CSV_OUTPUT_FILE):
        print(f"Warning: CSV file '{CONFIG_CSV_OUTPUT_FILE}' not found and generation was skipped. Subsequent steps might fail.")
    else:
        print(f"Confirmed: Using existing CSV file '{CONFIG_CSV_OUTPUT_FILE}'.")


--- Stage: Generating 10 New Listings ---
Generating 10 listings as raw LLM output strings...

Attempting to generate listing 1/10 with characteristic: 'A modern downtown apartment with city views'
Listing 1 generated. Raw output captured.
---
Attempting to generate listing 2/10 with characteristic: 'A sprawling ranch-style home with a large backyard'
Listing 2 generated. Raw output captured.
---
Attempting to generate listing 3/10 with characteristic: 'A cozy cottage perfect for a small family'
Listing 3 generated. Raw output captured.
---
Attempting to generate listing 4/10 with characteristic: 'A luxurious penthouse with high-end amenities'
Listing 4 generated. Raw output captured.
---
Attempting to generate listing 5/10 with characteristic: 'A historic Victorian home with original features'
Listing 5 generated. Raw output captured.
---
Attempting to generate listing 6/10 with characteristic: 'A suburban family home with 4 bedrooms and a pool'
Listing 6 generated. Raw output captur

#### Step 3: Storing Listings in a Vector Database
Vector Database Setup: Initialize and configure ChromaDB or a similar vector database to store real estate listings.
Generating and Storing Embeddings: Convert the LLM-generated listings into suitable embeddings that capture the semantic content of each listing, and store these embeddings in the vector database.

In [33]:
def load_documents_from_csv(file_path="listings.csv"):
    """
    Loads documents from a CSV file using CSVLoader.
    Returns a list of LangChain Document objects.
    """
    # Check if file exists and is not empty (beyond just header)
    header = "neighborhood,price,bedrooms,bathrooms,house_size,description,neighborhood_description"
    if not os.path.exists(file_path) or os.path.getsize(file_path) <= len(header) + 1: # +1 for newline
        print(f"CSV file '{file_path}' not found, is empty, or contains only the header. Cannot load documents.")
        return []
    try:
        # CSVLoader expects each row to be a document.
        # The 'page_content' will be a string representation of the row.
        # Metadata will include 'source' and 'row'.
        loader = CSVLoader(
            file_path=file_path,
            csv_args={'delimiter': ',', 'quotechar': '"'},
            encoding='utf-8'
        )
        loaded_documents = loader.load()
        print(f"{len(loaded_documents)} documents loaded from {file_path}.")
        return loaded_documents
    except Exception as e:
        print(f"Error loading CSV from '{file_path}': {e}")
        return []


In [None]:

def prepare_documents_for_vector_db(loaded_docs, csv_file_path_source="listings.csv"):
    """
    Prepares loaded documents for ChromaDB.
    The CSVLoader creates page_content like:
    'neighborhood: Sunnyvale\nprice: $1,200,000\nbedrooms: 3...'
    We need to extract metadata from this and decide what text to embed.
    For ChromaDB, we'll store the full CSV row data as metadata for later retrieval,
    and the combined description fields as the text to embed.
    """
    docs_for_embedding = []
    metadatas_for_db = []
    ids_for_db = []

    
    if not loaded_docs:
        print("No documents provided to prepare for vector DB.")
        return docs_for_embedding, metadatas_for_db, ids_for_db

    print(f"Preparing {len(loaded_docs)} loaded documents for ChromaDB...")
    expected_csv_keys = ["neighborhood", "price", "bedrooms", "bathrooms", "house_size", "description", "neighborhood_description"]

    # Ensure all expected keys are present in the metadata
    for i, doc in enumerate(loaded_docs):
        listing_metadata = {}
        # Parse the page_content (which is "key: value\nkey: value...")
        content_lines = doc.page_content.strip().split('\n')
        for line in content_lines:
            parts = line.split(':', 1)
            if len(parts) == 2:
                key = parts[0].strip()
                value = parts[1].strip()
                if key in expected_csv_keys:
                    listing_metadata[key] = value
        
        # Fill missing expected keys
        for key in expected_csv_keys:
            listing_metadata.setdefault(key, 'N/A') # Use setdefault to avoid overwriting

        # Create a unique ID and add source info from CSVLoader's metadata
        listing_id = f"listing_{doc.metadata.get('row', i) + 1}" # CSVLoader provides 'row'
        listing_metadata['id'] = listing_id
        listing_metadata['source_file'] = doc.metadata.get('source', csv_file_path_source)
        listing_metadata['csv_row_index'] = doc.metadata.get('row', i)

        # Text for embedding: Combine descriptive fields for richer semantics
        # Ensure all parts are strings before joining.
        text_to_embed = f"Property in {listing_metadata.get('neighborhood', '')}. " \
                        f"Features: {listing_metadata.get('description', '')}. " \
                        f"Neighborhood details: {listing_metadata.get('neighborhood_description', '')}"
        
        # Basic validation
        if not listing_metadata.get('description') or listing_metadata.get('description') == 'N/A':
            print(f"Warning: Document at CSV row {listing_metadata['csv_row_index']} is missing 'description'. Skipping.")
            continue

        docs_for_embedding.append(text_to_embed)
        metadatas_for_db.append(listing_metadata) # Store all parsed CSV fields as metadata
        ids_for_db.append(listing_id)
        
        if i < 2: # Print first few for verification
             print(f"Prepared for DB (doc {i+1}): ID='{listing_id}', Text to embed (start): '{text_to_embed[:100]}...', Metadata: {listing_metadata}")

    print(f"Successfully prepared {len(docs_for_embedding)} documents for ChromaDB.")
    return docs_for_embedding, metadatas_for_db, ids_for_db

In [None]:
def initialize_and_populate_vector_db(embed_model, documents, metadatas, ids, collection_name="home_match_listings", persist_dir=None):
    """
    Initializes ChromaDB, clears existing collection, populates with new data.
    Returns the vector_db instance.
    """
    print(f"Initializing ChromaDB collection: '{collection_name}'")
    if persist_dir:
        if not os.path.exists(persist_dir):
            os.makedirs(persist_dir)
        print(f"Persisting ChromaDB to directory: {persist_dir}")
    
    # This approach creates a client and then gets/creates a collection.
    import chromadb
    if persist_dir:
        client_settings = chromadb.Settings(persist_directory=persist_dir, is_persistent=True)
        chroma_client = chromadb.Client(client_settings)
    else:
        chroma_client = chromadb.Client() # In-memory client


    # Delete collection if it exists, to ensure a fresh start for the demo
    try:
        if collection_name in [col.name for col in chroma_client.list_collections()]:
            print(f"Clearing existing collection: '{collection_name}'")
            chroma_client.delete_collection(name=collection_name)
    except Exception as e:
        print(f"Notice: Could not check/delete existing collection (may be first run or client issue): {e}")

    # Create the LangChain Chroma object using the client and collection name
    vector_db = Chroma(
        client=chroma_client,
        collection_name=collection_name,
        embedding_function=embed_model,
        # persist_directory=persist_dir # Not needed here if client is already configured
    )

    # Add documents to the vector database
    if documents and metadatas and ids:
        vector_db.add_texts(texts=documents, metadatas=metadatas, ids=ids)
        # If persistent, ensure data is saved
        if persist_dir:
            # chroma_client.persist() # Deprecated. Persistence is handled by client settings for newer chromadb versions.
            # For very old versions of chromadb with Langchain, vector_db.persist() might be needed.
            # Current best practice is to configure client for persistence.
             print(f"Data added to persistent collection '{collection_name}'.")

        print(f"Added {len(documents)} listings to ChromaDB collection '{collection_name}'.")
        print(f"Total documents in collection: {vector_db._collection.count()}")
    else:
        print("No valid documents provided to populate the vector database.")
        if vector_db._collection.count() == 0:
             print(f"ChromaDB collection '{collection_name}' is empty.")
    
    return vector_db

In [36]:
# --- Configuration for Vector DB ---
#CONFIG_CSV_OUTPUT_FILE = "listings.csv"
VECTOR_DB_COLLECTION_NAME_CONFIG = "homematch_listings_db"
# Optional: Set a directory to persist the DB, e.g.,  "./chroma_persistent_db"
# Set to None for in-memory database (will be rebuilt on each run)
VECTOR_DB_PERSIST_DIRECTORY_CONFIG = None

In [None]:
# --- Execute Loading, Preparation, and DB Population ---
vector_db = None # Initialize to None
print("\n--- Loading, Preparing Documents, and Populating Vector Database ---")
loaded_documents = load_documents_from_csv(file_path=CONFIG_CSV_OUTPUT_FILE) # Use path from Step 2

if loaded_documents:
    docs_to_embed, metadatas, unique_ids = prepare_documents_for_vector_db(loaded_documents, csv_file_path_source=CONFIG_CSV_OUTPUT_FILE)
    
    # Check if any documents were prepared for embedding
    if docs_to_embed: 
        # Initialize and populate the vector database
        vector_db = initialize_and_populate_vector_db(
            embeddings_model, # Use the embeddings_model from Step 1
            documents=docs_to_embed,
            metadatas=metadatas,
            ids=unique_ids,
            collection_name=VECTOR_DB_COLLECTION_NAME_CONFIG,
            persist_dir=VECTOR_DB_PERSIST_DIRECTORY_CONFIG
        )
    else:
        print("No documents were successfully prepared for the vector database. Database will not be populated.")
else:
    print(f"Could not load documents from {CONFIG_CSV_OUTPUT_FILE}. Vector DB will not be populated.")

# Final check: Print the number of documents in the vector database
if vector_db:
    print(f"\nVector DB '{VECTOR_DB_COLLECTION_NAME_CONFIG}' ready. Contains {vector_db._collection.count()} documents.")
else:
    print(f"\nVector DB '{VECTOR_DB_COLLECTION_NAME_CONFIG}' initialization failed or was skipped due to missing data.")


--- Loading, Preparing Documents, and Populating Vector Database ---
10 documents loaded from listings.csv.
Preparing 10 loaded documents for ChromaDB...
Prepared for DB (doc 1): ID='listing_1', Text to embed (start): 'Property in Downtown Views. Features: Enjoy city living at its finest in this modern downtown apartm...', Metadata: {'neighborhood': 'Downtown Views', 'price': '$650,000', 'bedrooms': '2', 'bathrooms': '2', 'house_size': '120 m²', 'description': "Enjoy city living at its finest in this modern downtown apartment with stunning views of the city skyline. This sleek and stylish unit features an open floor plan, high-end finishes, and floor-to-ceiling windows. Don't miss out on this urban oasis!", 'neighborhood_description': 'Downtown Views is a vibrant neighborhood in the heart of the city, known for its bustling streets, trendy restaurants, and convenient access to public transportation. Experience the excitement of downtown living in this prime location.', 'id': 'listing_

#### Step 4: Building the User Preference Interface
Collect buyer preferences, such as the number of bedrooms, bathrooms, location, and other specific requirements from a set of questions or telling the buyer to enter their preferences in natural language. You can hard-code the buyer preferences in questions and answers, or collect them interactively however you'd like.

In [None]:
# For personalization, you can add your own questions and answers
# This is a default set of questions and answers for the demo
CONFIG_QUESTION_PREFERENCES = [
    "How big do you want your house to be?",
    "What are 3 most important things for you in choosing this property?",
    "Which amenities would you like?",
    "Which transportation options are important to you?",
    "How urban do you want your neighborhood to be?"
]
CONFIG_ANSWER_PREFERENCES = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
]

In [None]:
def get_buyer_preferences_string(questions, answers):
    """
    Combines answers into a single string representing buyer preferences.
    Prints the individual preferences and the combined string.
    """
    # Check if the number of questions and answers match
    if len(questions) != len(answers):
        # You could raise an error or handle this more gracefully
        print("Error: The number of questions and answers must be the same.")
        return ""
    
    # Combine the answers into a single string
    preferences_list = []
    print("\nCollecting Buyer Preferences:")
    for q_idx, (q, a) in enumerate(zip(questions, answers)):
        print(f"Q{q_idx+1}: {q}\nA{q_idx+1}: {a}\n")
        preferences_list.append(a)
        
    buyer_preferences_str = " ".join(preferences_list)
    print(f"--- Combined Buyer Preferences String for Searching ---\n{buyer_preferences_str}\n---")
    return buyer_preferences_str

In [40]:
# --- Execute Buyer Preference Collection ---
buyer_preferences_search_string = get_buyer_preferences_string(
    CONFIG_QUESTION_PREFERENCES, 
    CONFIG_ANSWER_PREFERENCES
)


Collecting Buyer Preferences:
Q1: How big do you want your house to be?
A1: A comfortable three-bedroom house with a spacious kitchen and a cozy living room.

Q2: What are 3 most important things for you in choosing this property?
A2: A quiet neighborhood, good local schools, and convenient shopping options.

Q3: Which amenities would you like?
A3: A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.

Q4: Which transportation options are important to you?
A4: Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.

Q5: How urban do you want your neighborhood to be?
A5: A balance between suburban tranquility and access to urban amenities like restaurants and theaters.

--- Combined Buyer Preferences String for Searching ---
A comfortable three-bedroom house with a spacious kitchen and a cozy living room. A quiet neighborhood, good local schools, and convenient shopping options. A backyard for gardening, a two-car 

#### Step 5: Searching Based on Preferences
Semantic Search Implementation: Use the structured buyer preferences to perform a semantic search on the vector database, retrieving listings that most closely match the user's requirements.
Listing Retrieval Logic: Fine-tune the retrieval algorithm to ensure that the most relevant listings are selected based on the semantic closeness to the buyer’s preferences.

In [None]:
def search_listings(v_db, preferences_str, num_results=3):
    """
    Performs a semantic search in the vector_db based on buyer preferences.
    Returns a list of dictionaries, where each dict contains 'metadata' and 'score'.
    """
    retrieved_listings_with_details = []
    # Check if the vector database is initialized and has documents
    if not v_db:
        print("\nVector database is not initialized. Cannot perform search.")
        return retrieved_listings_with_details
    if v_db._collection.count() == 0:
        print("\nVector database is empty. Cannot perform search.")
        return retrieved_listings_with_details

    print(f"\n--- Searching for Top {num_results} Listings Based on Preferences ---")
    print(f"(Database contains {v_db._collection.count()} listings)")
    
    # Ensure k is not greater than the number of documents in the collection
    actual_num_results = min(num_results, v_db._collection.count())
    if actual_num_results <= 0:
        print("Adjusted number of results to retrieve is 0 or less. No search will be performed.")
        return retrieved_listings_with_details
    
    # Perform similarity search with scores
    retrieved_docs_and_scores = v_db.similarity_search_with_score(
        preferences_str,
        k=actual_num_results
    )
    
    print(f"\n--- Top {len(retrieved_docs_and_scores)} Retrieved Listings (Lower score is more similar) ---")
    for doc_obj, score in retrieved_docs_and_scores:
        # doc_obj.metadata should contain the full parsed CSV data we stored earlier
        print(f"Listing ID: {doc_obj.metadata.get('id', 'N/A')}")
        print(f"  Neighborhood: {doc_obj.metadata.get('neighborhood', 'N/A')}")
        print(f"  Price: {doc_obj.metadata.get('price', 'N/A')}")
        print(f"  Bedrooms: {doc_obj.metadata.get('bedrooms', 'N/A')}, Bathrooms: {doc_obj.metadata.get('bathrooms', 'N/A')}")
        print(f"  Original Description (Snippet): {doc_obj.metadata.get('description', '')[:150]}...") # Show a snippet
        print(f"  Similarity Score (L2 Distance): {score:.4f}") # Chroma's default score is L2 distance
        print("------------------------------------")
        retrieved_listings_with_details.append({'metadata': doc_obj.metadata, 'score': score, 'original_text_embedded': doc_obj.page_content})
        
    return retrieved_listings_with_details

In [42]:
# --- Configuration for Search ---
NUM_RESULTS_TO_RETRIEVE_CONFIG = 3

# --- Execute Search ---
# Ensure buyer_preferences_search_string is available from Step 4
# Ensure vector_db is available from Step 3
retrieved_listings_for_personalization = [] # Initialize
if vector_db and buyer_preferences_search_string:
    retrieved_listings_for_personalization = search_listings(
        vector_db, 
        buyer_preferences_search_string, 
        num_results=NUM_RESULTS_TO_RETRIEVE_CONFIG
    )
else:
    print("\nSkipping search: Vector DB or buyer preferences not available.")

if not retrieved_listings_for_personalization:
    print("No listings were retrieved based on the current preferences and database state.")


--- Searching for Top 3 Listings Based on Preferences ---
(Database contains 10 listings)

--- Top 3 Retrieved Listings (Lower score is more similar) ---
Listing ID: listing_6
  Neighborhood: Maple Grove
  Price: $650,000
  Bedrooms: 4, Bathrooms: 2
  Original Description (Snippet): This charming suburban family home features 4 bedrooms, a spacious living area, a backyard pool perfect for entertaining, and a cozy fireplace. Don't ...
  Similarity Score (L2 Distance): 0.2547
------------------------------------
Listing ID: listing_7
  Neighborhood: Maple Grove
  Price: $650,000
  Bedrooms: 4, Bathrooms: 3
  Original Description (Snippet): Welcome to this stunning eco-friendly home featuring solar panels, energy-efficient appliances, and sustainable materials throughout. Enjoy the spacio...
  Similarity Score (L2 Distance): 0.2723
------------------------------------
Listing ID: listing_3
  Neighborhood: Maple Grove
  Price: $350,000
  Bedrooms: 2, Bathrooms: 1
  Original Description (S

#### Step 6: Personalizing Listing Descriptions
LLM Augmentation: For each retrieved listing, use the LLM to augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
Maintaining Factual Integrity: Ensure that the augmentation process enhances the appeal of the listing without altering factual information.

In [None]:
def personalize_listing_descriptions(llm_instance, retrieved_listings_data, buyer_prefs_str):
    """
    Personalizes the descriptions of retrieved listings using an LLM.
    'retrieved_listings_data' is a list of dicts, each with 'metadata' and 'score'.
    """
    # Create a prompt template for personalizing the property description
    personalization_prompt_template_str = """
    You are a helpful real estate assistant. Your task is to rewrite a property description to highlight aspects that match the buyer's specific preferences, while strictly adhering to the original property's facts.

    Buyer's Preferences:
    {buyer_preferences}

    Original Property Listing Details:
    Neighborhood: {neighborhood}
    Price: {price}
    Bedrooms: {bedrooms}
    Bathrooms: {bathrooms}
    House Size: {house_size}
    Original Full Description: {original_description}
    Original Neighborhood Description: {neighborhood_description}

    Instructions for Personalization:
    1. Carefully review the "Buyer's Preferences".
    2. Read the "Original Property Listing Details" thoroughly.
    3. Rewrite the "Original Full Description" to create a "Personalized Property Description".
    4. In the "Personalized Property Description", emphasize features and aspects of the property and neighborhood (from the original details provided) that directly align with the "Buyer's Preferences".
    5. DO NOT invent new facts, features, or amenities about the property or neighborhood that are not present in the "Original Property Listing Details".
    6. If the original description is very short or lacks details relevant to many preferences, acknowledge this subtly if necessary, but still try to highlight what IS relevant.
    7. The personalized description should be engaging and flow naturally.
    8. Maintain a professional and appealing tone.
    9. Directly address how the property meets specific preferences if possible (e.g., "You mentioned wanting a quiet neighborhood, and this home is located on a peaceful cul-de-sac...").

    Personalized Property Description:
    """
    # Create a PromptTemplate instance with the defined template
    # and the input variables for the buyer's preferences and listing details
    personalization_prompt = PromptTemplate(
        input_variables=["buyer_preferences", "neighborhood", "price", "bedrooms", "bathrooms", "house_size", "original_description", "neighborhood_description"],
        template=personalization_prompt_template_str
    )
    personalization_chain = LLMChain(llm=llm_instance, prompt=personalization_prompt)

    print("\n--- Personalizing Retrieved Listing Descriptions ---")
    if not retrieved_listings_data:
        print("No listings were retrieved to personalize.")
        return

    for listing_data_item in retrieved_listings_data:
        listing_metadata = listing_data_item['metadata'] # This is the dict of all CSV fields
        similarity_score = listing_data_item['score']
        
        print(f"\n--- Personalizing Listing ID: {listing_metadata.get('id', 'N/A')} (Similarity Score: {similarity_score:.4f}) ---")
        
        # Prepare payload for the LLM, ensuring all keys from metadata are present
        payload_for_llm = {
            "buyer_preferences": buyer_prefs_str,
            "neighborhood": listing_metadata.get('neighborhood', 'N/A'),
            "price": listing_metadata.get('price', 'N/A'),
            "bedrooms": listing_metadata.get('bedrooms', 'N/A'),
            "bathrooms": listing_metadata.get('bathrooms', 'N/A'),
            "house_size": listing_metadata.get('house_size', 'N/A'),
            "original_description": listing_metadata.get('description', 'N/A'), # This is the key for property description
            "neighborhood_description": listing_metadata.get('neighborhood_description', 'N/A') # Key for neighborhood info
        }
        
        # Validate essential data for personalization
        if payload_for_llm["original_description"] == 'N/A' or not payload_for_llm["original_description"].strip():
             print(f"Original Description is missing or empty for Listing ID: {listing_metadata.get('id', 'N/A')}. Cannot personalize effectively.")
             print(f"Displaying Original Description (if any):\n{payload_for_llm['original_description']}")
             print("------------------------------------")
             continue

        try:
            personalized_response = personalization_chain.invoke(payload_for_llm)
            personalized_description = personalized_response.get('text', '').strip()
            
            print(f"Original Description (Property):\n{payload_for_llm['original_description']}")
            if payload_for_llm['neighborhood_description'] and payload_for_llm['neighborhood_description'] != 'N/A':
                 print(f"\nOriginal Description (Neighborhood):\n{payload_for_llm['neighborhood_description']}")
            
            print(f"\n>>> Personalized Description for Buyer <<<\n{personalized_description}")
            
        except Exception as e:
            print(f"Error personalizing listing {listing_metadata.get('id', 'N/A')}: {e}")
        print("------------------------------------")

In [44]:
# --- Execute Personalization ---
# Ensure llm, retrieved_listings_for_personalization, and buyer_preferences_search_string are available
if llm and retrieved_listings_for_personalization and buyer_preferences_search_string:
    personalize_listing_descriptions(
        llm, # LLM from Step 1
        retrieved_listings_for_personalization, # Results from Step 5
        buyer_preferences_search_string # Preferences from Step 4
    )
else:
    print("\nSkipping personalization: LLM, retrieved listings, or buyer preferences not available.")


--- Personalizing Retrieved Listing Descriptions ---

--- Personalizing Listing ID: listing_6 (Similarity Score: 0.2547) ---
Original Description (Property):
This charming suburban family home features 4 bedrooms, a spacious living area, a backyard pool perfect for entertaining, and a cozy fireplace. Don't miss out on this opportunity!

Original Description (Neighborhood):
Maple Grove is a peaceful neighborhood with tree-lined streets, great schools, and friendly neighbors. Close to parks, shopping, and dining options.

>>> Personalized Description for Buyer <<<
Located in the peaceful neighborhood of Maple Grove, this charming family home offers the perfect blend of suburban tranquility and urban convenience. With 4 bedrooms, including a comfortable three-bedroom setup ideal for your family, this spacious house boasts a cozy living area perfect for relaxing and entertaining guests.

The backyard features a pool for leisurely summer days and ample space for gardening, fulfilling your 

##### Step 7: Deliverables and Testing
Test your "HomeMatch" application and make sure it meets all of the requirements in the rubric. Your project code will be run when it's assessed. Enter different "buyer preferences" and ensure it works.
Jupyter Notebook/Python Program: Compile the application code in a Jupyter notebook or a standalone Python program. Ensure the code is well-commented and logically structured.
Example Outputs: Include example outputs showcasing how user preferences are processed and how the application generates personalized listing descriptions. You can include these in comments in your application or in a Jupyter notebook that's saved with outputs.

**To test with different buyer preferences:**

Create a list with the desired answers for your new buyer (aka `NEW_ANSWER_PREFERENCES` below), ensuring they correspond to the questions in `CONFIG_QUESTION_PREFERENCES`. Then, run the following functions in order:

1. `get_buyer_preferences_string(...)`
2. `search_listings(...)`
3. `personalize_listing_descriptions(...)`

In [None]:
# New answers for the demo for testing a different set of preferences
NEW_ANSWER_PREFERENCES = [
    "A smaller, low-maintenance apartment or condo, perhaps a one or two-bedroom unit.",
    "Safety, proximity to parks, and good internet connectivity.",
    "Access to a swimming pool, in-unit laundry, and a dedicated parking spot.",
    "Walkability to shops and cafes, nearby bike trails, and easy access to public transport (train/subway).",
    "A quiet, suburban feel with minimal traffic and plenty of green space."
]

In [None]:
# Execute new buyer preferences string generation
buyer_preferences_search_string = get_buyer_preferences_string(
    CONFIG_QUESTION_PREFERENCES, 
    NEW_ANSWER_PREFERENCES
)


Collecting Buyer Preferences:
Q1: How big do you want your house to be?
A1: A smaller, low-maintenance apartment or condo, perhaps a one or two-bedroom unit.

Q2: What are 3 most important things for you in choosing this property?
A2: Safety, proximity to parks, and good internet connectivity.

Q3: Which amenities would you like?
A3: Access to a swimming pool, in-unit laundry, and a dedicated parking spot.

Q4: Which transportation options are important to you?
A4: Walkability to shops and cafes, nearby bike trails, and easy access to public transport (train/subway).

Q5: How urban do you want your neighborhood to be?
A5: A quiet, suburban feel with minimal traffic and plenty of green space.

--- Combined Buyer Preferences String for Searching ---
A smaller, low-maintenance apartment or condo, perhaps a one or two-bedroom unit. Safety, proximity to parks, and good internet connectivity. Access to a swimming pool, in-unit laundry, and a dedicated parking spot. Walkability to shops and 

In [None]:
# Retrieve listings again based on the new preferences
retrieved_listings_for_personalization = search_listings(
    vector_db, 
    buyer_preferences_search_string, 
    num_results=3
)


--- Searching for Top 3 Listings Based on Preferences ---
(Database contains 10 listings)

--- Top 3 Retrieved Listings (Lower score is more similar) ---
Listing ID: listing_1
  Neighborhood: Downtown Views
  Price: $650,000
  Bedrooms: 2, Bathrooms: 2
  Original Description (Snippet): Enjoy city living at its finest in this modern downtown apartment with stunning views of the city skyline. This sleek and stylish unit features an ope...
  Similarity Score (L2 Distance): 0.3394
------------------------------------
Listing ID: listing_8
  Neighborhood: Artisan Village
  Price: $600,000
  Bedrooms: 2, Bathrooms: 2
  Original Description (Snippet): Experience modern living in this sleek and stylish loft. With open spaces and natural light, this minimalist home is perfect for those seeking a conte...
  Similarity Score (L2 Distance): 0.3418
------------------------------------
Listing ID: listing_10
  Neighborhood: Maple Grove
  Price: $450,000
  Bedrooms: 2, Bathrooms: 2
  Original Descri

In [None]:
#Personalize the descriptions of the newly retrieved listings based on the new answers by the buyer
personalize_listing_descriptions(
    llm, 
    retrieved_listings_for_personalization, 
    buyer_preferences_search_string
)


--- Personalizing Retrieved Listing Descriptions ---

--- Personalizing Listing ID: listing_1 (Similarity Score: 0.3394) ---
Original Description (Property):
Enjoy city living at its finest in this modern downtown apartment with stunning views of the city skyline. This sleek and stylish unit features an open floor plan, high-end finishes, and floor-to-ceiling windows. Don't miss out on this urban oasis!

Original Description (Neighborhood):
Downtown Views is a vibrant neighborhood in the heart of the city, known for its bustling streets, trendy restaurants, and convenient access to public transportation. Experience the excitement of downtown living in this prime location.

>>> Personalized Description for Buyer <<<
Experience the perfect blend of city living and suburban tranquility in this 2-bedroom, 2-bathroom apartment located in the vibrant Downtown Views neighborhood. Situated in a modern building with stunning city skyline views, this sleek unit offers an open floor plan, high-e