This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 

In [1]:
import os
import pandas as pd
import chromadb
from langchain.llms import OpenAI
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

# Set API Key for OpenAI
os.environ["OPENAI_API_KEY"] = "voc-2048388894126677382773267a994f8caf0d5.26888284"
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"


In [2]:
# Load the CSV file into pandas DataFrame, specifying the columns to load
csv_file = 'listing.csv'  # Make sure this is the correct path
listings_df = pd.read_csv(csv_file, encoding='utf-8', usecols=lambda column: column not in ['Unnamed: 7'])

# Display the DataFrame in tabular format
listings_df.head()

Unnamed: 0,Neighborhood,Price,Bedrooms,Bathrooms,House Size (sqft),Description,Neighborhood Description
0,Willow Creek,"$650,000",4,3,2400,Nestled in the quiet Willow Creek neighborhood...,Willow Creek is known for its family-friendly ...
1,Oakwood Ridge,"$950,000",5,4,3200,"This luxurious 5-bedroom, 4-bathroom home in O...",Oakwood Ridge offers an upscale lifestyle with...
2,Rose Garden,"$425,000",2,1,1100,"This cozy 2-bedroom, 1-bathroom home in the Ro...",Rose Garden is a vibrant neighborhood with a m...
3,Lakeview Heights,"$1,200,000",6,5,4000,"Located in the prestigious Lakeview Heights, t...",Lakeview Heights offers exclusivity and breath...
4,Maple Grove,"$575,000",3,2,1800,"This charming 3-bedroom, 2-bathroom home is lo...",Maple Grove is known for its peaceful atmosphe...


In [3]:
# Initialize Chroma client
client = chromadb.Client()

# Define collection name
collection_name = "real_estate_listings"

# Check if the collection already exists and delete it if necessary
if collection_name in client.list_collections():
    print(f"Collection '{collection_name}' already exists. Deleting...")
    client.delete_collection(name=collection_name)

# Create a new collection
collection = client.create_collection(name=collection_name)

# Confirm the collection is created
print(f"Collection '{collection_name}' created successfully.")


Collection 'real_estate_listings' created successfully.


In [4]:
# Initialize OpenAI embeddings
embeddings = OpenAIEmbeddings()

# Function to generate embeddings (this uses the correct method in langchain)
def generate_embeddings(text):
    # Generate embedding using langchain's OpenAIEmbeddings
    return embeddings.embed_query(text)  # Correct method to generate embeddings

# Generate embeddings for each listing description
listings_df['embedding'] = listings_df['Description'].apply(generate_embeddings)

# Check the DataFrame with embeddings
listings_df.head()

Unnamed: 0,Neighborhood,Price,Bedrooms,Bathrooms,House Size (sqft),Description,Neighborhood Description,embedding
0,Willow Creek,"$650,000",4,3,2400,Nestled in the quiet Willow Creek neighborhood...,Willow Creek is known for its family-friendly ...,"[0.010881074604915782, 0.018381876703214218, -..."
1,Oakwood Ridge,"$950,000",5,4,3200,"This luxurious 5-bedroom, 4-bathroom home in O...",Oakwood Ridge offers an upscale lifestyle with...,"[0.0061523921517389925, 0.005893447678132679, ..."
2,Rose Garden,"$425,000",2,1,1100,"This cozy 2-bedroom, 1-bathroom home in the Ro...",Rose Garden is a vibrant neighborhood with a m...,"[0.00825472745460379, 0.015740057655477865, -0..."
3,Lakeview Heights,"$1,200,000",6,5,4000,"Located in the prestigious Lakeview Heights, t...",Lakeview Heights offers exclusivity and breath...,"[-0.005485554621941153, 0.0025812475615214826,..."
4,Maple Grove,"$575,000",3,2,1800,"This charming 3-bedroom, 2-bathroom home is lo...",Maple Grove is known for its peaceful atmosphe...,"[0.0011015332243873193, 0.01886712443542094, -..."


In [5]:
# Add data to ChromaDB (documents, embeddings, and metadata)
documents = listings_df['Description'].tolist()
embeddings_list = listings_df['embedding'].tolist()
metadatas = [{'description': desc} for desc in documents]  # Metadata as a simple dictionary

# Create a unique ID for each document, using the DataFrame index
ids = [str(i) for i in listings_df.index.tolist()]

# Add the documents, metadata, embeddings, and ids to the Chroma collection
collection.add(
    documents=documents,
    metadatas=metadatas,
    embeddings=embeddings_list,
    ids=ids  # Make sure to include the ids
)

In [6]:
# Confirm data has been added by checking the collection
print(collection.get())

{'ids': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], 'embeddings': None, 'metadatas': [{'description': 'Nestled in the quiet Willow Creek neighborhood, this spacious 4-bedroom, 3-bathroom home is perfect for a growing family. The large master suite comes with a walk-in closet and a luxurious en-suite bath with a soaking tub. The open-plan living area features a fireplace and large windows that bring in plenty of natural light. Enjoy outdoor living on the deck overlooking a landscaped backyard, ideal for BBQs and relaxation.'}, {'description': 'This luxurious 5-bedroom, 4-bathroom home in Oakwood Ridge is a must-see. The gourmet kitchen features high-end appliances, granite countertops, and a spacious island perfect for entertaining. The open-concept living area flows seamlessly into a formal dining room. Outside, a large patio area with a built-in grill and fireplace invites you to enjoy the outdoors year-round.'}, {'description': 'This cozy 2-bedroom, 1-bathroom home in the Ros

In [7]:
# Define the user preferences and perform the search (Step 5)
questions = [
    "How big do you want your house to be?", 
    "What are 3 most important things for you in choosing this property?", 
    "Which amenities would you like?", 
    "Which transportation options are important to you?",
    "How urban do you want your neighborhood to be?"
]

# Example answers (buyer preferences)
answers = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
]

# Combine the answers into a single string for semantic search
preferences = " ".join(answers)

In [8]:
# Perform the semantic search based on the preferences

# Combine the answers into a single string for semantic search
preferences = " ".join(answers)

# Convert the preferences into an embedding using the embed_query method
preference_embedding = embeddings.embed_query(preferences)

# Perform the search with the embedding for the user's preferences
results = collection.query(
    query_embeddings=[preference_embedding],
    n_results=3  # Number of results to return
)

# Display the results
print("Top matching listings based on user preferences:")
for result in results['documents']:
    print(result)

Top matching listings based on user preferences:
['This charming 3-bedroom, 2-bathroom home offers the perfect blend of comfort and style. The open-concept living space includes vaulted ceilings, a cozy fireplace, and large windows that provide lots of natural light. The kitchen is updated with stainless steel appliances and ample storage, and the backyard is a peaceful retreat with a patio and lush greenery.', 'This charming 3-bedroom, 2-bathroom home is located in the heart of Maple Grove. The spacious living area includes a cozy fireplace, while the kitchen boasts modern appliances and a large breakfast bar. The backyard features a large deck perfect for entertaining or relaxing in the sun.', 'This cozy 2-bedroom, 1-bathroom home in the Rose Garden area is perfect for first-time buyers or downsizers. The updated kitchen boasts modern appliances, while the living room features beautiful hardwood floors and large windows. Enjoy the charm of this small but charming property, complete w