This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 

# Step 1: Setting Up the Python Application

In [None]:
!pip install pandas
!pip install chromadb
!pip install langchain
!pip install numpy
!pip install -U langchain-openai
!pip install pydantic
!pip install shutil
!pip install openai==0.28

In [1]:
import os
import pandas as pd
import shutil
from dataclasses import dataclass

from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.evaluation import load_evaluator
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
from langchain.vectorstores.chroma import Chroma
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

from typing import List
from langchain.output_parsers import PydanticOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field, NonNegativeInt
from langchain.prompts import PromptTemplate
from fastapi.encoders import jsonable_encoder

# Step 2: Generating Real Estate Listings

## Define OpenAI model and API Key

In [2]:
# Environment variables
OPENAI_API_KEY = 'voc-xxx'
MODEL_NAME = 'gpt-3.5-turbo'

## Load LLM

In [None]:
# Loading the Model
llm = OpenAI(model_name=MODEL_NAME, temperature=0, api_key=OPENAI_API_KEY)

INSTRUCTION = "Generate a CSV file with at least 10 real estate listings."
SAMPLE_LISTING = \
"""
Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.
"""

In [None]:
# Initiate the model class
class RealEstateListing(BaseModel):
    """
    A real estate listing.
    
    Attributes:
    - neighborhood: str
    - price: NonNegativeInt
    - bedrooms: NonNegativeInt
    - bathrooms: NonNegativeInt
    - house_size: NonNegativeInt
    - description: str
    - neighborhood_description: str
    """
    neighborhood: str = Field(description="The neighborhood where the property is located")
    price: NonNegativeInt = Field(description="The price of the property in USD")
    bedrooms: NonNegativeInt = Field(description="The number of bedrooms in the property")
    bathrooms: NonNegativeInt = Field(description="The number of bathrooms in the property")
    house_size: NonNegativeInt = Field(description="The size of the house in square feet")
    description: str = Field(description="A description of the property")
    neighborhood_description: str = Field(description="A description of the neighborhood.")  

class ListingCollection(BaseModel):
    """
    A collection of real estate listings.
    
    Attributes:
    - listings: List[RealEstateListing]
    """
    listings: List[RealEstateListing] = Field(description="A list of real estate listings")

In [None]:
# Intiate the parser
# Parse the collection
parser = PydanticOutputParser(pydantic_object=ListingCollection)

In [None]:
# Output the report
prompt = PromptTemplate(
    template="{instruction}\n{sample}\n{format_instructions}\n",
    input_variables=["instruction", "sample"],
    partial_variables={"format_instructions": parser.get_format_instructions},
)

query = prompt.format(
    instruction=INSTRUCTION,
    sample=SAMPLE_LISTING,
)
print(query)

In [None]:
# Output the response
response = llm(query)

In [None]:
# Parse the output and stich to the panda df
result = parser.parse(response)
df = pd.DataFrame(jsonable_encoder(result.listings))
df.head()

In [None]:
# Output to the CSV file
df.to_csv('real_estate_listings.csv', index_label = 'id')

# Step 3: Storing Listings in a Vector Database

* Vector Database Setup: Initialize and configure ChromaDB or a similar vector database to store real estate listings.

* Generating and Storing Embeddings: Convert the LLM-generated listings into suitable embeddings that capture the semantic content of each listing, and store these embeddings in the vector database.

In [3]:
# Initialize and configure ChromaDB or a similar vector database to store real estate listings
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
CHROMA_PATH = "chroma"
CSV_PATH = "real_estate_listings.csv" 

embedding_function = OpenAIEmbeddings()

df = pd.read_csv(CSV_PATH)
documents = []
for index, row in df.iterrows():
    documents.append(Document(page_content=row['description'], metadata={'id': str(index)}))


# Split Text
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=100,
    length_function=len,
    add_start_index=True,
)
chunks = text_splitter.split_documents(documents)
print(f"Split {len(documents)} documents into {len(chunks)} chunks.")

if chunks:
    document = chunks[10]
    print(document.page_content)
    print(document.metadata)

# Save to Chroma
if os.path.exists(CHROMA_PATH):
    shutil.rmtree(CHROMA_PATH)

db = Chroma.from_documents(
    chunks, OpenAIEmbeddings(), persist_directory=CHROMA_PATH
)
db.persist()
print(f"Saved {len(chunks)} chunks to {CHROMA_PATH}.")

  warn_deprecated(


Split 10 documents into 23 chunks.
Situated in the historic neighborhood of Georgetown, this 4-bedroom, 3-bathroom townhouse offers a blend of modern amenities and classic charm. The open-concept living and dining area features exposed brick walls, hardwood floors, and a cozy fireplace. The gourmet kitchen is equipped with stainless
{'id': '4', 'start_index': 0}
Saved 23 chunks to chroma.


# Step 4: Building the User Preference Interface
 * Collect buyer preferences, such as the number of bedrooms, bathrooms, location, and other specific requirements from a set of questions or telling the buyer to enter their preferences in natural language. You can hard-code the buyer preferences in questions and answers, or collect them interactively however you'd like, example:

* Buyer Preference Parsing: Implement logic to interpret and structure these preferences for querying the vector database.

In [41]:
query_text = "A comfortable three-bedroom house with a spacious kitchen and a cozy living room." 

In [42]:
BASIC_PROMPT_TEMPLATE =\
"""
Based on the following context:

{context}

---

Answer the question : {question}
"""

# Step 5: Searching Based on Preferences
* Semantic Search Implementation: Use the structured buyer preferences to perform a semantic search on the vector database, retrieving listings that most closely match the user's requirements.
* Listing Retrieval Logic: Fine-tune the retrieval algorithm to ensure that the most relevant listings are selected based on the semantic closeness to the buyer’s preferences.

In [43]:
def predict_response(query_text, PROMPT_TEMPLATE):
    embedding_function = OpenAIEmbeddings()
    db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_function)

    # Search the DB.
    results = db.similarity_search_with_relevance_scores(query_text, k=3)
    if len(results) == 0 or results[0][1] < 0.7:
        print(f"Unable to find matching results.")
    else:
        context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
        prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
        prompt = prompt_template.format(context=context_text, question=query_text)
        print(f"Generated Prompt:\n{prompt}")
        
        model = ChatOpenAI()
        response_text = model.predict(prompt)
        sources = [doc.metadata.get("id", None) for doc, _score in results]
        formatted_response = f"Response: {response_text}\nSources: {sources}"
        print(formatted_response)

In [44]:
predict_response(query_text, BASIC_PROMPT_TEMPLATE)

Generated Prompt:
Human: 
Based on the following context:

The gourmet kitchen is equipped with top-of-the-line appliances and a center island for casual dining. The spacious backyard offers a peaceful retreat from city life, with a deck and garden perfect for outdoor entertaining.

---

Welcome to this elegant 3-bedroom, 2-bathroom apartment on the Upper West Side of Manhattan. The spacious living room features hardwood floors, high ceilings, and oversized windows with city views. The chef's kitchen is equipped with stainless steel appliances, granite countertops, and a breakfast

---

Situated in the historic neighborhood of Georgetown, this 4-bedroom, 3-bathroom townhouse offers a blend of modern amenities and classic charm. The open-concept living and dining area features exposed brick walls, hardwood floors, and a cozy fireplace. The gourmet kitchen is equipped with stainless

---

Answer the question : A comfortable three-bedroom house with a spacious kitchen and a cozy living ro


# Step 6: Personalizing Listing Descriptions

* LLM Augmentation: For each retrieved listing, use the LLM to augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
* Maintaining Factual Integrity: Ensure that the augmentation process enhances the appeal of the listing without altering factual information.

In [45]:
AUGMENT_PROMPT_TEMPLATE =\
"""
Based on the following context:

{context}

---

craft a response that not only answers the question {question}, but also ensures that your explanation is distinct, captivating, and customized to align with the specified preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
"""

In [46]:
predict_response(query_text, AUGMENT_PROMPT_TEMPLATE)

Generated Prompt:
Human: 
Based on the following context:

The gourmet kitchen is equipped with top-of-the-line appliances and a center island for casual dining. The spacious backyard offers a peaceful retreat from city life, with a deck and garden perfect for outdoor entertaining.

---

Welcome to this elegant 3-bedroom, 2-bathroom apartment on the Upper West Side of Manhattan. The spacious living room features hardwood floors, high ceilings, and oversized windows with city views. The chef's kitchen is equipped with stainless steel appliances, granite countertops, and a breakfast

---

Situated in the historic neighborhood of Georgetown, this 4-bedroom, 3-bathroom townhouse offers a blend of modern amenities and classic charm. The open-concept living and dining area features exposed brick walls, hardwood floors, and a cozy fireplace. The gourmet kitchen is equipped with stainless

---

craft a response that not only answers the question A comfortable three-bedroom house with a spaciou