# Step 1: Setting Up the Python Application

- Initialize a Python Project: Create a new Python project, setting up a virtual environment and installing necessary packages like LangChain, a suitable LLM library (e.g., OpenAI's GPT), and a vector database package compatible with Python (I will use ChromaDB or LanceDB). 

In [1]:
!pip install pandas

Defaulting to user installation because normal site-packages is not writeable


In [2]:
import shutil
import pandas as pd


from langchain.chat_models import ChatOpenAI # this is the new import statement
from langchain.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, NonNegativeInt
from typing import List
from random import sample 
from langchain.document_loaders.csv_loader import CSVLoader



from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.prompts.few_shot import FewShotPromptTemplate
from fastapi.encoders import jsonable_encoder


In [3]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.chroma import Chroma
from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import ChatPromptTemplate
from langchain.llms import OpenAI




In [4]:
import os

OPENAI_API_KEY = 'voc-477078812126677339221666a2a250d1a8a2.96550613'
os.environ['OPENAI_API_KEY']= OPENAI_API_KEY
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"



In [5]:
MODEL_NAME = 'gpt-3.5-turbo'

### Load LLM

In [6]:
llm = OpenAI(model_name=MODEL_NAME, temperature=0, api_key=OPENAI_API_KEY)




# Step 2: Generating Real Estate Listings

Generate real estate listings using a Large Language Model. Generate at least 10 listings This can involve creating prompts for the LLM to produce descriptions of various properties. An example of a listing might be:



```python

Neighborhood: Green Oaks
Price: 800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.

```





You'll use these listings to populate the database for testing and development of "HomeMatch".

In [7]:
instruction = "Generate a CSV file with at least 10 real estate listings."
sample_listing= \
"""
Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.
"""

In [8]:
class RealEstateListing(BaseModel):
    """
    A real estate listing.
    
    Attributes:
    - neighborhood: str
    - price: NonNegativeInt
    - bedrooms: NonNegativeInt
    - bathrooms: NonNegativeInt
    - house_size: NonNegativeInt
    - description: str
    - neighborhood_description: str
    """
    neighborhood: str = Field(description="The neighborhood where the property is located")
    price: NonNegativeInt = Field(description="The price of the property in USD")
    bedrooms: NonNegativeInt = Field(description="The number of bedrooms in the property")
    bathrooms: NonNegativeInt = Field(description="The number of bathrooms in the property")
    house_size: NonNegativeInt = Field(description="The size of the house in square feet")
    description: str = Field(description="A description of the property")
    neighborhood_description: str = Field(description="A description of the neighborhood.")  

class ListingCollection(BaseModel):
    """
    A collection of real estate listings.
    
    Attributes:
    - listings: List[RealEstateListing]
    """
    listings: List[RealEstateListing] = Field(description="A list of real estate listings")

In [9]:
# generate output
parser = PydanticOutputParser(pydantic_object=ListingCollection)

In [10]:
# printing the prompt
prompt = PromptTemplate(
    template="{instruction}\n{sample}\n{format_instructions}\n",
    input_variables=["instruction", "sample"],
    partial_variables={"format_instructions": parser.get_format_instructions},
)

query = prompt.format(
    instruction=instruction,
    sample=sample_listing,
)
print(query)

Generate a CSV file with at least 10 real estate listings.

Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bi

In [11]:
response = llm(query)

In [12]:
# create a dataframe from the response
result = parser.parse(response)
df = pd.DataFrame(jsonable_encoder(result.listings))
df.head()
df

Unnamed: 0,neighborhood,price,bedrooms,bathrooms,house_size,description,neighborhood_description
0,Green Oaks,800000,3,2,2000,Welcome to this eco-friendly oasis nestled in ...,"Green Oaks is a close-knit, environmentally-co..."
1,Sunnyvale,950000,4,3,2500,Located in the desirable neighborhood of Sunny...,"Sunnyvale is known for its top-rated schools, ..."
2,Downtown Loft District,1200000,2,2,1800,Welcome to urban living at its finest in the D...,The Downtown Loft District is a vibrant and ec...
3,Lakefront Estates,1500000,5,4,4000,Live the lakefront lifestyle in this stunning ...,Lakefront Estates is an exclusive waterfront c...
4,Mountain View,1100000,3,2,2200,"Nestled in the scenic hills of Mountain View, ...",Mountain View is known for its picturesque lan...
5,Historic Old Town,750000,2,1,1500,Step back in time with this charming 2-bedroom...,Historic Old Town is a vibrant and historic ne...
6,Waterfront Condos,1000000,2,2,1800,Live the waterfront lifestyle in these luxurio...,Waterfront Condos offer a luxurious waterfront...
7,Suburban Meadows,850000,4,3,2300,Escape to the tranquility of Suburban Meadows ...,Suburban Meadows is a family-friendly neighbor...
8,Downtown Highrise,1300000,3,2,2000,Experience luxury living in the heart of the c...,Downtown Highrise offers a vibrant urban livin...
9,Gated Community,1800000,5,4,3500,"Live in luxury in this exclusive 5-bedroom, 4-...",The Gated Community offers a secure and privat...


In [13]:
# save the dataframe to a csv file
df.to_csv('listings.csv', index_label = 'id')

# Step 3: Storing Listings in a Vector Database

- Vector Database Setup: Initialize and configure ChromaDB database to store real estate listings.



- Generating and Storing Embeddings: Convert the LLM-generated listings into suitable embeddings that capture the semantic content of each listing, and store these embeddings in the vector database.


In [14]:
# Initialize and configure ChromaDB or a similar vector database to store real estate listings
CHROMA_PATH = "chroma"
CSV_PATH = "listings.csv" 

embedding_function = OpenAIEmbeddings()

df = pd.read_csv(CSV_PATH)
documents = []
for index, row in df.iterrows():
    documents.append(Document(page_content=row['description'], metadata={'id': str(index)}))


# Split Text
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=100,
    length_function=len,
    add_start_index=True,
)
chunks = text_splitter.split_documents(documents)
print(f"Split {len(documents)} documents into {len(chunks)} chunks.")

if chunks:
    document = chunks[10]
    print(document.page_content)
    print(document.metadata)

# Save to Chroma
if os.path.exists(CHROMA_PATH):
    shutil.rmtree(CHROMA_PATH)

db = Chroma.from_documents(
    chunks, OpenAIEmbeddings(), persist_directory=CHROMA_PATH
)
db.persist()
print(f"Saved {len(chunks)} chunks to {CHROMA_PATH}.")

Split 10 documents into 21 chunks.
windows, allowing natural light to fill the space. The gourmet kitchen is a chef's dream, with high-end appliances and granite countertops. Relax in the private backyard oasis with a hot tub and outdoor dining area. Experience the beauty of Mountain View living in this peaceful retreat.
{'id': '4', 'start_index': 202}
Saved 21 chunks to chroma.


# Step 4: Building the User Preference Interface

- Collect buyer preferences, such as the number of bedrooms, bathrooms, location, and other specific requirements from a set of questions or telling the buyer to enter their preferences in natural language. 
Example:

```python
questions = [   
                "How big do you want your house to be?" 
                "What are 3 most important things for you in choosing this property?", 
                "Which amenities would you like?", 
                "Which transportation options are important to you?",
                "How urban do you want your neighborhood to be?",   
            ]
answers = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
]
```

    
- Buyer Preference Parsing: Implement logic to interpret and structure these preferences for querying the vector database.


In [15]:
PROMPT_TEMPLATE =\
"""
Based on the following context:

{context}

---

Answer the question : {question}
"""

In [16]:
query_text_1 = "A house close to hell" 

In [17]:
query_text_2 = "A house close to the beach" 

# Step 5: Searching Based on Preferences

- Semantic Search Implementation: Use the structured buyer preferences to perform a semantic search on the vector database, retrieving listings that most closely match the user's requirements.
- Listing Retrieval Logic: Fine-tune the retrieval algorithm to ensure that the most relevant listings are selected based on the semantic closeness to the buyer’s preferences.

In [18]:
def predict_response(query_text, PROMPT_TEMPLATE):
    embedding_function = OpenAIEmbeddings()
    db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_function)

    # Search the DB.
    results = db.similarity_search_with_relevance_scores(query_text, k=3)
    if len(results) == 0 or results[0][1] < 0.7:
        print(f"Unable to find matching results.")
    else:
        context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
        prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
        prompt = prompt_template.format(context=context_text, question=query_text)
        print(f"Generated Prompt:\n{prompt}")
        
        model = ChatOpenAI()
        response_text = model.predict(prompt)
        sources = [doc.metadata.get("id", None) for doc, _score in results]
        formatted_response = f"Response: {response_text}\nSources: {sources}"
        print(formatted_response)

In [19]:
predict_response(query_text_1, PROMPT_TEMPLATE)


Unable to find matching results.


In [20]:
predict_response(query_text_2, PROMPT_TEMPLATE)


Generated Prompt:
Human: 
Based on the following context:

Live the waterfront lifestyle in these luxurious 2-bedroom, 2-bathroom condos. Each unit features floor-to-ceiling windows, modern finishes, and a private balcony with stunning water views. The gourmet kitchen is perfect for entertaining, with high-end appliances and a spacious island. Enjoy

---

Live the lakefront lifestyle in this stunning 5-bedroom, 4-bathroom estate. This custom-built home features panoramic views of the lake, a gourmet kitchen with top-of-the-line appliances, and a luxurious master suite with a spa-like bathroom. The outdoor living space is perfect for entertaining,

---

master suite with a spa-like bathroom. The outdoor living space is perfect for entertaining, with a covered patio, infinity pool, and private dock. Experience luxury living at its finest in Lakefront Estates.

---

Answer the question : A house close to the beach

Response: The house close to the beach is a stunning 5-bedroom, 4-bathroom

# Step 6: Personalizing Listing Descriptions

- LLM Augmentation: For each retrieved listing, use the LLM to augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
- Maintaining Factual Integrity: Ensure that the augmentation process enhances the appeal of the listing without altering factual information.

In [21]:
AUGMENT_PROMPT_TEMPLATE =\
"""
Based on the following context:

{context}

---

craft a response that not only answers the question {question}, but also ensures that your explanation is distinct, captivating, and customized to align with the specified preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
"""

In [22]:
predict_response(query_text_1, AUGMENT_PROMPT_TEMPLATE)

Unable to find matching results.


In [23]:
predict_response(query_text_2, AUGMENT_PROMPT_TEMPLATE)

Generated Prompt:
Human: 
Based on the following context:

Live the waterfront lifestyle in these luxurious 2-bedroom, 2-bathroom condos. Each unit features floor-to-ceiling windows, modern finishes, and a private balcony with stunning water views. The gourmet kitchen is perfect for entertaining, with high-end appliances and a spacious island. Enjoy

---

Live the lakefront lifestyle in this stunning 5-bedroom, 4-bathroom estate. This custom-built home features panoramic views of the lake, a gourmet kitchen with top-of-the-line appliances, and a luxurious master suite with a spa-like bathroom. The outdoor living space is perfect for entertaining,

---

master suite with a spa-like bathroom. The outdoor living space is perfect for entertaining, with a covered patio, infinity pool, and private dock. Experience luxury living at its finest in Lakefront Estates.

---

craft a response that not only answers the question A house close to the beach, but also ensures that your explanation is di


# Step 7: Tests performed

- As seen above, two tests were performed with different user prompts, a realistic one (A house close to the beach), and another to force a non-matching (A house close to hell)
- The results were satisfactory, as a house close to the beach was recommended, and furtunately, there is no house close to the hell..yet.
