This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 

### Install and import python dependencies

In [None]:
import os
import shutil
import pandas as pd
from typing import List
from langchain.llms import OpenAI
from langchain.schema import Document
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from fastapi.encoders import jsonable_encoder
from langchain.vectorstores.chroma import Chroma
from langchain.prompts import ChatPromptTemplate
from langchain.embeddings import OpenAIEmbeddings
from langchain.output_parsers import PydanticOutputParser
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.pydantic_v1 import BaseModel, Field, NonNegativeInt

### Define OpenAI model and API Key

In [None]:
# Environment variables
OPENAI_API_KEY = ''
os.environ['OPENAI_API_KEY']= OPENAI_API_KEY
MODEL_NAME = 'gpt-3.5-turbo'

### load LLM

In [5]:
llm = OpenAI(model_name=MODEL_NAME, temperature=0, api_key=OPENAI_API_KEY)



## Step 1: Synthetic Data Generation - Generating Real Estate Listings with an LLM

In [None]:
instruction = "Generate a CSV file with at least 10 real estate listings."
sample_listing= """
                Neighborhood: Green Oaks
                Price: $800,000
                Bedrooms: 3
                Bathrooms: 2
                House Size: 2,000 sqft

                Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

                Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.
                """

In [7]:
class RealEstateListing(BaseModel):
    """
    A real estate listing.
    
    Attributes:
    - neighborhood: str
    - price: NonNegativeInt
    - bedrooms: NonNegativeInt
    - bathrooms: NonNegativeInt
    - house_size: NonNegativeInt
    - description: str
    - neighborhood_description: str
    """
    neighborhood: str = Field(description="The neighborhood where the property is located")
    price: NonNegativeInt = Field(description="The price of the property in USD")
    bedrooms: NonNegativeInt = Field(description="The number of bedrooms in the property")
    bathrooms: NonNegativeInt = Field(description="The number of bathrooms in the property")
    house_size: NonNegativeInt = Field(description="The size of the house in square feet")
    description: str = Field(description="A description of the property")
    neighborhood_description: str = Field(description="A description of the neighborhood.")  

class ListingCollection(BaseModel):
    """
    A collection of real estate listings.
    
    Attributes:
    - listings: List[RealEstateListing]
    """
    listings: List[RealEstateListing] = Field(description="A list of real estate listings")

In [8]:
# generate parsed output
parser = PydanticOutputParser(pydantic_object=ListingCollection)

In [9]:
# printing the prompt
prompt = PromptTemplate(
    template="{instruction}\n{sample}\n{format_instructions}\n",
    input_variables=["instruction", "sample"],
    partial_variables={"format_instructions": parser.get_format_instructions},
)

query = prompt.format(
    instruction=instruction,
    sample=sample_listing,
)
print(query)

Generate a CSV file with at least 10 real estate listings.

Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bi

In [10]:
# get the response
response = llm(query)

In [11]:
# create a dataframe from the response
result = parser.parse(response)
df = pd.DataFrame(jsonable_encoder(result.listings))
df.head()

Unnamed: 0,neighborhood,price,bedrooms,bathrooms,house_size,description,neighborhood_description
0,Green Oaks,800000,3,2,2000,Welcome to this eco-friendly oasis nestled in ...,"Green Oaks is a close-knit, environmentally-co..."
1,Sunnyvale,950000,4,3,2500,"Beautiful 4-bedroom, 3-bathroom home located i...",Sunnyvale is known for its family-friendly atm...
2,Downtown Loft District,1200000,2,2,1800,Luxurious loft living in the heart of the Down...,The Downtown Loft District is a vibrant urban ...
3,Lakefront Estates,1500000,5,4,3500,Stunning lakefront property in the prestigious...,Lakefront Estates is an exclusive waterfront c...
4,Mountain View,1100000,3,2,2200,"Charming 3-bedroom, 2-bathroom home in the des...",Mountain View is a family-friendly community w...


In [None]:
# save the dataframe to a csv file
df.to_csv('generated_listings.csv', index_label = 'id')

## Step 2: Semantic Search

### Creating a Vector Database and Storing Listings
- Vector Database Setup: Initialize and configure ChromaDB or a similar vector database to store real estate listings.

- Generating and Storing Embeddings: Convert the LLM-generated listings into suitable embeddings that capture the semantic content of each listing, and store these embeddings in the vector database.

In [None]:
# Initialize and configure ChromaDB or a similar vector database to store real estate listings
CHROMA_PATH = "chroma"
CSV_PATH = "generated_listings.csv" 

embedding_function = OpenAIEmbeddings()

df = pd.read_csv(CSV_PATH)
documents = []
for index, row in df.iterrows():
    documents.append(Document(page_content=row['description'], metadata={'id': str(index)}))


# Split Text
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=100,
    length_function=len,
    add_start_index=True,
)
chunks = text_splitter.split_documents(documents)
print(f"Split {len(documents)} documents into {len(chunks)} chunks.")

if chunks:
    document = chunks[10]
    print(document.page_content)
    print(document.metadata)

# Save to Chroma
if os.path.exists(CHROMA_PATH):
    shutil.rmtree(CHROMA_PATH)

db = Chroma.from_documents(
    chunks, OpenAIEmbeddings(), persist_directory=CHROMA_PATH
)
db.persist()
print(f"Saved {len(chunks)} chunks to {CHROMA_PATH}.")

Split 10 documents into 12 chunks.
Luxury living in a gated community with resort-style amenities. This 4-bedroom, 3-bathroom home features a gourmet kitchen, private pool, and lush landscaping. Enjoy 24-hour security, clubhouse access, and proximity to top-rated schools.
{'id': '8', 'start_index': 0}
Saved 12 chunks to chroma.


### Semantic Search of Listings Based on Buyer Preferences
- Collect buyer preferences, such as the number of bedrooms, bathrooms, location, and other specific requirements from a set of questions or telling the buyer to enter their preferences in natural language. You can hard-code the buyer preferences in questions and answers, or collect them interactively however you'd like.
- Buyer Preference Parsing: Implement logic to interpret and structure these preferences for querying the vector database.

In [14]:
query_text = "A comfortable 4-bedroom house with a spacious kitchen and a beautiful living room." 

In [None]:
PROMPT_TEMPLATE =\
"""
Based on the following context:

{context}

---

Answer the question : {question}
"""

### Searching Based on Preferences
- Semantic Search Implementation: Use the structured buyer preferences to perform a semantic search on the vector database, retrieving listings that most closely match the user's requirements.
- Listing Retrieval Logic: Fine-tune the retrieval algorithm to ensure that the most relevant listings are selected based on the semantic closeness to the buyer’s preferences.

In [17]:
def predict_response(query_text, PROMPT_TEMPLATE):
    embedding_function = OpenAIEmbeddings()
    db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_function)

    # Search the DB.
    results = db.similarity_search_with_relevance_scores(query_text, k=3)
    if len(results) == 0 or results[0][1] < 0.7:
        print(f"Unable to find matching results.")
    else:
        context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
        prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
        prompt = prompt_template.format(context=context_text, question=query_text)
        print(f"Generated Prompt:\n{prompt}")
        
        model = ChatOpenAI()
        response_text = model.predict(prompt)
        sources = [doc.metadata.get("id", None) for doc, _score in results]
        formatted_response = f"Response: {response_text}\nSources: {sources}"
        print(formatted_response)

In [19]:
predict_response(query_text, PROMPT_TEMPLATE)

Generated Prompt:
Human: 
Based on the following context:

Experience luxury living in this oceanfront paradise. This 4-bedroom, 3-bathroom home features a gourmet kitchen, private beach access, and breathtaking views of the Pacific Ocean. Relax on the rooftop deck or entertain guests in the outdoor living space.

---

Beautiful 4-bedroom, 3-bathroom home located in the desirable neighborhood of Sunnyvale. This spacious property features a modern kitchen, luxurious bathrooms, and a large backyard perfect for entertaining. Enjoy the convenience of nearby shopping centers, parks, and top-rated schools.

---

Escape to this peaceful rural retreat surrounded by nature. This 3-bedroom, 2-bathroom home features a wrap-around porch, country kitchen, and expansive backyard. Enjoy gardening, birdwatching, and stargazing in this serene setting.

---

Answer the question : A comfortable 4-bedroom house with a spacious kitchen and a beautiful living room.

Response: The second option, the beautifu

## Step 3: Augmented Response Generation

### Personalizing Listing Descriptions
- LLM Augmentation: For each retrieved listing, use the LLM to augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
- Maintaining Factual Integrity: Ensure that the augmentation process enhances the appeal of the listing without altering factual information.

In [20]:
AUGMENT_PROMPT_TEMPLATE =\
"""
Based on the following context:

{context}

---

craft a response that not only answers the question {question}, but also ensures that your explanation is distinct, captivating, and customized to align with the specified preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
"""

In [21]:
predict_response(query_text, AUGMENT_PROMPT_TEMPLATE)

Generated Prompt:
Human: 
Based on the following context:

Experience luxury living in this oceanfront paradise. This 4-bedroom, 3-bathroom home features a gourmet kitchen, private beach access, and breathtaking views of the Pacific Ocean. Relax on the rooftop deck or entertain guests in the outdoor living space.

---

Beautiful 4-bedroom, 3-bathroom home located in the desirable neighborhood of Sunnyvale. This spacious property features a modern kitchen, luxurious bathrooms, and a large backyard perfect for entertaining. Enjoy the convenience of nearby shopping centers, parks, and top-rated schools.

---

Escape to this peaceful rural retreat surrounded by nature. This 3-bedroom, 2-bathroom home features a wrap-around porch, country kitchen, and expansive backyard. Enjoy gardening, birdwatching, and stargazing in this serene setting.

---

craft a response that not only answers the question A comfortable 4-bedroom house with a spacious kitchen and a beautiful living room., but also en