In [1]:
!pip install -r /content/requirements.txt

Collecting tiktoken==0.4.0 (from -r /content/requirements.txt (line 11))
  Using cached tiktoken-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.2 kB)
Collecting langsmith<0.1.0,>=0.0.43 (from langchain==0.0.316->-r /content/requirements.txt (line 2))
  Using cached langsmith-0.0.92-py3-none-any.whl.metadata (9.9 kB)
INFO: pip is looking at multiple versions of langchain-openai to determine which version is compatible with other requirements. This could take a while.
Collecting langchain-openai (from -r /content/requirements.txt (line 4))
  Downloading langchain_openai-0.1.20-py3-none-any.whl.metadata (2.6 kB)
  Downloading langchain_openai-0.1.19-py3-none-any.whl.metadata (2.6 kB)
  Downloading langchain_openai-0.1.17-py3-none-any.whl.metadata (2.5 kB)
  Downloading langchain_openai-0.1.16-py3-none-any.whl.metadata (2.5 kB)
  Downloading langchain_openai-0.1.15-py3-none-any.whl.metadata (2.5 kB)
  Downloading langchain_openai-0.1.14-py3-none-any.whl.metada

In [2]:
# !pip install langchain==0.0.316
# !pip install -U langchain-openai
# !pip install openai==0.28.1

In [3]:
# Adding OPENAI_API_KEY

import os
OPENAI_API_KEY = ""
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

In [4]:
# imports
import os
import pandas as pd
import shutil
from dataclasses import dataclass

from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.evaluation import load_evaluator
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
from langchain.vectorstores.chroma import Chroma
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

from typing import List
from langchain.output_parsers import PydanticOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field, NonNegativeInt
from langchain.prompts import PromptTemplate
from fastapi.encoders import jsonable_encoder

In [5]:
# Load LLM
MODEL_NAME = 'gpt-3.5-turbo'
llm = OpenAI(model_name=MODEL_NAME, temperature=0, api_key=OPENAI_API_KEY)



In [6]:
# Step 1: Synthetic Data Generation - Creating Real Estate Listings Using an LLM

instruction = "Produce a CSV file with no fewer than 10 real estate listings."
sample_listing= \
"""
Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Discover this eco-friendly haven situated in the heart of Green Oaks. This inviting 3-bedroom, 2-bathroom residence features energy-saving amenities including solar panels and excellent insulation. Sunlight illuminates the living areas, highlighting the attractive hardwood floors and environmentally-conscious finishes. The open-plan kitchen and dining space extend to a generous backyard with a vegetable garden, ideal for a family with a focus on sustainability. Experience stylish, eco-friendly living in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, eco-conscious neighborhood with access to organic grocery stores, community gardens, and cycling paths. Enjoy a walk in the nearby Green Oaks Park or visit the charming Green Bean Cafe for a coffee. With convenient access to public transportation and bike lanes, commuting is effortless.
"""

In [7]:
class RealEstateListing(BaseModel):
    """
    Represents a real estate listing.

    Attributes:
    - neighborhood: str
    - price: NonNegativeInt
    - bedrooms: NonNegativeInt
    - bathrooms: NonNegativeInt
    - house_size: NonNegativeInt
    - description: str
    - neighborhood_description: str
    """
    neighborhood: str = Field(description="The area where the property is situated")
    price: NonNegativeInt = Field(description="The property's price in USD")
    bedrooms: NonNegativeInt = Field(description="The total number of bedrooms in the property")
    bathrooms: NonNegativeInt = Field(description="The total number of bathrooms in the property")
    house_size: NonNegativeInt = Field(description="The property's size measured in square feet")
    description: str = Field(description="Detailed description of the property")
    neighborhood_description: str = Field(description="Description of the surrounding neighborhood")

class ListingCollection(BaseModel):
    """
    A group of real estate listings.

    Attributes:
    - listings: List[RealEstateListing]
    """
    listings: List[RealEstateListing] = Field(description="A collection of real estate listings")


In [8]:
# Generate parsed output
parser = PydanticOutputParser(pydantic_object=ListingCollection)

In [9]:
# Output the formatted prompt
prompt = PromptTemplate(
    template="{instruction}\n{sample}\n{format_instructions}\n",
    input_variables=["instruction", "sample"],
    partial_variables={"format_instructions": parser.get_format_instructions},
)

query = prompt.format(
    instruction=instruction,
    sample=sample_listing,
)
print(query)


Produce a CSV file with no fewer than 10 real estate listings.

Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Discover this eco-friendly haven situated in the heart of Green Oaks. This inviting 3-bedroom, 2-bathroom residence features energy-saving amenities including solar panels and excellent insulation. Sunlight illuminates the living areas, highlighting the attractive hardwood floors and environmentally-conscious finishes. The open-plan kitchen and dining space extend to a generous backyard with a vegetable garden, ideal for a family with a focus on sustainability. Experience stylish, eco-friendly living in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, eco-conscious neighborhood with access to organic grocery stores, community gardens, and cycling paths. Enjoy a walk in the nearby Green Oaks Park or visit the charming Green Bean Cafe for a coffee. With convenient access to public transportation a

In [10]:
# Obtain the response
response = llm(query)

In [11]:
# Convert the response into a DataFrame
result = parser.parse(response)
df = pd.DataFrame(jsonable_encoder(result.listings))
df.head()

Unnamed: 0,neighborhood,price,bedrooms,bathrooms,house_size,description,neighborhood_description
0,Green Oaks,800000,3,2,2000,Discover this eco-friendly haven situated in t...,"Green Oaks is a close-knit, eco-conscious neig..."
1,Sunnyvale,950000,4,3,2500,Located in the desirable neighborhood of Sunny...,"Sunnyvale is known for its top-rated schools, ..."
2,Downtown LA,1200000,2,2,1800,Experience luxury living in the heart of Downt...,Downtown LA offers a vibrant urban lifestyle w...
3,Brooklyn Heights,1500000,5,4,3000,Situated in the historic neighborhood of Brook...,Brooklyn Heights is known for its tree-lined s...
4,Pacific Palisades,2500000,6,5,4000,Live the California dream in this luxurious 6-...,Pacific Palisades offers a relaxed coastal lif...


In [12]:
# Export the DataFrame to a CSV file
df.to_csv('/content/output/real_estate_listings.csv', index_label='id')


In [13]:
"""
Step 2: Semantic Search
Creating a Vector Database and Storing Listings
1. Vector Database Setup: Set up and configure ChromaDB or another vector database to store the real estate listings.
2. Generating and Storing Embeddings: Transform the listings generated by the LLM into embeddings that accurately represent the semantic content of each listing, and save these embeddings in the vector database.
"""

# Initialize and configure ChromaDB or a similar vector database to store real estate listings
CHROMA_PATH = "chroma"
CSV_PATH = "/content/output/real_estate_listings.csv"

embedding_function = OpenAIEmbeddings()

df = pd.read_csv(CSV_PATH)
documents = [Document(page_content=row['description'], metadata={'id': str(index)}) for index, row in df.iterrows()]

# Split Text
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=100,
    length_function=len,
    add_start_index=True,
)
chunks = text_splitter.split_documents(documents)
print(f"Divided {len(documents)} documents into {len(chunks)} chunks.")

if chunks:
    sample_chunk = chunks[10]
    print(sample_chunk.page_content)
    print(sample_chunk.metadata)

# Save to Chroma
if os.path.exists(CHROMA_PATH):
    shutil.rmtree(CHROMA_PATH)

db = Chroma.from_documents(
    chunks, OpenAIEmbeddings(), persist_directory=CHROMA_PATH
)
db.persist()
print(f"Stored {len(chunks)} chunks in {CHROMA_PATH}.")



Divided 10 documents into 21 chunks.
gourmet kitchen features a large island, high-end appliances, and a breakfast nook. Retreat to the master suite with a fireplace, spa-like bathroom, and private balcony. Entertain guests in the backyard oasis with a pool, spa, and outdoor kitchen.
{'id': '4', 'start_index': 199}
Stored 21 chunks in chroma.


In [14]:
"""
Semantic Search for Listings Based on Buyer Preferences

Gather buyer preferences, including details like the number of bedrooms, bathrooms, location, and any other specific needs, either through a series of questions or by allowing buyers to enter their preferences in natural language. Preferences can be either hard-coded as questions and answers or collected interactively, depending on your approach.
Buyer Preference Parsing: Develop logic to interpret and organize these preferences to query the vector database effectively.
"""


'\nSemantic Search for Listings Based on Buyer Preferences\n\nGather buyer preferences, including details like the number of bedrooms, bathrooms, location, and any other specific needs, either through a series of questions or by allowing buyers to enter their preferences in natural language. Preferences can be either hard-coded as questions and answers or collected interactively, depending on your approach.\nBuyer Preference Parsing: Develop logic to interpret and organize these preferences to query the vector database effectively.\n'

In [15]:
query_text = "A cozy 4-bedroom home featuring a large kitchen and an attractive living room."

In [16]:
PROMPT_TEMPLATE =\
"""
Given the context below:
{context}
---
Respond to the question: {question}
"""

In [17]:
"""
Searching Based on Preferences

Semantic Search Execution: Utilize the organized buyer preferences to conduct a semantic search within the vector database, fetching listings that best align with the user's criteria.
Listing Retrieval Strategy: Refine the retrieval process to ensure the most relevant listings are identified, based on their semantic similarity to the buyer’s preferences.
"""

def generate_response(query_text, PROMPT_TEMPLATE):
    embedding_function = OpenAIEmbeddings()
    db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_function)

    # Perform the search in the database.
    results = db.similarity_search_with_relevance_scores(query_text, k=3)
    if not results or results[0][1] < 0.7:
        print("No matching results found.")
    else:
        context_text = "\n\n---\n\n".join([doc.page_content for doc, _ in results])
        prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
        prompt = prompt_template.format(context=context_text, question=query_text)
        print(f"Generated Prompt:\n{prompt}")

        model = ChatOpenAI()
        response_text = model.predict(prompt)
        sources = [doc.metadata.get("id", None) for doc, _ in results]
        formatted_response = f"Response: {response_text}\nSources: {sources}"
        print(formatted_response)


In [18]:
generate_response(query_text, PROMPT_TEMPLATE)

Generated Prompt:
Human: 
Given the context below:
Located in the desirable neighborhood of Sunnyvale, this 4-bedroom, 3-bathroom home offers spacious living areas and a beautifully landscaped backyard. The modern kitchen features stainless steel appliances and granite countertops, perfect for entertaining guests. Relax in the luxurious master suite

---

Discover historic charm and modern luxury in this 4-bedroom, 4-bathroom townhouse in Georgetown. The elegant living spaces feature hardwood floors, crown moldings, and custom built-ins. The gourmet kitchen boasts high-end appliances, marble countertops, and a center island. Retreat to the master

---

The gourmet kitchen features top-of-the-line appliances, a large island, and a breakfast nook. Relax in the master suite with a fireplace, spa-like bathroom, and private balcony. Entertain guests in the backyard oasis with a pool, spa, and outdoor kitchen.
---
Respond to the question: A cozy 4-bedroom home featuring a large kitchen and a

In [19]:
"""
Step 3: Enhanced Response Generation

Customizing Listing Descriptions
LLM Enhancement: For every retrieved listing, use the LLM to refine the description, making it more appealing by highlighting features that match the buyer’s preferences. This involves subtly emphasizing the property's aspects that align with the buyer’s interests.
Preserving Factual Accuracy: Ensure that the enhancement process boosts the listing's attractiveness while preserving the accuracy of the factual details.
"""

AUGMENT_PROMPT_TEMPLATE =\
"""
Given the context below:

{context}

---

Create a response that not only addresses the question {question} but also ensures that the explanation is unique, engaging, and tailored to fit the specified preferences. This should include subtly highlighting aspects of the property that align with the buyer’s interests.
"""


In [20]:
generate_response(query_text, AUGMENT_PROMPT_TEMPLATE)


Generated Prompt:
Human: 
Given the context below:

Located in the desirable neighborhood of Sunnyvale, this 4-bedroom, 3-bathroom home offers spacious living areas and a beautifully landscaped backyard. The modern kitchen features stainless steel appliances and granite countertops, perfect for entertaining guests. Relax in the luxurious master suite

---

Discover historic charm and modern luxury in this 4-bedroom, 4-bathroom townhouse in Georgetown. The elegant living spaces feature hardwood floors, crown moldings, and custom built-ins. The gourmet kitchen boasts high-end appliances, marble countertops, and a center island. Retreat to the master

---

The gourmet kitchen features top-of-the-line appliances, a large island, and a breakfast nook. Relax in the master suite with a fireplace, spa-like bathroom, and private balcony. Entertain guests in the backyard oasis with a pool, spa, and outdoor kitchen.

---

Create a response that not only addresses the question A cozy 4-bedroom hom