This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 


# Project: Personalized Real Estate Agent

An example of AI agent built by Python, Langchain, Vector Database and OpenAI's API.


## Step 1: Synthetic Data Generation

Generate a list of at least 10 real estates using LLM, 
which will be served as the data source to store into the vector database.


In [1]:
# Import Python Packages

from langchain.llms import OpenAI
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, NonNegativeInt
from typing import List
from random import sample 
from langchain.document_loaders.csv_loader import CSVLoader 




In [2]:

# Step 1.1: Initialize OpenAI

from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
import os

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

model_name = "gpt-3.5-turbo"
llm = ChatOpenAI(model=model_name, temperature=0, api_key=OPENAI_API_KEY)
#llm = OpenAI(model_name = model_name, temperature=0.0)



In [3]:

# Step 1.2: Define data model for parser

class RealEstate(BaseModel):
    title: str = Field(description="The name or title of a house")
    bedroom: int = Field(description="Number of bedroom for a house")
    bathroom: int = Field(description="Number of bathroom for a house")
    garage: int = Field(description="Number of garage for a house")
    price_usd: int = Field(description="The price of a house in USD")
    size_sqft: int = Field(description="The size of a house in square feet") 
    description: str = Field(description="The 200-word description of a house")
    neighborhood: str = Field(description="The brief summary or name of the neighborhood for the house")
    neighborhood_details: str = Field(description="The 200-word description of the neighborhood")


parser = PydanticOutputParser(pydantic_object=RealEstate)
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"title": {"description": "The name or title of a house", "title": "Title", "type": "string"}, "bedroom": {"description": "Number of bedroom for a house", "title": "Bedroom", "type": "integer"}, "bathroom": {"description": "Number of bathroom for a house", "title": "Bathroom", "type": "integer"}, "garage": {"description": "Number of garage for a house", "title": "Garage", "type": "integer"}, "price_usd": {"description": "The price of a house in USD", "title": "Price Usd", "type": "integer"}, "size_sqft": {"description": "The siz

In [7]:

# Step 1.3: Ask LLM to generate a list of real estate 

from langchain_core.prompts import ChatPromptTemplate

question = """
Generate 11 houses which are currently for sale in the US market, 
earch house should include these properties: 
title, 
number of bedrooms,
number of bathrooms,
number of garadges
price (integer in USD), 
size (integer in squre feet), 
description of the house with at least 200 words, 
neighborhood, 
description of neighborhood with at least 100 words
"""

structured_llm = llm.with_structured_output(RealEstate, method="json_mode")
listings = structured_llm.invoke(question + "\n\n" + parser.get_format_instructions())

print("Datasource for real estate is ready")

In [9]:
listings["houses"]

[{'title': 'Modern Luxury Home',
  'bedroom': 4,
  'bathroom': 3,
  'garage': 2,
  'price_usd': 1000000,
  'size_sqft': 3000,
  'description': 'This modern luxury home features 4 spacious bedrooms, 3 bathrooms, a gourmet kitchen, and a stunning backyard with a pool. The open floor plan and high ceilings create a sense of grandeur. The master suite includes a walk-in closet and a spa-like bathroom. The outdoor space is perfect for entertaining with a built-in BBQ area and fire pit. Located in a prestigious neighborhood, this home offers the ultimate in luxury living.',
  'neighborhood': 'Prestigious Neighborhood',
  'neighborhood_details': 'The neighborhood is known for its upscale homes, tree-lined streets, and top-rated schools. Residents enjoy easy access to parks, shopping centers, and fine dining establishments. With a strong sense of community and a low crime rate, this neighborhood is perfect for families looking for a safe and welcoming environment.'},
 {'title': 'Cozy Cottage',

In [14]:
import pandas as pd

df = pd.DataFrame.from_dict(listings["houses"])
df.to_csv("Listings.csv")
