This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 

# Step 1: Setting Up the Python Application

- Initialize a Python Project: Create a new Python project, setting up a virtual environment and installing necessary packages like LangChain, a suitable LLM library (e.g., OpenAI's GPT), and a vector database package compatible with Python (e.g., ChromaDB or LanceDB). If you don't wish to create your files from scratch, starter files are available in the workspace on the next page as an application skeleton.


In [1]:
import pandas as pd

from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain import LLMChain
from langchain.chains.question_answering import load_qa_chain

In [2]:
openai_api_key = "YOUR API KEY"

# Step 2: Generating Real Estate Listings

- Generate real estate listings using a Large Language Model. Generate at least 10 listings This can involve creating prompts for the LLM to produce descriptions of various properties. An example of a listing might be:
    
    ```
    Neighborhood: Green Oaks
    Price: $800,000
    Bedrooms: 3
    Bathrooms: 2
    House Size: 2,000 sqft

    Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

    Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.
    ```
    

- You'll use these listings to populate the database for testing and development of "HomeMatch".

In [3]:
completion_model_name = "gpt-3.5-turbo-instruct"
temperature = 0.9
completion_llm = OpenAI(model_name=completion_model_name, temperature=temperature, max_tokens=256, openai_api_key=openai_api_key)

In [4]:
prompt_format = """
You are a real estate agent. 
You have to return only real state listings using following format.

Format:
Neighborhood: 
Price:
Bedrooms:
Bathrooms:
House Size:
Description:
Neighborhood Description:

Examlpe:
Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft
Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.
Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.

Then real state listings in {} is 
"""

In [5]:
def generate_real_estate_data(prompt):
    response = completion_llm(prompt)
    # print(response)
    data = []
    if(response.split("\n")[0]==''):
        for line in response.split("\n")[1:]:
            data.append(line.split(":")[1])
    else:
        for line in response.split("\n"):
            data.append(line.split(":")[1])
    
    return data

In [6]:
location = "California"
num_listings = 10
generated_data = []

prompt = prompt_format.format(location)

for i in range(num_listings):
    data = generate_real_estate_data(prompt)
    generated_data.append(data)

In [7]:
columns = ["Neighborhood","Price","Bedrooms","Bathrooms","House Size","Description","Neighborhood Description"]

df = pd.DataFrame(generated_data, columns=columns)

In [8]:
df

Unnamed: 0,Neighborhood,Price,Bedrooms,Bathrooms,House Size,Description,Neighborhood Description
0,Pacific Palisades,"$3,500,000",4,3.5,"3,500 sqft",This stunning modern home in the sought-after...,Pacific Palisades is a premier coastal commun...
1,Hollywood Hills,"$2,500,000",5,4.0,"3,500 sqft",Experience luxury living in this stunning 5-b...,The Hollywood Hills is a highly desirable nei...
2,Ladera Ranch,"$1,200,000",4,3.0,"3,500 sqft","This stunning 4-bedroom, 3-bathroom home is l...",Ladera Ranch is a family-friendly community l...
3,Manhattan Beach,"$2,500,000",5,3.5,"3,500 sqft","This stunning 5-bedroom, 3.5-bathroom home in...",Manhattan Beach is a highly sought-after coas...
4,Santa Monica,"$1,500,000",4,3.0,"2,500 sqft",This stunning Mediterranean-style home in San...,Santa Monica is a desirable beachfront city k...
5,West Hollywood,"$1,500,000",4,3.0,"2,500 sqft",Live in luxury in the heart of West Hollywood...,West Hollywood is known for its vibrant night...
6,Beverly Hills,"$5,500,000",5,6.0,"4,200 sqft",Located in the prestigious Beverly Hills neig...,Beverly Hills is known for its upscale shoppi...
7,Los Feliz,"$1,200,000",4,3.0,"2,500 sqft",Located in the highly desirable neighborhood ...,Los Feliz is known for its trendy restaurants...
8,Beverly Hills,"$5,500,000",5,6.0,"5,000 sqft",Welcome to luxurious living in Beverly Hills....,Beverly Hills is known for its luxurious life...
9,Beverly Hills,"$5,000,000",6,8.0,"6,000 sqft",Experience luxury living in this stunning 6-b...,Beverly Hills is the epitome of luxury living...


In [9]:
df.to_csv("listings.csv")

# Step 3: Storing Listings in a Vector Database

- Vector Database Setup: Initialize and configure ChromaDB or a similar vector database to store real estate listings.

- Generating and Storing Embeddings: Convert the LLM-generated listings into suitable embeddings that capture the semantic content of each listing, and store these embeddings in the vector database.


In [10]:
loader = CSVLoader(file_path='./listings.csv')
docs = loader.load()

# print(docs)

splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
split_docs = splitter.split_documents(docs)

embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

db = Chroma.from_documents(split_docs, embeddings)

# Step 4: Building the User Preference Interface

- Collect buyer preferences, such as the number of bedrooms, bathrooms, location, and other specific requirements from a set of questions or telling the buyer to enter their preferences in natural language. You can hard-code the buyer preferences in questions and answers, or collect them interactively however you'd like, example:
    ```
    questions = [   
                "How big do you want your house to be?" 
                "What are 3 most important things for you in choosing this property?", 
                "Which amenities would you like?", 
                "Which transportation options are important to you?",
                "How urban do you want your neighborhood to be?",   
            ]
    answers = [
        "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
        "A quiet neighborhood, good local schools, and convenient shopping options.",
        "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
        "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
        "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
    ```
- Buyer Preference Parsing: Implement logic to interpret and structure these preferences for querying the vector database.


In [11]:
prompt_format = """
You are a real estate buyer interetsed in real estates in {}.
Answer the following questions. Return answer only.

Questions:
How big do you want your house to be?
What are 3 most important things for you in choosing this property?
Which amenities would you like?
Which transportation options are important to you?
How urban do you want your neighborhood to be?

Examlpe:
A comfortable three-bedroom house with a spacious kitchen and a cozy living room.
A quiet neighborhood, good local schools, and convenient shopping options.
A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.
Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.
A balance between suburban tranquility and access to urban amenities like restaurants and theaters.

Answers:
"""

In [12]:
def generate_user_preference(prompt):
    response = completion_llm(prompt)
    return response

In [13]:
user_preference = generate_user_preference(prompt_format.format(location))
print(user_preference)


I am interested in a comfortable three-bedroom house with a spacious kitchen and a cozy living room.
The three most important things for me in choosing this property are a quiet neighborhood, good local schools, and convenient shopping options.
For amenities, I would like a backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.
In terms of transportation options, I prefer easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.
I am looking for a balance between suburban tranquility and access to urban amenities like restaurants and theaters for my neighborhood in California.


# Step 5: Searching Based on Preferences

- Semantic Search Implementation: Use the structured buyer preferences to perform a semantic search on the vector database, retrieving listings that most closely match the user's requirements.

- Listing Retrieval Logic: Fine-tune the retrieval algorithm to ensure that the most relevant listings are selected based on the semantic closeness to the buyer’s preferences.

# Step 6: Personalizing Listing Descriptions

- LLM Augmentation: For each retrieved listing, use the LLM to augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.

- Maintaining Factual Integrity: Ensure that the augmentation process enhances the appeal of the listing without altering factual information.


In [14]:
model_name = 'gpt-3.5-turbo'
llm = ChatOpenAI(model_name=model_name, temperature=0, max_tokens=2000, openai_api_key=openai_api_key)

In [15]:
def get_personalized_descriptions(user_preference):
    query = user_preference + "Show me descriptions and three exmaples from database."

    use_chain_helper = False
    if use_chain_helper:
        rag = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=db.as_retriever())
        print(rag.run(query))
    else:
        similar_docs = db.similarity_search(query, k=5)
        prompt = PromptTemplate(
            template="{query}\nContext: {context}",
            input_variables=["query", "context"],
        )
        chain = load_qa_chain(llm, prompt = prompt, chain_type="stuff")
        print(chain.run(input_documents=similar_docs, query = query))

In [16]:
get_personalized_descriptions(user_preference)

Based on your preferences, here are three examples from the database that may interest you:

1. Neighborhood: Manhattan Beach
Price: $2,500,000
Bedrooms: 5
Bathrooms: 3.5
House Size: 3,500 sqft
Description: This stunning 5-bedroom, 3.5-bathroom home in Manhattan Beach embraces the California beach lifestyle. With high-end finishes and plenty of natural light, this home exudes luxury and comfort. The spacious backyard features a pool, hot tub, and outdoor kitchen, perfect for entertaining. Enjoy ocean views from the rooftop deck or take a short walk to the nearby beach and shops. Live your best life in this beachside haven.

2. Neighborhood: Santa Monica
Price: $1,500,000
Bedrooms: 4
Bathrooms: 3
House Size: 2,500 sqft
Description: This stunning Mediterranean-style home in Santa Monica offers the perfect blend of luxury and comfort. Boasting 4 bedrooms and 3 bathrooms, this spacious home features high-end finishes, hardwood floors, and a gourmet kitchen with top-of-the-line appliances. 

# Step 7: Deliverables and Testing

- Test your "HomeMatch" application and make sure it meets all of the [requirements in the rubric](https://review.udacity.com/#!/rubrics/5403/view). Your project code will be run when it's assessed. Enter different "buyer preferences" and ensure it works.

- Jupyter Notebook/Python Program: Compile the application code in a Jupyter notebook or a standalone Python program. Ensure the code is well-commented and logically structured.

- Example Outputs: Include example outputs showcasing how user preferences are processed and how the application generates personalized listing descriptions. You can include these in comments in your application or in a Jupyter notebook that's saved with outputs.


In [17]:
user_preference = generate_user_preference(prompt_format.format(location))
print(user_preference)

The size of the house does not matter, as long as it is comfortable for my needs.
A convenient location, good local schools, and a safe neighborhood are important to me.
A backyard for gardening, a modern kitchen, and a garage for storage are amenities I would like.
Proximity to a major highway, bike-friendly roads, and access to public transportation are important transportation options for me.
I would like a suburban neighborhood with access to urban amenities such as restaurants and shopping.


In [18]:
get_personalized_descriptions(user_preference)

Based on the descriptions provided, here are three examples that align with your preferences:

1. Neighborhood: Santa Monica
Price: $1,500,000
Bedrooms: 4
Bathrooms: 3
House Size: 2,500 sqft
Description: This Mediterranean-style home in Santa Monica offers luxury and comfort with high-end finishes, a gourmet kitchen, and a backyard oasis with a pool and hot tub. Located minutes away from the beach, shopping, and dining, this home provides the ultimate California dream lifestyle.

2. Neighborhood: Los Feliz
Price: $1,200,000
Bedrooms: 4
Bathrooms: 3
House Size: 2,500 sqft
Description: This modern home in Los Feliz features top-of-the-line appliances, a private backyard with a pool, and a master suite for relaxation. With trendy restaurants, boutique shops, and historic attractions nearby, this neighborhood offers a vibrant lifestyle.

3. Neighborhood: Manhattan Beach
Price: $2,500,000
Bedrooms: 5
Bathrooms: 3.5
House Size: 3,500 sqft
Description: This beachside home in Manhattan Beach e

# Step 8: Project Submission

- Generated Listings: Include a file that contains your synthetically generated real estate listings. Name this file "listings"

- Project Documentation: Include a readme file or an accompanying document explaining the functionality, how to run the code, and any prerequisites or dependencies.

- Code Submission: Submit the Jupyter Notebook or Python program on the "Project Submission Page" that follows the workspace page.
