This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 

# Project Introduction
Imagine you're a talented developer at "Future Homes Realty", a forward-thinking real estate company. In an industry where personalization is key to customer satisfaction, your company wants to revolutionize how clients interact with real estate listings. The goal is to create a personalized experience for each buyer, making the property search process more engaging and tailored to individual preferences.

## The Challenge

Your task is to develop an innovative application named "HomeMatch". This application leverages large language models (LLMs) and vector databases to transform standard real estate listings into personalized narratives that resonate with potential buyers' unique preferences and needs.

## Creating The Agent

### 1. Step - Generating Real Estate Listings

Let's create a list of houses using LLM and save it to a JSON file

In [1]:
from src.provider import LLMProvider

llm = LLMProvider()

  self.client = ChatOpenAI(**self.configs, **extra_configs)


In [2]:
from src.db import generate_sample_data
import json

# Generate 15 real estate listings and save them to a JSON file
data = generate_sample_data.create_real_estate_listings(llm, num_listings=20)
print(f"Generated a total of {len(data)} real estate listings.")

# Save the generated data to a JSON file
json_path = "./real_estate_listings.json"
json.dump(data, open(json_path,"w"), indent=4)


===== Prompt =====
You are a helpful assistant that generates realistic real estate listings. The listings should include various details about the property and neighborhood and should not be repetitive.
Generate a total of 20 real estate listings.
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"$defs": {"RealEstateListing": {"description": "A Pydantic model representing a real estate listing. ", "properties": {"neighborhood": {"description": "The neighborhood name of the listing", "title": "Neighborhood", "type": "string"}, "price": {"description": "The price of the listing, in USD forma

  response = self.client(messages)


Response: {
    "results": [
        {
            "neighborhood": "Willow Creek",
            "price": "$750,000",
            "bedrooms": 4,
            "bathrooms": 3,
            "house_size": "2,500 sqft",
            "description": "Discover modern elegance in this stunning 4-bedroom, 3-bathroom home in the desirable Willow Creek neighborhood. The spacious interior features high ceilings, a gourmet kitchen with granite countertops, and a luxurious master suite. Enjoy the private backyard oasis with a pool and outdoor kitchen, perfect for entertaining guests. Experience luxury living at its finest in Willow Creek.",
            "neighborhood_description": "Willow Creek is known for its upscale living, tree-lined streets, and top-rated schools. Residents enjoy easy access to shopping centers, parks, and fine dining restaurants. Explore the nearby Willow Creek Country Club for golf and social events."
        },
        {
            "neighborhood": "Lakeview Heights",
            "

### 2. Storing Listings in a Vector Database

Now we can load all generated houses to the Vector Store.

In [2]:
from src.db.chroma_db import VectorStore

json_path = "./real_estate_listings.json"
db = VectorStore(json_path, llm)

  embeddings = OpenAIEmbeddings(base_url=self.base_url, api_key=self.api_key)


In [3]:
# Testing the database with a sample query
db.query(query_text="Find me a 4-bedroom house with a garden and a garage.", top_k=3)

[Document(metadata={'source': '/Users/fabiovalonga/git/pessoal/Udacity/4_Building_Generative_AI_Solutions/final_project/real_estate_listings.json', 'seq_num': 9}, page_content='{"neighborhood": "Maplewood Gardens", "price": "$670,000", "bedrooms": 4, "bathrooms": 3, "house_size": "2,200 sqft", "description": "Welcome to this beautifully renovated 4-bedroom, 3-bathroom home in the serene Maplewood Gardens neighborhood. The interior features an open floor plan, hardwood floors, and a gourmet kitchen with quartz countertops. Relax in the landscaped backyard with a deck and garden beds. Enjoy modern living in Maplewood Gardens.", "neighborhood_description": "Maplewood Gardens offers a peaceful environment with tree-lined streets, parks, and local shops. Residents can explore nearby nature reserves, bike paths, and community events. With convenient access to schools and amenities, Maplewood Gardens is a great place to live."}'),
 Document(metadata={'source': '/Users/fabiovalonga/git/pessoal

### Step 3 - Building the User Preference Interface

- Collect buyer preferences, such as the number of bedrooms, bathrooms, location, and other specific requirements from a set of questions or telling the buyer to enter their preferences in natural language. You can hard-code the buyer preferences in questions and answers, or collect them interactively however you'd like, example:

- _Buyer Preference Parsing:_ Implement logic to interpret and structure these preferences for querying the vector database.


In [3]:
def collect_buyer_preferences(test=False):
    questions = [
        "How big do you want your house to be?"
        "What are 3 most important things for you in choosing this property?",
        "Which amenities would you like?",
        "Which transportation options are important to you?",
        "How urban do you want your neighborhood to be?",
    ]
    interview = []
    if test:
        answers = [
            "I want a house that is at least 2000 square feet.",
            "The most important things for me are a big backyard, a modern kitchen, and a quiet neighborhood.",
            "I would like a swimming pool, a gym, and a home office.",
            "Having access to public transportation and bike lanes is important to me.",
            "I prefer a suburban neighborhood with good schools and parks.",
        ]
        interview = [{"question": q, "answer": a} for q, a in list(zip(questions, answers))]

    else:
        for question in questions:
            answer = input(question + "\n")
            interview.append({"question": question, "answer": answer})

    return interview


### Step 4: Searching Based on Preferences

- _Semantic Search Implementation:_ Use the structured buyer preferences to perform a semantic search on the vector database, retrieving listings that most closely match the user's requirements.
- _Listing Retrieval Logic:_ Fine-tune the retrieval algorithm to ensure that the most relevant listings are selected based on the semantic closeness to the buyer’s preferences.


In [4]:
from langchain.prompts import PromptTemplate
from langchain.chains.question_answering import load_qa_chain
from langchain.chains import RetrievalQA


def find_best_properties(llm_client, interview, vector_store, top_k=3):
    print("Finding the best properties based on buyer preferences...")
    query = "Based on the interview in the context, find the top {top_k} properties that best match the buyer's preferences."

    prompt = PromptTemplate(input_variables=["query", "interview", "top_k"],
                            template="{query}\nInterview:{interview}",
                            partial_variables={"top_k": top_k, "interview": interview}
                            )

    rag = RetrievalQA.from_chain_type(llm=llm_client, chain_type="stuff", retriever=vector_store.as_retriever())
    results = rag.run(prompt.format(query=query))
    print(results)
    return results

### Step 5: Personalizing Listing Descriptions

- LLM Augmentation: For each retrieved listing, use the LLM to augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
- Maintaining Factual Integrity: Ensure that the augmentation process enhances the appeal of the listing without altering factual information.

In [5]:
def enhance_results(llm, results):
    print("Enhancing the property descriptions to make them more appealing...")
    system_prompt = "You're a real estate expert. Your task is to enhance property descriptions to make them more appealing to potential buyers. But act as a professional home estate agent."
    query = "Given the following property listings, enhance the descriptions to make them more appealing to potential buyers. Add more details about the features and benefits of each property."
    output_format = "Use a friendly and engaging tone. Highlight unique features and amenities. Separate each listing with a line and do not forget to enumerate each home."
    prompt = PromptTemplate(input_variables=["results"],
                            template="{query}\n\nProperty Listings:\n{results}\n\nOutput Format:\n{output_format}",
                            partial_variables={"query": query, "output_format": output_format}
                            )

    enhanced_results = llm.ask(prompt.format(results=results), system_prompt=system_prompt)
    print(enhanced_results)
    return enhanced_results

In [6]:
def main(test_mode=True):
    interview = collect_buyer_preferences(test=test_mode)
    print("Interview collected:")
    for qa in interview:
        print(f"Q: {qa['question']}\nA: {qa['answer']}\n")

    if test_mode:
        top_results = 3
    else:
        try:
            top_results = int(input("How many top property matches would you like to see? (Give me just a number) "))
            print(f"Finding top {top_results} property matches...")
        except ValueError:
            print("Invalid input. Defaulting to 3 top results.")
            top_results = 3

    results = find_best_properties(llm.client, interview, db.db, top_k=top_results)

    if test_mode:
        print("=" * 40)
        print("Top property matches:")
        print(results)

    formated_results = enhance_results(llm, results)

    if test_mode:
        print("=" * 40)
        print("Enhanced property descriptions:\n")
        print(formated_results)

    return formated_results

## Testing

Let's test what we've created

In [7]:
main(test_mode=True)

Interview collected:
Q: How big do you want your house to be?What are 3 most important things for you in choosing this property?
A: I want a house that is at least 2000 square feet.

Q: Which amenities would you like?
A: The most important things for me are a big backyard, a modern kitchen, and a quiet neighborhood.

Q: Which transportation options are important to you?
A: I would like a swimming pool, a gym, and a home office.

Q: How urban do you want your neighborhood to be?
A: Having access to public transportation and bike lanes is important to me.

Finding the best properties based on buyer preferences...


  results = rag.run(prompt.format(query=query))


Based on the buyer's preferences for a house that is at least 2000 square feet, with a big backyard, a modern kitchen, and a quiet neighborhood, the top properties that best match these criteria are:
1. Riverfront Estates: $1,500,000, 6 bedrooms, 4 bathrooms, 4,000 sqft
2. Harbor View: $1,100,000, 5 bedrooms, 4 bathrooms, 3,300 sqft

These properties offer the space, amenities, and features that align with the buyer's preferences.
Top property matches:
Based on the buyer's preferences for a house that is at least 2000 square feet, with a big backyard, a modern kitchen, and a quiet neighborhood, the top properties that best match these criteria are:
1. Riverfront Estates: $1,500,000, 6 bedrooms, 4 bathrooms, 4,000 sqft
2. Harbor View: $1,100,000, 5 bedrooms, 4 bathrooms, 3,300 sqft

These properties offer the space, amenities, and features that align with the buyer's preferences.
Enhancing the property descriptions to make them more appealing...


  response = self.client(messages)


Absolutely, let's enhance those property listings for you:

1. Riverfront Estates: $1,500,000
Welcome to Riverfront Estates, where luxury living meets serene waterfront views. This expansive property boasts 6 bedrooms and 4 bathrooms across a generous 4,000 square feet of living space. Step inside and be greeted by a modern kitchen that is perfect for both everyday cooking and entertaining guests. The large backyard provides ample space for outdoor activities or simply unwinding in nature. Nestled in a quiet neighborhood, this home offers the perfect blend of tranquility and elegance. Don't miss out on the opportunity to call Riverfront Estates your new home sweet home.

2. Harbor View: $1,100,000
Experience coastal living at its finest at Harbor View. This stunning property features 5 bedrooms and 4 bathrooms spread out over 3,300 square feet of luxury living space. The modern kitchen is a chef's dream, equipped with top-of-the-line appliances and stylish finishes. Step outside to the

"Absolutely, let's enhance those property listings for you:\n\n1. Riverfront Estates: $1,500,000\nWelcome to Riverfront Estates, where luxury living meets serene waterfront views. This expansive property boasts 6 bedrooms and 4 bathrooms across a generous 4,000 square feet of living space. Step inside and be greeted by a modern kitchen that is perfect for both everyday cooking and entertaining guests. The large backyard provides ample space for outdoor activities or simply unwinding in nature. Nestled in a quiet neighborhood, this home offers the perfect blend of tranquility and elegance. Don't miss out on the opportunity to call Riverfront Estates your new home sweet home.\n\n2. Harbor View: $1,100,000\nExperience coastal living at its finest at Harbor View. This stunning property features 5 bedrooms and 4 bathrooms spread out over 3,300 square feet of luxury living space. The modern kitchen is a chef's dream, equipped with top-of-the-line appliances and stylish finishes. Step outside

## Show Time

Now you can see with your eyes the final result

In [9]:
# live demo
while True:
    cont = input("Do you want to find properties for a new buyer? (yes/no): ")
    if cont.lower() in ["yes", "y"]:
        results = main(test_mode=False)
        print("Final enhanced property descriptions:")
        print(results)
    else:
        print("Exiting the program.")
        break

Interview collected:
Q: How big do you want your house to be?What are 3 most important things for you in choosing this property?
A: I need a house with 3 bedrooms and 2 bathrooms at least

Q: Which amenities would you like?
A: home-office, garage, swimming pool

Q: Which transportation options are important to you?
A: just my car

Q: How urban do you want your neighborhood to be?
A: I would like somewhere closed to the beach

Finding top 5 property matches...
Finding the best properties based on buyer preferences...
Based on the buyer's preferences of needing a house with 3 bedrooms and 2 bathrooms, wanting amenities like a home office, garage, and swimming pool, preferring transportation by car, and desiring a neighborhood close to the beach, the top 3 properties that best match these preferences are:

1. Harbor View:
   - Neighborhood: Harbor View
   - Price: $1,100,000
   - Bedrooms: 5
   - Bathrooms: 4
   - House Size: 3,300 sqft
   - Description: Step into luxury with this exquisi