This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 

In [1]:
# Step 1: Setting Up the Python Application

"""
Initialize a Python Project: Create a new Python project, setting up a virtual environment and installing necessary 
packages like LangChain, a suitable LLM library (e.g., OpenAI's GPT), and a vector database package compatible with Python 
(e.g., ChromaDB or LanceDB). If you don't wish to create your files from scratch, starter files are available in the 
workspace on the next page as an application skeleton.
"""

from langchain.llms import OpenAI
# from langchain.chat_models import ChatOpenAI

# in case of Chain Of Thought (COT), you need these libraries:
from langchain.prompts import PromptTemplate
# from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain import LLMChain
# ChromaDB
from langchain.vectorstores import Chroma

from langchain.embeddings.openai import OpenAIEmbeddings
# from langchain.document_loaders.csv_loader import CSVLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain.chains.question_answering import load_qa_chain
from langchain.docstore.document import Document


import os
os.environ["OPENAI_API_KEY"] = ''

In [2]:
# Step 2: Generating Real Estate Listings

"""
Generate real estate listings using a Large Language Model. Generate at least 10 listings: this can involve creating prompts 
for the LLM to produce descriptions of various properties.
"""

current_model_name = "gpt-3.5-turbo"
temperature = 0.7
my_llm = OpenAI(model_name=current_model_name, temperature=temperature, max_tokens = 1000)

prompt_template = PromptTemplate.from_template(
    """You are a real estate agent/seller in charge of producing descriptions of various house properties. 
       Generate {num_descriptions} descriptions of possible properties that are at sell. These descriptions must include these parameters:
           Neighborhood: (you have to invent the name)
           Price: (in $)
           Bedrooms (max 6)
           Bathrooms (max 5)
           House Size (max 7,000 sqft)
           Description of the property: (max 50 words)
           Description of the Neighborhood: (max 50 words)
           
       Use this examples as a template:
            Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.
            Neighborhood: Green Oaks
            Price: $800,000
            Bedrooms: 3
            Bathrooms: 2
            House Size: 2,000 sqft
            Neighborhood Description: Green Oaks is a close-knit, environmentally conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.
       
       """
)

print("====OUTPUT=====\n")

llm_chain = LLMChain(llm=my_llm, prompt=prompt_template)
output = llm_chain.run(num_descriptions=5)

print(output)





====OUTPUT=====

Description: Step into luxury in this stunning 4-bedroom, 3.5-bathroom home located in the prestigious Willow Creek Estates. This spacious home features high-end finishes, a gourmet kitchen, and a private backyard oasis with a sparkling pool. Entertain guests in style or unwind in the luxurious master suite. Willow Creek Estates offers a peaceful escape from the hustle and bustle of the city, with tree-lined streets and upscale amenities.
Neighborhood: Willow Creek Estates
Price: $1,200,000
Bedrooms: 4
Bathrooms: 3.5
House Size: 4,500 sqft
Neighborhood Description: Willow Creek Estates is a sought-after community known for its elegant homes, top-rated schools, and scenic parks. Residents enjoy easy access to upscale shopping, dining, and entertainment options. With a strong sense of community and a variety of amenities, Willow Creek Estates offers the perfect blend of comfort and luxury.

Description: Welcome home to this charming 2-bedroom, 1-bathroom cottage in the q

In [3]:
# Step 3: Storing Listings in a Vector Database

"""
* Vector Database Setup: Initialize and configure ChromaDB or a similar vector database to store real estate listings.

* Generating and Storing Embeddings: Convert the LLM-generated listings into suitable embeddings that capture the semantic 
content of each listing and store these embeddings in the vector database.
"""

embeddings = OpenAIEmbeddings()

documented_descriptions = Document(page_content = output)
vector_descr = [documented_descriptions]

splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
split_docs = splitter.split_documents(vector_descr)

#db = Chroma.from_documents(desc, embeddings)
db = Chroma.from_documents(split_docs, embeddings)

# Function to retrieve and print documents from the database
def print_db_contents(db):
    # Retrieve documents from the Chroma database using a broad query
    query = " "  # Broad query to fetch all documents
    similar_docs = db.similarity_search(query, k=10)  # Adjust k to the expected number of documents
    
    for i, doc in enumerate(similar_docs):
        print(f"Document {i + 1}:")
        print(doc.page_content)
        print("\n")

# Print the contents of the database
print_db_contents(db)

Number of requested results 10 is greater than number of elements in index 4, updating n_results = 4


Document 1:
Description: Escape to paradise in this 3-bedroom, 2.5-bathroom beachfront retreat in the idyllic neighborhood of Sunset Shores. This coastal home features panoramic ocean views, a gourmet kitchen, a spacious deck for al fresco dining, and direct beach access. Sunset Shores offers a laid-back beach lifestyle, with sandy shores, charming beach cottages, and a vibrant coastal community.
Neighborhood: Sunset Shores
Price: $1,800,000
Bedrooms: 3
Bathrooms: 2.5
House Size: 3,000 sqft
Neighborhood Description: Sunset Shores is a beachfront community known for its stunning sunsets, sandy beaches, and relaxed vibe. Residents enjoy easy access to water sports, beachfront dining, and entertainment options. With a strong sense of community and a vibrant coastal lifestyle, Sunset Shores offers the perfect blend of relaxation and recreation.


Document 2:
Description: Welcome home to this charming 2-bedroom, 1-bathroom cottage in the quaint neighborhood of Oakwood Village. This cozy hom

In [4]:
# Step 4: Building the User Preference Interface
"""
Collect buyer preferences, such as the number of bedrooms, bathrooms, location, and other specific requirements from a set 
of questions or telling the buyer to enter their preferences in natural language. You can hard-code the buyer preferences 
in questions and answers, or collect them interactively however you'd like, example:
"""

questions = [   
"1) How big do you want your house to be?",
"2) What are 3 most important things for you in choosing this property?", 
"3) Which amenities would you like?", 
"4) Which transportation options are important to you?",
"5) How urban do you want your neighborhood to be?",   
            ]
answers = [
"A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
"A quiet neighborhood, good local schools, and convenient shopping options.",
"A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
"Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
"A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
            ]
# Buyer Preference Parsing: Implement logic to interpret and structure these preferences for querying the vector database.

combined_questions_answers = "Real estate agent/seller questions:\n" + "\n".join([f"{questions[i]} {answers[i]}" for i in range(len(questions))])

prompt_template_questionary = PromptTemplate.from_template(combined_questions_answers)
print(prompt_template_questionary.template)

query = """
Based on these questions and answers made to the customers, which property name would you recommend? And why?
Make sure you do not paraphrase the descriptions, and only use the information provided in the descriptions.
"""

use_chain_helper = True

if use_chain_helper:
    # Initialize the retriever
    retriever = db.as_retriever()
    #print("Retriever initialized:", retriever)
    
    # debug print to check the llm object
    #print("LLM object:", my_llm)
    #print("Type of LLM object", type(my_llm))
    
    # Create the RetrievalQA chain
    try:
        rag = RetrievalQA.from_chain_type(llm=my_llm, chain_type="stuff", retriever=retriever)
        print("RetrievalQA chain initialized")
        
        # Run the query
        result = rag.run(query)
        print("Query result:", result)
    except Exception as e:
        print("Error initializing RetrievalQA or running query:", e)
else:
    similar_docs = db.similarity_search(query, k=3)
    prompt = PromptTemplate(
        template="{query}\nContext: {context}",
        input_variables=["query", "context"]
    )
    chain = load_qa_chain(my_llm, prompt=prompt, chain_type="stuff")
    print(chain.run(input_documents=similar_docs, query=query))

Real estate agent/seller questions:
1) How big do you want your house to be? A comfortable three-bedroom house with a spacious kitchen and a cozy living room.
2) What are 3 most important things for you in choosing this property? A quiet neighborhood, good local schools, and convenient shopping options.
3) Which amenities would you like? A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.
4) Which transportation options are important to you? Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.
5) How urban do you want your neighborhood to be? A balance between suburban tranquility and access to urban amenities like restaurants and theaters.
RetrievalQA chain initialized
Query result: Based on the provided information, I would recommend the property in Diamond Hills. This is because it offers a luxurious 5-bedroom, 4-bathroom estate with upscale amenities, stunning mountain views, and a prestigious address, mak

In [5]:
# Step 5: Searching Based on Preferences
"""
* Semantic Search Implementation: Use the structured buyer preferences to perform a semantic search on the vector database, 
retrieving listings that most closely match the user's requirements.
"""

print ("Buyer's preferences: ")
buyer_pref = " ".join(answers)
print (buyer_pref)

similar_docs = db.similarity_search_with_score(buyer_pref, k=3)
print("Top 3 similar documents based on buyer preferences:")
for doc, score in similar_docs:
    print(f"Score: {score}\nDocument: {doc.page_content}\n")

"""    
* Listing Retrieval Logic: Fine-tune the retrieval algorithm to ensure that the most relevant listings are selected based 
on the semantic closeness to the buyer’s preferences.
"""

def fine_tune_retrieval(preferences, documents, k=3):
    similar_docs = db.similarity_search_with_score(preferences, k=k)
    sorted_docs = sorted(similar_docs, key=lambda x: x[1], reverse=True)  # Sort by score in descending order
    return sorted_docs[:k]

top_listings = fine_tune_retrieval(buyer_pref, split_docs, k=3)
print("Fine-tuned top 3 listings:")
for doc, score in top_listings:
    print(f"Score: {score}\nDocument: {doc.page_content}\n")

Buyer's preferences: 
A comfortable three-bedroom house with a spacious kitchen and a cozy living room. A quiet neighborhood, good local schools, and convenient shopping options. A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system. Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads. A balance between suburban tranquility and access to urban amenities like restaurants and theaters.
Top 3 similar documents based on buyer preferences:
Score: 0.33032089471817017
Document: Description: Welcome home to this charming 2-bedroom, 1-bathroom cottage in the quaint neighborhood of Oakwood Village. This cozy home features a bright and airy living space, a renovated kitchen, and a private backyard with a deck for outdoor entertaining. Oakwood Village offers a peaceful retreat from the city, with tree-lined streets and friendly neighbors.
Neighborhood: Oakwood Village
Price: $500,000
Bedrooms: 2
Bathrooms: 1
House Size: 1,20

In [6]:
# Step 6: Personalizing Listing Descriptions
"""
* LLM Augmentation: For each retrieved listing, use the LLM to augment the description, tailoring it to resonate with the buyer’s specific preferences. 
This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
"""

def augment_listing_description(listing, preferences):
    augmentation_prompt = f"""
    Here is a property listing description:

    {listing}

    Here are the buyer's preferences:

    {preferences}

    Augment the property listing description tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for without changing factual information. Keep the description within 100 words.
    """
    augmented_description = my_llm.predict(augmentation_prompt)
    return augmented_description

augmented_listings = []
for doc, score in top_listings:
    augmented_description = augment_listing_description(doc.page_content, buyer_pref)
    augmented_listings.append(augmented_description)

# Print augmented listings
print("Augmented Listings:")
for listing in augmented_listings:
    print(listing)

Augmented Listings:
Description: Nestled in the serene Willow Creek Estates, this charming 3-bedroom home offers a spacious kitchen and a cozy living room perfect for relaxing evenings. Enjoy the convenience of top-rated schools and upscale shopping just moments away. The backyard is ideal for gardening, while the two-car garage provides ample storage. Stay warm with the modern, energy-efficient heating system. With easy access to a reliable bus line and proximity to a major highway, commuting is a breeze. Embrace the perfect balance of suburban tranquility and urban amenities in this delightful home. Price: $1,200,000.
Escape to your ideal paradise in this 3-bedroom, 2.5-bathroom beachfront retreat in the peaceful Sunset Shores neighborhood. This coastal home boasts a gourmet kitchen, cozy living room, and a backyard perfect for gardening. With a two-car garage, modern heating system, and easy access to a reliable bus line and major highway, convenience is at your fingertips. Enjoy th

In [7]:
def ensure_factual_integrity(original_listing, augmented_listing):
    # Extract key details from the original listing
    original_facts = extract_facts(original_listing)
    augmented_facts = extract_facts(augmented_listing)
    
    for key in original_facts:
        if original_facts[key] != augmented_facts.get(key):
            return False
    return True

def extract_facts(listing):
    # A more advanced function to extract key details
    facts = {}
    # Example parsing logic (this needs to be more robust)
    for line in listing.split('\n'):
        if ':' in line:
            key, value = line.split(':', 1)
            facts[key.strip()] = value.strip()
    return facts

# Verify factual integrity of augmented listings
for doc, augmented_listing in zip(top_listings, augmented_listings):
    original_listing = doc[0].page_content
    print("Original Listing:\n", original_listing)
    print("Augmented Listing:\n", augmented_listing)
    if ensure_factual_integrity(original_listing, augmented_listing):
        print("Factual integrity maintained.")
    else:
        print("Factual integrity compromised.")
    print("Extracted Original Facts:", extract_facts(original_listing))
    print("Extracted Augmented Facts:", extract_facts(augmented_listing))

Original Listing:
 Description: Step into luxury in this stunning 4-bedroom, 3.5-bathroom home located in the prestigious Willow Creek Estates. This spacious home features high-end finishes, a gourmet kitchen, and a private backyard oasis with a sparkling pool. Entertain guests in style or unwind in the luxurious master suite. Willow Creek Estates offers a peaceful escape from the hustle and bustle of the city, with tree-lined streets and upscale amenities.
Neighborhood: Willow Creek Estates
Price: $1,200,000
Bedrooms: 4
Bathrooms: 3.5
House Size: 4,500 sqft
Neighborhood Description: Willow Creek Estates is a sought-after community known for its elegant homes, top-rated schools, and scenic parks. Residents enjoy easy access to upscale shopping, dining, and entertainment options. With a strong sense of community and a variety of amenities, Willow Creek Estates offers the perfect blend of comfort and luxury.
Augmented Listing:
 Description: Nestled in the serene Willow Creek Estates, thi

In [None]:
Step 7: Deliverables and Testing
•	Test your "HomeMatch" application and make sure it meets all of the requirements in the rubric. Your project code will be run when it's assessed. Enter different "buyer preferences" and ensure it works.
•	Jupyter Notebook/Python Program: Compile the application code in a Jupyter notebook or a standalone Python program. Ensure the code is well-commented and logically structured.
•	Example Outputs: Include example outputs showcasing how user preferences are processed and how the application generates personalized listing descriptions. You can include these in comments in your application or in a Jupyter notebook that's saved with outputs.
