## Step 1: Setting Up the Python Application

In this project, I used google google colaboratory.
Before continue to see AI response, you need to install library. Detail is below.
* langchain
* langchain-openai
* langchain_community
* langchain_core
* chromadb

In [None]:
!pip install langchain langchain-openai langchain_community langchain_core chromadb



> Restart kernel before continue

In [None]:
from langchain_openai  import OpenAI, OpenAIEmbeddings

from langchain_community.document_loaders import TextLoader

from langchain.prompts import PromptTemplate, ChatPromptTemplate, MessagesPlaceholder
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.memory import ChatMessageHistory

from langchain_core.runnables  import RunnablePassthrough, Runnable
from langchain_core.output_parsers import StrOutputParser

from operator import itemgetter

import os
import pandas as pd

In [None]:
os.environ['OPENAI_API_KEY'] = "YOUR_API_KEY"
llm = OpenAI(temperature =  1, max_tokens = 3700)
llm_personalize = OpenAI(temperature =  0, max_tokens = 3700)

## Step 2: Generating Real Estate Listings

* Generate real estate listings using a Large Language Model. Generate at least 10 listings This can involve creating prompts for the LLM to produce descriptions of various properties. An example of a listing might be:
* You'll use these listings to populate the database for testing and development of "HomeMatch".    

In [None]:
# define template
template = """{question}
###
Neighborhood: Green Oaks,
Price: $800,000,
Bedrooms: 3,
Bathrooms: 2,
House Size: 2,000 sqft,
Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure.
Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths.
###

Make sure the 'Description:  your answer' has at least 50 words.
Make sure the 'Neighborhood Description: your answer' has  at least 50 words.
"""

prompt = PromptTemplate.from_template(template)

llm_chain = prompt | llm

# set parameter for question
input_num = 15
place ="U.S.A,"

question = f"""
Generate {input_num} real estate listing for buyer in American English.
Restriction:
Real estate is located in {place}
Listing is seperated by ---.
Your answer format is below.
"""
listings = llm_chain.invoke(question)

In [None]:
# create lambda function to separate listing 
separater = lambda x: x.split("Neighborhood:")

# ensure 10 listings by passing thourough while loop
while len(separater(listings))<10:
  listings += llm_chain.invoke(question)

len(separater(listings))

16

In [None]:
# write text file to load listing.txt
with open("listings.txt", 'w') as f:
    f.write(listings)

## Step 3: Storing Listings in a Vector Database

* Vector Database Setup: Initialize and configure ChromaDB or a similar vector database to store real estate listings.
* Generating and Storing Embeddings: Convert the LLM-generated listings into suitable embeddings that capture the semantic content of each listing, and store these embeddings in the vector database.

In [None]:
# load text by using loader
loader = TextLoader("./listings.txt")
docs = loader.load()

# split text with chunck
splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 100 )
split_docs = splitter.split_documents(docs)

# create embeddings for similarity search
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

# create vector database
db = Chroma.from_documents(split_docs, embeddings)

In [None]:
# simirality search example
query = "2 bedroom"
db.similarity_search(query)[0].page_content

"---\n\nNeighborhood: Downtown Los Angeles,\nPrice: $1,200,000,\nBedrooms: 2,\nBathrooms: 2,\nHouse Size: 1,500 sqft,\nDescription: Live in the heart of the bustling city with this 2-bedroom, 2-bathroom condo in downtown Los Angeles. This stunning unit boasts floor-to-ceiling windows, modern finishes, and breathtaking views of the city skyline. Enjoy the convenience of being walking distance to restaurants, shops, and entertainment.\nNeighborhood Description: Downtown Los Angeles is a vibrant and diverse neighborhood, offering a mix of historic landmarks, contemporary buildings, and cultural attractions. With a thriving nightlife and access to public transportation, it's a popular choice for young professionals and urban enthusiasts.\n\n---"

## Step 4: Building the User Preference Interface


* Collect buyer preferences, such as the number of bedrooms, bathrooms, location, and other specific requirements from a set of questions or telling the buyer to enter their preferences in natural language. You can hard-code the buyer preferences in questions and answers, or collect them interactively however you'd like, example:

In [None]:
questions = [
    "How big do you want your house to be?"
    "What are 3 most important things for you in choosing this property?",
    "Which amenities would you like?",
    "Which transportation options are important to you?",
    "How urban do you want your neighborhood to be?",
            ]
answers = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
    ]

* Buyer Preference Parsing: Implement logic to interpret and structure these preferences for querying the vector database.
    

## Step 5: Searching Based on Preferences

* Semantic Search Implementation: Use the structured buyer preferences to perform a semantic search on the vector database, retrieving listings that most closely match the user's requirements.
* Listing Retrieval Logic: Fine-tune the retrieval algorithm to ensure that the most relevant listings are selected based on the semantic closeness to the buyer’s preferences.


In [None]:
docs = []

for answer in answers:
    # to ensure most closely match the user's requirements set k=1
    retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": 1})
    docs.extend(retriever.invoke(answer))

# show example
docs[0].page_content

"---\n\nNeighborhood: Austin, Texas,\nPrice: $600,000,\nBedrooms: 3,\nBathrooms: 2,\nHouse Size: 2,000 sqft,\nDescription: Embrace the vibrant city of Austin with this modern and spacious 3-bedroom, 2-bathroom home. Located in a hip and trendy neighborhood, this property offers an open floor plan, high-end finishes, and easy access to local hot spots, parks, and trails.\nNeighborhood Description: Austin is a culturally-rich and dynamic city with a thriving music and arts scene, a wide variety of outdoor activities, and plenty of local shops and restaurants to explore. It's a popular choice for creatives, young professionals, and families seeking a laid-back and unique lifestyle.\n\n---"

## Step 6: Personalizing Listing Descriptions

* LLM Augmentation: For each retrieved listing, use the LLM to augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
* Maintaining Factual Integrity: Ensure that the augmentation process enhances the appeal of the listing without altering factual information.

In [None]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [None]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """
                You are a professional real estate agent. According Buyer's Needs, emphasize key points and feature.
                Context :{context}
            """,
        ),
        MessagesPlaceholder(variable_name="messages"),
        ("user", "{question}")
    ]
)

prompt

ChatPromptTemplate(input_variables=['context', 'messages', 'question'], input_types={'messages': typing.List[typing.Union[langchain_core.messages.ai.AIMessage, langchain_core.messages.human.HumanMessage, langchain_core.messages.chat.ChatMessage, langchain_core.messages.system.SystemMessage, langchain_core.messages.function.FunctionMessage, langchain_core.messages.tool.ToolMessage]]}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], template="\n                You are a professional real estate agent. According Buyer's Needs, emphasize key points and feature.\n                Context :{context}\n            ")), MessagesPlaceholder(variable_name='messages'), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question'], template='{question}'))])

In [None]:
history = ChatMessageHistory()

for i, messeage in enumerate(questions):
    history.add_ai_message(questions[i])
    history.add_user_message(answers[i])

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough(),  "messages": lambda x: history.messages}
    | prompt
    | llm_personalize
    | StrOutputParser()
)

In [None]:
response = rag_chain.invoke("Get pesonalized answer")

In [None]:
response

'\nAI: Based on your preferences, this stunning 4-bedroom townhouse in the prestigious Upper East Side neighborhood would be the perfect fit for you. With a spacious kitchen and cozy living room, this 3,500 sqft home offers the perfect balance of comfort and style. The neighborhood is known for its upscale atmosphere and is conveniently located near top-rated schools and upscale shops and restaurants. Additionally, the private rooftop terrace and easy access to transportation options make this property a desirable choice for families and professionals alike.'

In [None]:
del db

## Reference

* [LangChain official](https://python.langchain.com/v0.1/docs/get_started/introduction)
    * [Document loaders](https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/)
    <br>
    * [Text Splitters](https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/)
    <br>
    * [Retriever](https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/)
    <br>
    * [Use cases](https://python.langchain.com/v0.1/docs/use_cases/)
