<a href="https://colab.research.google.com/github/seobando/UDACITY_GenerativeAI/blob/main/HomeMatch/HomeMatch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Workspace setup

## Install required libraries

In [None]:
!pip install openai
!pip install langchain
!pip install chromadb
!pip install tiktoken



## Load Libraries

In [None]:
import os
import ast
import pandas as pd

from langchain.document_loaders.csv_loader import CSVLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain_community.chat_models import ChatOpenAI
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain import LLMChain
from langchain.chains.question_answering import load_qa_chain

from openai import OpenAI

# Final Project

In [None]:
#from google.colab import userdata

#api_key = userdata.get('OPENAI_API_KEY')

In [None]:
api_key  = os.environ.get("OPENAI_API_KEY")

open_ai = OpenAI(
    api_key=api_key
)

In [None]:
file_path = "Listings.csv"

## 1. Synthetic Data Generation

In [None]:
def get_completition(client, prompt, model="gpt-3.5-turbo"):
  messages = [{"role": "user", "content": prompt}]
  response = client.chat.completions.create(
      model=model,
      messages=messages,
      temperature=0)
  return response.choices[0].message.content

def convert_to_dataframe(text):
  text_dict = ast.literal_eval(text)
  df = pd.DataFrame(text_dict)
  return df

def save_as_csv(df,file_path):
  df.to_csv(file_path)

In [None]:
instruction = """
Create a list of apartments with descriptions for rent.

The list must meet the following criteria:
  - Should have at leat 10 properties.
  - Each property should has the following description:
    - Building name
    - Number of bedrooms
    - Number of bathrooms
    - Property size in meters
    - Neigborhood name
    - Has parking
    - Has an elevator
    - Has garbage shut
    - Has a pool
    - Is pet friendly
    - Is near market places
    - Is near schools or universities
    - Is near hospitals
    - Is near public transport
    - Price in COP
  - Near means a distance of around 1000 meters to the apartment
"""

output_format = """
[
  {
    'Building name': '<answer_here>',
    'Number of bedrooms':'<answer_here>',
    'Number of badrooms':'<answer_here>',
    'Property size in meters':'<answer_here>',
    'Neigborhood name':'<answer_here>',
    'Has parking':'<answer_here>',
    'Has an elevator':'<answer_here>',
    'Has garbage shut':'<answer_here>',
    'Has a pool':'<answer_here>',
    'Is pet friendly': '<answer_here>',
    'Is near market places': '<answer_here>',
    'Is near schools or universities': '<answer_here>',
    'Is near hospitals': '<answer_here>',
    'Is near public transport': '<answer_here>',
    'Price in COP':'<answer_here>',
    },
]
"""

prompt = f"""
Act as a real state agent from the city of Medellin in Colombia.

{instruction}

You should fill the <building_name> and <answer_here> parts of the following output format:

{output_format}

"""

In [None]:
response = get_completition(open_ai, prompt, model="gpt-3.5-turbo")
print(response)

[
  {
    'Building name': 'Poblado Suites',
    'Number of bedrooms': '2',
    'Number of bathrooms': '2',
    'Property size in meters': '80',
    'Neigborhood name': 'El Poblado',
    'Has parking': 'Yes',
    'Has an elevator': 'Yes',
    'Has garbage shut': 'Yes',
    'Has a pool': 'Yes',
    'Is pet friendly': 'Yes',
    'Is near market places': 'Yes',
    'Is near schools or universities': 'Yes',
    'Is near hospitals': 'Yes',
    'Is near public transport': 'Yes',
    'Price in COP': '2,500,000',
  },
  {
    'Building name': 'Laureles Towers',
    'Number of bedrooms': '3',
    'Number of bathrooms': '2',
    'Property size in meters': '100',
    'Neigborhood name': 'Laureles',
    'Has parking': 'Yes',
    'Has an elevator': 'Yes',
    'Has garbage shut': 'Yes',
    'Has a pool': 'No',
    'Is pet friendly': 'Yes',
    'Is near market places': 'Yes',
    'Is near schools or universities': 'Yes',
    'Is near hospitals': 'Yes',
    'Is near public transport': 'Yes',
    'Pric

In [None]:
df = convert_to_dataframe(response)
df.head()

Unnamed: 0,Building name,Number of bedrooms,Number of bathrooms,Property size in meters,Neigborhood name,Has parking,Has an elevator,Has garbage shut,Has a pool,Is pet friendly,Is near market places,Is near schools or universities,Is near hospitals,Is near public transport,Price in COP
0,Poblado Suites,2,2,80,El Poblado,Yes,Yes,Yes,Yes,Yes,Yes,Yes,Yes,Yes,2500000
1,Laureles Towers,3,2,100,Laureles,Yes,Yes,Yes,No,Yes,Yes,Yes,Yes,Yes,3000000
2,Envigado Gardens,1,1,60,Envigado,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,Yes,1800000
3,Belén Heights,2,1,70,Belén,Yes,No,Yes,No,No,Yes,Yes,Yes,Yes,1600000
4,Robledo Residences,3,2,90,Robledo,Yes,Yes,Yes,No,Yes,Yes,Yes,Yes,Yes,2200000


In [None]:
save_as_csv(df,file_path)

## 2 Semantic Search

In [None]:
loader = CSVLoader(file_path=file_path)
docs = loader.load()
print(docs)

[Document(page_content=': 0\nBuilding name: Poblado Suites\nNumber of bedrooms: 2\nNumber of bathrooms: 2\nProperty size in meters: 80\nNeigborhood name: El Poblado\nHas parking: Yes\nHas an elevator: Yes\nHas garbage shut: Yes\nHas a pool: Yes\nIs pet friendly: Yes\nIs near market places: Yes\nIs near schools or universities: Yes\nIs near hospitals: Yes\nIs near public transport: Yes\nPrice in COP: 2,500,000', metadata={'source': 'apartments_list.csv', 'row': 0}), Document(page_content=': 1\nBuilding name: Laureles Towers\nNumber of bedrooms: 3\nNumber of bathrooms: 2\nProperty size in meters: 100\nNeigborhood name: Laureles\nHas parking: Yes\nHas an elevator: Yes\nHas garbage shut: Yes\nHas a pool: No\nIs pet friendly: Yes\nIs near market places: Yes\nIs near schools or universities: Yes\nIs near hospitals: Yes\nIs near public transport: Yes\nPrice in COP: 3,000,000', metadata={'source': 'apartments_list.csv', 'row': 1}), Document(page_content=': 2\nBuilding name: Envigado Gardens\nN

In [None]:
splitter = CharacterTextSplitter(chunk_size = 1000, chunk_overlap=0)
split_docs = splitter.split_documents(docs)
embeddings = OpenAIEmbeddings(openai_api_key=api_key)
db = Chroma.from_documents(split_docs, embeddings)

  warn_deprecated(


In [None]:
def search_properties(db,
                      building_name=None,
                      num_bedrooms=None,
                      num_bathrooms=None,
                      property_size=None,
                      neighborhood_name=None,
                      has_parking=None,
                      has_elevator=None,
                      has_garbage_shut=None,
                      has_pool=None,
                      is_pet_friendly=None,
                      is_near_marketplaces=None,
                      is_near_schools_or_universities=None,
                      is_near_hospitals=None,
                      is_near_public_transport=None,
                      price=None):
    # Construct the query string based on the provided parameters
    query_parts = []
    if building_name:
        query_parts.append(f"Building name: {building_name}")
    if num_bedrooms is not None:
        query_parts.append(f"Number of bedrooms: {num_bedrooms}")
    if num_bathrooms is not None:
        query_parts.append(f"Number of bathrooms: {num_bathrooms}")
    if property_size is not None:
        query_parts.append(f"Property size: {property_size}m²")
    if neighborhood_name:
        query_parts.append(f"Neighborhood: {neighborhood_name}")
    if has_parking is not None:
        query_parts.append(f"Has parking: {'Yes' if has_parking else 'No'}")
    if has_elevator is not None:
        query_parts.append(f"Has elevator: {'Yes' if has_elevator else 'No'}")
    if has_garbage_shut is not None:
        query_parts.append(f"Has garbage shut: {'Yes' if has_garbage_shut else 'No'}")
    if has_pool is not None:
        query_parts.append(f"Has pool: {'Yes' if has_pool else 'No'}")
    if is_pet_friendly is not None:
        query_parts.append(f"Is pet friendly: {'Yes' if is_pet_friendly else 'No'}")
    if is_near_marketplaces is not None:
        query_parts.append(f"Is near market places: {'Yes' if is_near_marketplaces else 'No'}")
    if is_near_schools_or_universities is not None:
        query_parts.append(f"Is near schools or universities: {'Yes' if is_near_schools_or_universities else 'No'}")
    if is_near_hospitals is not None:
        query_parts.append(f"Is near hospitals: {'Yes' if is_near_hospitals else 'No'}")
    if is_near_public_transport is not None:
        query_parts.append(f"Is near public transport: {'Yes' if is_near_public_transport else 'No'}")
    if price is not None:
        query_parts.append(f"Price: {price} COP")

    query = ". ".join(query_parts)

    # Perform the search
    results = db.similarity_search(query)

    return results

# Example usage
query_results = search_properties(
    db,
    building_name="Sunset Plaza",
    num_bedrooms=3,
    num_bathrooms=2,
    property_size=120,
    neighborhood_name="El Poblado",
    has_parking=True,
    has_elevator=True,
    has_garbage_shut=False,
    has_pool=True,
    is_pet_friendly=True,
    is_near_marketplaces=True,
    is_near_schools_or_universities=True,
    is_near_hospitals=True,
    is_near_public_transport=True,
    price=1500000
)

# Print the query results
for i, result in enumerate(query_results):
    print(f"Result {i+1}:")
    print(result)
    print("\n")

Result 1:
page_content=': 0\nBuilding name: Poblado Suites\nNumber of bedrooms: 2\nNumber of bathrooms: 2\nProperty size in meters: 80\nNeigborhood name: El Poblado\nHas parking: Yes\nHas an elevator: Yes\nHas garbage shut: Yes\nHas a pool: Yes\nIs pet friendly: Yes\nIs near market places: Yes\nIs near schools or universities: Yes\nIs near hospitals: Yes\nIs near public transport: Yes\nPrice in COP: 2,500,000' metadata={'row': 0, 'source': 'apartments_list.csv'}


Result 2:
page_content=': 4\nBuilding name: Robledo Residences\nNumber of bedrooms: 3\nNumber of bathrooms: 2\nProperty size in meters: 90\nNeigborhood name: Robledo\nHas parking: Yes\nHas an elevator: Yes\nHas garbage shut: Yes\nHas a pool: No\nIs pet friendly: Yes\nIs near market places: Yes\nIs near schools or universities: Yes\nIs near hospitals: Yes\nIs near public transport: Yes\nPrice in COP: 2,200,000' metadata={'row': 4, 'source': 'apartments_list.csv'}


Result 3:
page_content=': 8\nBuilding name: Castilla Heights\n

## 3 Augmented Response Generation

In [None]:
def generate_suggestions(db, llm, query):
  retriever = db.as_retriever(search_type="similarity", search_kwargs={"k":2})
  rag = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)
  result = rag({"query": query})
  return result["result"]

In [None]:
model_name = "gpt-3.5-turbo"
llm = ChatOpenAI(api_key=api_key, model_name = model_name, temperature=0, max_tokens=2000)

  warn_deprecated(


In [None]:
query = "Based on the available apartments suggest a couple of apartments for a family of 4 members near a school"
print(generate_suggestions(db, llm, query))

  warn_deprecated(


Based on the information provided, I would recommend the following apartments for a family of 4 members near a school:

1. Sabaneta Suites:
   - Building located in Sabaneta neighborhood.
   - 3 bedrooms, 2 bathrooms.
   - Pet-friendly.
   - Near schools or universities.
   - Price: 2,800,000 COP.

2. Robledo Residences:
   - Building located in Robledo neighborhood.
   - 3 bedrooms, 2 bathrooms.
   - Pet-friendly.
   - Near schools or universities.
   - Price: 2,200,000 COP.

Both of these apartments are suitable for a family of 4 members, near schools or universities, and offer the required number of bedrooms and bathrooms.


In [None]:
query = "Based on the available apartments suggest a couple of apartments for a family of 4 members near a school, should have at least 2 bathrooms"
print(generate_suggestions(db, llm, query))

Based on the information provided, I would recommend the following apartments for a family of 4 members near a school with at least 2 bathrooms:

1. Sabaneta Suites:
   - Building name: Sabaneta Suites
   - Neigborhood name: Sabaneta
   - Number of bedrooms: 3
   - Number of bathrooms: 2
   - Property size in meters: 95
   - Has parking: Yes
   - Has an elevator: Yes
   - Is pet friendly: Yes
   - Is near market places: Yes
   - Is near schools or universities: Yes
   - Is near hospitals: Yes
   - Is near public transport: Yes
   - Price in COP: 2,800,000

2. Robledo Residences:
   - Building name: Robledo Residences
   - Neigborhood name: Robledo
   - Number of bedrooms: 3
   - Number of bathrooms: 2
   - Property size in meters: 90
   - Has parking: Yes
   - Has an elevator: Yes
   - Is pet friendly: Yes
   - Is near market places: Yes
   - Is near schools or universities: Yes
   - Is near hospitals: Yes
   - Is near public transport: Yes
   - Price in COP: 2,200,000

Both of these a