# Real Estate Image Tagger

As I've previously worked in Real Estate, I wanted to explore use cases in real estate using GenAI. 

A simple use case I've considered is for an LLM to recognize key features in an image that would be useful for automatically labeling images for indexing so it can be searchable. I've seen too many images that are not adequately labeled. Does the listing have no carpet? I have to look at the images. Too often, listings are not easily searcheable across a number of attributes I would fine desirable. 

For this project, I started with the GenAI capability of Image Understanding. I would see how far I would get with the time I had.

As of before the submission deadline, the following GenAI capabilities are covered in this project:
* Structured output/JSON mode/controlled generation
* Few-shot prompting
* Image understanding
* Embeddings
* Retrieval augmented generation (RAG)
* Vector search/vector store/vector database



## Prerequisites

Include retry policy and be able to use the Google API Key.

In [None]:
# Uninstall packages from Kaggle base image that are not needed.
!pip uninstall -qqy jupyterlab kfp
# Install the google-genai SDK for this codelab.
!pip install -qU 'google-genai==1.7.0' 'chromadb==0.6.3'

In [None]:
from google import genai
from google.genai import types

from IPython.display import Markdown, HTML, display

genai.__version__

In [None]:
# API KEY

from kaggle_secrets import UserSecretsClient

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")

client = genai.Client(api_key=GOOGLE_API_KEY)

# Define a retry policy. The model might make multiple consecutive calls automatically
# for a complex query, this ensures the client retries if it hits quota limits.
from google.api_core import retry

is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

if not hasattr(genai.models.Models.generate_content, '__wrapped__'):
  genai.models.Models.generate_content = retry.Retry(
      predicate=is_retriable)(genai.models.Models.generate_content)

## Dataset

I created a small dataset using images from Unsplash. These are images related to real estate. I added it as an input into the notebook. 

In [None]:
def upload_to_google(path):
    image = client.files.upload(file=path)
    return image

## Upload to Google

In order to send it Gemini, I must save the file to Google. Below, I define some helper methods to upload the files to Google so it can be passed to Gemini for interpretation.

In [None]:
# delete google files, resets images stored in Google to avoid duplicates
print('List all images stored in Google via File API:')
for f in client.files.list():
    print(' ', f.name)
    client.files.delete(name=f.name)

In [None]:
import os

image_map = {}

# list all files in the dataset and upload to google
for dirname, _, filenames in os.walk('/kaggle/input/'):
    for filename in filenames:
        path = os.path.join(dirname, filename)
        image = upload_to_google(path)
        image_map[image.name] = path
        print(path)

print(image_map)

In [None]:
# list files
print('List all images stored in Google via File API:')
last_file = ""
for f in client.files.list():
    last_file = f.name
    print(' ', f.name)

In [None]:
# captions the image as a "hello world" test that it works
# this is code directly from the documentation
def process_image(image_path):
    image = client.files.get(name=image_path)
    response = client.models.generate_content(
        model="gemini-2.0-flash",
        contents=[image, "Caption this image."])

    print(response.text)

In [None]:
process_image(last_file)

## Prompt Engineering Improvements

I can see that it's working now, so I can improve the prompt to get a better description of the image.

In this section, I include a few more GenAI capabiltiies: Few-shot prompting, and Structured output/JSON mode/controlled generation.

In [None]:
prompt = """
Examine the image provided. This image is part of a real estate listing.

To enhance the listing, important details need to be identified in the image to include in the listing. 
These details should be included in the 'caption' attribute in the sample json below. It should contain at least 150 words.

The image should also be evaluated for the presence of a pool. 

If a pool is present, the json output should include a boolean value indicating a pool is present.

The output should be in JSON.

EXAMPLE:
The image of a kitchen contains an island. A tag would be "kitchen island".
JSON Response:
```
{
"caption": "A bright and spacious kitchen with white cabinets, a blue island, and dark countertops.",
"tags": ["kitchen island"],
"has_pool": false
}

EXAMPLE:
The image of a house contains a pool. A tag would be "outdoor pool".
JSON Response:
```
{
"caption": "A picturesque two-story house with a red exterior stands nestled between lush greenery and neighboring homes under a clear blue sky.",
"tags": ["kitchen island"],
"has_pool": true
}
"""
import typing_extensions as typing
import json

class ImageTags(typing.TypedDict):
    caption: str
    tags: list[str]
    has_pool: bool

def process_image_tags(image_path):
    image = client.files.get(name=image_path)
    response = client.models.generate_content(
            model="gemini-2.0-flash",
            config=types.GenerateContentConfig(
            temperature=0.5,
            response_mime_type="application/json",
            response_schema=ImageTags,
            ),
            contents=[image, prompt])

    return json.loads(response.text) | { "path": image_map[image_path] }

In [None]:
process_image_tags(last_file)


## Embeddings and Vector Databases

Once the tags are identified, the information can be added to a database for querying. As the next step of this project, I'll test out Embeddings and RAGs.

In [None]:
for m in client.models.list():
    if "embedContent" in m.supported_actions:
        print(m.name)

In [None]:
from chromadb import Documents, EmbeddingFunction, Embeddings
from google.api_core import retry

from google.genai import types


# Define a helper to retry when per-minute quota is reached.
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})


class GeminiEmbeddingFunction(EmbeddingFunction):
    # Specify whether to generate embeddings for documents, or queries
    document_mode = True

    @retry.Retry(predicate=is_retriable)
    def __call__(self, input: Documents) -> Embeddings:
        if self.document_mode:
            embedding_task = "retrieval_document"
        else:
            embedding_task = "retrieval_query"

        response = client.models.embed_content(
            model="models/text-embedding-004",
            contents=input,
            config=types.EmbedContentConfig(
                task_type=embedding_task,
            ),
        )
        return [e.values for e in response.embeddings]

In [None]:
from PIL import Image

# create a list of documents to be added to the database
documents = []

print('List all images stored in Google via File API:')
for f in client.files.list():
    document = process_image_tags(f.name)
    documents.append(document)
    print(' ', document)

In [None]:
# display images to verify results and limit # to avoid memory problems
for idx, document in enumerate(documents): 
    if idx < 3:
        display(Image.open(document['path']))
    print(' ', document)

In [None]:
import chromadb

DB_NAME = "real_estate_listings_db"

embed_fn = GeminiEmbeddingFunction()
embed_fn.document_mode = True

chroma_client = chromadb.Client()
chroma_client.delete_collection(name=DB_NAME) # reset db

db = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn, metadata={
        "hnsw:space": "cosine",
    })


In [None]:
for document in documents:
    db.add(
        documents=[document['caption']],
        metadatas=[{"tags": ','.join(document['tags']), "has_pool": document['has_pool']}],
        ids=[document['path']]
    )

In [None]:
db.count()

## Query Database

After adding embeddings and documents to database, now the database can be queried.

In [None]:
embed_fn.document_mode = False

# Search the Chroma DB using the specified query.
query = "Show me a listing with a red exterior."

result = db.query(query_texts=[query], n_results=1)

display(Image.open(result['ids'][0][0]))


[all_passages] = result["documents"]

Markdown(all_passages[0])

In [None]:
# Search the Chroma DB using the specified query.
query = "Show me a two-story home."

result = db.query(query_texts=[query])

display(Image.open(result['ids'][0][0]))

[all_passages] = result["documents"]

print("Total Listings returned: ", len(all_passages))

Markdown(all_passages[0])

## RAGs

After querying the database, it augments the prompt with the results and generates a "final answer".

In [None]:
query_oneline = query.replace("\n", " ")

# This prompt is where you can specify any guidance on tone, or what topics the model should stick to, or avoid.
prompt = f"""You are a helpful and informative real estate listing agent bot that answers questions using text from the listings included below. 
Tell me how many listings match my request. Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. 
Strike a friendly and converstional tone. If the passage is irrelevant to the answer, you may ignore it.

QUESTION: {query_oneline}
"""

# Add the retrieved documents to the prompt.
for passage in all_passages:
    passage_oneline = passage.replace("\n", " ")
    prompt += f"Listing Description: {passage_oneline}\nTotal Listings: {len(all_passages)} "

print(prompt)

In [None]:
answer = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=prompt)

Markdown(answer.text)

## Next Steps

