<a href="https://colab.research.google.com/github/kalyanarisetti/greenleaf_dtse/blob/main/VLM_with_BLIP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


### Prompt 1.1: MVP — Local Image Captioning with BLIP

Build me a Gradio app for Colab that follows this simple structure:

**Input Layer:**
    - The app should ask me to upload a single image file (like JPG or PNG).
    - There should be a big button labeled "Generate Caption" that I click to start the process.

**Processing Layer:**
    - When I click the button, the app should use a pre-trained AI model called BLIP (already available in Hugging Face) to look at my image and generate a short, natural-language description of what's in it.
    - The app should quietly install any libraries it needs (like transformers, torch, gradio) without asking me.
    - The first time I run it, it may take 1–2 minutes to download the model — that's okay.

**Output Layer:**
    - It should display the generated caption in a text box.
    - Below that, it should display a simple message like: “Caption generated using BLIP (local AI model).”

That’s it. No extra tabs. No download buttons. Just upload → click → see caption.

Give me one complete, runnable Gradio app in a single code cell as far as feasible.

In [None]:
import gradio as gr
from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image
import torch

# Install libraries quietly
!pip install -q gradio transformers torch

# Load the BLIP model and processor from Hugging Face
# This will download the model the first time it's run
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")

# Define the function for image captioning
def generate_caption(image_file):
    """
    Takes an image file and returns a generated caption using the BLIP model.
    """
    if image_file is None:
        return "Please upload an image."

    # Open the uploaded image
    raw_image = Image.open(image_file).convert('RGB')

    # Preprocess the image and generate a caption
    inputs = processor(images=raw_image, return_tensors="pt")

    # Use torch.no_grad() to save memory and speed up inference
    with torch.no_grad():
        out = model.generate(**inputs)

    # Decode the generated output and return the caption
    caption = processor.decode(out[0], skip_special_tokens=True)
    return caption

# Create the Gradio interface
with gr.Blocks(title="BLIP Image Captioning") as demo:
    gr.Markdown("# Local Image Captioning with BLIP")
    gr.Markdown("Upload an image, click the button, and let the BLIP model generate a caption!")

    with gr.Row():
        image_input = gr.Image(type="filepath", label="Upload an Image (JPG, PNG)")
        caption_output = gr.Textbox(label="Generated Caption")

    btn = gr.Button("Generate Caption")
    btn.click(fn=generate_caption, inputs=image_input, outputs=caption_output)

    gr.Markdown("---")
    gr.Markdown("Caption generated using **BLIP (local AI model).**")

# Launch the Gradio app
demo.launch(share=True)

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

preprocessor_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/506 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://dee9b979af981a5a47.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




### Prompt 1.2: Add User Control, Preview & Download

Enhance the app by giving me more control and utility. Keep the core BLIP captioning, but add:

  **Input Layer:**
    - Add a dropdown: “Caption Style” with options “Short caption” and “Detailed description”.
    - Keep the image upload and “Generate Caption” button.

  **Processing Layer:**
    - Use my style choice to guide BLIP (e.g., prepend “a photo of” for detailed mode).
    - Save the caption as a .txt file named after my image.
    - Keep a copy of the uploaded image for preview.

  **Output Layer:**
    - Display a preview of my uploaded image.
    - Display the caption in a text box.
    - Add a "Download Caption" button for the .txt file.
    - Keep the status message.

That's it. One cohesive, practical app — no fragmentation.

Give me one complete, runnable Gradio app in a single code cell as far as feasible.

In [None]:
import gradio as gr
from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image
import torch
import os

# Install libraries quietly
!pip install -q gradio transformers torch

# Load the BLIP model and processor from Hugging Face
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")

# Define the function for image captioning with user controls
def generate_caption(image_file, caption_style):
    """
    Takes an image file and a caption style, returns a caption, the image for preview, and a download file path.
    """
    if image_file is None:
        return "Please upload an image.", None, None

    # Open the uploaded image
    raw_image = Image.open(image_file).convert('RGB')

    # Preprocess the image and generate a caption
    if caption_style == "Detailed description":
        # Modify the prompt for a more detailed caption
        text = "a detailed description of "
        inputs = processor(images=raw_image, text=text, return_tensors="pt")
    else: # "Short caption"
        inputs = processor(images=raw_image, return_tensors="pt")

    # Use torch.no_grad() to save memory and speed up inference
    with torch.no_grad():
        out = model.generate(**inputs)

    # Decode the generated output
    caption = processor.decode(out[0], skip_special_tokens=True)

    # Save the caption to a text file
    base_name = os.path.splitext(os.path.basename(image_file))[0]
    output_filename = f"{base_name}_caption.txt"
    with open(output_filename, "w") as f:
        f.write(caption)

    return caption, image_file, output_filename

# Create the Gradio interface
with gr.Blocks(title="Enhanced BLIP Image Captioning") as demo:
    gr.Markdown("# Enhanced Local Image Captioning with BLIP")
    gr.Markdown("Upload an image, choose a style, and generate a caption with preview and download options.")

    with gr.Row():
        with gr.Column(scale=1):
            image_input = gr.Image(type="filepath", label="Upload an Image (JPG, PNG)")
            caption_style_dropdown = gr.Dropdown(
                choices=["Short caption", "Detailed description"],
                value="Short caption",
                label="Caption Style"
            )
            generate_button = gr.Button("Generate Caption")
            download_button = gr.DownloadButton("Download Caption", visible=False)

        with gr.Column(scale=2):
            image_preview = gr.Image(type="filepath", label="Image Preview", visible=False)
            caption_output = gr.Textbox(label="Generated Caption", show_copy_button=True)

    generate_button.click(
        fn=generate_caption,
        inputs=[image_input, caption_style_dropdown],
        outputs=[caption_output, image_preview, download_button]
    )

    gr.Markdown("---")
    gr.Markdown("Caption generated using **BLIP (local AI model).**")

# Launch the Gradio app
demo.launch(share=True)

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://67c0d627a8f75f101a.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [None]:
# Cell 3: Define captioning function (optimized for quality)
import os

def generate_caption(image, caption_style):
    try:
        img = Image.open(image).convert("RGB")

        # Use natural, training-aligned prefixes — NOT commands
        prompts = {
            "Short caption": "",
            "Detailed description": "a photo of"
        }
        prompt = prompts.get(caption_style, "")

        # Process image with optional prompt
        inputs = processor(img, text=prompt, return_tensors="pt")

        # Generate with better parameters for coherence
        outputs = model.generate(
            **inputs,
            max_length=50,
            min_length=10,
            num_beams=5,
            no_repeat_ngram_size=2,
            early_stopping=True
        )
        caption = processor.decode(outputs[0], skip_special_tokens=True)

        # Clean up any accidental repetition or artifacts
        caption = caption.strip().rstrip('.').strip()
        if caption and not caption[-1] in '.!?':
            caption += '.'

        # Get filename
        base_name = os.path.splitext(os.path.basename(image))[0]
        file_name = f"{base_name}_caption.txt"

        # Save caption
        with open(file_name, "w", encoding="utf-8") as f:
            f.write(caption)

        return caption, f"Download caption as {file_name} (check Downloads folder)", file_name

    except Exception as e:
        return f"Error: {str(e)}", "", ""


### Prompt 1.3: Multi-Tool iApp — User-Selected Intelligence

Build a unified Gradio app that lets me upload an image and choose what kind of analysis I want:

  **Input Layer:**
  - Upload image (JPG/PNG).

  - Dropdown: “Analysis Type” with options:

      • “Generate Caption (BLIP)”

      • “Scan Barcode/QR Code (pyzbar)”

      • “Detect Faces & Sentiment (DeepFace)”

      • “Extract Text from Image (OCR)”

    - Button: “Analyze”

  **Processing Layer:**
  - Based on my selection, run only the relevant tool:

      • BLIP → caption

      • pyzbar → decode QR/barcode → return URL or text

      • DeepFace → detect faces, draw green boxes, analyze emotion

      • OCR → extract all readable text from the image (using easyocr or pytesseract)

  - Quietly install transformers, pyzbar, deepface, opencv-python, and easyocr if missing.

  **Output Layer:**
  - Show results in a unified output area:

      • For BLIP: caption text + download button

      • For pyzbar: “Decoded: [URL]” or “No barcode found”

      • For DeepFace: annotated image + table of [Face Label, Sentiment]

      • For OCR: extracted text in a scrollable textbox + download button

  - Always show the original image preview.

This is our first true “iApp” — one interface, multiple intelligences, user in control.

Give me one complete, runnable Gradio app in a single code cell as far as feasible.


In [None]:
import gradio as gr
import os
import cv2
import numpy as np
from PIL import Image
import torch
import warnings

# --- 1. Explicitly check for and quietly install all required packages ---
print("Installing/upgrading required packages...")
# Ensure packages are up-to-date
!pip install -q --upgrade gradio transformers torch pyzbar opencv-python deepface easyocr
print("Installation complete.")

# --- 2. Import libraries after installation ---
from transformers import BlipProcessor, BlipForConditionalGeneration
from pyzbar.pyzbar import decode
from deepface import DeepFace
import easyocr

# Suppress some common warnings from the libraries
warnings.filterwarnings("ignore")

# --- 3. Initialize models and readers globally to avoid re-loading on each call ---
blip_processor = None
blip_model = None
ocr_reader = None

def load_blip_model():
    """Loads the BLIP model on demand."""
    global blip_processor, blip_model
    if blip_processor is None:
        print("Loading BLIP model...")
        blip_processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
        blip_model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
    return blip_processor, blip_model

def load_ocr_reader():
    """Loads the EasyOCR reader on demand."""
    global ocr_reader
    if ocr_reader is None:
        print("Loading EasyOCR reader...")
        ocr_reader = easyocr.Reader(['en'], gpu=torch.cuda.is_available())
    return ocr_reader

# --- 4. Analysis Functions ---

def generate_caption(image_path):
    """Generates a caption using the BLIP model."""
    processor, model = load_blip_model()
    raw_image = Image.open(image_path).convert('RGB')

    inputs = processor(images=raw_image, return_tensors="pt")
    with torch.no_grad():
        out = model.generate(**inputs)

    caption = processor.decode(out[0], skip_special_tokens=True)
    return caption, "caption.txt"

def scan_barcode(image_path):
    """Scans for barcodes and QR codes."""
    try:
        image = cv2.imread(image_path)
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        decoded_objects = decode(gray)
        if decoded_objects:
            result = decoded_objects[0].data.decode("utf-8")
            return f"Decoded: **{result}**", None
        else:
            return "No barcode or QR code found.", None
    except Exception as e:
        return f"Error scanning barcode: {str(e)}", None

def detect_faces_and_sentiment(image_path):
    """
    Detects faces and sentiment using DeepFace.

    FIX: Accessing coordinates by key (x, y, w, h) instead of using .values()
    to prevent 'too many values to unpack' error.
    """
    try:
        image = cv2.imread(image_path)

        # Analyze the image for faces and sentiment
        demographies = DeepFace.analyze(img_path=image_path, actions=['emotion'], enforce_detection=False)

        # Draw green bounding boxes around detected faces
        annotated_image = image.copy()
        for i, demography in enumerate(demographies):
            # *** FIX IMPLEMENTATION HERE ***
            region = demography['region']
            x = region['x']
            y = region['y']
            w = region['w']
            h = region['h']
            # *** END FIX ***

            cv2.rectangle(annotated_image, (x, y), (x + w, y + h), (0, 255, 0), 2)
            cv2.putText(annotated_image, f"Face {i+1}", (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)

        # Create a table of results
        results_html = "<h4>DeepFace Analysis Results:</h4>"
        if not demographies:
            results_html += "<p>No faces were detected in the image.</p>"
        else:
            results_html += "<table>"
            results_html += "<tr><th>Face Label</th><th>Dominant Sentiment</th></tr>"
            for i, demography in enumerate(demographies):
                emotion = demography['dominant_emotion']
                results_html += f"<tr><td>Face {i+1}</td><td>**{emotion.capitalize()}**</td></tr>"
            results_html += "</table>"

        # Save the annotated image to a temporary file for display
        annotated_image_path = "temp_annotated_image.png"
        cv2.imwrite(annotated_image_path, annotated_image)

        return results_html, annotated_image_path, None

    except Exception as e:
        return f"Error detecting faces: {str(e)}", None, None

def extract_text(image_path):
    """Extracts text from an image using EasyOCR."""
    reader = load_ocr_reader()
    try:
        result = reader.readtext(image_path, detail=0)
        extracted_text = "\n".join(result)

        return extracted_text, "extracted_text.txt"
    except Exception as e:
        return f"Error extracting text: {str(e)}", None

# --- 5. Unified Gradio Interface Function ---

def unified_analysis(image_file, analysis_type):
    """Main function that dispatches to the correct tool and formats the output."""
    if image_file is None:
        return ("Please upload an image.", None, None, gr.DownloadButton(visible=False), gr.DownloadButton(visible=False))

    image_path = image_file

    output_message = ""
    annotated_image_path = None
    download_filename = None

    if analysis_type == "Generate Caption (BLIP)":
        caption, download_filename = generate_caption(image_path)
        output_message = f"**Generated Caption:**\n{caption}"
        with open(download_filename, "w") as f:
            f.write(caption)

    elif analysis_type == "Scan Barcode/QR Code (pyzbar)":
        output_message, _ = scan_barcode(image_path)

    elif analysis_type == "Detect Faces & Sentiment (DeepFace)":
        output_message, annotated_image_path, _ = detect_faces_and_sentiment(image_path)

    elif analysis_type == "Extract Text from Image (OCR)":
        ocr_text, download_filename = extract_text(image_path)
        # Use an HTML textarea for a scrollable text box output
        output_message = f"<textarea readonly style='width:100%; height:200px; padding:10px; border:1px solid #ccc;'>{ocr_text}</textarea>"
        if download_filename:
            with open(download_filename, "w", encoding="utf-8") as f:
                f.write(ocr_text)

    # Dynamic output components based on analysis type
    download_cap_btn = gr.DownloadButton("Download Caption (.txt)", visible=(analysis_type == "Generate Caption (BLIP)"), value=download_filename if analysis_type == "Generate Caption (BLIP)" else None)
    download_txt_btn = gr.DownloadButton("Download Text (.txt)", visible=(analysis_type == "Extract Text from Image (OCR)"), value=download_filename if analysis_type == "Extract Text from Image (OCR)" else None)

    return output_message, image_path, annotated_image_path, download_cap_btn, download_txt_btn

# --- 6. Create the Gradio interface ---

def clear_all():
    """Function to reset all inputs and outputs."""
    return (
        None, # image_input
        "Generate Caption (BLIP)", # analysis_type_dropdown
        "Click 'Analyze' to see results.", # output_area
        None, # image_preview_original
        None, # image_preview_annotated
        gr.DownloadButton(visible=False), # download_caption_button
        gr.DownloadButton(visible=False) # download_text_button
    )

with gr.Blocks(title="Multi-Tool iApp") as demo:
    gr.Markdown("# Image Analysis iApp: User-Selected Intelligence")
    gr.Markdown("Upload an image and select the type of analysis to perform. Click **Clear all** to reset the app.")

    with gr.Row():
        with gr.Column(scale=1):
            image_input = gr.Image(type="filepath", label="Upload an Image (JPG, PNG)")
            analysis_type_dropdown = gr.Dropdown(
                choices=["Generate Caption (BLIP)", "Scan Barcode/QR Code (pyzbar)", "Detect Faces & Sentiment (DeepFace)", "Extract Text from Image (OCR)"],
                value="Generate Caption (BLIP)",
                label="Analysis Type"
            )
            with gr.Row():
                analyze_button = gr.Button("Analyze", variant="primary")
                clear_button = gr.Button("Clear all", variant="secondary")

        with gr.Column(scale=2):
            image_preview_original = gr.Image(label="Original Image Preview", visible=False)
            image_preview_annotated = gr.Image(label="Annotated Image (DeepFace Output)", visible=False)

            output_area = gr.HTML(label="Analysis Results", value="Click 'Analyze' to see results.")

            with gr.Row():
                download_caption_button = gr.DownloadButton("Download Caption (.txt)", visible=False)
                download_text_button = gr.DownloadButton("Download Text (.txt)", visible=False)

    # Define the click event handler for Analyze
    analyze_button.click(
        fn=unified_analysis,
        inputs=[image_input, analysis_type_dropdown],
        outputs=[output_area, image_preview_original, image_preview_annotated, download_caption_button, download_text_button]
    )

    # Define the click event handler for Clear all
    clear_button.click(
        fn=clear_all,
        inputs=[],
        outputs=[
            image_input,
            analysis_type_dropdown,
            output_area,
            image_preview_original,
            image_preview_annotated,
            download_caption_button,
            download_text_button
        ]
    )

    gr.Markdown("---")
    gr.Markdown("This iApp is powered by: **BLIP**, **pyzbar**, **DeepFace**, and **EasyOCR**.")

# Launch the Gradio app
demo.launch(share=True)

Installing/upgrading required packages...
Installation complete.
Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://66b3d9a65963793543.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




### Prompt 1.4: AI-Directed Intelligence

Build a Gradio app that removes the user’s tool-selection dropdown.

Instead, it uses a Gemini API key from Gemini AI studio for the low-cost 'Gemini 2.5 Flash' to automatically decide what to do with the uploaded image

The app must ask user for and accept their Gemini API key


**Input Layer:**

  - Upload image (JPG/PNG)

  - Text input field labeled “Google AI API Key” (with placeholder: “Paste your key from Google AI Studio”)

  - Single button: “Analyze with AI”

**Processing Layer:**

  - When the user clicks “Analyze with AI”, the app first checks if an API key was entered.

  - If no key is provided, it shows: “Please enter your Google AI API key.”

  - If a key is provided, it configures the Gemini API with that key and sends the image with this instruction:

    "Analyze this image. If it contains a QR/barcode, decode it.
      If it has human faces, describe their emotions.
      If it’s a document, extract the text.
      Otherwise, generate a detailed caption."

  - Check for and quietly install google-generativeai if missing.

**Output Layer:**

  - Show the full AI response in a scrollable text box.

  - Display the original image preview.

  - Add a note: "AI automatically chose the best analysis method for your image"

  - Never store or log the API key — use it only for the current request.


This is the frontier: AI as autonomous agent — no menus, no routing, just intelligence — powered by your own API key.

Give me one complete, runnable Gradio app in a single code cell as far as feasible.

In [None]:
import gradio as gr
import os
from PIL import Image
import warnings

# --- 1. Explicitly check for and quietly install required packages ---
print("Installing/upgrading required packages...")
# Install google-generativeai, Gradio, and Pillow
!pip install -q --upgrade google-generativeai gradio Pillow
print("Installation complete.")

# --- 2. Import Gemini library after installation ---
try:
    from google import genai
    from google.genai.errors import APIError
except ImportError:
    print("Error: The 'google-generativeai' package could not be imported. Please ensure the installation command ran successfully.")

# Suppress warnings
warnings.filterwarnings("ignore")

# --- 3. Core Gemini Analysis Function ---

def ai_directed_analysis(api_key, image_file):
    """
    Sends the image and a routing prompt to the Gemini API for autonomous analysis
    using the cost-efficient gemini-2.5-flash model.
    """

    # 3.1. Input Validation
    if not api_key:
        return "❌ Please enter your Google AI API Key.", None
    if image_file is None:
        return "⚠️ Please upload an image file.", None

    try:
        # 3.2. Configure the Gemini Client (Key is used dynamically)
        client = genai.Client(api_key=api_key)

        # 3.3. Prepare the Image
        img = Image.open(image_file)

        # 3.4. Define the AI Agent Prompt
        prompt = (
            "Analyze this image carefully. Your goal is to determine the most relevant analysis "
            "based on the image content and provide a detailed output:\n\n"
            "1. **If the image contains a QR code or barcode:** Decode the content and present it clearly as 'DECODED CODE: [Content]'.\n"
            "2. **If the image is a document (invoice, letter, sign, etc.):** Extract all readable text and present it as 'EXTRACTED TEXT:\n[Full Text]'.\n"
            "3. **If the image primarily features human faces:** Describe the dominant emotion and give a brief context for each face found.\n"
            "4. **Otherwise (scenery, objects, abstract art, etc.):** Generate a detailed, natural-language caption describing the entire scene.\n\n"
            "Begin your response immediately with the result, clearly indicating which of the four analysis types you performed."
        )

        # 3.5. Execute the Vision API Call
        # *** FIX: Changed model to the cost-efficient 'gemini-2.5-flash' ***
        model_response = client.models.generate_content(
            model='gemini-2.5-flash',
            contents=[prompt, img]
        )

        # 3.6. Format the Output
        ai_response = model_response.text
        full_output = f"✅ Gemini Analysis Complete (Model: gemini-2.5-flash):\n\n---\n\n{ai_response}"

        return full_output, image_file

    except APIError as e:
        # Catch API-specific errors (invalid key, quota exceeded, etc.)
        return f"❌ API Error: Check your key, model name, or quota. Details: {e}", None
    except Exception as e:
        return f"❌ An unexpected error occurred: {e}", None

# --- 4. Create the Gradio Interface ---

with gr.Blocks(title="AI-Directed Intelligence with Gemini") as demo:
    gr.Markdown("# AI-Directed Intelligence: Gemini Chooses the Tool 🧠")
    gr.Markdown("Upload an image and provide your Gemini API key. The AI (using the cost-efficient **Gemini 2.5 Flash** model) will autonomously decide the best analysis method.")

    with gr.Row():
        with gr.Column(scale=1):
            # Input Layer
            api_key_input = gr.Textbox(
                label="Google AI API Key",
                type="password", # Hides the key for security
                placeholder="Paste your key from Google AI Studio",
                info="Your key is used only for the current request and is not stored."
            )
            image_input = gr.Image(type="filepath", label="Upload an Image (JPG, PNG)")

            analyze_button = gr.Button("Analyze with AI", variant="primary")

        with gr.Column(scale=2):
            # Output Layer
            image_preview = gr.Image(label="Original Image Preview", visible=False)

            output_text_box = gr.Textbox(
                label="Gemini AI Response (Scrollable)",
                lines=15,
                max_lines=25,
                show_copy_button=True
            )

            gr.Markdown(
                "--- \n"
                "**Note:** AI automatically chose the best analysis method for your image "
                "using the **gemini-2.5-flash** model."
            )

    # 5. Define the click event handler
    analyze_button.click(
        fn=ai_directed_analysis,
        inputs=[api_key_input, image_input],
        outputs=[output_text_box, image_preview]
    )

# Launch the Gradio app
demo.launch(share=True)

Installing/upgrading required packages...
Installation complete.
Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://c6b55ad9d42d7f7caa.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


