# 🌸 Plant Identification with Multimodal AI

Welcome to Exercise #5 in the Generative AI course! In this notebook, you'll learn how to use multimodal AI to identify plants from images and create annotated visualizations.

---

## 📚 What You'll Learn

By the end of this tutorial, you'll be able to:

1. **Use multimodal AI** - Work with models that understand both text and images
2. **Identify plants from photos** - Leverage vision models for species recognition
3. **Annotate images programmatically** - Add text overlays to create informative visuals
4. **Build practical AI applications** - Combine multiple techniques into a complete workflow

---

## 💼 Business Value: Why Multimodal AI Matters

Multimodal AI has transformative applications across industries:

- **🌾 Agriculture**: Automated crop disease detection and species monitoring
- **🌿 Biodiversity Research**: Large-scale plant surveys and conservation tracking
- **🏪 Retail**: Plant identification apps for garden centers and nurseries
- **📱 Consumer Apps**: Educational tools for hikers, gardeners, and nature enthusiasts
- **🔬 Scientific Research**: Accelerating botanical research and documentation

Traditional computer vision required thousands of labeled images and custom model training. Modern multimodal AI can identify plants with just a simple API call!

---

Let's get started! 🚀

---

# 🧠 Theory: Understanding Multimodal AI

## What is Multimodal AI?

**Multimodal AI** refers to artificial intelligence systems that can process and understand multiple types of data simultaneously - such as text, images, audio, and video. Unlike traditional AI models that work with only one type of input, multimodal models can "see" images and "read" text, enabling them to understand context in ways that mirror human perception.

Vision-capable language models like GPT-4o and GPT-5-nano combine two powerful capabilities:
1. **Computer Vision**: Understanding what's in an image (objects, scenes, text, colors, spatial relationships)
2. **Natural Language Processing**: Describing findings in human language and following text instructions

This combination is revolutionary because it allows us to ask questions about images in plain English and receive detailed, contextual answers.

## How Do Vision Models Work?

At a high level, vision-language models work through these stages:

1. **Image Encoding**: The image is processed through neural networks that identify visual features (edges, shapes, textures, objects)
2. **Feature Extraction**: Important visual elements are converted into numerical representations that capture their meaning
3. **Multimodal Fusion**: Visual features are combined with text input (your prompt) in a shared representation space
4. **Language Generation**: The model generates a text response based on both the visual and textual understanding

Modern vision models are trained on millions of image-text pairs, learning associations between visual concepts and language descriptions. This training enables them to recognize objects, scenes, and even abstract concepts they've never explicitly been taught about.

## Real-World Use Cases Beyond Plant Identification

- **🏥 Healthcare**: Medical image analysis, X-ray interpretation, skin condition diagnosis
- **🚗 Automotive**: Autonomous vehicle perception, road sign recognition, obstacle detection
- **🏪 Retail**: Visual search, product recognition, automated inventory management
- **♿ Accessibility**: Describing images for visually impaired users, reading text in photos
- **🔍 Content Moderation**: Detecting inappropriate images, verifying content authenticity
- **🏗️ Manufacturing**: Quality control, defect detection, assembly verification

---

### 🎯 Key Takeaways

- ✅ **Multimodal AI processes multiple data types** (text + images) simultaneously
- ✅ **Vision models understand images** through neural networks trained on millions of examples
- ✅ **Natural language prompts control behavior** - you can ask questions about images in plain English
- ✅ **No custom training required** - pre-trained models work out-of-the-box for many tasks
- ✅ **Applications span industries** from healthcare to agriculture to accessibility

---

### 💡 Key Point: Prompt Engineering for Vision Tasks

When working with vision models, **prompt precision matters**. Vague prompts like "What is this?" may yield verbose or unfocused responses. Specific prompts like "Identify the type of flower in this image. Provide only the common name." produce concise, actionable results. We'll explore this principle hands-on in the identification section.

---

# 🔧 Setup

Before we can identify plants, we need to:
1. Configure our OpenAI API key for authentication
2. Install required Python packages for image processing and API communication
3. Import all necessary libraries

Let's start with API configuration!

## 🔑 API Key Configuration

You have two methods to provide your API key:

**Method 1 (Recommended)**: Use Colab Secrets
1. Click the 🔑 icon in the left sidebar
2. Click "Add new secret"
3. Name: `OPENAI_API_KEY`
4. Value: Your OpenAI API key
5. Enable notebook access

**Method 2 (Fallback)**: Manual input when prompted

Run the cell below to configure authentication:

In [None]:
import os

# Configure OpenAI API key
# Method 1: Try to get API key from Colab secrets (recommended)
try:
    from google.colab import userdata
    OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
    print("✅ API key loaded from Colab secrets")
except:
    # Method 2: Manual input (fallback)
    from getpass import getpass
    print("💡 To use Colab secrets: Go to 🔑 (left sidebar) → Add new secret → Name: OPENAI_API_KEY")
    OPENAI_API_KEY = getpass("Enter your OpenAI API Key: ")

# Set the API key as an environment variable
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

# Validate that the API key is set
if not OPENAI_API_KEY or OPENAI_API_KEY.strip() == "":
    raise ValueError("❌ ERROR: No API key provided!")

print("✅ Authentication configured!")

# Configure which OpenAI model to use
OPENAI_MODEL = "gpt-5-nano"  # Using gpt-5-nano for cost efficiency
print(f"🤖 Selected Model: {OPENAI_MODEL}")

## 📦 Install Dependencies

We need several Python packages:
- **openai**: Official OpenAI Python client for API access
- **opencv-python**: Image processing library for loading, manipulating, and annotating images
- **matplotlib**: Visualization library for displaying images in the notebook
- **numpy**: Numerical computing library for array operations
- **requests**: HTTP library for downloading images from URLs
- **pillow**: Additional image processing utilities

In [None]:
# Install required packages
!pip install -q openai opencv-python matplotlib requests numpy pillow

# Suppress deprecation warnings for cleaner output
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

print("✅ All dependencies installed!")

## 📚 Import Required Libraries

Now let's import all the libraries we'll use throughout the notebook:

In [None]:
# OpenAI API client for vision and language models
from openai import OpenAI

# Image processing with OpenCV
import cv2

# Visualization with Matplotlib
import matplotlib.pyplot as plt

# Numerical operations with NumPy
import numpy as np

# HTTP requests for downloading images
import requests

# System and OS utilities
import os

# Initialize the OpenAI client
client = OpenAI(api_key=OPENAI_API_KEY)

print("✅ All libraries imported successfully!")
print("🎉 Setup complete - ready to identify plants!")

---

# 📸 Image Loading and Display

Before we can identify plants, we need to load our flower images and display them. We'll work with two flower images hosted online.

In this section, we'll:
1. Define the URLs for our flower images
2. Create a reusable function to download and display images
3. View both flowers in their original form

Let's start by defining our image sources!

## 🔗 Define Image URLs

We'll use two flower images from Wikimedia Commons:

In [None]:
# URLs for the flower images we'll identify
flower_url_1 = "https://upload.wikimedia.org/wikipedia/commons/6/6f/Path_krupina.jpg"
flower_url_2 = "https://upload.wikimedia.org/wikipedia/commons/2/26/Path_krupina_2.jpg"

print("✅ Image URLs configured:")
print(f"   Flower 1: {flower_url_1}")
print(f"   Flower 2: {flower_url_2}")

## 🖼️ Create Image Display Function

Let's create a reusable function that downloads an image from a URL and displays it in the notebook. This function will handle:
- Downloading the image using proper HTTP headers
- Decoding the image data with OpenCV
- Converting from BGR (OpenCV default) to RGB (display format)
- Rendering the image with Matplotlib
- Error handling for network or image issues

In [None]:
def display_image_from_url(url):
    """
    Download and display an image from a URL.
    
    Args:
        url (str): The URL of the image to display
    
    Returns:
        numpy.ndarray: The image array in RGB format, or None if error occurs
    """
    try:
        # Download the image from URL with proper headers
        # User-Agent header helps avoid blocking from some servers
        headers = {'User-Agent': 'Mozilla/5.0'}
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()  # Raise exception for HTTP errors
        
        # Convert the response content to a numpy array
        image_array = np.asarray(bytearray(response.content), dtype=np.uint8)
        
        # Decode the image array into OpenCV format (BGR color space)
        image = cv2.imdecode(image_array, cv2.IMREAD_COLOR)
        
        if image is None:
            print(f"❌ Failed to decode image from {url}")
            return None
        
        # Convert from BGR (OpenCV default) to RGB (standard display format)
        # OpenCV reads images in BGR, but matplotlib expects RGB
        image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        
        # Display the image using matplotlib
        plt.figure(figsize=(10, 8))
        plt.imshow(image_rgb)
        plt.axis('off')  # Hide axis labels for cleaner display
        plt.tight_layout()
        plt.show()
        
        return image_rgb
        
    except requests.exceptions.RequestException as e:
        print(f"❌ Network error downloading image: {e}")
        return None
    except Exception as e:
        print(f"❌ Error processing image: {e}")
        return None

print("✅ Image display function created!")

## 🌸 Display Flower Image 1

Let's load and view our first flower image:

In [None]:
print("📸 Flower Image 1:")
print(f"   Loading from: {flower_url_1}")
print()

image_1 = display_image_from_url(flower_url_1)

if image_1 is not None:
    print(f"✅ Image loaded successfully! Dimensions: {image_1.shape[1]}x{image_1.shape[0]} pixels")

## 🌸 Display Flower Image 2

Now let's view our second flower image:

In [None]:
print("📸 Flower Image 2:")
print(f"   Loading from: {flower_url_2}")
print()

image_2 = display_image_from_url(flower_url_2)

if image_2 is not None:
    print(f"✅ Image loaded successfully! Dimensions: {image_2.shape[1]}x{image_2.shape[0]} pixels")

---

# 🔍 Plant Identification with Vision AI

Now comes the exciting part - using multimodal AI to identify the flowers! We'll send each image to the gpt-5-nano vision model along with a text prompt asking for identification.

## How Vision API Calls Work

When working with vision-capable models, we structure our API request with:
1. **Model specification**: `gpt-5-nano` (cost-efficient vision model)
2. **Multimodal input**: A combination of text prompt and image URL
3. **Text configuration**: Optional verbosity settings to control response length

The model analyzes the image and generates a text response based on what it "sees" and our instructions.

---

### 💡 Key Point: Prompt Precision

**Why specific prompts matter**: Vision models can provide detailed descriptions, scientific names, growing conditions, and more. If we ask "What is this?", we might get a paragraph of information. For our use case (annotating images), we need just the common name.

**Our strategy**: We use the prompt "Please identify the type of flower in this image. Only provide a common name of the flower, no other words." This instructs the model to give us a concise, single-piece of information perfect for image annotation.

Compare these prompts:
- ❌ Vague: "What is this?" → Response: "This appears to be a flowering plant, likely from the..."
- ✅ Specific: "Only provide a common name of the flower, no other words." → Response: "Daisy"

---

### 🎯 Key Takeaways

- ✅ **Vision models understand images** and can identify objects, plants, and scenes
- ✅ **Prompt engineering controls output format** - be specific about what you want
- ✅ **Input combines text and images** using a structured format
- ✅ **Response extraction is simple** - use `response.output_text` directly
- ✅ **Error handling is essential** - network issues and API errors can occur

## 🤖 Create Identification Function

Let's create a function that takes an image URL and returns the identified flower name:

In [None]:
def identify_flower(url_to_flower):
    """
    Identify a flower in an image using OpenAI's vision model.
    
    Args:
        url_to_flower (str): URL of the flower image to identify
    
    Returns:
        str: The common name of the identified flower, or None if error occurs
    """
    try:
        # Create a multimodal API request with both text and image
        # The input parameter accepts a structured format for combining modalities
        response = client.responses.create(
            model=OPENAI_MODEL,
            input=[
                {
                    "role": "user",
                    "content": [
                        {
                            # Text prompt - gives instructions to the model
                            "type": "input_text",
                            "text": "Please identify the type of flower in this image. Only provide a common name of the flower, no other words."
                        },
                        {
                            # Image input - the flower photo to analyze
                            "type": "input_image",
                            "image_url": url_to_flower
                        }
                    ]
                }
            ],
            text={"verbosity": "low"}  # Low verbosity for concise responses
        )
        
        # Extract the identification result from the response
        # No complex parsing needed - just access the output_text attribute
        flower_name = response.output_text.strip()
        
        return flower_name
        
    except Exception as e:
        # Handle any errors (API errors, network issues, etc.)
        print(f"❌ Error identifying flower: {e}")
        return None

print("✅ Flower identification function created!")

## 🌸 Identify Flower 1

Let's identify the first flower using our vision model:

In [None]:
print("🔍 Identifying Flower 1...")
print(f"   Sending to {OPENAI_MODEL} vision model...")
print()

flower_name_1 = identify_flower(flower_url_1)

if flower_name_1:
    print(f"🌸 Identified: {flower_name_1}")
else:
    print("❌ Failed to identify flower")

## 🌸 Identify Flower 2

Now let's identify the second flower:

In [None]:
print("🔍 Identifying Flower 2...")
print(f"   Sending to {OPENAI_MODEL} vision model...")
print()

flower_name_2 = identify_flower(flower_url_2)

if flower_name_2:
    print(f"🌸 Identified: {flower_name_2}")
else:
    print("❌ Failed to identify flower")

---

# ✏️ Image Annotation

Now that we've identified both flowers, let's create beautiful annotated images by overlaying the plant names on the original photos. This makes the results visually clear and shareable.

## How Image Annotation Works

To add text to an image, we need to:
1. **Calculate appropriate font size** based on image dimensions (larger images need larger text)
2. **Create a background rectangle** for the text to ensure readability over any color
3. **Position the text** at a visually appealing location (typically bottom of the image)
4. **Render the text** with high contrast (white text on black background)

OpenCV provides all the tools we need for this image manipulation.

Let's create a function to handle annotation!

## 🎨 Create Annotation Function

This function downloads an image, adds a text overlay with background, and displays the result:

In [None]:
def display_image_with_text(url, text):
    """
    Download an image, add text annotation with background, and display it.
    
    Args:
        url (str): URL of the image to annotate
        text (str): Text to overlay on the image
    
    Returns:
        numpy.ndarray: The annotated image in RGB format, or None if error occurs
    """
    try:
        # Download the image from URL
        headers = {'User-Agent': 'Mozilla/5.0'}
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()
        
        # Decode the image into OpenCV format
        image_array = np.asarray(bytearray(response.content), dtype=np.uint8)
        image = cv2.imdecode(image_array, cv2.IMREAD_COLOR)
        
        if image is None:
            print(f"❌ Failed to decode image from {url}")
            return None
        
        # Get image dimensions
        height, width = image.shape[:2]
        
        # Calculate font size based on image dimensions
        # Larger images need proportionally larger text for readability
        font_scale = min(width, height) / 500  # Scale factor based on smaller dimension
        font_scale = max(font_scale, 0.5)  # Minimum font size
        
        # Set font properties
        font = cv2.FONT_HERSHEY_SIMPLEX  # Clean, readable font
        thickness = max(int(font_scale * 2), 1)  # Thickness scales with font size
        
        # Calculate text size to determine background rectangle dimensions
        (text_width, text_height), baseline = cv2.getTextSize(
            text, font, font_scale, thickness
        )
        
        # Calculate text position (centered horizontally, near bottom)
        text_x = (width - text_width) // 2  # Center horizontally
        text_y = height - 40  # 40 pixels from bottom
        
        # Define background rectangle coordinates with padding
        padding = 15  # Pixels of padding around text
        rect_x1 = text_x - padding
        rect_y1 = text_y - text_height - padding
        rect_x2 = text_x + text_width + padding
        rect_y2 = text_y + baseline + padding
        
        # Draw semi-transparent black background rectangle
        # This ensures text is readable regardless of background colors
        overlay = image.copy()
        cv2.rectangle(overlay, (rect_x1, rect_y1), (rect_x2, rect_y2), 
                     (0, 0, 0), -1)  # Black filled rectangle
        
        # Blend the rectangle with the original image (0.7 opacity)
        alpha = 0.7  # Transparency factor
        cv2.addWeighted(overlay, alpha, image, 1 - alpha, 0, image)
        
        # Draw white text on top of the black background
        cv2.putText(image, text, (text_x, text_y), font, font_scale,
                   (255, 255, 255), thickness, cv2.LINE_AA)  # White text, anti-aliased
        
        # Convert from BGR to RGB for display
        image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        
        # Display the annotated image
        plt.figure(figsize=(10, 8))
        plt.imshow(image_rgb)
        plt.axis('off')
        plt.tight_layout()
        plt.show()
        
        return image_rgb
        
    except requests.exceptions.RequestException as e:
        print(f"❌ Network error downloading image: {e}")
        return None
    except Exception as e:
        print(f"❌ Error annotating image: {e}")
        return None

print("✅ Image annotation function created!")

## 🖼️ Display Annotated Flower 1

Let's create and display the first annotated image:

In [None]:
print("📸 Final Result - Annotated Image 1:")
print(f"   Adding text: '{flower_name_1}'")
print()

if flower_name_1:
    annotated_image_1 = display_image_with_text(flower_url_1, flower_name_1)
    if annotated_image_1 is not None:
        print("\n✅ Annotated image created successfully!")
else:
    print("⚠️ Skipping annotation - no flower name available")

## 🖼️ Display Annotated Flower 2

Now let's create and display the second annotated image:

In [None]:
print("📸 Final Result - Annotated Image 2:")
print(f"   Adding text: '{flower_name_2}'")
print()

if flower_name_2:
    annotated_image_2 = display_image_with_text(flower_url_2, flower_name_2)
    if annotated_image_2 is not None:
        print("\n✅ Annotated image created successfully!")
else:
    print("⚠️ Skipping annotation - no flower name available")

---

# ✅ Validation and Results

## What You Should See

If everything worked correctly, you should now see:

1. **Two original flower images** displayed in the "Image Loading" section
2. **Two identification results** printed in the "Plant Identification" section
3. **Two annotated images** with the flower names overlaid on black backgrounds

## Understanding the Results

**Model behavior variation**: Vision models may provide slightly different names across runs:
- Example variations: "Daisy" vs "Common Daisy" vs "White Daisy"
- All refer to the same plant - models balance scientific accuracy with common usage
- This is normal behavior, not an error

**Verifying identification accuracy**:
- Compare the model's identification with your own observation
- Search the plant name online to see if images match
- For scientific applications, you may want to request scientific names in your prompt
- Consider adding confidence scores to your application (requires prompt modification)

## Troubleshooting Common Issues

If you encounter problems:

- **Images not loading**: Check your internet connection and verify URLs are accessible
- **API errors**: Verify your API key is valid and has sufficient credits
- **Incorrect identification**: Try adjusting the prompt for more specific instructions
- **Text not visible**: Ensure the annotation function completed without errors

## Summary

You've successfully built a complete multimodal AI application that:
- ✅ Downloads images from URLs
- ✅ Identifies plant species using vision AI
- ✅ Annotates images with identification results
- ✅ Handles errors gracefully throughout the pipeline

This workflow can be adapted for countless other visual recognition tasks!

---

# 💡 Best Practices for Multimodal AI

Based on what we've learned, here are key best practices for working with vision models:

## 🎯 Key Takeaways

### Prompt Engineering
- ✅ **Be specific about output format** - Request exactly what you need ("only common name" vs. "detailed description")
- ✅ **Use clear, direct language** - Avoid ambiguous phrasing like "tell me about this"
- ✅ **Specify verbosity** - Use `text={"verbosity": "low"}` for concise answers, "high" for detailed analysis
- ✅ **Test and iterate** - Experiment with different prompts to find what works best for your use case

### Model Selection
- ✅ **Choose cost-effective models** - Use `gpt-5-nano` for routine tasks; it's significantly cheaper than `gpt-4o`
- ✅ **Consider your use case** - Vision models work for text-only questions too, but text-only models are faster and cheaper
- ✅ **Understand model capabilities** - Vision models excel at visual tasks but aren't magical; they can make mistakes

### Image Quality and Access
- ✅ **Use high-quality images** - Clear, well-lit photos produce better identification results
- ✅ **Ensure URL accessibility** - Images must be publicly accessible; private URLs won't work
- ✅ **Consider image size** - Very large images may be slower to process; optimize when possible
- ✅ **Test image loading separately** - Verify images download correctly before making API calls

### Error Handling
- ✅ **Always use try-except blocks** - Network issues, API errors, and image problems will occur
- ✅ **Provide helpful error messages** - Tell users what went wrong and how to fix it
- ✅ **Validate inputs** - Check that API keys, URLs, and responses are valid before proceeding
- ✅ **Handle API response variations** - Models may return slightly different formats; be flexible

### Cost Management
- ✅ **Monitor API usage** - Track costs, especially for production applications
- ✅ **Use appropriate verbosity** - Don't request detailed responses when brief ones suffice
- ✅ **Cache results when possible** - Store identification results to avoid re-analyzing the same images
- ✅ **Consider batch processing** - Process multiple images in sequence rather than making redundant calls

---

## ⚠️ Common Mistakes to Avoid

### 1. Not Handling Failed Image Downloads
**Problem**: Images from URLs can fail to download due to network issues, dead links, or access restrictions.

**Solution**: Always wrap image operations in try-except blocks and return None on failure.

### 2. Vague Prompts Leading to Inconsistent Results
**Problem**: Asking "What is this?" may return paragraphs of text when you only need a name.

**Solution**: Be explicit: "Identify the flower type. Provide only the common name, nothing else."

### 3. Not Validating API Responses
**Problem**: API calls can fail, return errors, or provide unexpected formats.

**Solution**: Check that responses exist and contain expected data before using them.

### 4. Ignoring Image Format Conversions
**Problem**: OpenCV uses BGR color space, but matplotlib expects RGB, causing color distortion.

**Solution**: Always convert: `image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)`

### 5. Hardcoding Image Sizes and Positions
**Problem**: Text annotations sized for one image won't scale properly to images of different dimensions.

**Solution**: Calculate font sizes and positions proportionally based on image dimensions.

---

## 🔐 Security Considerations

- **Never hardcode API keys** in production code - use environment variables or secrets management
- **Validate URLs before downloading** - protect against malicious or inappropriate content
- **Be mindful of image content** - use moderation APIs if processing user-submitted images
- **Respect rate limits** - implement backoff strategies to avoid account suspension

---

# 🚀 Next Steps and Extensions

Congratulations! You've built a complete plant identification system. Here's how you can extend what you've learned:

## 🔬 Practice Exercise Ideas

### 1. Try Different Images
Replace the URLs with your own flower images or other plants:
- Garden plants
- Houseplants
- Trees and shrubs
- Vegetables and herbs

### 2. Batch Processing
Modify the code to process a list of image URLs:
```python
image_urls = [url1, url2, url3, ...]
for url in image_urls:
    identify_and_annotate(url)
```

### 3. Add More Information
Modify the prompt to request additional details:
- Scientific name
- Plant family
- Growing conditions
- Native region

Then parse the response and create multi-line annotations.

### 4. Save Annotated Images
Add code to save the annotated images to files:
```python
cv2.imwrite('annotated_flower.jpg', annotated_image)
```

### 5. Build a User Interface
Create an interactive interface where users can:
- Upload their own images
- Select what information to display
- Download annotated results

### 6. Compare Multiple Models
Try different OpenAI vision models and compare results:
- `gpt-5-nano` (cheapest, fast)
- `gpt-4o` (most capable, expensive)

---

## 📚 What's Next in the Course?

In upcoming exercises, you'll learn:
- Working with video content and audio transcription
- Extracting structured data from unstructured text
- Advanced prompt engineering techniques
- Building complete AI-powered applications

---

## 🌟 Real-World Application Ideas

Consider how you might apply multimodal AI in your domain:

- **Education**: Interactive learning apps for botany students
- **Agriculture**: Crop disease identification from phone photos
- **Retail**: Customer service bots that identify products from images
- **Healthcare**: Medical image analysis assistants
- **Manufacturing**: Quality control and defect detection
- **Accessibility**: Image description services for visually impaired users

---

## 📖 Additional Resources

To deepen your understanding:

- [OpenAI Vision API Documentation](https://platform.openai.com/docs/guides/vision)
- [OpenCV Python Tutorials](https://docs.opencv.org/4.x/d6/d00/tutorial_py_root.html)
- [Prompt Engineering Guide](https://platform.openai.com/docs/guides/prompt-engineering)
- [Best Practices for Production](https://platform.openai.com/docs/guides/production-best-practices)

---

## 🎉 Congratulations!

You've successfully completed Exercise #5 and learned how to:
- ✅ Work with multimodal AI models
- ✅ Process and display images programmatically
- ✅ Make vision API calls with proper structure
- ✅ Annotate images with dynamic text overlays
- ✅ Handle errors and edge cases gracefully
- ✅ Apply prompt engineering to vision tasks

Keep experimenting, and see you in the next exercise! 🚀