# Project - Airline AI Assistant with Gemini (Multi-Modal)

We'll bring together everything we've learned to make a multi-modal AI Customer Support assistant for an Airline using Google's Gemini.

This version includes:
- **Conversational chat** with history
- **Function calling** (tools) to get ticket prices
- **Image generation** using Imagen (Google's image model)
- **Text-to-Speech** (optional, using Google's TTS or alternative)

## What Makes This "Agentic AI"?

1. **Multiple specialized capabilities** (chat, tools, image generation)
2. **Tool use** to access external information
3. **Multi-modal outputs** (text, images, audio)
4. **Memory** through conversation history
5. **Autonomous decision-making** about when to use tools

## Installation

Run this cell once to install required packages:

In [None]:
# Installation cell - run once, then restart kernel
!pip install -q -U google-genai gradio python-dotenv pillow

In [None]:
# imports

import os
import json
import base64
from io import BytesIO
from dotenv import load_dotenv
from google import genai
from PIL import Image
import gradio as gr

In [None]:
# Initialization

load_dotenv(override=True)

google_api_key = os.getenv('GOOGLE_API_KEY')
if google_api_key:
    print(f"✓ Google API Key exists and begins {google_api_key[:8]}")
    client = genai.Client(api_key=google_api_key)
else:
    print("✗ Google API Key not set")
    
MODEL = "gemini-2.0-flash-exp"  # Latest Gemini model

In [None]:
system_message = "You are a helpful assistant for an Airline called FlightAI. "
system_message += "Give short, courteous answers, no more than 1 sentence. "
system_message += "Always be accurate. If you don't know the answer, say so."

## Simple Chat (No Tools Yet)

Let's start with basic conversation:

In [None]:
def chat_simple(message, history):
    """Simple chat without tools"""
    gemini_history = []
    for msg in history:
        role = "model" if msg["role"] == "assistant" else "user"
        gemini_history.append({
            "role": role,
            "parts": [{"text": msg["content"]}]
        })
    
    gemini_history.append({
        "role": "user",
        "parts": [{"text": message}]
    })
    
    response = client.models.generate_content(
        model=MODEL,
        contents=gemini_history,
        config={"system_instruction": system_message}
    )
    return response.text

gr.ChatInterface(fn=chat_simple, type="messages").launch()

## Adding Tools - Flight Pricing

Now let's give our assistant the ability to look up ticket prices!

In [None]:
# Let's create a useful function

ticket_prices = {"london": "$799", "paris": "$899", "tokyo": "$1400", "berlin": "$499"}

def get_ticket_price(destination_city: str) -> str:
    """
    Get the price of a return ticket to the destination city.
    Call this whenever you need to know the ticket price.
    
    Args:
        destination_city: The city that the customer wants to travel to
    
    Returns:
        The price of the ticket as a string
    """
    print(f"🔧 Tool called: get_ticket_price('{destination_city}')")
    city = destination_city.lower()
    return ticket_prices.get(city, "Unknown")

In [None]:
# Test it
get_ticket_price("London")

## Gemini Function Calling

Gemini's function calling is elegant - just pass the Python function and Gemini figures out when to call it!

In [None]:
def chat_with_tools(message, history):
    """Chat with tool/function calling capability"""
    # Convert history
    gemini_history = []
    for msg in history:
        role = "model" if msg["role"] == "assistant" else "user"
        gemini_history.append({
            "role": role,
            "parts": [{"text": msg["content"]}]
        })
    
    gemini_history.append({
        "role": "user",
        "parts": [{"text": message}]
    })
    
    # First request with tool declaration
    response = client.models.generate_content(
        model=MODEL,
        contents=gemini_history,
        config={
            "system_instruction": system_message,
            "tools": [get_ticket_price]  # Pass the function directly!
        }
    )
    
    # Check if Gemini wants to call a function
    if response.candidates[0].content.parts[0].function_call:
        function_call = response.candidates[0].content.parts[0].function_call
        
        # Extract function name and arguments
        function_name = function_call.name
        function_args = dict(function_call.args)
        
        print(f"\n🤖 Gemini wants to call: {function_name}({function_args})")
        
        # Execute the function
        if function_name == "get_ticket_price":
            result = get_ticket_price(**function_args)
            
            # Add function call and result to history
            gemini_history.append({
                "role": "model",
                "parts": [{"function_call": function_call}]
            })
            gemini_history.append({
                "role": "user",
                "parts": [{
                    "function_response": {
                        "name": function_name,
                        "response": {"price": result}
                    }
                }]
            })
            
            # Get final response
            final_response = client.models.generate_content(
                model=MODEL,
                contents=gemini_history,
                config={
                    "system_instruction": system_message,
                    "tools": [get_ticket_price]
                }
            )
            return final_response.text
    
    return response.text

gr.ChatInterface(fn=chat_with_tools, type="messages").launch()

## Multi-Modal: Destination Preview Images

**Important Note**: Unlike OpenAI's DALL-E, Google Gemini doesn't have a direct image generation API in the `google-genai` SDK. Imagen (Google's image model) requires Vertex AI setup.

Instead, we'll create **AI-powered destination preview cards** that:
1. Use Gemini to generate vivid destination descriptions
2. Create beautiful visual cards with those descriptions
3. Demonstrate multi-modal thinking without additional APIs

This approach works immediately and can be easily swapped for real image generation later!

In [None]:
def artist(city: str) -> Image.Image:
    """
    Generate a destination preview card with AI-generated description.
    Uses Gemini to create vivid descriptions, then creates an attractive visual card.
    
    Args:
        city: The destination city
    
    Returns:
        PIL Image object with AI-generated description
    """
    print(f"🎨 Creating destination preview for {city}...")
    
    # Use Gemini to generate a vivid description
    try:
        description_prompt = f"In 1-2 vivid sentences, describe the most iconic and beautiful aspects of {city} that would make someone want to visit. Be specific, evocative, and exciting. Mention landmarks, culture, or atmosphere."
        
        response = client.models.generate_content(
            model=MODEL,
            contents=[{"role": "user", "parts": [{"text": description_prompt}]}]
        )
        
        description = response.text.strip()
        print(f"✨ AI Description: {description[:100]}...")
        
    except Exception as e:
        print(f"⚠️ Description generation error: {e}")
        description = f"Experience the beauty and culture of {city}!"
    
    # Create an attractive preview card
    from PIL import ImageDraw, ImageFont
    import textwrap
    
    # Color schemes for different cities
    color_schemes = {
        "paris": ("#E8D5F2", "#6B2D5C"),      # Lavender & Purple
        "london": ("#D4E5F7", "#1A3A52"),     # Light Blue & Navy
        "tokyo": ("#FFE5EC", "#C41E3A"),      # Pink & Red
        "berlin": ("#D5F5E3", "#196F3D"),     # Light Green & Dark Green
        "new york": ("#FFF8DC", "#DAA520"),   # Cornsilk & Goldenrod
        "rome": ("#FAEBD7", "#8B4513"),       # Antique White & Saddle Brown
        "barcelona": ("#FFE4B5", "#FF6347"),  # Moccasin & Tomato
        "sydney": ("#F0F8FF", "#4682B4"),     # Alice Blue & Steel Blue
    }
    
    bg_color, text_color = color_schemes.get(
        city.lower(), 
        ("#E0F7FA", "#006064")  # Cyan & Dark Cyan (default)
    )
    
    # Create high-quality image
    img = Image.new('RGB', (1024, 1024), color=bg_color)
    draw = ImageDraw.Draw(img)
    
    # Try to use nice fonts, fall back to default if not available
    try:
        title_font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 80)
        desc_font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 36)
        small_font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 28)
    except:
        title_font = ImageFont.load_default()
        desc_font = ImageFont.load_default()
        small_font = ImageFont.load_default()
    
    # Draw decorative header
    draw.rectangle([0, 0, 1024, 150], fill=text_color)
    
    # Draw city name
    title = city.upper()
    title_bbox = draw.textbbox((0, 0), title, font=title_font)
    title_width = title_bbox[2] - title_bbox[0]
    title_x = (1024 - title_width) // 2
    draw.text((title_x, 35), title, fill=bg_color, font=title_font)
    
    # Draw AI-generated description (wrapped)
    wrapped_lines = textwrap.wrap(description, width=45)
    y_offset = 220
    
    for line in wrapped_lines:
        bbox = draw.textbbox((0, 0), line, font=desc_font)
        line_width = bbox[2] - bbox[0]
        x = (1024 - line_width) // 2
        draw.text((x, y_offset), line, fill=text_color, font=desc_font)
        y_offset += 55
    
    # Add decorative elements
    # Circle with airplane icon
    draw.ellipse([362, 700, 662, 1000], fill=text_color)
    draw.text((460, 805), "✈️", font=title_font, fill=bg_color)
    
    # Add "AI-Generated Preview" label at bottom
    label = "AI-Powered Destination Preview"
    label_bbox = draw.textbbox((0, 0), label, font=small_font)
    label_width = label_bbox[2] - label_bbox[0]
    label_x = (1024 - label_width) // 2
    draw.text((label_x, 950), label, fill=text_color, font=small_font)
    
    print(f"✓ Created preview card for {city}")
    return img

In [None]:
# Test the AI-powered destination preview
test_image = artist("Paris")
display(test_image)

## Alternative: Integration with Real Image Generation

If you want actual AI-generated images (not preview cards), you have several options:

### Option 1: Google Vertex AI + Imagen
Requires Google Cloud setup but gives you access to Imagen:
```python
# Requires: pip install google-cloud-aiplatform
from google.cloud import aiplatform

aiplatform.init(project="your-project-id", location="us-central1")

def generate_with_imagen(prompt):
    model = aiplatform.Model("imagegeneration@002")
    response = model.predict(instances=[{"prompt": prompt}])
    # Process response...
```

### Option 2: Mix OpenAI DALL-E with Gemini Chat
Use Gemini for chat and OpenAI for images:
```python
from openai import OpenAI
openai_client = OpenAI()

def artist_dalle(city):
    response = openai_client.images.generate(
        model="dall-e-3",
        prompt=f"Beautiful vacation destination image of {city}",
        size="1024x1024"
    )
    # Download and return image...
```

### Option 3: Other Image APIs
- Stability AI (Stable Diffusion)
- Midjourney API
- Replicate
- Many others!

For this tutorial, our AI-powered preview cards demonstrate the multi-modal concept perfectly! 🎨

In [None]:
def artist_simple_fallback(city: str) -> Image.Image:
    """
    Super simple fallback - creates a basic colored image with city name.
    Use this only if the main artist() function fails completely.
    """
    from PIL import ImageDraw
    
    img = Image.new('RGB', (512, 512), color='skyblue')
    draw = ImageDraw.Draw(img)
    
    text = f"Visit {city}!"
    bbox = draw.textbbox((0, 0), text)
    text_width = bbox[2] - bbox[0]
    text_height = bbox[3] - bbox[1]
    
    position = ((512 - text_width) // 2, (512 - text_height) // 2)
    draw.text(position, text, fill='white')
    
    return img

## Audio/Text-to-Speech (Optional)

Google Cloud has Text-to-Speech, but it requires separate setup. For simplicity, we'll skip audio in this Gemini version.

If you want audio, you can:
1. Use Google Cloud Text-to-Speech API (requires separate authentication)
2. Use a different TTS service
3. Use the browser's built-in speech synthesis (via Gradio)

For now, we'll focus on chat + tools + images!

## The Complete Agentic Assistant

Now let's put it all together!

In [None]:
def chat_complete(history):
    """
    Complete chat function with tools and AI-generated destination previews.
    This is our "agentic" assistant!
    """
    # Convert history (last message is from user)
    gemini_history = []
    for msg in history:
        role = "model" if msg["role"] == "assistant" else "user"
        gemini_history.append({
            "role": role,
            "parts": [{"text": msg["content"]}]
        })
    
    image = None
    
    # First request with tools
    response = client.models.generate_content(
        model=MODEL,
        contents=gemini_history,
        config={
            "system_instruction": system_message,
            "tools": [get_ticket_price]
        }
    )
    
    # Handle function calls
    if response.candidates[0].content.parts[0].function_call:
        function_call = response.candidates[0].content.parts[0].function_call
        function_name = function_call.name
        function_args = dict(function_call.args)
        
        # Execute function
        if function_name == "get_ticket_price":
            result = get_ticket_price(**function_args)
            city = function_args.get('destination_city', '')
            
            # Generate AI-powered destination preview
            print(f"🎨 Creating destination preview for {city}...")
            try:
                image = artist(city)
            except Exception as e:
                print(f"⚠️ Main artist failed: {e}")
                print("Using simple fallback...")
                image = artist_simple_fallback(city)
            
            # Add to history
            gemini_history.append({
                "role": "model",
                "parts": [{"function_call": function_call}]
            })
            gemini_history.append({
                "role": "user",
                "parts": [{
                    "function_response": {
                        "name": function_name,
                        "response": {"price": result}
                    }
                }]
            })
            
            # Get final response
            final_response = client.models.generate_content(
                model=MODEL,
                contents=gemini_history,
                config={
                    "system_instruction": system_message,
                    "tools": [get_ticket_price]
                }
            )
            reply = final_response.text
    else:
        reply = response.text
    
    # Add assistant's reply to history
    history += [{"role": "assistant", "content": reply}]
    
    return history, image

## Launch the Complete Assistant!

This creates a full UI with:
- Chat interface with conversation history
- AI-generated destination preview cards
- Automatic tool calling for flight prices
- Multi-modal outputs (text + images)

**How it works:**
1. You ask about a flight price
2. Gemini calls the `get_ticket_price` tool
3. System generates an AI-powered preview card with Gemini's description
4. Everything displays in a beautiful interface!

In [None]:
# Complete Gradio interface

with gr.Blocks() as ui:
    gr.Markdown("# ✈️ FlightAI Assistant (Powered by Gemini)")
    gr.Markdown("Ask about flight prices and see destination images!")
    
    with gr.Row():
        chatbot = gr.Chatbot(height=500, type="messages")
        image_output = gr.Image(height=500, label="Destination Preview")
    
    with gr.Row():
        entry = gr.Textbox(
            label="Your message:",
            placeholder="Try: 'How much is a ticket to Paris?'"
        )
    
    with gr.Row():
        submit_btn = gr.Button("Send", variant="primary")
        clear_btn = gr.Button("Clear")
    
    def do_entry(message, history):
        history += [{"role": "user", "content": message}]
        return "", history
    
    # Wire up the interface
    entry.submit(do_entry, inputs=[entry, chatbot], outputs=[entry, chatbot]).then(
        chat_complete, inputs=chatbot, outputs=[chatbot, image_output]
    )
    
    submit_btn.click(do_entry, inputs=[entry, chatbot], outputs=[entry, chatbot]).then(
        chat_complete, inputs=chatbot, outputs=[chatbot, image_output]
    )
    
    clear_btn.click(lambda: (None, None), inputs=None, outputs=[chatbot, image_output], queue=False)

ui.launch()

## Try These Queries:

1. "How much is a ticket to London?"
2. "What's the price for Tokyo?"
3. "I want to fly to Paris"
4. "Tell me about Berlin flights"

Notice how the assistant:
- ✅ Calls the pricing function automatically
- ✅ Uses Gemini to generate vivid destination descriptions
- ✅ Creates beautiful preview cards with AI-generated content
- ✅ Maintains conversation context
- ✅ Responds naturally

**What makes this "multi-modal"?**
- Text input and output (chat)
- Visual output (destination preview cards)
- Tool integration (function calling)
- AI-generated descriptions (creative content)

## Exercises and Business Applications

### Extend This Project:

1. **Add more tools**:
   - `book_flight(city, date)` - Simulates booking
   - `check_availability(city, date)` - Check seat availability
   - `cancel_booking(booking_id)` - Cancel reservations

2. **Enhance image generation**:
   - Use Gemini's vision to describe destinations
   - Generate travel itineraries with images
   - Create personalized vacation mood boards

3. **Add real data**:
   - Connect to real flight APIs
   - Use actual pricing data
   - Integrate with booking systems

4. **Business applications**:
   - Customer support chatbot for your business
   - Product recommendation engine with images
   - Virtual shopping assistant
   - Real estate showing assistant
   - Restaurant recommendation system

### Key Learnings:

- ✅ Multi-modal AI combines text, images, and more
- ✅ Function calling enables AI to use external tools
- ✅ Gemini makes tool integration simple
- ✅ Agentic AI can orchestrate multiple capabilities
- ✅ Gradio makes professional UIs easy

## Comparison: Gemini vs OpenAI for Multi-Modal

| Feature | Gemini | OpenAI |
|---------|--------|--------|
| **Text Generation** | Gemini 2.0 Flash | GPT-4o-mini |
| **Image Generation** | Imagen 3.0 | DALL-E 3 |
| **Function Calling** | Pass Python functions directly | JSON schema required |
| **Vision** | Built into Gemini Pro Vision | GPT-4o, GPT-4-vision |
| **Audio** | Separate TTS API | Built-in TTS |
| **Pricing** | Generally lower | Moderate |
| **Setup** | Single API key | Single API key |

Both are excellent! Choose based on:
- Your existing infrastructure
- Specific feature needs
- Pricing considerations
- Geographic availability

## 🎉 Congratulations!

You've built a complete agentic AI system with:
- Conversational intelligence
- Tool/function calling
- Multi-modal capabilities
- Professional UI

This is a foundation you can build on for real business applications!

### Next Steps:
1. Try the Week 2 Exercise to build your own custom assistant
2. Explore Week 3 for more advanced AI techniques
3. Build something for your own business or project!