## Introduction to Conversational AI Development

The journey into building production-ready AI systems takes a significant leap forward when you move beyond simple API calls to creating sophisticated conversational interfaces. This chapter explores the essential techniques for building AI assistants that can maintain context, leverage external tools, and provide genuine business value through natural language interactions.

At this stage in your learning journey, you've already mastered the fundamentals of working with frontier language models through their APIs. Now comes the exciting part: transforming those API calls into interactive applications that users can genuinely engage with. We'll explore how to create chat interfaces, manage conversation history, implement tool calling, and build assistants with specialized knowledge and capabilities.

## Understanding AI Assistants and Their Core Components

An AI assistant represents one of the most common and valuable applications of large language models in commercial settings. The power of these assistants comes from three fundamental capabilities that work in harmony.

First, an AI assistant can embody a specific persona that aligns with your brand identity or corporate culture. Through carefully crafted system prompts, you establish the tone, style, and approach the assistant will take in every interaction. This isn't just about being friendly or professional—it's about creating a consistent experience that reflects your organization's values and communication style.

Second, successful AI assistants create what we might call an "illusion of memory." While the model itself doesn't truly remember past conversations in the way humans do, we can architect our applications to maintain conversational context. By carefully managing the conversation history and including it in each API call, we enable the assistant to understand what has been discussed and respond appropriately to follow-up questions or references to earlier parts of the dialogue.

Third, expertise makes an assistant genuinely valuable. Through strategic prompt engineering and information injection, you can equip your assistant with domain-specific knowledge. This transforms a general-purpose language model into a specialized tool that understands your business, your products, and your customers' needs.

## The Art of Prompt Engineering for Assistants

Thoughtful prompt construction forms the foundation of effective AI assistants. System prompts serve as the ground rules and context-setting mechanism for your assistant's behavior. One of the most critical patterns in prompt engineering is the instruction: "If you don't know the answer, just say so." This simple directive helps combat hallucination—the tendency of language models to generate plausible-sounding but incorrect information when they lack actual knowledge of a topic.

Beyond preventing hallucinations, system prompts provide essential background information. You might include details about your company, your products, current promotions, or relevant policies. This background forms the knowledge base from which your assistant draws when responding to queries.

Multi-shot prompting represents a powerful technique for shaping model behavior. By inserting concrete examples into the system prompt—typically formatted as "if you get this question, this is how you should answer"—you bias the model's predictions. Remember that language models work by predicting the most likely next token based on the input they receive. When you provide examples in the system prompt, you're essentially teaching the model what "likely" looks like in your specific context. As the model processes these examples, it becomes more probable that its responses will align with the patterns you've demonstrated.

### Understanding Callback Functions

The heart of your chat interface (gradio) lies in the callback function. **A callback is simply a function you provide to Gradio, which Gradio will invoke whenever a user submits a message**. This function follows a specific signature: it accepts two parameters and returns a response.

The first parameter, conventionally called `message`, contains the text the user just typed into the chat interface. The second parameter, `history`, contains the entire conversation that has occurred up to this point. Gradio handles the user interface rendering—displaying the conversation thread and providing an input field—and your callback function determines how the system responds to each new message.

Initially, you might create an extremely simple callback that always returns the same response, perhaps "bananas" as a playful example. This demonstrates the fundamental pattern: Gradio displays the UI, captures user input, calls your function with the current message and conversation history, and displays whatever your function returns. This one-way conversation pattern helps you understand the flow before adding complexity.

When you connect this callback to Gradio using `gr.Interface`, specifying `type="messages"`, you're telling Gradio to format the conversation history using the OpenAI messages format. This proves incredibly convenient because most developers use these chat interfaces specifically to call language model APIs, and having the data pre-formatted saves significant effort.

### Examining the Messages Format

The messages format that Gradio provides deserves careful examination. When your callback function receives the `history` parameter, it arrives as a list of dictionaries. Each dictionary contains a `role` field (typically "user" or "assistant") and a `content` field with the actual message text. You might also see additional metadata fields.

This format mirrors OpenAI's API structure almost exactly, which is intentional. However, some language model providers—particularly Gemini and Grok—reject messages that contain extra fields like metadata. To ensure compatibility across different model providers, you can clean the history by creating a new list that includes only the `role` and `content` fields, stripping out anything extra.

## Implementing Real Language Model Integration

Once you understand the callback pattern, integrating an actual language model becomes straightforward. Your callback function needs to construct the complete messages array that you'll send to the API.

This array begins with a system message. Remember, Gradio only passes you the visible conversation history from the UI—it doesn't know about your system prompt. You must explicitly prepend a message with `role: "system"` and your system message content.

Next, you include all the conversation history that Gradio provided. Finally, you append the current user message. This gives you a complete messages array: system context first, then the back-and-forth conversation history, and finally the user's latest input.

With this messages array prepared, you call the language model API exactly as you've done before, passing in your chosen model and the messages array. The response comes back in the familiar format, and you extract the content from `response.choices[0].message.content` and return it from your callback function.

When Gradio receives this return value, it automatically displays it in the chat interface. The user sees their message, followed by the assistant's response, all rendered in an attractive, familiar chat interface format.

### Adding Streaming for Better User Experience

Streaming responses significantly improve the user experience by showing text as it generates rather than waiting for the complete response. Implementing streaming in your callback requires only minor modifications.

You add `stream=True` to your API call parameters. The API then returns a stream object that you can iterate over using a for loop. For each chunk that arrives, you accumulate the text in a response variable. The key change is using `yield` instead of `return`. This transforms your callback function into a generator—a special Python construct that can return multiple values over time.

## Advanced Prompting Techniques

The system prompt represents your most powerful tool for customizing assistant behavior. Consider an example where you're building an assistant for a clothing store. Your system prompt might establish the context: "You are a helpful assistant in a clothes store." But it can go much further.

You might include specific business rules: "You should try to gently encourage the customer to try items that are on sale. Hats are 60% off and most other items are 50% off." This gives your assistant commercial knowledge and a soft sales objective.

One-shot prompting takes this further by providing an example interaction: "For example, if the customer says, 'I'm looking to buy a hat,' you could reply with something like, 'Wonderful! We have lots of hats, including several that are part of our sales event.'" This example demonstrates the tone, enthusiasm level, and information-sharing approach you want the assistant to adopt.

You can also add specific handling instructions: "If the customer asks for shoes, you should respond that shoes are not on sale today, but remind the customer to look for hats." As you add more such examples, you're practicing multi-shot prompting—providing multiple scenarios and desired responses that guide the model's behavior across various situations.

### Dynamic Context Injection

The real power emerges when you dynamically modify the system prompt based on the user's input. Imagine detecting that the user mentioned "belt" in their message. You could programmatically add a new section to your system prompt: "The store does not sell belts. If you are asked for belts, be sure to point out other items on sale."

While this specific example might seem trivial—why not always include this information?—it demonstrates a crucial concept. In a real store with thousands of products, including every possible piece of information in every system prompt would be impractical. Your prompts would become enormous, degrading model accuracy while consuming unnecessary tokens and increasing costs.

This dynamic context injection represents your first glimpse into Retrieval-Augmented Generation (RAG). The core idea is elegantly simple: intelligently select relevant information and insert it into the prompt at inference time. This equips the model with the specific knowledge it needs to answer the current query without overwhelming it with irrelevant details.

RAG isn't a complex algorithmic innovation—it's a practical approach to prompt enhancement. The sophistication comes from determining what information is truly relevant for each query, rather than using crude keyword matching. As you progress in your AI development journey, you'll learn increasingly sophisticated techniques for selecting and injecting the right context at the right time.

## Understanding Tool Calling Architecture

Tool calling often sounds mysterious or complex when you first encounter it, but the underlying mechanism is remarkably straightforward. Let's dispel any confusion by examining exactly what happens when a language model "uses tools."

Language models are, fundamentally, statistical programs that predict the next most likely token based on their training and the input they receive. They're neural networks running on remote servers, processing inputs and generating outputs. They don't magically reach across the internet to execute code on your local machine or directly interact with databases and APIs.

Here's the reality: when you enable tool calling, you're following a specific communication protocol with the language model. You describe available tools in your initial prompt using a structured JSON format. The language model has been trained to recognize this format and can respond by indicating it wants to use one of these tools.

When the model decides a tool would be helpful, it doesn't execute anything itself. Instead, it generates tokens that represent a request to call a specific tool with specific parameters. Your code receives this response, detects that it's a tool call request rather than a regular text response, executes the appropriate function locally, and then makes a second API call to the language model.

This second call includes the entire conversation history: the user's original question, the model's request to use a tool, and the result your code obtained from executing that tool. Based on this expanded context, the model generates a final response that incorporates the tool's output.

### The Tool Calling Workflow

Let's walk through a concrete example. Imagine you're building an airline assistant that can look up ticket prices. You write a Python function that queries your database for pricing information. You then describe this function to the language model using a specific JSON schema that specifies the function name, its purpose, what parameters it accepts, and what each parameter means.

When a user asks "How much is a flight to Paris?", you include your tool description in the API call. The model recognizes that answering this question requires information it doesn't have, and it sees a tool that can provide that information. It responds with something like: "Please use the get_ticket_price tool with destination_city='Paris'."

Your code detects this response type, extracts the function name and parameters, calls your actual Python function, and receives a result—perhaps "$850." You then construct a new messages array that includes: the original user question, the model's tool call request, and the result from executing the tool (formatted as a message with `role: "tool"`). You send this complete history back to the API.

The model now has all the information it needs. It sees the user asked about Paris flights, it "remembers" requesting pricing information (through the conversation history), and it has the actual price. It generates a helpful response: "A return ticket to Paris costs $850. Would you like to book this flight?"

### The JSON Schema for Tool Descriptions

Tools must be described using a specific JSON format. While this format can feel verbose and tedious to write, it's essential for enabling the language model to understand what tools are available and how to use them.

A tool description includes several key elements:


In [16]:
{
    "type": "function",
    "function": {
        "name": "get_ticket_price",
        "description": "Get the price of a return ticket to the destination city",
        "parameters": {
            "type": "object",
            "properties": {
                "destination_city": {
                    "type": "string",
                    "description": "The city that the customer wants to travel to"
                }
            },
            "required": ["destination_city"]
        }
    }
}

{'type': 'function',
 'function': {'name': 'get_ticket_price',
  'description': 'Get the price of a return ticket to the destination city',
  'parameters': {'type': 'object',
   'properties': {'destination_city': {'type': 'string',
     'description': 'The city that the customer wants to travel to'}},
   'required': ['destination_city']}}}

The `name` identifies the function. The `description` explains what the function does—this is crucial because the model uses this description to decide when the tool is appropriate. The `parameters` section describes each argument using JSON Schema syntax, including the data type and a description of what that parameter represents.

Language models have been trained on extensive examples of this format, so they understand how to interpret these schemas and generate properly formatted tool call requests.

## Implementing Tool Calling in Practice

Implementing tool calling requires modifications to your chat callback function. First, you need to pass the tool descriptions to the API when you make your initial call. This typically involves adding a `tools` parameter that contains your JSON-formatted tool descriptions.

Second, you must examine the response to determine whether it's a regular text response or a tool call request. The response object includes a `finish_reason` field. When this field equals "tool_calls", you know the model wants to use a tool rather than directly answering the user.

When you detect a tool call request, you need to:

1. Extract the tool call details from the response
2. Identify which function was requested
3. Extract the parameters for that function
4. Execute your actual Python function with those parameters
5. Construct a new message with `role: "tool"` containing the function's result
6. Make a second API call with the expanded conversation history


In [17]:
import os
import json
import uuid
from datetime import datetime
from dotenv import load_dotenv
from openai import OpenAI
import gradio as gr

Following `FLIGHT_DB` dictionary represents a structured in-memory database for the FlightAI airline, organizing flight information by destination city. Each city entry contains shared metadata such as the operating airline and currency, along with a nested `flights` object that lists all available cabin classes for that route. For each class (e.g., economy, premium economy, business, first class), the database stores key commercial and operational details including ticket price, flight duration, baggage allowance, refund eligibility, and real-time seat availability. This hierarchical design makes it easy to query flights by city, compare classes within the same route, select the cheapest or most suitable option, and support booking logic while remaining extensible for future additions such as taxes, layovers, or fare rules.


In [18]:
FLIGHT_DB = {
    "london": {
        "currency": "USD",
        "airline": "FlightAI",
        "flights": {
            "economy": {
                "price": 799,
                "duration_hours": 7,
                "baggage_kg": 23,
                "refundable": False,
                "seats_available": 42
            },
            "premium_economy": {
                "price": 1099,
                "duration_hours": 7,
                "baggage_kg": 28,
                "refundable": True,
                "seats_available": 18
            },
            "business": {
                "price": 1899,
                "duration_hours": 6.5,
                "baggage_kg": 32,
                "refundable": True,
                "seats_available": 6
            }
        }
    },

    "paris": {
        "currency": "USD",
        "airline": "FlightAI",
        "flights": {
            "economy": {
                "price": 899,
                "duration_hours": 8,
                "baggage_kg": 23,
                "refundable": False,
                "seats_available": 35
            },
            "premium_economy": {
                "price": 1199,
                "duration_hours": 7.8,
                "baggage_kg": 28,
                "refundable": True,
                "seats_available": 14
            },
            "business": {
                "price": 1999,
                "duration_hours": 7.5,
                "baggage_kg": 32,
                "refundable": True,
                "seats_available": 5
            }
        }
    },

    "tokyo": {
        "currency": "USD",
        "airline": "FlightAI",
        "flights": {
            "economy": {
                "price": 1400,
                "duration_hours": 14,
                "baggage_kg": 23,
                "refundable": False,
                "seats_available": 50
            },
            "premium_economy": {
                "price": 1850,
                "duration_hours": 13.5,
                "baggage_kg": 28,
                "refundable": True,
                "seats_available": 20
            },
            "business": {
                "price": 3200,
                "duration_hours": 13,
                "baggage_kg": 32,
                "refundable": True,
                "seats_available": 8
            },
            "first_class": {
                "price": 5200,
                "duration_hours": 12.8,
                "baggage_kg": 40,
                "refundable": True,
                "seats_available": 2
            }
        }
    },

    "berlin": {
        "currency": "USD",
        "airline": "FlightAI",
        "flights": {
            "economy": {
                "price": 499,
                "duration_hours": 6,
                "baggage_kg": 23,
                "refundable": False,
                "seats_available": 60
            },
            "business": {
                "price": 1299,
                "duration_hours": 5.7,
                "baggage_kg": 32,
                "refundable": True,
                "seats_available": 10
            }
        }
    },

    "new york": {
        "currency": "USD",
        "airline": "FlightAI",
        "flights": {
            "economy": {
                "price": 650,
                "duration_hours": 5,
                "baggage_kg": 23,
                "refundable": False,
                "seats_available": 48
            },
            "premium_economy": {
                "price": 980,
                "duration_hours": 4.8,
                "baggage_kg": 28,
                "refundable": True,
                "seats_available": 22
            },
            "business": {
                "price": 1750,
                "duration_hours": 4.5,
                "baggage_kg": 32,
                "refundable": True,
                "seats_available": 9
            }
        }
    }
}


The `lookup_flight` function serves as a backend tool that allows querying the `FLIGHT_DB` for flight information based on a specified destination. It first normalizes the city name to lowercase for consistent lookups and checks if the destination exists in the database. If the city is not found, it returns a structured message indicating unavailability. When a specific cabin class is provided, the function attempts to locate that class within the available flights. If the class exists, it returns detailed information including price, currency, flight duration, baggage allowance, refund eligibility, and available seats. If the class is not offered, it provides a clear reason for its unavailability, ensuring the system handles edge cases gracefully.

For general queries without a specified class, the function sorts all available cabin classes by price and either returns a limited number of tickets (based on the `limit` parameter) or all options if `show_all` is set to `True`. The returned data is structured in a consistent format, with each ticket entry containing essential details such as class, price, duration, baggage, refund policy, and seats available. This design allows the AI assistant to present concise, accurate, and user-friendly information, whether the user requests a single class or a comparison of multiple ticket options. By keeping all computations deterministic and relying directly on the database, the function ensures that responses are reliable and never invented by the model.


In [19]:
# Backend Function (Tool)

def lookup_flight(
    destination_city: str,
    cabin_class: str | None = None,
    show_all: bool = False,
    limit: int = 3
):
    city = destination_city.lower()
    city_data = FLIGHT_DB.get(city)

    if not city_data:
        return {
            "destination": city,
            "available": False,
            "reason": "Destination not found"
        }

    flights = city_data["flights"]

    # Normalize class name if provided
    if cabin_class:
        cabin_class = cabin_class.lower()
        data = flights.get(cabin_class)

        if not data:
            return {
                "destination": city,
                "available": False,
                "reason": f"{cabin_class} class not available"
            }

        return {
            "destination": city,
            "class": cabin_class,
            "price": data["price"],
            "currency": city_data["currency"],
            "duration_hours": data["duration_hours"],
            "baggage_kg": data["baggage_kg"],
            "refundable": data["refundable"],
            "seats_available": data["seats_available"]
        }

    # Sort all tickets by price
    sorted_flights = sorted(
        flights.items(),
        key=lambda item: item[1]["price"]
    )

    # Show all or limited number
    selected_flights = (
        sorted_flights if show_all else sorted_flights[:limit]
    )

    return {
        "destination": city,
        "currency": city_data["currency"],
        "tickets": [
            {
                "class": cls,
                "price": data["price"],
                "duration_hours": data["duration_hours"],
                "baggage_kg": data["baggage_kg"],
                "refundable": data["refundable"],
                "seats_available": data["seats_available"]
            }
            for cls, data in selected_flights
        ]
    }





The `flight_lookup_tool` dictionary defines a **tool schema** that an AI assistant can use to query flight information from the backend in a structured and predictable way. This schema informs the model about the tool's purpose, the input parameters it requires, and how to interpret the outputs, enabling the AI to call it reliably when responding to user queries.

- `"name": "lookup_flight"` specifies the unique identifier of the tool. The AI uses this name internally to trigger the flight lookup functionality.
- `"description"` provides a short explanation of the tool's purpose. It instructs the assistant to retrieve airline ticket information and explains that setting `show_all=true` allows the customer to view all available ticket classes, not just the default selection.
- `"parameters"` defines the expected input using JSON schema:
  - `"destination_city"` (string) is required and specifies the city the customer wants to travel to.
  - `"show_all"` (boolean) is optional and indicates whether the assistant should display all available ticket classes or only a subset.
- `"required": ["destination_city"]` ensures that at least the destination city must be provided for the tool to function properly.
- `"additionalProperties": False` prevents extra, unspecified fields from being passed, enforcing strict input validation.


In [20]:
# Tool Schema (Function Definition for LLM)

flight_lookup_tool = {
    "name": "lookup_flight",
    "description": (
        "Retrieve airline ticket information. "
        "Use show_all=true when the customer asks to see more options or all ticket classes."
    ),
    "parameters": {
        "type": "object",
        "properties": {
            "destination_city": {
                "type": "string",
                "description": "City the customer wants to travel to"
            },
            "show_all": {
                "type": "boolean",
                "description": "Set true if the customer wants to see all available ticket classes"
            }
        },
        "required": ["destination_city"],
        "additionalProperties": False
    }
}


tools = [{"type": "function", "function": flight_lookup_tool}]


The `booking_tool` dictionary below defines a **tool schema** that an AI assistant can use to handle flight bookings in a structured and deterministic way. This tool essentially tells the model what kind of action it can perform, what input it expects, and what parameters are required to execute the booking.

- `"name": "book_flight"` specifies the identifier of the tool. This is the name the AI assistant will reference when it wants to call this functionality.
- `"description"` provides a short explanation of what the tool does, which in this case is to book a flight after collecting the passenger's **name** and **email**. This helps the model understand the purpose of the tool and when it should be invoked.
- `"parameters"` defines the expected input schema using JSON schema notation. It specifies that the tool expects an object containing the following properties:
  - `"name"`: a string representing the passenger’s full name.
  - `"email"`: a string representing the passenger’s email address.
  - `"destination_city"`: a string for the city the passenger wants to travel to.
  - `"cabin_class"`: a string indicating the desired flight class (e.g., economy, business, first).
- `"required"` lists the parameters that must always be provided for the tool to function correctly. In this case, all four fields—`name`, `email`, `destination_city`, and `cabin_class`—are mandatory, ensuring that the AI assistant cannot attempt a booking without complete information.


In [21]:
booking_tool = {
    "name": "book_flight",
    "description": "Book a flight after collecting name and email",
    "parameters": {
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "email": {"type": "string"},
            "destination_city": {"type": "string"},
            "cabin_class": {"type": "string"}
        },
        "required": ["name", "email", "destination_city", "cabin_class"]
    }
}


In [None]:
BOOKINGS_DB = []
BOOKINGS_FILE = "bookings.json"

def book_flight(
    name: str,
    email: str,
    destination_city: str,
    cabin_class: str
):
    global BOOKINGS_DB
    
    city = destination_city.lower()
    cabin_class = cabin_class.lower()

    city_data = FLIGHT_DB.get(city)
    if not city_data:
        return {"success": False, "reason": "Destination not available"}

    flight = city_data["flights"].get(cabin_class)
    if not flight:
        return {"success": False, "reason": "Class not available"}

    if flight["seats_available"] <= 0:
        return {"success": False, "reason": "No seats available"}

    # Reserve seat
    flight["seats_available"] -= 1

    booking = {
        "booking_id": str(uuid.uuid4()),
        "name": name,
        "email": email,
        "destination": city,
        "class": cabin_class,
        "price": flight["price"],
        "currency": city_data["currency"],
        "timestamp": datetime.utcnow().isoformat()
    }
   
    BOOKINGS_DB.append(booking)

    return {
        "success": True,
        "booking_id": booking["booking_id"],
        "destination": city,
        "class": cabin_class,
        "price": flight["price"],
        "currency": city_data["currency"]
    }


In [23]:
# Tool Call Handler

def handle_tool_call(message):
    tool_call = message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)

    result = lookup_flight(args["destination_city"])

    tool_response = {
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": json.dumps(result)
    }

    return tool_response


In [24]:
# Streaming Chat Function (Gradio-Compatible)

client = OpenAI()
MODEL = "gpt-4.1-mini"

def chat(message, history, tools):
    messages = [{"role": "system", "content": system_message}]
    messages += history
    messages.append({"role": "user", "content": message})

    # First request (may trigger tool)
    response = client.chat.completions.create(
        model=MODEL,
        messages=messages,
        tools=tools,
        stream=False
    )

    choice = response.choices[0]

    # Tool call path
    if choice.finish_reason == "tool_calls":
        messages.append(choice.message)
        tool_message = handle_tool_call(choice.message)
        messages.append(tool_message)

        # Second request (final answer, streamed)
        stream = client.chat.completions.create(
            model=MODEL,
            messages=messages,
            stream=True
        )

        full_text = ""
        for chunk in stream:
            delta = chunk.choices[0].delta
            if delta and delta.content:
                full_text += delta.content
                yield full_text

    else:
        yield choice.message.content

In [25]:
tools = [
    {
        "type": "function",
        "function": booking_tool
    }
]


In [26]:
# System Message (Steering)

system_message = (
    "You are FlightAI, a professional airline assistant. "
    "Only book a flight after the customer explicitly confirms "
    "their full name, email address, destination, and cabin class. "
    "If any of these are missing, ask for them. "
    "Never guess personal information."
)

In [27]:
# Gradio UI

gr.ChatInterface(
    fn=chat,
    type="messages",
    title="✈️ FlightAI – Airline Assistant",
    description="Ask about ticket prices, destinations, and flight details."
).launch()

* Running on local URL:  http://127.0.0.1:7877

To create a public link, set `share=True` in `launch()`.




