# Creating Interactive Interfaces for LLM Applications

Building user interfaces traditionally requires frontend development expertise—knowledge of HTML, CSS, JavaScript, and frameworks like React. For data scientists and backend developers, this creates a significant barrier to demonstrating work or deploying applications. Gradio eliminates this barrier entirely, allowing you to create polished web interfaces with just a few lines of Python code.

## Understanding Gradio

Gradio is an open-source Python library created by a company now owned by Hugging Face. It enables you to write simple Python code that automatically generates complete web applications. The magic of how this works—the technical details of server creation and React frontend generation—will be explained after you experience it firsthand. For now, focus on the practical aspects of building interfaces.

Gradio has become enormously popular in the data science community precisely because it's designed for data science workflows. It's not the only option—Streamlit offers similar capabilities with a different approach—but Gradio excels at rapid prototyping and creating demo applications with minimal code.

The fundamental concept is elegantly simple: you define Python functions that perform your core logic, then tell Gradio how those functions should connect to interface elements. Gradio handles everything else—creating the web server, generating the frontend, managing user interactions, and calling your functions when appropriate.

## Your First Gradio Application

Consider this minimal example from Gradio's documentation:


In [21]:
import gradio as gr

def greet(name):
    return f"Hello {name}!"

demo = gr.Interface(fn=greet, inputs="text", outputs="text")
demo.launch()

* Running on local URL:  http://127.0.0.1:7867

To create a public link, set `share=True` in `launch()`.




This creates a complete web application with an input field, a submit button, and an output display. When users type text and click submit, the `greet` function executes and the result appears in the output area. The entire interface—server, frontend, event handling—comes from these four lines of code.

The pattern here represents the core Gradio workflow: define your function, specify inputs and outputs, create an interface, and launch it. Everything else builds on this foundation.

## Building a Simple Text Transformer

Let's create a practical example. Start by defining a straightforward function:


In [22]:
def shout(text):
    print(f"Shout has been called with input: {text}")
    return text.upper()

This function takes text as input, prints a confirmation message, and returns the uppercase version. Nothing about this function knows or cares about user interfaces—it's pure Python logic.

Now create a Gradio interface for this function:


In [23]:
demo = gr.Interface(
    fn=shout,
    inputs="textbox",
    outputs="textbox",
    flagging_mode="never"
)
demo.launch()

* Running on local URL:  http://127.0.0.1:7868

To create a public link, set `share=True` in `launch()`.




When you run this code, a complete web interface appears. You can type text into an input field, click submit, and see the uppercased result in the output field. The console shows the print statement confirming the function was called.

The `flagging_mode="never"` parameter disables a special Gradio feature designed for data science workflows where you might want to flag certain results for review. For most applications, you won't need this feature.

## Understanding the Callback Pattern

The critical concept here is the callback pattern. You're not calling the `shout` function directly. Instead, you're passing the function itself (not the result of calling it) to Gradio:

```python
fn=shout  # Correct: passing the function
# NOT: fn=shout("hello")  # Incorrect: passing a function result
```

This distinction is fundamental. You're telling Gradio: "Here's a function you can call whenever you need to." Gradio stores this function and calls it (calls back to it) when users interact with the interface. This is why it's called a callback—Gradio calls back to your code at the appropriate time.

## Accessing Your Application

When you launch a Gradio app, it starts a web server on your local machine. You'll see output indicating the server is running, including a URL like `http://127.0.0.1:7860`. This is a local URL accessible only from your computer.

The interface appears embedded directly in your notebook, but clicking the URL opens it in a new browser tab. Both views connect to the same server and function identically. The port number (7860 in this example) increments each time you launch a new Gradio app, as each instance needs its own port.

## Sharing Your Application

Gradio includes a remarkable feature for sharing applications with others. Adding `share=True` to your launch call does something technically sophisticated:

```python
demo.launch(share=True)
```

This uploads your application to Gradio's servers and creates a public URL (ending in `.gradio.live`) that anyone can access. The truly impressive part: when someone uses this public interface, Gradio tunnels the request back to your local machine to execute your function.

This works through HTTP tunneling technology (similar to tools like ngrok) that's well-established in technical communities. However, if you work in a corporate environment with strict security policies, this feature might be blocked or could flag security monitoring systems. Use it only in environments where you're certain it's acceptable.

**The shared URL remains active for one week and only works while your computer runs the Gradio application**. This makes it perfect for quick demos but isn't suitable for permanent deployment.

For production deployment, Gradio provides a `gradio deploy` command that properly hosts your application. This approach is covered in more advanced courses focused on deployment strategies.

## Authentication and Security

Adding basic authentication to your Gradio app requires just one additional parameter:

```python
demo.launch(auth=("username", "password"))
```

Users must enter this username and password before accessing the interface. For multiple users, pass a list of tuples:

```python
demo.launch(auth=[("alice", "secret1"), ("bob", "secret2")])
```

This provides rudimentary security for shared applications. However, storing passwords in plain text is poor practice. **At minimum, use environment variables to store credentials**. For production applications, implement proper authentication with hashed passwords and secure credential storage.

## Customizing Appearance

Gradio respects your system's light or dark mode preference by default. While Gradio recommends maintaining this for accessibility, you can force a specific theme if needed:

```python
demo.launch(js="dark")  # Forces dark mode
demo.launch(js="light")  # Forces light mode
```

The `in_browser=True` parameter automatically opens a browser window when launching, convenient when you always work in a browser rather than the notebook:

```python
demo.launch(in_browser=True)
```

## Creating Detailed Interfaces

Instead of accepting default settings, you can explicitly define interface components:


In [24]:
message_input = gr.Textbox(
    label="Your Message",
    placeholder="Enter a message to be shouted",
    lines=7
)

message_output = gr.Textbox(
    label="Response",
    lines=8
)

In [25]:
demo = gr.Interface(
    fn=shout,
    title="Shout",
    inputs=message_input,
    outputs=message_output,
    examples=["hello", "howdy"],
    flagging_mode="never"
)
demo.launch()

* Running on local URL:  http://127.0.0.1:7869

To create a public link, set `share=True` in `launch()`.




This creates a more polished interface with:

- Custom labels and placeholders
- Specified text area heights
- A title for the application
- Example inputs users can click to populate the input field

The examples feature is particularly useful—users can click an example to instantly populate the input field, making your application more discoverable and easier to use.

## Connecting LLMs to Gradio

The true power of Gradio emerges when you connect it to language models. Remember the function created earlier for calling GPT:


In [26]:
from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")

openai = OpenAI()

def message_gpt(prompt):
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ]
    )
    return response.choices[0].message.content

This function takes a prompt, calls GPT, and returns the response. It has the same signature as the `shout` function—one input, one output. This means you can use it with Gradio exactly the same way:


In [27]:
demo = gr.Interface(
    fn=message_gpt,
    title="GPT",
    inputs=gr.Textbox(label="Your message", placeholder="Enter a message for GPT"),
    outputs=gr.Textbox(label="Response"),
    examples=["hello", "What is machine learning?"],
    flagging_mode="never"
)
demo.launch()

* Running on local URL:  http://127.0.0.1:7870

To create a public link, set `share=True` in `launch()`.




Run this code and you have a complete interface for chatting with GPT. Gradio has no idea it's calling an LLM—from its perspective, this is just another callback function. It handles the interface, you handle the AI logic.

## Working with Markdown

Language models often produce better output when instructed to use Markdown formatting. Modify your system message:


In [28]:
system_message = "You are a helpful assistant that responds in Markdown without code blocks."

Then change the output component to render Markdown:


In [29]:
outputs=gr.Markdown(label="Response")

In [30]:
demo = gr.Interface(
    fn=message_gpt,
    title="GPT",
    inputs=gr.Textbox(label="Your message", placeholder="Enter a message for GPT"),
    outputs=gr.Textbox(label="Response"),
    examples=["hello", "What is machine learning?"],
    flagging_mode="never"
)
demo.launch()

* Running on local URL:  http://127.0.0.1:7871

To create a public link, set `share=True` in `launch()`.




Now responses display with proper formatting—bold text, headings, lists, and other Markdown features render correctly. The instruction to avoid code blocks prevents the model from wrapping its entire response in triple backticks, which would interfere with proper rendering.

Test this with different prompts:

- "Explain the transformer architecture to a layperson"
- "Explain the transformer architecture to an aspiring AI engineer"

The formatted responses demonstrate how Markdown improves readability significantly compared to plain text.

## Implementing Streaming Responses

When generating longer content, waiting for the complete response before displaying anything creates poor user experience. Streaming shows content as it's generated, creating the familiar typewriter effect.

Gradio handles streaming elegantly through Python generators. Create a generator function that yields progressively complete responses:


In [31]:
def stream_gpt(prompt):
    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": prompt}
    ]

    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        stream=True
    )

    result = ""
    for chunk in response:
        if chunk.choices[0].delta.content:
            result += chunk.choices[0].delta.content
            yield result

The key differences: `stream=True` in the API call returns a stream object rather than a complete response. Iterating over this stream gives you chunks of content. Each chunk gets added to the accumulated result, and you yield the full accumulated text (not just the new chunk).

Gradio recognizes generator functions automatically. When your callback is a generator, Gradio repeatedly calls it and updates the display with each yielded value. This creates the streaming effect without any additional configuration:


In [32]:
demo = gr.Interface(
    fn=stream_gpt,  # Generator function
    title="GPT (Streaming)",
    inputs=gr.Textbox(label="Your message"),
    outputs=gr.Markdown(label="Response"),
    examples=["Explain quantum computing", "What is machine learning?"],
    flagging_mode="never"
)
demo.launch()

* Running on local URL:  http://127.0.0.1:7872

To create a public link, set `share=True` in `launch()`.




The interface looks identical, but now responses stream in progressively rather than appearing all at once.

## Supporting Multiple Models

Create a single interface that routes to different models based on user selection:


In [33]:
def stream_model(prompt, model):
    if model == "GPT":
        result = stream_gpt(prompt)
    elif model == "Claude":
        # result = stream_claude(prompt)    # Since we don't have a claude api, for running the model we again use openai
        result = stream_gpt(prompt)
    else:
        raise ValueError(f"Unknown model: {model}")

    yield from result

The `yield from result` statement is shorthand for `for item in result: yield item`—it passes through all values from the nested generator.

Now create an interface with multiple inputs:


In [34]:
message_input = gr.Textbox(
    label="Your message",
    placeholder="Enter a message for the LLM"
)

model_selector = gr.Dropdown(
    choices=["GPT", "Claude"],
    value="GPT",
    label="Select Model"
)

demo = gr.Interface(
    fn=stream_model,
    title="LLMs",
    inputs=[message_input, model_selector],
    outputs=gr.Markdown(label="Response"),
    examples=[
        ["Explain transformers to a layperson", "GPT"],
        ["Explain transformers to an AI engineer", "Claude"]
    ],
    flagging_mode="never"
)
demo.launch()

* Running on local URL:  http://127.0.0.1:7873

To create a public link, set `share=True` in `launch()`.




Notice that inputs is now a list containing both the textbox and dropdown. Examples must also be lists of lists—each example provides values for both inputs.

Users can now select their preferred model from a dropdown before submitting their question. The same interface seamlessly routes to different backend services based on user choice.
