# Fireworks Platform Demo

A fast tour of Fireworks with runnable examples.

You’ll explore:
- Model Library and UI
- OpenAI-compatible SDK for text, vision, and embeddings
- Structured output (JSON, Grammar) and function calling
- Deployments with firectl and a quick tour of fine-tuning

Helpful links:
- Fireworks Docs: [Getting started](https://fireworks.ai/docs/getting-started/introduction) · [Model Library](https://fireworks.ai/models)

## Setup

This demo uses the Fireworks SDK and OpenAI SDK. Before running cells, set your environment variable `FIREWORKS_API_KEY`.

Install dependencies using uv (fast package manager):

```bash
uv venv .venv
source .venv/bin/activate
uv pip install --upgrade fireworks-ai openai requests pillow umap-learn matplotlib
```


## Section 1: Querying text models

Fireworks supports an OpenAI-compatible API. Point the OpenAI client at Fireworks by setting the base URL, then choose a model from the Model Library.

- Base URL: `https://api.fireworks.ai/inference/v1`
- Example model: `accounts/fireworks/models/deepseek-v3p1`
- Docs: [Querying text models](https://fireworks.ai/docs/guides/querying-text-models)


In [None]:
# Text: Fireworks via OpenAI-compatible client
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(api_key=os.environ["FIREWORKS_API_KEY"], base_url="https://api.fireworks.ai/inference/v1")

resp = client.chat.completions.create(
    model="accounts/fireworks/models/deepseek-v3p1",
    messages=[{"role": "user", "content": "Write a haiku about a dog who likes AI"}],
)
print(resp.choices[0].message.content)


## Section 1: Vision-language (image + text)

Send an image and a prompt to a multi-modal model.

- Example model: `accounts/fireworks/models/qwen2p5-vl-32b-instruct`
- Docs: [Vision language models](https://fireworks.ai/docs/guides/querying-vision-language-models)


In [None]:
# Vision-language: local image example
import os, base64
from openai import OpenAI
from matplotlib import image as mplimg
from matplotlib import pyplot as plt

client = OpenAI(api_key=os.environ["FIREWORKS_API_KEY"], base_url="https://api.fireworks.ai/inference/v1")

image_path = "cat-in-a-hat.png"
img = mplimg.imread(image_path)
plt.imshow(img)
plt.axis("off")
plt.show()

with open(image_path, "rb") as f:
    b64 = base64.b64encode(f.read()).decode()

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image in one concise sentence."},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{b64}"}},
        ],
    }
]

resp = client.chat.completions.create(
    model="accounts/fireworks/models/qwen2p5-vl-32b-instruct",
    messages=messages,
)
print(resp.choices[0].message.content)


## Section 1: Embeddings

Turn text into vectors for search and RAG.

- Example model: `fireworks/qwen3-30b-a3b`
- Docs: [Embeddings](https://docs.fireworks.ai/guides/querying-embeddings-models)


In [None]:
# Embeddings with Fireworks (OpenAI-compatible)
import os
from openai import OpenAI

client = OpenAI(
    base_url = "https://api.fireworks.ai/inference/v1",
    api_key=os.environ["FIREWORKS_API_KEY"],
)
response = client.embeddings.create(
  model="fireworks/qwen3-30b-a3b",
  input="Spiderman was a particularly entertaining movie.",
)

print(response)


In [None]:
import os
import numpy as np
import umap
import plotly.express as px
import warnings
warnings.filterwarnings("ignore")


# Fireworks (OpenAI-compatible) client
from openai import OpenAI
client = OpenAI(
    base_url="https://api.fireworks.ai/inference/v1",
    api_key=os.environ["FIREWORKS_API_KEY"],
)

# 1) Fifteen texts across 3 semantic clusters (5 each)
texts = [
    # Cluster A: Movies / Superheroes (5)
    "The latest Spider-Man reboot had dazzling visuals and witty dialogue, especially during the city-swing chase.",
    "Critics praised the superhero film’s character arc and the emotional stakes between the masked hero and his mentor.",
    "I loved the comic-book set pieces, the web-swinging, and the playful banter that kept the action energetic.",
    "The villain’s motivations felt grounded, and the mid-credits scene set up a bold twist for the next installment.",
    "Sound design in the rooftop battle mixed orchestral swells with gritty street noise to heighten tension.",

    # Cluster B: Cooking / Pasta (5)
    "To make cacio e pepe, toast the pepper, add starchy pasta water, then fold in finely grated Pecorino until silky.",
    "Fresh tagliatelle with slow-simmered ragù develops depth—start with onions, carrots, celery, and a gentle reduction.",
    "Al dente spaghetti with cherry tomatoes, basil, and garlic shines when you emulsify the sauce with pasta water.",
    "Finish carbonara off-heat so the eggs turn glossy, then adjust with a splash of pasta water to keep it velvety.",
    "For pesto, chill the bowl and pulse briefly to prevent bruising the basil; loosen with a ladle of hot pasta water.",

    # Cluster C: Outdoors / Hiking (5)
    "The ridge hike offers sweeping valley views; pack layers, check the forecast, and bring enough water for the climb.",
    "We pitched the tent near a quiet alpine lake, then followed switchbacks to the summit just before golden hour.",
    "Trail etiquette matters—yield to uphill hikers, stay on marked paths, and carry out everything you bring in.",
    "Microspikes helped on icy sections above tree line; we navigated cairns until the clouds finally broke.",
    "A pre-dawn start kept us cool; we logged GPS waypoints and kept snacks handy for the last steep push.",
]

labels = np.array(
    ["Movies/Comics"] * 5 +
    ["Cooking/Pasta"] * 5 +
    ["Hiking/Outdoors"] * 5
)

# 2) Batch embeddings
emb_resp = client.embeddings.create(
    model="fireworks/qwen3-30b-a3b",
    input=texts
)
embeddings = np.array([d.embedding for d in emb_resp.data], dtype=np.float32)

# 3) UMAP to 3D (cosine distance works well for text embeddings)
reducer = umap.UMAP(n_components=3, metric="cosine", random_state=42)
xyz = reducer.fit_transform(embeddings)

# 4) Interactive 3D plot with Plotly
fig = px.scatter_3d(
    x=xyz[:, 0],
    y=xyz[:, 1],
    z=xyz[:, 2],
    color=labels,
    hover_name=[f"{i+1}. {lbl}" for i, lbl in enumerate(labels)],
    hover_data={"Text": texts},
    title="UMAP (3D) of 15 Text Embeddings"
)
fig.update_traces(marker=dict(size=6, opacity=0.9))
fig.update_layout(width=900, height=500, scene=dict(xaxis_title="UMAP-1", yaxis_title="UMAP-2", zaxis_title="UMAP-3"))
fig.show()

## Section 1: Function calling

Let the model request a tool with arguments; you execute it and feed the result back.

- Docs: [Tool / Function calling](https://fireworks.ai/docs/guides/function-calling)


In [None]:
from fireworks import LLM

# Define function schemas
def get_weather(location: str) -> str:
    """Get current weather for a location"""
    # Mock weather data
    weather_data = {
        "New York": "Sunny, 72°F",
        "London": "Cloudy, 15°C",
        "Tokyo": "Rainy, 20°C"
    }
    return weather_data.get(location, "Weather data not available")


def count_letter_occurrences(word: str, letter: str) -> int:
    """Count how many times a given letter occurs in a word"""
    if not word or not letter:
        return 0
    return word.count(letter)


# Available functions mapping
available_functions = {
    "get_weather": get_weather,
    "count_letter_occurrences": count_letter_occurrences
}

# Function definitions for the LLM (using correct "tools" format)
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name"
                    }
                },
                "required": ["location"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "count_letter_occurrences",
            "description": "Count how many times a given letter occurs in a word",
            "parameters": {
                "type": "object",
                "properties": {
                    "word": {
                        "type": "string",
                        "description": "The word to check"
                    },
                    "letter": {
                        "type": "string",
                        "description": "The letter to count"
                    }
                },
                "required": ["word", "letter"]
            }
        }
    }
]

# Initialize LLM
llm = LLM(model="accounts/fireworks/models/glm-4p5", deployment_type="serverless", api_key=os.environ["FIREWORKS_API_KEY"])


In [None]:
# Example 1: Weather query
import json

print("=== Example 1: Weather Query ===")

# Initialize the messages list
messages = [
    {
        "role": "system",
        "content": "You are a helpful assistant. You have access to a couple of tools, use them when needed."
    },
    {
        "role": "user",
        "content": "What's the weather like in Tokyo?"
    }
]

response = llm.chat.completions.create(
    messages=messages,
    tools=tools,
    temperature=0.1
)

# Check if the model wants to call a tool/function
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    function_name = tool_call.function.name
    function_args = json.loads(tool_call.function.arguments)

    print(f"LLM wants to call: {function_name}")
    print(f"With arguments: {function_args}")

    # Execute the function
    function_response = available_functions[function_name](**function_args)
    print(f"Function result: {function_response}")

    # Add the assistant's tool call to the conversation
    messages.append({
        "role": "assistant",
        "content": "",
        "tool_calls": [tool_call.model_dump() for tool_call in response.choices[0].message.tool_calls]
    })

    # Add the function result to the conversation
    messages.append({
        "role": "tool",
        "content": json.dumps(function_response) if isinstance(function_response, dict) else str(function_response)
    })

    # Get the final response
    final_response = llm.chat.completions.create(
        messages=messages,
        tools=tools,
        temperature=0.1
    )

    print(f"Final response: {final_response.choices[0].message.content}")


In [None]:
# Example 2: Count letter occurrences
import json

print("\n=== Example 2: Count Letter Occurrences ===")

# Initialize messages for letter counter
messages = [
    {
        "role": "system",
        "content": "You are a helpful assistant. You have access to a couple of tools, use them when needed."
    },
    {
        "role": "user",
        "content": "How many times does the letter 'p' appear in the word 'apple'?"
    }
]

response = llm.chat.completions.create(
    messages=messages,
    tools=tools,
    temperature=0.1
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    function_name = tool_call.function.name
    function_args = json.loads(tool_call.function.arguments)

    print(f"LLM wants to call: {function_name}")
    print(f"With arguments: {function_args}")

    # Execute the function
    function_response = available_functions[function_name](**function_args)
    print(f"Function result: {function_response}")

    # Add the assistant's tool call to the conversation
    messages.append({
        "role": "assistant",
        "content": "",
        "tool_calls": [tool_call.model_dump() for tool_call in response.choices[0].message.tool_calls]
    })

    # Add the function result to the conversation
    messages.append({
        "role": "tool",
        "content": json.dumps(function_response) if isinstance(function_response, dict) else str(function_response)
    })

    # Get final response
    final_response = llm.chat.completions.create(
        messages=messages,
        tools=tools,
        temperature=0.1
    )

    print(f"Final response: {final_response.choices[0].message.content}")


## Section 1: JSON mode

Ask for structured JSON that is easy to parse.

- Docs: [JSON mode](https://fireworks.ai/docs/structured-responses/structured-response-formatting)


In [None]:
# Minimal JSON mode with Fireworks LLM + Pydantic
import os, json
from typing import List, Optional
from pydantic import BaseModel
from fireworks import LLM  # ✅ Fireworks SDK

# --- Schema ---
class ProductInfo(BaseModel):
    title: str
    category: str
    price: str
    features: List[str]

# --- Client ---
FIREWORKS_API_KEY = os.environ["FIREWORKS_API_KEY"]
MODEL_ID = "accounts/fireworks/models/deepseek-v3p1"

llm = LLM(model=MODEL_ID, deployment_type="serverless", api_key=FIREWORKS_API_KEY)

# --- Example product blurb ---
product_blurb = """
Introducing the Acme BreezeMax Pro, a whisper-quiet 16" smart pedestal fan.
AI auto mode adjusts airflow to room conditions. Three colors: black, white, green.
Includes a 2-year warranty. MSRP $139. Perfect for large rooms.
"""

# --- Messages ---
system_msg = {
    "role": "system",
    "content": (
        "You are a precise information extraction assistant. "
        "Return ONLY a JSON object that matches the schema. "
        "Do not include extra fields."
    ),
}
user_msg = {
    "role": "user",
    "content": f"Extract structured product information from this summary:\n\n{product_blurb}",
}

# --- Call model ---
resp = llm.chat.completions.create(
    messages=[system_msg, user_msg],
    temperature=0.1,
    response_format={"type": "json_object", "schema": ProductInfo.model_json_schema()},
)

# --- Validate ---
raw = resp.choices[0].message.content
try:
    data = json.loads(raw)
    product = ProductInfo.model_validate(data)
    print("✅ Parsed keys:", list(data.keys()))
    print(json.dumps(product.model_dump(), indent=2))
except Exception as e:
    print("Raw output:\n", raw)
    print("\nValidation error:", e)


## Section 1: Grammar mode

Constrain output to a specific grammar (e.g., names, IDs).

- Docs: [Grammar mode](https://fireworks.ai/docs/structured-responses/structured-output-grammar-based)


In [None]:
import os
from fireworks import LLM

# --- Client setup ---
FIREWORKS_API_KEY = os.environ["FIREWORKS_API_KEY"]
MODEL_ID = "accounts/fireworks/models/deepseek-v3p1"

llm = LLM(model=MODEL_ID, deployment_type="serverless", api_key=FIREWORKS_API_KEY)

# --- Blurb to extract from ---
product_blurb = """
Introducing the Acme BreezeMax Pro, a whisper-quiet 16" smart pedestal fan.
AI auto mode adjusts airflow to room conditions. Three colors: black, white, green.
Includes a 2-year warranty. MSRP 139 USD. Perfect for large rooms.
"""

# --- Grammar in BNF (GBNF) format ---
price_grammar = """
root ::= number
number ::= DIGITS
DIGITS ::= DIGIT | DIGIT DIGITS
DIGIT ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
"""

# --- Messages ---
system_msg = {
    "role": "system",
    "content": (
        "You are an assistant that extracts the price from a product summary. "
        "Return ONLY the number (no currency symbol, no text)."
    )
}
user_msg = {
    "role": "user",
    "content": f"Here is the summary:\n\n{product_blurb}\n\nWhat is the price?"
}

# --- Call using grammar mode ---
resp = llm.chat.completions.create(
    messages=[system_msg, user_msg],
    temperature=0.1,
    response_format={"type": "grammar", "grammar": price_grammar},
)

# --- Output ---
print("Extracted price:", resp.choices[0].message.content)


## Section 2: Deployments with firectl

Create and manage deployments from the command line.

- Docs: [firectl](https://fireworks.ai/docs/tools-sdks/firectl)


#### Install firectl with Homebrew (macOS)

```bash
brew tap fw-ai/firectl
brew install firectl
```

#### Verify auth

```bash
firectl signin
firectl whoami
```


#### Create a small deployment (example: Qwen3 0.6B)
**Confirm the exact model ID in the Model Library for your account**

```bash
firectl create deployment accounts/fireworks/models/qwen3-0p6b
```

**List deployments**

```bash
firectl list deployments
```

In [None]:
#### Sending a request to my deployment
import requests
import json
from dotenv import load_dotenv

load_dotenv()

url = "https://api.fireworks.ai/inference/v1/chat/completions"
payload = {
  "model": "accounts/jmiano888-83b646/deployedModels/qwen3-0p6b-rno799u3",
  "max_tokens": 5120,
  "top_p": 1,
  "top_k": 40,
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "temperature": 0.6,
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ]
}
headers = {
  "Accept": "application/json",
  "Content-Type": "application/json",
  "Authorization": f"Bearer {os.environ['FIREWORKS_API_KEY_2']}"
}
resp = requests.request("POST", url, headers=headers, data=json.dumps(payload))
print(resp.json()["choices"][0]["message"]["content"])

## Section 3 — Fine-tuning overview

Approaches:
- Supervised fine-tuning (SFT)
- Reinforcement fine-tuning (RFT)
- Direct Preference Optimization (DPO)

Goal: get a small, specialized model to match a larger general model on your task.

Docs:
- Fine-tuning overview: `https://fireworks.ai/docs/fine-tuning/finetuning-intro`
- Examples and guides: `https://fireworks.ai/docs/examples/introduction`


## Section 4 — What can you build?

Examples:
- Chat assistants and copilots (knowledge-grounded, tools, JSON/Grammar outputs)
- RAG search over docs with embeddings
- Image understanding and captioning
- Meeting/phone call voice assistants (streaming STT + LLM)
- Agents that call APIs (function calling)

Call to action:
- Start with the Model Library and start applying SoTA models to your use-cases
- Try a new small model on your task (e.g., Qwen3 0.6B) and see how it performs; fine-tune for even better results!
