# Fireworks OpenAI Response API with Previous Response ID

This notebook demonstrates how to use Fireworks' OpenAI-compatible Response API with the powerful **previous response ID** feature. This allows you to:

- **Continue conversations** from where you left off
- **Reference previous responses** without re-sending full context
- **Build multi-turn workflows** efficiently
- **Chain complex reasoning** across multiple API calls

## Key Benefits:
- 🔗 **Conversation Continuity**: Maintain context across API calls
- 💰 **Cost Efficiency**: Avoid re-sending large contexts
- ⚡ **Performance**: Faster responses by referencing cached context
- 🏗️ **Workflow Building**: Create sophisticated multi-step processes


In [1]:
# Install dependencies
%pip install openai -q


Note: you may need to restart the kernel to use updated packages.


In [2]:
# Setup and Configuration
import os
import json
from openai import OpenAI
from pprint import pprint

# Initialize Fireworks client
client = OpenAI(
    base_url="https://api.fireworks.ai/inference/v1",
    api_key=os.getenv("FIREWORKS_API_KEY", "YOUR_FIREWORKS_API_KEY_HERE")
)


## 🚀 Example 1: Basic Response API Usage

Let's start with a basic example to understand the Response API structure and how to capture response IDs.


In [3]:
# Example 1: Basic Response API call to establish initial context
print("🎯 Step 1: Creating initial response with context")
print("=" * 60)

# Make the first API call
initial_response = client.responses.create(
    model="accounts/fireworks/models/qwen3-235b-a22b",
    tools=[
        {
            "type": "mcp",
            "server_label": "deepwiki",
            "server_url": "https://mcp.deepwiki.com/mcp",
            "require_approval": "never",
        },
    ],
    input="What transport protocols are supported in the 2025-03-26 version of the MCP spec? Use https://github.com/modelcontextprotocol/modelcontextprotocol as the source of truth.",

)

# Extract and display the response
response_text = initial_response.output[-1].content[0].text.split("</think>")[-1]
response_id = initial_response.id

print(f"📝 Response ID: {response_id}")
print(f"💬 Response Content:")
print(response_text)
print("\n" + "=" * 60)

# Store response ID for next example
previous_response_id = response_id
print(f"💾 Stored response ID for continuation: {previous_response_id}")


🎯 Step 1: Creating initial response with context
📝 Response ID: resp_7ac894680c784fb1a80b7ca2a4f78f5f
💬 Response Content:


The 2025-03-26 version of the Model Context Protocol (MCP) specification supports the following transport protocols:

1. **STDIO Transport** 
   - Used for local process communication
   - Required for local integrations
   - Messages are delimited by newlines
   - Server may write logs to `stderr` for debugging

2. **Streamable HTTP Transport** 
   - Provides remote server connections
   - Implements HTTP POST for requests/response
   - Uses Server-Sent Events (SSE) for server-to-client streaming
   - Supports session management with `Mcp-Session-Id` headers
   - Compatible with web browser and remote service deployments

The specification explicitly states that clients **MUST** support the stdio transport unless otherwise constrained, ensuring local integrations remain universally accessible. Both transport layers support bidirectional JSON-RPC 2.0 messaging as 

## 🔗 Example 2: Using Previous Response ID

Now let's use the previous response ID to continue the conversation without re-sending the entire context.


In [4]:
# Example 2: Continue conversation using previous response ID
print("🔄 Step 2: Continuing conversation with previous response ID")
print("=" * 60)

# Continue the conversation by referencing the previous response
continuation_response = client.responses.create(
    model="accounts/fireworks/models/qwen3-235b-a22b",
    input="Great! Now I specifically want to focus the second transport protocol that you mentioned, can you explain to me why I should care as an application developer?",
    tools=[
        {
            "type": "mcp",
            "server_label": "deepwiki",
            "server_url": "https://mcp.deepwiki.com/mcp",
            "require_approval": "never",
        },
    ],
    previous_response_id=previous_response_id  # This is the key parameter!
)

# Extract and display the continuation
continuation_text = continuation_response.output[-1].content[0].text.split("</think>")[-1]
continuation_id = continuation_response.id

print(f"📝 New Response ID: {continuation_id}")
print(f"🔗 Referenced Previous ID: {previous_response_id}")
print(f"💬 Continuation Response:")
print(continuation_text)
print("\n" + "=" * 60)


🔄 Step 2: Continuing conversation with previous response ID
📝 New Response ID: resp_3b0887d6e36a47a08c1b4010f9257b5d
🔗 Referenced Previous ID: resp_7ac894680c784fb1a80b7ca2a4f78f5f
💬 Continuation Response:


As an application developer, you should care about the **Streamable HTTP Transport** (HTTP/SSE) in the 2025-03-26 MCP specification for the following **critical reasons**:

---

### 1. **Remote Server Deployment = Cloud-Native Flexibility**
- **Why It Matters**: The HTTP/SSE transport allows MCP servers to run as **independent remote services** rather than local subprocesses. This decouples clients (like IDEs or web apps) from server logic, enabling:
  - **Scalable microservices**: Deploy MCP servers as REST APIs for centralized tool execution (e.g., a multi-tenant code-generation service).
  - **Hybrid architectures**: Use MCP clients in mobile/desktop apps while servers run serverlessly (e.g., AWS Lambda or Cloudflare Workers).
  - Example: A VS Code extension uses MCP over HTTP 

Here is an visualization of the overall flow

![overall flow](./image.png)

In [5]:
# Install dependencies
%pip install rich -q


Note: you may need to restart the kernel to use updated packages.


In [None]:
# Now here is another way to visualize the conversation history
from rich.console import Console
from rich.panel import Panel
from rich.syntax import Syntax

console = Console()

continuation_response = client.responses.retrieve(continuation_id)
for msg in continuation_response.output:
    if msg.role == "user":
        console.print(Panel(msg.content[0].text, title="[bold green]👤 User[/]", border_style="green"))
    elif msg.role == "assistant":
        text = msg.content[0].text
        think_block = ""
        if "<think>" in text:
            think_block, text = text.split("</think>", 1)
            console.print(Panel(Syntax(think_block + "</think>", "xml", theme="monokai"), title="[bold blue]🤖 Assistant (Thought Process)[/]", border_style="blue"))
        if text.strip():
            console.print(Panel(text.strip(), title="[bold blue]🤖 Assistant (Final Answer)[/]", border_style="cyan"))
    elif msg.type == "tool_output":
        console.print(Panel(f"[dim]{msg.output[:500]}...[/dim]", title=f"[bold yellow]🛠️ Tool Output (for {msg.tool_call_id})[/]", border_style="yellow"))