-
Notifications
You must be signed in to change notification settings - Fork 54
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
When using MCP responses, something is lost from tool calling to the conversation endpoint.
LLamastack config:
version: '2'
image_name: ollama-llama-stack-config
apis:
- agents
- inference
- safety
- telemetry
- tool_runtime
- vector_io
logging:
level: DEBUG
category_levels:
llama_stack: DEBUG # Enable DEBUG for all llama_stack modules
llama_stack.providers.remote.inference.vllm: DEBUG
llama_stack.providers.inline.agents.meta_reference: DEBUG
llama_stack.providers.inline.agents.meta_reference.agent_instance: DEBUG
llama_stack.providers.inline.vector_io.faiss: DEBUG
llama_stack.providers.inline.telemetry.meta_reference: DEBUG
llama_stack.core: DEBUG
llama_stack.apis: DEBUG
uvicorn: DEBUG
uvicorn.access: INFO
fastapi: DEBUG
providers:
vector_io:
- config:
kvstore:
db_path: /tmp/faiss_store.db
type: sqlite
provider_id: faiss
provider_type: inline::faiss
agents:
- config:
persistence_store:
db_path: /tmp/agents_store.db
namespace: null
type: sqlite
responses_store:
db_path: /tmp/responses_store.db
type: sqlite
provider_id: meta-reference
provider_type: inline::meta-reference
inference:
- provider_id: vllm-inference
provider_type: remote::vllm
config:
url: ${env.VLLM_URL:=http://localhost:8000/v1}
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
api_token: ${env.VLLM_API_TOKEN:=fake}
tls_verify: ${env.VLLM_TLS_VERIFY:=false}
tool_runtime:
- provider_id: model-context-protocol
provider_type: remote::model-context-protocol
config: {}
module: null
telemetry:
- config:
service_name: 'llama-stack'
sinks: console,sqlite
sqlite_db_path: /tmp/trace_store.db
provider_id: meta-reference
provider_type: inline::meta-reference
metadata_store:
type: sqlite
db_path: /tmp/registry.db
namespace: null
inference_store:
type: sqlite
db_path: /tmp/inference_store.db
lightspeed-statck:
name: Test LSCore
service:
host: 0.0.0.0
port: 8080
auth_enabled: false
workers: 1
color_log: true
access_log: true
llama_stack:
use_as_library_client: false
# url: http://llama-stack:8321
url: "http://host.docker.internal:8321"
user_data_collection:
feedback_enabled: true
feedback_storage: "/tmp/data/feedback"
transcripts_enabled: true
transcripts_storage: "/tmp/data/transcripts"
authentication:
module: "noop"
mcp_servers:
- name: "server1"
provider_id: "model-context-protocol"
url: "http://host.docker.internal:8000/mcp/"
inference:
default_model: vllm-inference/gemma3:27b-it-qat
default_provider: vllm-inference
Initial Call and conversations response:
$ -> ls::test
[DEBUG] POST http://localhost:8080/v1/query
{
"conversation_id": "13120c75-fd5f-4182-bf94-49a65315bcf1",
"response": ""
}
$ -> ls::conversation "13120c75-fd5f-4182-bf94-49a65315bcf1"
[DEBUG] GET http://localhost:8080/v1/conversations/13120c75-fd5f-4182-bf94-49a65315bcf1
{
"conversation_id": "13120c75-fd5f-4182-bf94-49a65315bcf1",
"chat_history": [
{
"messages": [
{
"content": "I need to inform my manager about how many orchestator workflows we have currently registered",
"type": "user"
},
{
"content": "",
"type": "assistant"
}
],
"started_at": "2025-08-14T14:46:47.209657Z",
"completed_at": "2025-08-14T14:46:49.752601Z"
}
]
}
Llamastack agents/sessions
Agents:
$ -> curl -s http://localhost:8321/v1/agents | jq .
{
"data": [
{
"agent_id": "be50344d-c67f-493f-9010-d7e08da09c1c",
"agent_config": {
"sampling_params": {
"strategy": {
"type": "greedy"
},
"max_tokens": 0,
"repetition_penalty": 1.0,
"stop": null
},
"input_shields": [],
"output_shields": [],
"toolgroups": [],
"client_tools": [],
"tool_choice": null,
"tool_prompt_format": null,
"tool_config": {
"tool_choice": "auto",
"tool_prompt_format": null,
"system_message_behavior": "append"
},
"max_infer_iters": 10,
"model": "vllm-inference/qwen2.5-coder:7b",
"instructions": "You are a helpful assistant",
"name": null,
"enable_session_persistence": true,
"response_format": null
},
"created_at": "2025-08-14T14:46:47.113391Z"
}
],
"has_more": false,
"url": "/v1/agents"
}
Session:
$ -> curl -s http://localhost:8321/v1/agents/be50344d-c67f-493f-9010-d7e08da09c1c/session/13120c75-fd5f-4182-bf94-49a65315bcf1 | jq .
{
"session_id": "13120c75-fd5f-4182-bf94-49a65315bcf1",
"session_name": "a8187279-b6c2-4979-83b7-a1156191cb90",
"turns": [
{
"turn_id": "cc01590b-4129-4ceb-9a95-137a7000e8e6",
"session_id": "13120c75-fd5f-4182-bf94-49a65315bcf1",
"input_messages": [
{
"role": "user",
"content": "I need to inform my manager about how many orchestator workflows we have currently registered",
"context": null
}
],
"steps": [
{
"turn_id": "cc01590b-4129-4ceb-9a95-137a7000e8e6",
"step_id": "7f6697b0-24e4-4197-a386-4c7a659eaa7d",
"started_at": "2025-08-14T14:46:47.212324Z",
"completed_at": "2025-08-14T14:46:47.918493Z",
"step_type": "inference",
"model_response": {
"role": "assistant",
"content": "",
"stop_reason": "end_of_turn",
"tool_calls": [
{
"call_id": "call_1g7qh2ks",
"tool_name": "get_orchestrator_instances",
"arguments": {
"session_id": "your_session_id"
},
"arguments_json": "{\"session_id\": \"your_session_id\"}"
}
]
}
},
{
"turn_id": "cc01590b-4129-4ceb-9a95-137a7000e8e6",
"step_id": "e902b473-ecda-42d1-8e11-b384cd978b6c",
"started_at": "2025-08-14T14:46:47.955033Z",
"completed_at": "2025-08-14T14:46:48.037260Z",
"step_type": "tool_execution",
"tool_calls": [
{
"call_id": "call_1g7qh2ks",
"tool_name": "get_orchestrator_instances",
"arguments": {
"session_id": "your_session_id"
},
"arguments_json": "{\"session_id\": \"your_session_id\"}"
}
],
"tool_responses": [
{
"call_id": "call_1g7qh2ks",
"tool_name": "get_orchestrator_instances",
"content": [
{
"type": "text",
"text": "{\"total\":5,\"data\":[{\"name\":\"process_payment_refunds\",\"status\":\"RUNNING\"},{\"name\":\"sync_inventory_updates\",\"status\":\"READY\"},{\"name\":\"generate_monthly_reports\",\"status\":\"RUNNING\"},{\"name\":\"backup_customer_data\",\"status\":\"FAILED\"},{\"name\":\"send_welcome_emails\",\"status\":\"COMPLETED\"}]}"
}
],
"metadata": null
}
]
},
{
"turn_id": "cc01590b-4129-4ceb-9a95-137a7000e8e6",
"step_id": "3a069c6c-0e02-4a47-b90e-d65296923663",
"started_at": "2025-08-14T14:46:48.049073Z",
"completed_at": "2025-08-14T14:46:49.741719Z",
"step_type": "inference",
"model_response": {
"role": "assistant",
"content": "",
"stop_reason": "end_of_turn",
"tool_calls": []
}
}
],
"output_message": {
"role": "assistant",
"content": "",
"stop_reason": "end_of_turn",
"tool_calls": []
},
"output_attachments": [],
"started_at": "2025-08-14T14:46:47.209657Z",
"completed_at": "2025-08-14T14:46:49.752601Z"
}
],
"started_at": "2025-08-14T14:46:47.127526Z"
}
But I can see using tcmpdump in ollama that it's replaying correctly, looks like that lightspeed is not getting an event with the response.
T +0.198405 REDACTED:39176 -> REDACTED:11434 [A] #104
POST /v1/chat/completions HTTP/1.1'
Host: REDACTED:11434'
Accept-Encoding: gzip, deflate'
Connection: keep-alive'
Accept: application/json'
Content-Type: application/json'
User-Agent: AsyncOpenAI/Python 1.98.0'
X-Stainless-Lang: python'
X-Stainless-Package-Version: 1.98.0'
X-Stainless-OS: Linux'
X-Stainless-Arch: x64'
X-Stainless-Runtime: CPython'
X-Stainless-Runtime-Version: 3.12.11'
Authorization: Bearer test'
X-Stainless-Async: async:asyncio'
x-stainless-retry-count: 0'
x-stainless-read-timeout: 600'
Content-Length: 1116'
'
{"messages":[{"role":"system","content":[{"type":"text","text":"You are a helpful assistant"}]},{"role":"user","content":[{"type":"text","text":"I need to inform my manager about how many orchestator workflows we have currently registered"}]},{"role":"assistant","content":[{"type":"text","text":""}],"tool_calls":[{"id":"call_1g7qh2ks","type":"function","function":{"name":"get_orchestrator_instances","arguments":"{\"session_id\": \"your_session_id\"}"}}]},{"role":"tool","content":[{"type":"text","text":"{\"total\":5,\"data\":[{\"name\":\"process_payment_refunds\",\"status\":\"RUNNING\"},{\"name\":\"sync_inventory_updates\",\"status\":\"READY\"},{\"name\":\"generate_monthly_reports\",\"status\":\"RUNNING\"},{\"name\":\"backup_customer_data\",\"status\":\"FAILED\"},{\"name\":\"send_welcome_emails\",\"status\":\"COMPLETED\"}]}"}]}],"model":"qwen2.5-coder:7b","max_tokens":4096,"stream":true,"temperature"
##
T +0.000459 REDACTED:39176 -> REDACTED:11434 [AP] #106
:0.0,"tools":[{"type":"function","function":{"name":"get_orchestrator_instances","description":"","parameters":{"type":"object","properties":{"session_id":{"type":"string"}},"required":["session_id"]}}}]}
##
T +1.601402 REDACTED:11434 -> REDACTED:39176 [AP] #108
HTTP/1.1 200 OK'
Content-Type: text/event-stream'
Date: Thu, 14 Aug 2025 14:46:49 GMT'
Transfer-Encoding: chunked'
'
23d'
data: {"id":"chatcmpl-630","object":"chat.completion.chunk","created":1755182809,"model":"qwen2.5-coder:7b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"There are currently 5 orchestator workflows registered. Here is the status of each:\n\n1. **process_payment_refunds** - Status: RUNNING\n2. **sync_inventory_updates** - Status: READY\n3. **generate_monthly_reports** - Status: RUNNING\n4. **backup_customer_data** - Status: FAILED\n5. **send_welcome_emails** - Status: COMPLETED"},"finish_reason":"stop"}]}
data: [DONE]
- Latest versions from docker/quay.
- Nothing relevant in the logs on both sides.
- FastMCP is working and I can see the tool response correctly.
[{"id":"call_1g7qh2ks","type":"function","function":{"name":"get_orchestrator_instances","arguments":"{\"session_id\": \"your_session_id\"}"}}]},{"role":"tool","content":[{"type":"text","text":"{\"total\":5,\"data\":[{\"name\":\"process_payment_refunds\",\"status\":\"RUNNING\"},{\"name\":\"sync_inventory_updates\",\"status\":\"READY\"},{\"name\":\"generate_monthly_reports\",\"status\":\"RUNNING\"},{\"name\":\"backup_customer_data\",\"status\":\"FAILED\"},{\"name\":\"send_welcome_emails\",\"status\":\"COMPLETED\"}]}"}]}] - I checked the communication from llamastack to lightspeed service, there is no reponse event at all.
- Is this is a bug? Is this the desired usage of the MCP tools? should I define the MCP tools on llamastack?
Regards
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working