Skip to content

[BUG] Tools are not invoked when used with vLLM's OpenAI compatible endpoint #1022

@ssbusc1

Description

@ssbusc1

Checks

  • I have updated to the lastest minor and patch version of Strands
  • I have checked the documentation and this is not expected behavior
  • I have searched ./issues and there are no duplicates of my issue

Strands Version

1.12.0

Python Version

3.13.5

Operating System

Ubuntu 22.04.4 LTS

Installation Method

pip

Steps to Reproduce

I have a vLLM server with a Qwen3-coder model and am hitting the OpenAI compatible endpoint.
Steps to reproduce:

  • pip install strands-agents[openai]
  • Configure the OpenAI model provider with the vLLM base_url and model (see below)
  • Configure strands with the OpenAI model provider
  • Invoke the agent
from strands import Agent

@tool
def get_weather(city: str) -> str:
    """returns weather info for the specified city."""
    return f"The weather in {city} is sunny with a high of 85F and a low of 58F"

model = OpenAIModel(
    client_args={
        "api_key": "EMPTY",
        "base_url": "<my base url>",
    },
    # **model_config
    model_id="<my Qwen3-coder model path>",
    params={
        "max_tokens": 1000,
        "temperature": 0.7,
    }
)

# Initialize the agent with tools, model, and configuration
agent = Agent(model=model,
              tools=[get_weather],
              system_prompt="You are the chief meteorologist at a TV station. When asked, you should look up the weather in the specified city and provide a colorful response"
              )
result = agent(""What is the weather in San Francisco?"")
print(result.message)

I've also tried replacing the get_weather tool with the calculator tool from the strands-agents-tools package, and that doesn't invoke the tool either.

Expected Behavior

"The weather in San Francisco is sunny with a high of 85F and a low of 58F" or something like that.

FWIW, the same tool when used with the openai-agents SDK (with the relevant annotation) and the same vLLM server+ model combination generates the weather report as expected.

Actual Behavior

The result of running the code prints out:

<tool_call>
<function=get_weather>
<parameter=city>
San Francisco
</parameter>
</function>
</tool_call>

instead of the weather.

Here's the entire result:

AgentResult(stop_reason='end_turn',
            message={'content': [{'text': '<tool_call>\n'
                                          '<function=get_weather>\n'
                                          '<parameter=city>\n'
                                          'San Francisco\n'
                                          '</parameter>\n'
                                          '</function>\n'
                                          '</tool_call>'}],
                     'role': 'assistant'},
            metrics=EventLoopMetrics(cycle_count=1,
                                     tool_metrics={},
                                     cycle_durations=[0.3705911636352539],
                                     traces=[<strands.telemetry.metrics.Trace object at 0x7fc473363460>],
                                     accumulated_usage={'inputTokens': 310,
                                                        'outputTokens': 23,
                                                        'totalTokens': 333},
                                     accumulated_metrics={'latencyMs': 0}),
            state={})

Additional Context

I also issued a direct chat.completions request against the server using the OpenAI client. Here's that response:

ChatCompletion(id='chatcmpl-6f6f42cf28014456893aee0f6fa61b2e', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageFunctionToolCall(id='chatcmpl-tool-15fa2e41b38a4a5aadece55724056f4a', function=Function(arguments='{"city": "San Francisco"}', name='get_weather'), type='function')], reasoning_content=None), stop_reason=None, token_ids=None)], created=1760142049, model='<my Qwen3-coder model name>', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=23, prompt_tokens=300, total_tokens=323, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)

The finish_reason is set to tool_calls as expected, and the tool call and arguments seem fine to me.

Possible Solution

No response

Related Issues

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions