# Weather Function Calling Streaming Example

## Launching the Server
To begin, you need to launch a server that will handle the requests. The following code block is used to launch the server.
After the server is running, you can interact with it using the client to get the weather information.

```bash
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --port 30000 --host 0.0.0.0
```
This starts the server on port `30000`. Once running, you can make requests to it from this notebook.

In [None]:
import os

os.environ["CUDA_VISIBLE_DEVICES"] = "5"

In [None]:
from sglang.utils import execute_shell_command, wait_for_server, terminate_process

server_process = execute_shell_command(
    "python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --port 30111 --host 0.0.0.0"
)
wait_for_server("http://localhost:30111")

[2025-01-03 03:56:53] server_args=ServerArgs(model_path='meta-llama/Meta-Llama-3.1-8B-Instruct', tokenizer_path='meta-llama/Meta-Llama-3.1-8B-Instruct', tokenizer_mode='auto', skip_tokenizer_init=False, load_format='auto', trust_remote_code=False, dtype='auto', kv_cache_dtype='auto', quantization=None, context_length=None, device='cuda', served_model_name='meta-llama/Meta-Llama-3.1-8B-Instruct', chat_template=None, is_embedding=False, revision=None, return_token_ids=False, host='0.0.0.0', port=30111, mem_fraction_static=0.88, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, prefill_only_one_req=False, tp_size=1, stream_interval=1, random_seed=465810866, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_

## Define Tools for Function Call
Next, we'll define the tools for our function call. In this example, we define a function for getting the current weather in a specified location.

In [None]:
from openai import OpenAI
import json

# Define tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        },
    }
]
messages = [
    {
        "role": "user",
        "content": "What's the weather like in Boston today? Please respond with the format: Today's weather is :{function call result}",
    }
]

# Initialize OpenAI-like client
client = OpenAI(api_key="YOUR_API_KEY", base_url="http://0.0.0.0:30111/v1")
model_name = client.models.list().data[0].id

[2025-01-03 03:57:19] INFO:     127.0.0.1:35030 - "GET /v1/models HTTP/1.1" 200 OK


## Make Non-Streaming Request
We'll now test the non-streaming function call and print the result.

In [None]:
# Non-streaming mode test
response_non_stream = client.chat.completions.create(
    model=model_name,
    messages=messages,
    temperature=0.8,
    top_p=0.8,
    stream=False,  # Non-streaming
    tools=tools,
)
print("Non-stream response:")
print(response_non_stream)

[2025-01-03 03:57:19 TP0] Prefill batch. #new-seq: 1, #new-token: 235, #cached-token: 1, cache hit rate: 0.41%, token usage: 0.00, #running-req: 0, #queue-req: 0


[2025-01-03 03:57:19] INFO:     127.0.0.1:35030 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Non-stream response:
ChatCompletion(id='9663d03c6f7048539cdef8e2bf4bbc59', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='0', function=Function(arguments='{"location": "Boston, MA", "unit": "fahrenheit"}', name='get_current_weather'), type='function')]), matched_stop=128008)], created=1735876639, model='meta-llama/Meta-Llama-3.1-8B-Instruct', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=28, prompt_tokens=236, total_tokens=264, completion_tokens_details=None, prompt_tokens_details=None))


## Make Streaming Request
Next, we will test the streaming function call. The response will come in chunks, and we will handle the stream and process the function calls.

In [None]:
# Streaming mode test
print("Streaming response:")
response_stream = client.chat.completions.create(
    model=model_name,
    messages=messages,
    temperature=0.8,
    top_p=0.8,
    stream=True,  # Enable streaming
    tools=tools,
)

# Handle streaming responses, combine different chunks
chunks = []
for chunk in response_stream:
    chunks.append(chunk)
    print(chunk)  # Optionally print each chunk to observe its content

# Parse and combine function call arguments
arguments = []
for chunk in chunks:
    choice = chunk.choices[0]
    delta = choice.delta
    if delta.tool_calls:
        tool_call = delta.tool_calls[0]
        if tool_call.function.name:
            print(f"Streamed function call name: {tool_call.function.name}")

        if tool_call.function.arguments:
            arguments.append(tool_call.function.arguments)
            print(f"Streamed function call arguments: {tool_call.function.arguments}")

# Combine all argument fragments
full_arguments = "".join(arguments)
print(f"Final streamed function call arguments: {full_arguments}")

Streaming response:
[2025-01-03 03:57:19] INFO:     127.0.0.1:35030 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[2025-01-03 03:57:19 TP0] Prefill batch. #new-seq: 1, #new-token: 1, #cached-token: 235, cache hit rate: 49.27%, token usage: 0.00, #running-req: 0, #queue-req: 0
ChatCompletionChunk(id='3e30674d309747eb9fc9a7e4c177c843', choices=[Choice(delta=ChoiceDelta(content='', function_call=None, refusal=None, role='assistant', tool_calls=None), finish_reason='', index=0, logprobs=None, matched_stop=None)], created=1735876639, model='meta-llama/Meta-Llama-3.1-8B-Instruct', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
starting on new tool %d 0
[2025-01-03 03:57:19 TP0] Decode batch. #running-req: 1, #token: 241, token usage: 0.00, gen throughput (token/s): 6.42, #queue-req: 0
ChatCompletionChunk(id='3e30674d309747eb9fc9a7e4c177c843', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role='assistant', tool_calls

ChatCompletionChunk(id='3e30674d309747eb9fc9a7e4c177c843', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role='assistant', tool_calls=[ChoiceDeltaToolCall(index=None, id='0', function=ChoiceDeltaToolCallFunction(arguments='{"unit": "', name=''), type='function')]), finish_reason='tool_call', index=0, logprobs=None, matched_stop=None)], created=1735876639, model='meta-llama/Meta-Llama-3.1-8B-Instruct', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
ChatCompletionChunk(id='3e30674d309747eb9fc9a7e4c177c843', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role='assistant', tool_calls=[ChoiceDeltaToolCall(index=None, id='0', function=ChoiceDeltaToolCallFunction(arguments='f', name=''), type='function')]), finish_reason='tool_call', index=0, logprobs=None, matched_stop=None)], created=1735876639, model='meta-llama/Meta-Llama-3.1-8B-Instruct', object='chat.completion.chunk', service

## Simulate Tool Call
In this section, we'll simulate a call to the weather function and handle the arguments passed from the stream.

In [None]:
# Add user message and function call result to the message list
messages.append(
    {
        "role": "user",
        "content": "",
        "tool_calls": {"name": "get_current_weather", "arguments": full_arguments},
    }
)


# Define the actual function for getting current weather
def get_current_weather(location: str, unit: str):
    # Here you can integrate an actual weather API
    return f"The weather in {location} is 85 degrees {unit}. It is partly cloudy, with highs in the 90's."


# Simulate tool call
available_tools = {"get_current_weather": get_current_weather}

# Parse JSON arguments
try:
    call_data = json.loads(full_arguments)
except json.JSONDecodeError as e:
    print(f"JSON decoding error: {e}")
    call_data = {}

# Call the corresponding tool function
if "tool_calls" in messages[-1] and "name" in messages[-1]["tool_calls"]:
    tool_name = messages[-1]["tool_calls"]["name"]
    if tool_name in available_tools:
        tool_to_call = available_tools[tool_name]
        result = tool_to_call(**call_data)
        print(f"Function call result: {result}")
        messages.append({"role": "tool", "content": result, "name": tool_name})
    else:
        print(f"Unknown tool name: {tool_name}")
else:
    print("Function call name not found.")

Function call result: The weather in Boston, MA is 85 degrees fahrenheit. It is partly cloudy, with highs in the 90's.


## Final Chat Completion
We will now perform the final chat completion using the simulated function call result.

In [None]:
chat_completion_final = client.chat.completions.create(
    model=model_name,
    messages=messages,
    temperature=0.8,
    top_p=0.8,
    stream=False,
    tools=tools,
)

print("\nFinal Chat Completion:")
print(chat_completion_final)

[2025-01-03 03:57:19 TP0] Prefill batch. #new-seq: 1, #new-token: 40, #cached-token: 233, cache hit rate: 62.37%, token usage: 0.00, #running-req: 0, #queue-req: 0


[2025-01-03 03:57:19 TP0] Decode batch. #running-req: 1, #token: 290, token usage: 0.00, gen throughput (token/s): 81.38, #queue-req: 0
[2025-01-03 03:57:20] INFO:     127.0.0.1:35030 - "POST /v1/chat/completions HTTP/1.1" 200 OK

Final Chat Completion:
ChatCompletion(id='dbc7c72d3cc54dcc807683232c4ec487', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='0', function=Function(arguments='{"unit": "fahrenheit", "location": "Boston, MA"}', name='get_current_weather'), type='function')]), matched_stop=128008)], created=1735876640, model='meta-llama/Meta-Llama-3.1-8B-Instruct', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=28, prompt_tokens=273, total_tokens=301, completion_tokens_details=None, prompt_tokens_details=None))


## Terminate Server
Once you are done, you can terminate the server process using the following code.

In [None]:
terminate_process(server_process)