# Benchmark an Endpoint

Welcome to this tutorial on benchmarking an endpoint deployed on a SambaNova dedicated node!

If you don't already have an endpoint deployed, please follow one of the workflows described in the [README](./README.md) to deploy an endpoint before proceeding with this tutorial.

Also, please install the benchmarking requirements in the Python kernel used in this Jupyter notebook

`! pip install -r ../../benchmarking/requirements.txt`

## 1.  Imports

In [1]:
import sys
sys.version

'3.11.11 (main, Dec 11 2024, 10:28:39) [Clang 14.0.6 ]'

In [2]:
from IPython.display import display, HTML
display(HTML("<style>:root { --jp-notebook-max-width: 100% !important; }</style>"))
import json
import os
import pprint
import getpass
import pandas as pd
pd.set_option('display.max_columns', None)

In [3]:
benchmarking_dir = "../../benchmarking/"
sys.path.append(benchmarking_dir + "benchmarking_scripts")
sys.path.append(benchmarking_dir + "prompts")

from synthetic_performance_eval_script import *

## 2. Get endpoint info
To benchmark the endpoint, we will need to obtain some of its information. Note that this information can be obtained from your SambaNova representative.

#### Enter the endpoint url
Run the cell below and then enter the endpoint URL. This should be in the format of `https://my.env/v1/<endpoint_id>/chat/completions` 

In [4]:
endpoint_url = input().strip()
if endpoint_url[-1] == "/":
    endpoint_url = endpoint_url[:-1]
os.environ["SAMBASTUDIO_URL"] = endpoint_url
endpoint_id = endpoint_url.split('/')[-3]
print("Benchmarking Endpoint:", endpoint_id)

Benchmarking Endpoint: ad64853e-c96d-45c8-b98f-72a6a650e73b


#### Enter the endpoint API key

In [5]:
endpoint_key = getpass.getpass().strip()
if len(endpoint_key) > 0:
    os.environ["SAMBASTUDIO_API_KEY"] = endpoint_key
else:
    print("Please enter a valid key")

## 3. Automatically retrieve model list from endpoint (Optional)
Run this section only if you don't the have list of models on the endpoint. Note that running this section requires you to first follow the set up instructions given in the [README](./README.md)
#### Set up environment connector
The connector connects to the remote dedicated environment using the variables defined below

In [6]:
env_url = '/'.join(endpoint_url.split('/')[:3])
env_url

'https://sjc3-e9.sambanova.net'

In [7]:
print("Enter the env access key")
env_key = getpass.getpass().strip()
if len(env_key) > 0:
    os.environ["SAMBASTUDIO_ACCESS_KEY"] = env_key
else:
    print("Please enter a valid key")

Enter the env access key


In [8]:
env_tenant = "default"

In [9]:
from snsdk import SnSdk
sn_env = SnSdk(host_url=env_url, 
                   access_key=env_key, 
                   tenant_id=env_tenant)

#### Get model names in the endpoint

In [10]:
endpoint_info = sn_env.endpoint_info_by_id(endpoint_id)
endpoint_model_id = endpoint_info['targets'][0]["model"]
model_info = sn_env.model_info(endpoint_model_id, job_type="deploy")
model_constituents = [m["name"] for m in model_info["dependencies"]]
sorted(model_constituents)

['Meta-Llama-3.3-70B-Instruct',
 'Qwen-2.5-72B-SD-Qwen-2.5-0.5B',
 'Salesforce--Llama-xLAM-2-70b-fc-r',
 'Salesforce--Llama-xLAM-2-8b-fc-r']

#### Get target model names in the endpoint
Target model names generally differ from model names when the model is a speculative decoding pair

In [11]:
target_models = []
for constituent_name in model_constituents:    
    model_name = constituent_name

    # Check for speculative decoding
    constituent_info = sn_env.model_info(constituent_name, job_type="deploy")
    if 'target_model' in constituent_info['config']:
        target_name = constituent_info['config']['target_model']        
        if len(target_name) > 0:
            model_name = target_name
    target_models.append(model_name)
sorted(target_models)

['Meta-Llama-3.3-70B-Instruct',
 'Qwen2.5-72B-Instruct',
 'Salesforce--Llama-xLAM-2-70b-fc-r',
 'Salesforce--Llama-xLAM-2-8b-fc-r']

## 4. Set up Model Configs for Benchmarking
Note that this section only currently supports a fraction of what the Benchmarking Kit is capable of. You may repurpose this section if you would like to benchmark images or run questions per second (qps).

#### Specify target models
If not automatically set in Step 3, set Target models as a python list

In [12]:
target_models = target_models # ["model1", "model2", "model3"]

#### Specify combinatorial inputs for benchmarking

In [None]:
input_tokens = [4000, 8000, 16000]
output_tokens = [30]
num_requests = [64]
concurrent_requests = [1, 2, 4, 8, 16, 32]

#### Automatically generate configs

In [14]:
model_configs_df = pd.DataFrame(columns=[
                "model_name",
                "input_tokens",
                "output_tokens",
                "num_requests",
                "concurrent_requests"
                ])
counter = 1
for target_model in target_models:
    for input_token in input_tokens:
        for output_token in output_tokens:
            for num_request in num_requests:
                for concurrent_request in concurrent_requests:
                    model_configs_df.loc[counter] = [
                                        target_model, 
                                        input_token,
                                        output_token,
                                        num_request,
                                        concurrent_request
                                    ]
                    counter += 1


#### Confirm model configs and delete any configs that you would rather exclude

In [15]:
model_configs_df

Unnamed: 0,model_name,input_tokens,output_tokens,num_requests,concurrent_requests
1,Qwen2.5-72B-Instruct,4000,100,64,1
2,Qwen2.5-72B-Instruct,4000,100,64,2
3,Qwen2.5-72B-Instruct,4000,100,64,4
4,Qwen2.5-72B-Instruct,4000,100,64,8
5,Qwen2.5-72B-Instruct,4000,100,64,16
...,...,...,...,...,...
68,Salesforce--Llama-xLAM-2-70b-fc-r,16000,100,64,2
69,Salesforce--Llama-xLAM-2-70b-fc-r,16000,100,64,4
70,Salesforce--Llama-xLAM-2-70b-fc-r,16000,100,64,8
71,Salesforce--Llama-xLAM-2-70b-fc-r,16000,100,64,16


In [16]:
drop_rows = []
model_configs_df.drop(drop_rows, inplace=True)
model_configs_df

Unnamed: 0,model_name,input_tokens,output_tokens,num_requests,concurrent_requests
1,Qwen2.5-72B-Instruct,4000,100,64,1
2,Qwen2.5-72B-Instruct,4000,100,64,2
3,Qwen2.5-72B-Instruct,4000,100,64,4
4,Qwen2.5-72B-Instruct,4000,100,64,8
5,Qwen2.5-72B-Instruct,4000,100,64,16
...,...,...,...,...,...
68,Salesforce--Llama-xLAM-2-70b-fc-r,16000,100,64,2
69,Salesforce--Llama-xLAM-2-70b-fc-r,16000,100,64,4
70,Salesforce--Llama-xLAM-2-70b-fc-r,16000,100,64,8
71,Salesforce--Llama-xLAM-2-70b-fc-r,16000,100,64,16


## 5. Run Benchmarking
We will run benchmarking with the configs now

#### Name the benchmarking run
Give the run a unique name so that the configs and results can be saved with that name. Please note that the name should be compatible with file system path naming rules.

In [17]:
run_name = "qwen_llama_salesforce_20250519_1"

#### Configure saving options
Saving makes it easy to re-run the model configs by skipping all the above sections

In [None]:
output_path = f"{benchmarking_dir} data/benchmarking_tracking_tests/"
config = {
    'model_configs_path': f'{output_path}model_configs_{run_name}.csv', # leave this as is
    'llm_api': 'sambastudio', # leave this as is
    'output_files_dir': f'{output_path}logs/output_files', # each run saved here
    'consolidated_results_dir': f'{output_path}consolidated_results', # consolidated xlsx saved here
    'timeout': 3600,
    'time_delay': 0, # between batches of concurrent requests
}

#### Save configs

In [19]:
with open(f"{output_path}config_{run_name}.yaml", "w") as f:
    yaml.dump(config, f, default_flow_style=False)
model_configs_df.to_csv(config["model_configs_path"], index=False)

#### Run configs

In [20]:
run_benchmarking(config=config, benchmarking_dir=benchmarking_dir, run_name=run_name)

2025-05-19 18:28:05,715 [INFO] Running model_name Qwen2.5-72B-Instruct, input_tokens 4000, output_tokens 100, concurrent_requests 1, num_requests 64, multimodal_img_size na


Running Requests: 100%|██████████| 64/64 [00:10<00:00,  6.27it/s]

2025-05-19 18:28:16,727 [ERROR] Error while running model_name Qwen2.5-72B-Instruct,                 input_tokens 4000,                 output_tokens 100,                 num_requests 64,                 concurrent_requests 1,                 qps 0.0,                 qps_distribution constant                 multimodal_img_size na
2025-05-19 18:28:16,728 [ERROR] Unexpected error happened when executing requests:                
- Error while running LLM API requests. Check your model name, LLM API type, env variables and endpoint status.                
Additional messages:
- Error: 'choices' at streamed event: {"error":{"code":null,"message":"Requested generation length 100 is not possible! The provided prompt is 4028 tokens long, so generating 100 tokens requires a sequence length of 4128, but the maximum supported sequence length is just 4096!","param":null,"type":"Error"}}
2025-05-19 18:28:16,728 [INFO] Time delay: 0 seconds
2025-05-19 18:28:16,730 [INFO] Running model_name Qwen2.5

Running Requests: 100%|██████████| 64/64 [00:10<00:00,  5.83it/s]
Running Requests: 100%|██████████| 64/64 [00:16<00:00,  4.17it/s]

2025-05-19 18:28:33,831 [ERROR] Error while running model_name Qwen2.5-72B-Instruct,                 input_tokens 4000,                 output_tokens 100,                 num_requests 64,                 concurrent_requests 2,                 qps 0.0,                 qps_distribution constant                 multimodal_img_size na
2025-05-19 18:28:33,832 [ERROR] Unexpected error happened when executing requests:                
- Error while running LLM API requests. Check your model name, LLM API type, env variables and endpoint status.                
Additional messages:
- Error: 'choices' at streamed event: {"error":{"code":null,"message":"Requested generation length 100 is not possible! The provided prompt is 4028 tokens long, so generating 100 tokens requires a sequence length of 4128, but the maximum supported sequence length is just 4096!","param":null,"type":"Error"}}
2025-05-19 18:28:33,834 [INFO] Time delay: 0 seconds
2025-05-19 18:28:33,836 [INFO] Running model_name Qwen2.5

Running Requests: 100%|██████████| 64/64 [00:17<00:00,  3.76it/s]
Running Requests: 100%|██████████| 64/64 [00:06<00:00,  6.83it/s]

2025-05-19 18:28:40,641 [ERROR] Error while running model_name Qwen2.5-72B-Instruct,                 input_tokens 4000,                 output_tokens 100,                 num_requests 64,                 concurrent_requests 4,                 qps 0.0,                 qps_distribution constant                 multimodal_img_size na
2025-05-19 18:28:40,642 [ERROR] Unexpected error happened when executing requests:                
- Error while running LLM API requests. Check your model name, LLM API type, env variables and endpoint status.                
Additional messages:
- Error: 'choices' at streamed event: {"error":{"code":null,"message":"Requested generation length 100 is not possible! The provided prompt is 4028 tokens long, so generating 100 tokens requires a sequence length of 4128, but the maximum supported sequence length is just 4096!","param":null,"type":"Error"}}
2025-05-19 18:28:40,643 [INFO] Time delay: 0 seconds
2025-05-19 18:28:40,644 [INFO] Running model_name Qwen2.5

Running Requests: 100%|██████████| 64/64 [00:06<00:00,  9.39it/s]
Running Requests: 100%|██████████| 64/64 [00:07<00:00,  3.99it/s]

2025-05-19 18:28:48,582 [ERROR] Error while running model_name Qwen2.5-72B-Instruct,                 input_tokens 4000,                 output_tokens 100,                 num_requests 64,                 concurrent_requests 8,                 qps 0.0,                 qps_distribution constant                 multimodal_img_size na
2025-05-19 18:28:48,583 [ERROR] Unexpected error happened when executing requests:                
- Error while running LLM API requests. Check your model name, LLM API type, env variables and endpoint status.                
Additional messages:
- Error: 'choices' at streamed event: {"error":{"code":null,"message":"Requested generation length 100 is not possible! The provided prompt is 4028 tokens long, so generating 100 tokens requires a sequence length of 4128, but the maximum supported sequence length is just 4096!","param":null,"type":"Error"}}
2025-05-19 18:28:48,583 [INFO] Time delay: 0 seconds
2025-05-19 18:28:48,585 [INFO] Running model_name Qwen2.5

Running Requests: 100%|██████████| 64/64 [00:07<00:00,  8.05it/s]
Running Requests: 100%|██████████| 64/64 [00:05<00:00,  7.59it/s]

2025-05-19 18:28:54,321 [ERROR] Error while running model_name Qwen2.5-72B-Instruct,                 input_tokens 4000,                 output_tokens 100,                 num_requests 64,                 concurrent_requests 16,                 qps 0.0,                 qps_distribution constant                 multimodal_img_size na
2025-05-19 18:28:54,322 [ERROR] Unexpected error happened when executing requests:                
- Error while running LLM API requests. Check your model name, LLM API type, env variables and endpoint status.                
Additional messages:
- Error: 'choices' at streamed event: {"error":{"code":null,"message":"Requested generation length 100 is not possible! The provided prompt is 4028 tokens long, so generating 100 tokens requires a sequence length of 4128, but the maximum supported sequence length is just 4096!","param":null,"type":"Error"}}
2025-05-19 18:28:54,323 [INFO] Time delay: 0 seconds
2025-05-19 18:28:54,325 [INFO] Running model_name Qwen2.

Running Requests: 100%|██████████| 64/64 [00:05<00:00, 11.05it/s]
Running Requests: 100%|██████████| 64/64 [00:05<00:00, 14.97it/s]

2025-05-19 18:29:00,202 [ERROR] Error while running model_name Qwen2.5-72B-Instruct,                 input_tokens 4000,                 output_tokens 100,                 num_requests 64,                 concurrent_requests 32,                 qps 0.0,                 qps_distribution constant                 multimodal_img_size na
2025-05-19 18:29:00,203 [ERROR] Unexpected error happened when executing requests:                
- Error while running LLM API requests. Check your model name, LLM API type, env variables and endpoint status.                
Additional messages:
- Error: 'choices' at streamed event: {"error":{"code":null,"message":"Requested generation length 100 is not possible! The provided prompt is 4028 tokens long, so generating 100 tokens requires a sequence length of 4128, but the maximum supported sequence length is just 4096!","param":null,"type":"Error"}}
2025-05-19 18:29:00,204 [INFO] Time delay: 0 seconds
2025-05-19 18:29:00,205 [INFO] Running model_name Qwen2.

Running Requests: 100%|██████████| 64/64 [00:05<00:00, 10.91it/s]
Running Requests: 100%|██████████| 64/64 [00:11<00:00,  5.78it/s]

2025-05-19 18:29:11,930 [ERROR] Error while running model_name Qwen2.5-72B-Instruct,                 input_tokens 8000,                 output_tokens 100,                 num_requests 64,                 concurrent_requests 1,                 qps 0.0,                 qps_distribution constant                 multimodal_img_size na
2025-05-19 18:29:11,932 [ERROR] Unexpected error happened when executing requests:                
- Error while running LLM API requests. Check your model name, LLM API type, env variables and endpoint status.                
Additional messages:
- Error: 'choices' at streamed event: {"error":{"code":null,"message":"Requested generation length 100 is not possible! The provided prompt is 8028 tokens long, so generating 100 tokens requires a sequence length of 8128, but the maximum supported sequence length is just 4096!","param":null,"type":"Error"}}
2025-05-19 18:29:11,933 [INFO] Time delay: 0 seconds
2025-05-19 18:29:11,934 [INFO] Running model_name Qwen2.5

Running Requests: 100%|██████████| 64/64 [00:11<00:00,  5.47it/s]
Running Requests: 100%|██████████| 64/64 [00:13<00:00,  3.77it/s]

2025-05-19 18:29:25,767 [ERROR] Error while running model_name Qwen2.5-72B-Instruct,                 input_tokens 8000,                 output_tokens 100,                 num_requests 64,                 concurrent_requests 2,                 qps 0.0,                 qps_distribution constant                 multimodal_img_size na
2025-05-19 18:29:25,770 [ERROR] Unexpected error happened when executing requests:                
- Error while running LLM API requests. Check your model name, LLM API type, env variables and endpoint status.                
Additional messages:
- Error: 'choices' at streamed event: {"error":{"code":null,"message":"Requested generation length 100 is not possible! The provided prompt is 8028 tokens long, so generating 100 tokens requires a sequence length of 8128, but the maximum supported sequence length is just 4096!","param":null,"type":"Error"}}
2025-05-19 18:29:25,772 [INFO] Time delay: 0 seconds
2025-05-19 18:29:25,774 [INFO] Running model_name Qwen2.5

Running Requests: 100%|██████████| 64/64 [00:13<00:00,  4.65it/s]
Running Requests: 100%|██████████| 64/64 [00:07<00:00,  5.14it/s]

2025-05-19 18:29:33,316 [ERROR] Error while running model_name Qwen2.5-72B-Instruct,                 input_tokens 8000,                 output_tokens 100,                 num_requests 64,                 concurrent_requests 4,                 qps 0.0,                 qps_distribution constant                 multimodal_img_size na
2025-05-19 18:29:33,318 [ERROR] Unexpected error happened when executing requests:                
- Error while running LLM API requests. Check your model name, LLM API type, env variables and endpoint status.                
Additional messages:
- Error: 'choices' at streamed event: {"error":{"code":null,"message":"Requested generation length 100 is not possible! The provided prompt is 8028 tokens long, so generating 100 tokens requires a sequence length of 8128, but the maximum supported sequence length is just 4096!","param":null,"type":"Error"}}
2025-05-19 18:29:33,319 [INFO] Time delay: 0 seconds
2025-05-19 18:29:33,321 [INFO] Running model_name Qwen2.5

Running Requests: 100%|██████████| 64/64 [00:07<00:00,  8.47it/s]
Running Requests: 100%|██████████| 64/64 [00:06<00:00,  4.97it/s]

2025-05-19 18:29:39,867 [ERROR] Error while running model_name Qwen2.5-72B-Instruct,                 input_tokens 8000,                 output_tokens 100,                 num_requests 64,                 concurrent_requests 8,                 qps 0.0,                 qps_distribution constant                 multimodal_img_size na
2025-05-19 18:29:39,868 [ERROR] Unexpected error happened when executing requests:                
- Error while running LLM API requests. Check your model name, LLM API type, env variables and endpoint status.                
Additional messages:
- Error: 'choices' at streamed event: {"error":{"code":null,"message":"Requested generation length 100 is not possible! The provided prompt is 8028 tokens long, so generating 100 tokens requires a sequence length of 8128, but the maximum supported sequence length is just 4096!","param":null,"type":"Error"}}
2025-05-19 18:29:39,869 [INFO] Time delay: 0 seconds
2025-05-19 18:29:39,870 [INFO] Running model_name Qwen2.5

Running Requests: 100%|██████████| 64/64 [00:06<00:00,  9.53it/s]
Running Requests: 100%|██████████| 64/64 [00:05<00:00,  7.33it/s]

2025-05-19 18:29:46,084 [ERROR] Error while running model_name Qwen2.5-72B-Instruct,                 input_tokens 8000,                 output_tokens 100,                 num_requests 64,                 concurrent_requests 16,                 qps 0.0,                 qps_distribution constant                 multimodal_img_size na
2025-05-19 18:29:46,085 [ERROR] Unexpected error happened when executing requests:                
- Error while running LLM API requests. Check your model name, LLM API type, env variables and endpoint status.                
Additional messages:
- Error: 'choices' at streamed event: {"error":{"code":null,"message":"Requested generation length 100 is not possible! The provided prompt is 8028 tokens long, so generating 100 tokens requires a sequence length of 8128, but the maximum supported sequence length is just 4096!","param":null,"type":"Error"}}
2025-05-19 18:29:46,086 [INFO] Time delay: 0 seconds
2025-05-19 18:29:46,087 [INFO] Running model_name Qwen2.

Running Requests: 100%|██████████| 64/64 [00:06<00:00, 10.57it/s]
Running Requests: 100%|██████████| 64/64 [00:05<00:00,  7.34it/s]

2025-05-19 18:29:52,193 [ERROR] Error while running model_name Qwen2.5-72B-Instruct,                 input_tokens 8000,                 output_tokens 100,                 num_requests 64,                 concurrent_requests 32,                 qps 0.0,                 qps_distribution constant                 multimodal_img_size na
2025-05-19 18:29:52,193 [ERROR] Unexpected error happened when executing requests:                
- Error while running LLM API requests. Check your model name, LLM API type, env variables and endpoint status.                
Additional messages:
- Error: 'choices' at streamed event: {"error":{"code":null,"message":"Requested generation length 100 is not possible! The provided prompt is 8028 tokens long, so generating 100 tokens requires a sequence length of 8128, but the maximum supported sequence length is just 4096!","param":null,"type":"Error"}}
2025-05-19 18:29:52,194 [INFO] Time delay: 0 seconds
2025-05-19 18:29:52,195 [INFO] Running model_name Qwen2.

Running Requests: 100%|██████████| 64/64 [00:06<00:00, 10.23it/s]
Running Requests: 100%|██████████| 64/64 [00:13<00:00,  4.51it/s]

2025-05-19 18:30:06,372 [ERROR] Error while running model_name Qwen2.5-72B-Instruct,                 input_tokens 16000,                 output_tokens 100,                 num_requests 64,                 concurrent_requests 1,                 qps 0.0,                 qps_distribution constant                 multimodal_img_size na
2025-05-19 18:30:06,373 [ERROR] Unexpected error happened when executing requests:                
- Error while running LLM API requests. Check your model name, LLM API type, env variables and endpoint status.                
Additional messages:
- Error: 'choices' at streamed event: {"error":{"code":null,"message":"Requested generation length 100 is not possible! The provided prompt is 16028 tokens long, so generating 100 tokens requires a sequence length of 16128, but the maximum supported sequence length is just 4096!","param":null,"type":"Error"}}
2025-05-19 18:30:06,376 [INFO] Time delay: 0 seconds
2025-05-19 18:30:06,378 [INFO] Running model_name Qwen

Running Requests: 100%|██████████| 64/64 [00:14<00:00,  4.55it/s]
Running Requests: 100%|██████████| 64/64 [00:14<00:00,  3.12it/s]

2025-05-19 18:30:21,701 [ERROR] Error while running model_name Qwen2.5-72B-Instruct,                 input_tokens 16000,                 output_tokens 100,                 num_requests 64,                 concurrent_requests 2,                 qps 0.0,                 qps_distribution constant                 multimodal_img_size na
2025-05-19 18:30:21,701 [ERROR] Unexpected error happened when executing requests:                
- Error while running LLM API requests. Check your model name, LLM API type, env variables and endpoint status.                
Additional messages:
- Error: 'choices' at streamed event: {"error":{"code":null,"message":"Requested generation length 100 is not possible! The provided prompt is 16028 tokens long, so generating 100 tokens requires a sequence length of 16128, but the maximum supported sequence length is just 4096!","param":null,"type":"Error"}}
2025-05-19 18:30:21,702 [INFO] Time delay: 0 seconds
2025-05-19 18:30:21,703 [INFO] Running model_name Qwen

Running Requests: 100%|██████████| 64/64 [00:15<00:00,  4.18it/s]
Running Requests: 100%|██████████| 64/64 [00:08<00:00,  4.02it/s]

2025-05-19 18:30:30,192 [ERROR] Error while running model_name Qwen2.5-72B-Instruct,                 input_tokens 16000,                 output_tokens 100,                 num_requests 64,                 concurrent_requests 4,                 qps 0.0,                 qps_distribution constant                 multimodal_img_size na
2025-05-19 18:30:30,194 [ERROR] Unexpected error happened when executing requests:                
- Error while running LLM API requests. Check your model name, LLM API type, env variables and endpoint status.                
Additional messages:
- Error: 'choices' at streamed event: {"error":{"code":null,"message":"Requested generation length 100 is not possible! The provided prompt is 16028 tokens long, so generating 100 tokens requires a sequence length of 16128, but the maximum supported sequence length is just 4096!","param":null,"type":"Error"}}
2025-05-19 18:30:30,196 [INFO] Time delay: 0 seconds
2025-05-19 18:30:30,198 [INFO] Running model_name Qwen

Running Requests: 100%|██████████| 64/64 [00:08<00:00,  7.53it/s]
Running Requests: 100%|██████████| 64/64 [00:06<00:00,  5.03it/s]

2025-05-19 18:30:37,384 [ERROR] Error while running model_name Qwen2.5-72B-Instruct,                 input_tokens 16000,                 output_tokens 100,                 num_requests 64,                 concurrent_requests 8,                 qps 0.0,                 qps_distribution constant                 multimodal_img_size na
2025-05-19 18:30:37,385 [ERROR] Unexpected error happened when executing requests:                
- Error while running LLM API requests. Check your model name, LLM API type, env variables and endpoint status.                
Additional messages:
- Error: 'choices' at streamed event: {"error":{"code":null,"message":"Requested generation length 100 is not possible! The provided prompt is 16028 tokens long, so generating 100 tokens requires a sequence length of 16128, but the maximum supported sequence length is just 4096!","param":null,"type":"Error"}}
2025-05-19 18:30:37,386 [INFO] Time delay: 0 seconds
2025-05-19 18:30:37,390 [INFO] Running model_name Qwen

Running Requests: 100%|██████████| 64/64 [00:07<00:00,  8.93it/s]
Running Requests: 100%|██████████| 64/64 [00:06<00:00,  6.68it/s]

2025-05-19 18:30:44,228 [ERROR] Error while running model_name Qwen2.5-72B-Instruct,                 input_tokens 16000,                 output_tokens 100,                 num_requests 64,                 concurrent_requests 16,                 qps 0.0,                 qps_distribution constant                 multimodal_img_size na
2025-05-19 18:30:44,229 [ERROR] Unexpected error happened when executing requests:                
- Error while running LLM API requests. Check your model name, LLM API type, env variables and endpoint status.                
Additional messages:
- Error: 'choices' at streamed event: {"error":{"code":null,"message":"Requested generation length 100 is not possible! The provided prompt is 16028 tokens long, so generating 100 tokens requires a sequence length of 16128, but the maximum supported sequence length is just 4096!","param":null,"type":"Error"}}
2025-05-19 18:30:44,230 [INFO] Time delay: 0 seconds
2025-05-19 18:30:44,231 [INFO] Running model_name Qwe

Running Requests: 100%|██████████| 64/64 [00:06<00:00,  9.37it/s]
Running Requests: 100%|██████████| 64/64 [00:06<00:00,  6.26it/s]

2025-05-19 18:30:51,212 [ERROR] Error while running model_name Qwen2.5-72B-Instruct,                 input_tokens 16000,                 output_tokens 100,                 num_requests 64,                 concurrent_requests 32,                 qps 0.0,                 qps_distribution constant                 multimodal_img_size na
2025-05-19 18:30:51,212 [ERROR] Unexpected error happened when executing requests:                
- Error while running LLM API requests. Check your model name, LLM API type, env variables and endpoint status.                
Additional messages:
- Error: 'choices' at streamed event: {"error":{"code":null,"message":"Requested generation length 100 is not possible! The provided prompt is 16028 tokens long, so generating 100 tokens requires a sequence length of 16128, but the maximum supported sequence length is just 4096!","param":null,"type":"Error"}}
2025-05-19 18:30:51,213 [INFO] Time delay: 0 seconds
2025-05-19 18:30:51,214 [INFO] Running model_name Met

Running Requests: 100%|██████████| 64/64 [00:07<00:00,  8.73it/s]
Running Requests: 100%|██████████| 64/64 [01:17<00:00,  1.20s/it]

2025-05-19 18:32:09,882 [INFO] Tasks Executed!
2025-05-19 18:32:09,883 [INFO] Benchmarking results obtained for model Meta-Llama-3.3-70B-Instruct queried with the sambastudio API.
2025-05-19 18:32:09,911 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 18:32:09,914 [INFO]     p5 = 0.48
2025-05-19 18:32:09,915 [INFO]     p25 = 0.4896
2025-05-19 18:32:09,915 [INFO]     p50 = 0.4995
2025-05-19 18:32:09,916 [INFO]     p75 = 0.5092
2025-05-19 18:32:09,917 [INFO]     p90 = 0.5238
2025-05-19 18:32:09,917 [INFO]     p95 = 0.5347
2025-05-19 18:32:09,917 [INFO]     p99 = 1.3137
2025-05-19 18:32:09,918 [INFO]     mean = 0.5325
2025-05-19 18:32:09,919 [INFO]     min = 0.4722
2025-05-19 18:32:09,919 [INFO]     max = 2.4211
2025-05-19 18:32:09,919 [INFO]     stddev = 0.2411
2025-05-19 18:32:09,920 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 18:32:09,921 [INFO]     p5 = 1.1609
2025-05-19 18:32:09,921 [INFO]     p25 = 1.1706
2025

Running Requests: 100%|██████████| 64/64 [01:18<00:00,  1.23s/it]
Running Requests: 100%|██████████| 64/64 [01:11<00:00,  1.11s/it]

2025-05-19 18:33:22,297 [INFO] Tasks Executed!
2025-05-19 18:33:22,298 [INFO] Benchmarking results obtained for model Meta-Llama-3.3-70B-Instruct queried with the sambastudio API.
2025-05-19 18:33:22,314 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 18:33:22,317 [INFO]     p5 = 1.5376
2025-05-19 18:33:22,318 [INFO]     p25 = 1.5424
2025-05-19 18:33:22,319 [INFO]     p50 = 1.5473
2025-05-19 18:33:22,319 [INFO]     p75 = 1.5539
2025-05-19 18:33:22,320 [INFO]     p90 = 1.5593
2025-05-19 18:33:22,320 [INFO]     p95 = 1.5897
2025-05-19 18:33:22,321 [INFO]     p99 = 1.623
2025-05-19 18:33:22,322 [INFO]     mean = 1.535
2025-05-19 18:33:22,322 [INFO]     min = 0.4943
2025-05-19 18:33:22,323 [INFO]     max = 1.6399
2025-05-19 18:33:22,324 [INFO]     stddev = 0.1335
2025-05-19 18:33:22,324 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 18:33:22,326 [INFO]     p5 = 2.2187
2025-05-19 18:33:22,327 [INFO]     p25 = 2.2222
2025

Running Requests: 100%|██████████| 64/64 [01:12<00:00,  1.13s/it]
Running Requests: 100%|██████████| 64/64 [00:51<00:00,  1.10it/s]

2025-05-19 18:34:14,754 [INFO] Tasks Executed!
2025-05-19 18:34:14,755 [INFO] Benchmarking results obtained for model Meta-Llama-3.3-70B-Instruct queried with the sambastudio API.
2025-05-19 18:34:14,772 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 18:34:14,775 [INFO]     p5 = 2.4021
2025-05-19 18:34:14,775 [INFO]     p25 = 2.4103
2025-05-19 18:34:14,775 [INFO]     p50 = 2.4473
2025-05-19 18:34:14,776 [INFO]     p75 = 2.476
2025-05-19 18:34:14,777 [INFO]     p90 = 2.5186
2025-05-19 18:34:14,777 [INFO]     p95 = 3.8467
2025-05-19 18:34:14,778 [INFO]     p99 = 4.0872
2025-05-19 18:34:14,779 [INFO]     mean = 2.5202
2025-05-19 18:34:14,779 [INFO]     min = 0.4811
2025-05-19 18:34:14,780 [INFO]     max = 4.088
2025-05-19 18:34:14,780 [INFO]     stddev = 0.4772
2025-05-19 18:34:14,780 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 18:34:14,782 [INFO]     p5 = 3.1011
2025-05-19 18:34:14,782 [INFO]     p25 = 3.1071
2025

Running Requests: 100%|██████████| 64/64 [00:52<00:00,  1.22it/s]
Running Requests:  92%|█████████▏| 59/64 [00:37<00:05,  1.17s/it]

2025-05-19 18:34:52,977 [INFO] Tasks Executed!
2025-05-19 18:34:52,977 [INFO] Benchmarking results obtained for model Meta-Llama-3.3-70B-Instruct queried with the sambastudio API.
2025-05-19 18:34:52,986 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 18:34:52,987 [INFO]     p5 = 3.57
2025-05-19 18:34:52,988 [INFO]     p25 = 3.5951
2025-05-19 18:34:52,988 [INFO]     p50 = 3.6147
2025-05-19 18:34:52,988 [INFO]     p75 = 3.6754
2025-05-19 18:34:52,989 [INFO]     p90 = 5.5251
2025-05-19 18:34:52,989 [INFO]     p95 = 5.5279
2025-05-19 18:34:52,989 [INFO]     p99 = 5.5751
2025-05-19 18:34:52,990 [INFO]     mean = 3.8226
2025-05-19 18:34:52,990 [INFO]     min = 0.5078
2025-05-19 18:34:52,991 [INFO]     max = 5.6519
2025-05-19 18:34:52,991 [INFO]     stddev = 0.7647
2025-05-19 18:34:52,991 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 18:34:52,993 [INFO]     p5 = 4.3775
2025-05-19 18:34:52,993 [INFO]     p25 = 4.3953
2025

Running Requests: 100%|██████████| 64/64 [00:38<00:00,  1.67it/s]
Running Requests: 100%|██████████| 64/64 [00:32<00:00,  1.47s/it]

2025-05-19 18:35:26,846 [INFO] Tasks Executed!
2025-05-19 18:35:26,847 [INFO] Benchmarking results obtained for model Meta-Llama-3.3-70B-Instruct queried with the sambastudio API.
2025-05-19 18:35:26,857 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 18:35:26,859 [INFO]     p5 = 6.4694
2025-05-19 18:35:26,859 [INFO]     p25 = 6.4905
2025-05-19 18:35:26,860 [INFO]     p50 = 6.5094
2025-05-19 18:35:26,860 [INFO]     p75 = 7.3953
2025-05-19 18:35:26,860 [INFO]     p90 = 8.5444
2025-05-19 18:35:26,861 [INFO]     p95 = 8.5469
2025-05-19 18:35:26,861 [INFO]     p99 = 8.7368
2025-05-19 18:35:26,862 [INFO]     mean = 6.941
2025-05-19 18:35:26,862 [INFO]     min = 0.5418
2025-05-19 18:35:26,863 [INFO]     max = 9.0574
2025-05-19 18:35:26,863 [INFO]     stddev = 1.2138
2025-05-19 18:35:26,863 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 18:35:26,865 [INFO]     p5 = 7.6694
2025-05-19 18:35:26,865 [INFO]     p25 = 7.6777
202

Running Requests: 100%|██████████| 64/64 [00:33<00:00,  1.90it/s]
Running Requests:  55%|█████▍    | 35/64 [00:29<00:35,  1.21s/it]

2025-05-19 18:35:57,169 [INFO] Tasks Executed!
2025-05-19 18:35:57,170 [INFO] Benchmarking results obtained for model Meta-Llama-3.3-70B-Instruct queried with the sambastudio API.
2025-05-19 18:35:57,179 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 18:35:57,181 [INFO]     p5 = 12.0361
2025-05-19 18:35:57,181 [INFO]     p25 = 12.0516
2025-05-19 18:35:57,182 [INFO]     p50 = 12.8007
2025-05-19 18:35:57,182 [INFO]     p75 = 13.6108
2025-05-19 18:35:57,182 [INFO]     p90 = 13.6216
2025-05-19 18:35:57,183 [INFO]     p95 = 13.6248
2025-05-19 18:35:57,183 [INFO]     p99 = 14.501
2025-05-19 18:35:57,184 [INFO]     mean = 12.5271
2025-05-19 18:35:57,184 [INFO]     min = 0.6187
2025-05-19 18:35:57,185 [INFO]     max = 15.9882
2025-05-19 18:35:57,185 [INFO]     stddev = 2.2304
2025-05-19 18:35:57,185 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 18:35:57,187 [INFO]     p5 = 13.955
2025-05-19 18:35:57,187 [INFO]     p25 = 1

Running Requests: 100%|██████████| 64/64 [00:30<00:00,  2.09it/s]
Running Requests: 100%|██████████| 64/64 [01:59<00:00,  1.86s/it]

2025-05-19 18:37:57,541 [INFO] Tasks Executed!
2025-05-19 18:37:57,542 [INFO] Benchmarking results obtained for model Meta-Llama-3.3-70B-Instruct queried with the sambastudio API.
2025-05-19 18:37:57,559 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 18:37:57,560 [INFO]     p5 = 1.0499
2025-05-19 18:37:57,561 [INFO]     p25 = 1.1102
2025-05-19 18:37:57,561 [INFO]     p50 = 1.1667
2025-05-19 18:37:57,562 [INFO]     p75 = 1.275
2025-05-19 18:37:57,562 [INFO]     p90 = 1.4093
2025-05-19 18:37:57,563 [INFO]     p95 = 1.488
2025-05-19 18:37:57,563 [INFO]     p99 = 1.5555
2025-05-19 18:37:57,564 [INFO]     mean = 1.2033
2025-05-19 18:37:57,564 [INFO]     min = 1.0352
2025-05-19 18:37:57,564 [INFO]     max = 1.5633
2025-05-19 18:37:57,564 [INFO]     stddev = 0.1327
2025-05-19 18:37:57,565 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 18:37:57,566 [INFO]     p5 = 1.7088
2025-05-19 18:37:57,566 [INFO]     p25 = 1.7434
2025

Running Requests: 100%|██████████| 64/64 [02:00<00:00,  1.88s/it]
Running Requests: 100%|██████████| 64/64 [01:43<00:00,  1.61s/it]

2025-05-19 18:39:41,335 [INFO] Tasks Executed!
2025-05-19 18:39:41,338 [INFO] Benchmarking results obtained for model Meta-Llama-3.3-70B-Instruct queried with the sambastudio API.
2025-05-19 18:39:41,357 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 18:39:41,360 [INFO]     p5 = 2.5205
2025-05-19 18:39:41,361 [INFO]     p25 = 2.5349
2025-05-19 18:39:41,361 [INFO]     p50 = 2.564
2025-05-19 18:39:41,362 [INFO]     p75 = 2.5946
2025-05-19 18:39:41,362 [INFO]     p90 = 2.6287
2025-05-19 18:39:41,363 [INFO]     p95 = 2.6319
2025-05-19 18:39:41,363 [INFO]     p99 = 2.6839
2025-05-19 18:39:41,364 [INFO]     mean = 2.5463
2025-05-19 18:39:41,365 [INFO]     min = 1.0248
2025-05-19 18:39:41,365 [INFO]     max = 2.7672
2025-05-19 18:39:41,366 [INFO]     stddev = 0.1982
2025-05-19 18:39:41,367 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 18:39:41,368 [INFO]     p5 = 3.1992
2025-05-19 18:39:41,369 [INFO]     p25 = 3.2044
202

Running Requests: 100%|██████████| 64/64 [01:43<00:00,  1.62s/it]
Running Requests: 100%|██████████| 64/64 [01:29<00:00,  1.73s/it]

2025-05-19 18:41:11,264 [INFO] Tasks Executed!
2025-05-19 18:41:11,264 [INFO] Benchmarking results obtained for model Meta-Llama-3.3-70B-Instruct queried with the sambastudio API.
2025-05-19 18:41:11,282 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 18:41:11,285 [INFO]     p5 = 4.8399
2025-05-19 18:41:11,285 [INFO]     p25 = 4.8473
2025-05-19 18:41:11,286 [INFO]     p50 = 4.8637
2025-05-19 18:41:11,286 [INFO]     p75 = 4.924
2025-05-19 18:41:11,287 [INFO]     p90 = 4.9615
2025-05-19 18:41:11,288 [INFO]     p95 = 4.9647
2025-05-19 18:41:11,288 [INFO]     p99 = 5.0812
2025-05-19 18:41:11,289 [INFO]     mean = 4.8301
2025-05-19 18:41:11,289 [INFO]     min = 1.0214
2025-05-19 18:41:11,290 [INFO]     max = 5.0819
2025-05-19 18:41:11,290 [INFO]     stddev = 0.4873
2025-05-19 18:41:11,290 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 18:41:11,293 [INFO]     p5 = 5.5373
2025-05-19 18:41:11,293 [INFO]     p25 = 5.5444
202

Running Requests: 100%|██████████| 64/64 [01:29<00:00,  1.41s/it]
Running Requests:  92%|█████████▏| 59/64 [01:10<00:19,  3.86s/it]

2025-05-19 18:42:22,761 [INFO] Tasks Executed!
2025-05-19 18:42:22,762 [INFO] Benchmarking results obtained for model Meta-Llama-3.3-70B-Instruct queried with the sambastudio API.
2025-05-19 18:42:22,776 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 18:42:22,778 [INFO]     p5 = 7.9436
2025-05-19 18:42:22,779 [INFO]     p25 = 7.9585
2025-05-19 18:42:22,780 [INFO]     p50 = 7.9976
2025-05-19 18:42:22,780 [INFO]     p75 = 8.0361
2025-05-19 18:42:22,781 [INFO]     p90 = 8.4915
2025-05-19 18:42:22,781 [INFO]     p95 = 8.4947
2025-05-19 18:42:22,782 [INFO]     p99 = 8.5005
2025-05-19 18:42:22,783 [INFO]     mean = 7.9512
2025-05-19 18:42:22,783 [INFO]     min = 1.0364
2025-05-19 18:42:22,784 [INFO]     max = 8.508
2025-05-19 18:42:22,784 [INFO]     stddev = 0.8949
2025-05-19 18:42:22,784 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 18:42:22,786 [INFO]     p5 = 8.7437
2025-05-19 18:42:22,786 [INFO]     p25 = 8.7485
202

Running Requests: 100%|██████████| 64/64 [01:11<00:00,  1.12s/it]
Running Requests: 100%|██████████| 64/64 [01:02<00:00,  3.42s/it]

2025-05-19 18:43:25,847 [INFO] Tasks Executed!
2025-05-19 18:43:25,847 [INFO] Benchmarking results obtained for model Meta-Llama-3.3-70B-Instruct queried with the sambastudio API.
2025-05-19 18:43:25,854 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 18:43:25,855 [INFO]     p5 = 14.1222
2025-05-19 18:43:25,855 [INFO]     p25 = 14.1491
2025-05-19 18:43:25,855 [INFO]     p50 = 14.1697
2025-05-19 18:43:25,855 [INFO]     p75 = 14.749
2025-05-19 18:43:25,856 [INFO]     p90 = 14.7649
2025-05-19 18:43:25,856 [INFO]     p95 = 14.7685
2025-05-19 18:43:25,856 [INFO]     p99 = 14.8577
2025-05-19 18:43:25,857 [INFO]     mean = 14.1279
2025-05-19 18:43:25,857 [INFO]     min = 1.2585
2025-05-19 18:43:25,857 [INFO]     max = 15.0044
2025-05-19 18:43:25,858 [INFO]     stddev = 1.6582
2025-05-19 18:43:25,859 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 18:43:25,860 [INFO]     p5 = 15.3236
2025-05-19 18:43:25,861 [INFO]     p25 = 

Running Requests: 100%|██████████| 64/64 [01:02<00:00,  1.02it/s]
Running Requests: 100%|██████████| 64/64 [00:58<00:00,  1.26s/it]

2025-05-19 18:44:25,153 [INFO] Tasks Executed!
2025-05-19 18:44:25,153 [INFO] Benchmarking results obtained for model Meta-Llama-3.3-70B-Instruct queried with the sambastudio API.
2025-05-19 18:44:25,162 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 18:44:25,164 [INFO]     p5 = 27.2672
2025-05-19 18:44:25,164 [INFO]     p25 = 27.2881
2025-05-19 18:44:25,164 [INFO]     p50 = 27.3262
2025-05-19 18:44:25,165 [INFO]     p75 = 27.3621
2025-05-19 18:44:25,165 [INFO]     p90 = 27.3724
2025-05-19 18:44:25,165 [INFO]     p95 = 27.3761
2025-05-19 18:44:25,166 [INFO]     p99 = 27.9002
2025-05-19 18:44:25,166 [INFO]     mean = 26.9381
2025-05-19 18:44:25,167 [INFO]     min = 1.1492
2025-05-19 18:44:25,167 [INFO]     max = 28.7875
2025-05-19 18:44:25,167 [INFO]     stddev = 3.2801
2025-05-19 18:44:25,167 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 18:44:25,169 [INFO]     p5 = 29.1639
2025-05-19 18:44:25,169 [INFO]     p25 =

Running Requests: 100%|██████████| 64/64 [01:00<00:00,  1.07it/s]
Running Requests: 100%|██████████| 64/64 [08:21<00:00,  7.79s/it]

2025-05-19 18:52:47,787 [INFO] Tasks Executed!
2025-05-19 18:52:47,790 [INFO] Benchmarking results obtained for model Meta-Llama-3.3-70B-Instruct queried with the sambastudio API.
2025-05-19 18:52:47,813 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 18:52:47,816 [INFO]     p5 = 6.8634
2025-05-19 18:52:47,816 [INFO]     p25 = 6.9396
2025-05-19 18:52:47,817 [INFO]     p50 = 6.9964
2025-05-19 18:52:47,817 [INFO]     p75 = 7.0539
2025-05-19 18:52:47,818 [INFO]     p90 = 7.0999
2025-05-19 18:52:47,818 [INFO]     p95 = 7.1807
2025-05-19 18:52:47,819 [INFO]     p99 = 7.7173
2025-05-19 18:52:47,820 [INFO]     mean = 7.0209
2025-05-19 18:52:47,820 [INFO]     min = 6.843
2025-05-19 18:52:47,820 [INFO]     max = 8.5657
2025-05-19 18:52:47,821 [INFO]     stddev = 0.2142
2025-05-19 18:52:47,821 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 18:52:47,823 [INFO]     p5 = 7.6908
2025-05-19 18:52:47,823 [INFO]     p25 = 7.7269
202

Running Requests: 100%|██████████| 64/64 [08:22<00:00,  7.85s/it]
Running Requests: 100%|██████████| 64/64 [08:05<00:00,  7.58s/it]

2025-05-19 19:00:54,244 [INFO] Tasks Executed!
2025-05-19 19:00:54,245 [INFO] Benchmarking results obtained for model Meta-Llama-3.3-70B-Instruct queried with the sambastudio API.
2025-05-19 19:00:54,263 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:00:54,265 [INFO]     p5 = 14.3005
2025-05-19 19:00:54,266 [INFO]     p25 = 14.3138
2025-05-19 19:00:54,267 [INFO]     p50 = 14.3349
2025-05-19 19:00:54,267 [INFO]     p75 = 14.389
2025-05-19 19:00:54,268 [INFO]     p90 = 14.4161
2025-05-19 19:00:54,268 [INFO]     p95 = 14.4297
2025-05-19 19:00:54,269 [INFO]     p99 = 14.4596
2025-05-19 19:00:54,270 [INFO]     mean = 14.2378
2025-05-19 19:00:54,270 [INFO]     min = 6.9386
2025-05-19 19:00:54,271 [INFO]     max = 14.489
2025-05-19 19:00:54,271 [INFO]     stddev = 0.928
2025-05-19 19:00:54,271 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:00:54,273 [INFO]     p5 = 15.1399
2025-05-19 19:00:54,273 [INFO]     p25 = 15

Running Requests: 100%|██████████| 64/64 [08:06<00:00,  7.60s/it]
Running Requests: 100%|██████████| 64/64 [08:05<00:00,  7.59s/it]

2025-05-19 19:09:00,775 [INFO] Tasks Executed!
2025-05-19 19:09:00,777 [INFO] Benchmarking results obtained for model Meta-Llama-3.3-70B-Instruct queried with the sambastudio API.
2025-05-19 19:09:00,798 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:09:00,800 [INFO]     p5 = 29.4511
2025-05-19 19:09:00,801 [INFO]     p25 = 29.4852
2025-05-19 19:09:00,801 [INFO]     p50 = 29.5435
2025-05-19 19:09:00,802 [INFO]     p75 = 29.5621
2025-05-19 19:09:00,802 [INFO]     p90 = 29.583
2025-05-19 19:09:00,802 [INFO]     p95 = 29.6205
2025-05-19 19:09:00,803 [INFO]     p99 = 29.6435
2025-05-19 19:09:00,804 [INFO]     mean = 28.8272
2025-05-19 19:09:00,804 [INFO]     min = 6.8204
2025-05-19 19:09:00,805 [INFO]     max = 29.6455
2025-05-19 19:09:00,805 [INFO]     stddev = 3.4912
2025-05-19 19:09:00,805 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:09:00,807 [INFO]     p5 = 30.2557
2025-05-19 19:09:00,808 [INFO]     p25 = 

Running Requests: 100%|██████████| 64/64 [08:06<00:00,  7.61s/it]
Running Requests: 100%|██████████| 64/64 [08:05<00:00,  7.58s/it]

2025-05-19 19:17:07,342 [INFO] Tasks Executed!
2025-05-19 19:17:07,344 [INFO] Benchmarking results obtained for model Meta-Llama-3.3-70B-Instruct queried with the sambastudio API.
2025-05-19 19:17:07,363 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:17:07,366 [INFO]     p5 = 30.767
2025-05-19 19:17:07,366 [INFO]     p25 = 59.7933
2025-05-19 19:17:07,367 [INFO]     p50 = 59.8325
2025-05-19 19:17:07,367 [INFO]     p75 = 59.8696
2025-05-19 19:17:07,367 [INFO]     p90 = 59.8929
2025-05-19 19:17:07,368 [INFO]     p95 = 59.9189
2025-05-19 19:17:07,368 [INFO]     p99 = 59.9836
2025-05-19 19:17:07,369 [INFO]     mean = 56.5406
2025-05-19 19:17:07,369 [INFO]     min = 6.9171
2025-05-19 19:17:07,370 [INFO]     max = 60.061
2025-05-19 19:17:07,370 [INFO]     stddev = 10.7594
2025-05-19 19:17:07,370 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:17:07,372 [INFO]     p5 = 31.6025
2025-05-19 19:17:07,372 [INFO]     p25 = 

Running Requests: 100%|██████████| 64/64 [08:06<00:00,  7.60s/it]
Running Requests: 100%|██████████| 64/64 [08:05<00:00,  7.59s/it]

2025-05-19 19:25:13,843 [INFO] Tasks Executed!
2025-05-19 19:25:13,844 [INFO] Benchmarking results obtained for model Meta-Llama-3.3-70B-Instruct queried with the sambastudio API.
2025-05-19 19:25:13,865 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:25:13,867 [INFO]     p5 = 30.7891
2025-05-19 19:25:13,867 [INFO]     p25 = 120.3568
2025-05-19 19:25:13,868 [INFO]     p50 = 120.4511
2025-05-19 19:25:13,868 [INFO]     p75 = 120.5401
2025-05-19 19:25:13,869 [INFO]     p90 = 120.5933
2025-05-19 19:25:13,869 [INFO]     p95 = 120.6651
2025-05-19 19:25:13,869 [INFO]     p99 = 120.671
2025-05-19 19:25:13,870 [INFO]     mean = 106.3113
2025-05-19 19:25:13,871 [INFO]     min = 6.9154
2025-05-19 19:25:13,871 [INFO]     max = 120.6723
2025-05-19 19:25:13,871 [INFO]     stddev = 30.3877
2025-05-19 19:25:13,872 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:25:13,873 [INFO]     p5 = 31.6046
2025-05-19 19:25:13,873 [INFO]  

Running Requests: 100%|██████████| 64/64 [08:06<00:00,  7.60s/it]
Running Requests: 100%|██████████| 64/64 [08:05<00:00,  7.58s/it]

2025-05-19 19:33:20,640 [INFO] Tasks Executed!
2025-05-19 19:33:20,642 [INFO] Benchmarking results obtained for model Meta-Llama-3.3-70B-Instruct queried with the sambastudio API.
2025-05-19 19:33:20,658 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:33:20,661 [INFO]     p5 = 30.9972
2025-05-19 19:33:20,662 [INFO]     p25 = 126.6175
2025-05-19 19:33:20,662 [INFO]     p50 = 241.7515
2025-05-19 19:33:20,663 [INFO]     p75 = 241.9976
2025-05-19 19:33:20,663 [INFO]     p90 = 242.0784
2025-05-19 19:33:20,664 [INFO]     p95 = 242.1052
2025-05-19 19:33:20,664 [INFO]     p99 = 242.2833
2025-05-19 19:33:20,664 [INFO]     mean = 183.3609
2025-05-19 19:33:20,665 [INFO]     min = 7.1366
2025-05-19 19:33:20,665 [INFO]     max = 242.4974
2025-05-19 19:33:20,666 [INFO]     stddev = 77.362
2025-05-19 19:33:20,666 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:33:20,668 [INFO]     p5 = 31.8376
2025-05-19 19:33:20,668 [INFO]  

Running Requests: 100%|██████████| 64/64 [08:06<00:00,  7.60s/it]
Running Requests: 100%|██████████| 64/64 [00:27<00:00,  2.75it/s]

2025-05-19 19:33:48,578 [INFO] Tasks Executed!
2025-05-19 19:33:48,580 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-8b-fc-r queried with the sambastudio API.
2025-05-19 19:33:48,600 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:33:48,602 [INFO]     p5 = 0.2445
2025-05-19 19:33:48,603 [INFO]     p25 = 0.2915
2025-05-19 19:33:48,603 [INFO]     p50 = 0.3827
2025-05-19 19:33:48,603 [INFO]     p75 = 0.464
2025-05-19 19:33:48,603 [INFO]     p90 = 0.5394
2025-05-19 19:33:48,604 [INFO]     p95 = 0.564
2025-05-19 19:33:48,604 [INFO]     p99 = 1.1565
2025-05-19 19:33:48,604 [INFO]     mean = 0.4065
2025-05-19 19:33:48,604 [INFO]     min = 0.2384
2025-05-19 19:33:48,605 [INFO]     max = 2.0789
2025-05-19 19:33:48,605 [INFO]     stddev = 0.2365
2025-05-19 19:33:48,605 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:33:48,607 [INFO]     p5 = 0.2618
2025-05-19 19:33:48,607 [INFO]     p25 = 0.3073

Running Requests: 100%|██████████| 64/64 [00:27<00:00,  2.30it/s]
Running Requests: 100%|██████████| 64/64 [00:11<00:00,  4.65it/s]

2025-05-19 19:34:00,351 [INFO] Tasks Executed!
2025-05-19 19:34:00,352 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-8b-fc-r queried with the sambastudio API.
2025-05-19 19:34:00,371 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:34:00,374 [INFO]     p5 = 0.2493
2025-05-19 19:34:00,374 [INFO]     p25 = 0.2916
2025-05-19 19:34:00,375 [INFO]     p50 = 0.3242
2025-05-19 19:34:00,375 [INFO]     p75 = 0.3529
2025-05-19 19:34:00,376 [INFO]     p90 = 0.3926
2025-05-19 19:34:00,377 [INFO]     p95 = 0.4021
2025-05-19 19:34:00,377 [INFO]     p99 = 0.4251
2025-05-19 19:34:00,378 [INFO]     mean = 0.3266
2025-05-19 19:34:00,379 [INFO]     min = 0.2318
2025-05-19 19:34:00,379 [INFO]     max = 0.4435
2025-05-19 19:34:00,379 [INFO]     stddev = 0.0479
2025-05-19 19:34:00,380 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:34:00,382 [INFO]     p5 = 0.2664
2025-05-19 19:34:00,382 [INFO]     p25 = 0.30

Running Requests: 100%|██████████| 64/64 [00:11<00:00,  5.38it/s]
Running Requests: 100%|██████████| 64/64 [00:10<00:00,  7.75it/s]

2025-05-19 19:34:11,761 [INFO] Tasks Executed!
2025-05-19 19:34:11,762 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-8b-fc-r queried with the sambastudio API.
2025-05-19 19:34:11,780 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:34:11,783 [INFO]     p5 = 0.3166
2025-05-19 19:34:11,784 [INFO]     p25 = 0.3754
2025-05-19 19:34:11,784 [INFO]     p50 = 0.3821
2025-05-19 19:34:11,785 [INFO]     p75 = 0.3925
2025-05-19 19:34:11,785 [INFO]     p90 = 2.3201
2025-05-19 19:34:11,786 [INFO]     p95 = 2.3994
2025-05-19 19:34:11,786 [INFO]     p99 = 2.483
2025-05-19 19:34:11,787 [INFO]     mean = 0.6337
2025-05-19 19:34:11,787 [INFO]     min = 0.2332
2025-05-19 19:34:11,788 [INFO]     max = 2.4837
2025-05-19 19:34:11,788 [INFO]     stddev = 0.6746
2025-05-19 19:34:11,789 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:34:11,791 [INFO]     p5 = 0.3358
2025-05-19 19:34:11,791 [INFO]     p25 = 0.397

Running Requests: 100%|██████████| 64/64 [00:11<00:00,  5.59it/s]
Running Requests: 100%|██████████| 64/64 [00:10<00:00,  5.92it/s]

2025-05-19 19:34:22,584 [INFO] Tasks Executed!
2025-05-19 19:34:22,586 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-8b-fc-r queried with the sambastudio API.
2025-05-19 19:34:22,600 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:34:22,602 [INFO]     p5 = 0.538
2025-05-19 19:34:22,603 [INFO]     p25 = 1.0065
2025-05-19 19:34:22,603 [INFO]     p50 = 1.0146
2025-05-19 19:34:22,604 [INFO]     p75 = 1.0433
2025-05-19 19:34:22,604 [INFO]     p90 = 2.756
2025-05-19 19:34:22,605 [INFO]     p95 = 2.8944
2025-05-19 19:34:22,605 [INFO]     p99 = 2.8957
2025-05-19 19:34:22,606 [INFO]     mean = 1.1998
2025-05-19 19:34:22,607 [INFO]     min = 0.3455
2025-05-19 19:34:22,607 [INFO]     max = 2.8977
2025-05-19 19:34:22,608 [INFO]     stddev = 0.656
2025-05-19 19:34:22,608 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:34:22,610 [INFO]     p5 = 0.5623
2025-05-19 19:34:22,610 [INFO]     p25 = 1.0236


Running Requests: 100%|██████████| 64/64 [00:13<00:00,  4.62it/s]
Running Requests:  91%|█████████ | 58/64 [00:09<00:01,  5.57it/s]

2025-05-19 19:34:35,668 [INFO] Tasks Executed!
2025-05-19 19:34:35,669 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-8b-fc-r queried with the sambastudio API.
2025-05-19 19:34:35,686 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:34:35,688 [INFO]     p5 = 1.0414
2025-05-19 19:34:35,688 [INFO]     p25 = 1.7144
2025-05-19 19:34:35,689 [INFO]     p50 = 1.9512
2025-05-19 19:34:35,689 [INFO]     p75 = 2.4322
2025-05-19 19:34:35,690 [INFO]     p90 = 3.9235
2025-05-19 19:34:35,690 [INFO]     p95 = 4.0441
2025-05-19 19:34:35,691 [INFO]     p99 = 4.0466
2025-05-19 19:34:35,691 [INFO]     mean = 2.1954
2025-05-19 19:34:35,692 [INFO]     min = 0.3846
2025-05-19 19:34:35,692 [INFO]     max = 4.049
2025-05-19 19:34:35,693 [INFO]     stddev = 1.0254
2025-05-19 19:34:35,693 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:34:35,695 [INFO]     p5 = 1.0937
2025-05-19 19:34:35,695 [INFO]     p25 = 1.756

Running Requests: 100%|██████████| 64/64 [00:09<00:00,  6.47it/s]
Running Requests:  81%|████████▏ | 52/64 [00:05<00:05,  2.11it/s]

2025-05-19 19:34:42,098 [INFO] Tasks Executed!
2025-05-19 19:34:42,101 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-8b-fc-r queried with the sambastudio API.
2025-05-19 19:34:42,112 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:34:42,114 [INFO]     p5 = 0.6635
2025-05-19 19:34:42,115 [INFO]     p25 = 1.9327
2025-05-19 19:34:42,115 [INFO]     p50 = 2.545
2025-05-19 19:34:42,116 [INFO]     p75 = 3.2438
2025-05-19 19:34:42,116 [INFO]     p90 = 3.2582
2025-05-19 19:34:42,117 [INFO]     p95 = 3.2673
2025-05-19 19:34:42,118 [INFO]     p99 = 3.8514
2025-05-19 19:34:42,119 [INFO]     mean = 2.4903
2025-05-19 19:34:42,119 [INFO]     min = 0.2983
2025-05-19 19:34:42,120 [INFO]     max = 3.8536
2025-05-19 19:34:42,120 [INFO]     stddev = 0.7591
2025-05-19 19:34:42,121 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:34:42,123 [INFO]     p5 = 0.6895
2025-05-19 19:34:42,123 [INFO]     p25 = 1.981

Running Requests: 100%|██████████| 64/64 [00:06<00:00, 10.17it/s]
Running Requests: 100%|██████████| 64/64 [00:37<00:00,  1.72it/s]

2025-05-19 19:35:19,691 [INFO] Tasks Executed!
2025-05-19 19:35:19,692 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-8b-fc-r queried with the sambastudio API.
2025-05-19 19:35:19,709 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:35:19,711 [INFO]     p5 = 0.4052
2025-05-19 19:35:19,711 [INFO]     p25 = 0.452
2025-05-19 19:35:19,711 [INFO]     p50 = 0.5125
2025-05-19 19:35:19,712 [INFO]     p75 = 0.609
2025-05-19 19:35:19,712 [INFO]     p90 = 0.6917
2025-05-19 19:35:19,712 [INFO]     p95 = 0.7175
2025-05-19 19:35:19,712 [INFO]     p99 = 1.401
2025-05-19 19:35:19,713 [INFO]     mean = 0.5666
2025-05-19 19:35:19,713 [INFO]     min = 0.3776
2025-05-19 19:35:19,713 [INFO]     max = 2.0481
2025-05-19 19:35:19,714 [INFO]     stddev = 0.2197
2025-05-19 19:35:19,714 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:35:19,716 [INFO]     p5 = 0.4137
2025-05-19 19:35:19,716 [INFO]     p25 = 0.4716


Running Requests: 100%|██████████| 64/64 [00:37<00:00,  1.70it/s]
Running Requests: 100%|██████████| 64/64 [00:18<00:00,  3.23it/s]

2025-05-19 19:35:38,537 [INFO] Tasks Executed!
2025-05-19 19:35:38,539 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-8b-fc-r queried with the sambastudio API.
2025-05-19 19:35:38,557 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:35:38,559 [INFO]     p5 = 0.5056
2025-05-19 19:35:38,560 [INFO]     p25 = 0.5151
2025-05-19 19:35:38,560 [INFO]     p50 = 0.5368
2025-05-19 19:35:38,561 [INFO]     p75 = 0.5592
2025-05-19 19:35:38,561 [INFO]     p90 = 0.6039
2025-05-19 19:35:38,562 [INFO]     p95 = 0.6232
2025-05-19 19:35:38,562 [INFO]     p99 = 0.646
2025-05-19 19:35:38,563 [INFO]     mean = 0.545
2025-05-19 19:35:38,563 [INFO]     min = 0.386
2025-05-19 19:35:38,564 [INFO]     max = 0.6671
2025-05-19 19:35:38,564 [INFO]     stddev = 0.044
2025-05-19 19:35:38,564 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:35:38,566 [INFO]     p5 = 0.5277
2025-05-19 19:35:38,566 [INFO]     p25 = 0.5373
2

Running Requests: 100%|██████████| 64/64 [00:18<00:00,  3.41it/s]
Running Requests: 100%|██████████| 64/64 [00:20<00:00,  3.49it/s]

2025-05-19 19:35:59,245 [INFO] Tasks Executed!
2025-05-19 19:35:59,246 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-8b-fc-r queried with the sambastudio API.
2025-05-19 19:35:59,264 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:35:59,267 [INFO]     p5 = 0.8378
2025-05-19 19:35:59,268 [INFO]     p25 = 0.8905
2025-05-19 19:35:59,268 [INFO]     p50 = 0.9422
2025-05-19 19:35:59,269 [INFO]     p75 = 1.0962
2025-05-19 19:35:59,269 [INFO]     p90 = 1.9383
2025-05-19 19:35:59,269 [INFO]     p95 = 3.6409
2025-05-19 19:35:59,270 [INFO]     p99 = 4.0176
2025-05-19 19:35:59,271 [INFO]     mean = 1.2258
2025-05-19 19:35:59,271 [INFO]     min = 0.4352
2025-05-19 19:35:59,271 [INFO]     max = 4.0177
2025-05-19 19:35:59,272 [INFO]     stddev = 0.783
2025-05-19 19:35:59,272 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:35:59,274 [INFO]     p5 = 0.8616
2025-05-19 19:35:59,274 [INFO]     p25 = 0.915

Running Requests: 100%|██████████| 64/64 [00:20<00:00,  3.08it/s]
Running Requests: 100%|██████████| 64/64 [00:15<00:00,  2.67it/s]

2025-05-19 19:36:15,439 [INFO] Tasks Executed!
2025-05-19 19:36:15,440 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-8b-fc-r queried with the sambastudio API.
2025-05-19 19:36:15,456 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:36:15,458 [INFO]     p5 = 1.1628
2025-05-19 19:36:15,458 [INFO]     p25 = 1.6349
2025-05-19 19:36:15,459 [INFO]     p50 = 1.6618
2025-05-19 19:36:15,459 [INFO]     p75 = 1.7205
2025-05-19 19:36:15,460 [INFO]     p90 = 3.4669
2025-05-19 19:36:15,460 [INFO]     p95 = 3.5077
2025-05-19 19:36:15,460 [INFO]     p99 = 3.5087
2025-05-19 19:36:15,461 [INFO]     mean = 1.8565
2025-05-19 19:36:15,462 [INFO]     min = 0.4312
2025-05-19 19:36:15,462 [INFO]     max = 3.5094
2025-05-19 19:36:15,462 [INFO]     stddev = 0.6574
2025-05-19 19:36:15,462 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:36:15,464 [INFO]     p5 = 1.166
2025-05-19 19:36:15,464 [INFO]     p25 = 1.661

Running Requests: 100%|██████████| 64/64 [00:17<00:00,  3.70it/s]
Running Requests:  89%|████████▉ | 57/64 [00:19<00:02,  2.37it/s]

2025-05-19 19:36:36,695 [INFO] Tasks Executed!
2025-05-19 19:36:36,696 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-8b-fc-r queried with the sambastudio API.
2025-05-19 19:36:36,706 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:36:36,708 [INFO]     p5 = 2.0291
2025-05-19 19:36:36,708 [INFO]     p25 = 4.3377
2025-05-19 19:36:36,708 [INFO]     p50 = 4.7585
2025-05-19 19:36:36,708 [INFO]     p75 = 5.1743
2025-05-19 19:36:36,709 [INFO]     p90 = 6.7682
2025-05-19 19:36:36,709 [INFO]     p95 = 6.7711
2025-05-19 19:36:36,709 [INFO]     p99 = 6.7766
2025-05-19 19:36:36,710 [INFO]     mean = 4.6446
2025-05-19 19:36:36,710 [INFO]     min = 0.4237
2025-05-19 19:36:36,710 [INFO]     max = 6.7769
2025-05-19 19:36:36,711 [INFO]     stddev = 1.4997
2025-05-19 19:36:36,711 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:36:36,712 [INFO]     p5 = 2.0689
2025-05-19 19:36:36,712 [INFO]     p25 = 4.39

Running Requests: 100%|██████████| 64/64 [00:20<00:00,  3.14it/s]
Running Requests:  67%|██████▋   | 43/64 [00:17<00:34,  1.66s/it]

2025-05-19 19:36:55,108 [INFO] Tasks Executed!
2025-05-19 19:36:55,108 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-8b-fc-r queried with the sambastudio API.
2025-05-19 19:36:55,119 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:36:55,120 [INFO]     p5 = 2.0806
2025-05-19 19:36:55,121 [INFO]     p25 = 7.8518
2025-05-19 19:36:55,121 [INFO]     p50 = 8.5503
2025-05-19 19:36:55,121 [INFO]     p75 = 9.6511
2025-05-19 19:36:55,122 [INFO]     p90 = 9.6565
2025-05-19 19:36:55,122 [INFO]     p95 = 9.6593
2025-05-19 19:36:55,122 [INFO]     p99 = 9.661
2025-05-19 19:36:55,123 [INFO]     mean = 7.9613
2025-05-19 19:36:55,124 [INFO]     min = 0.4161
2025-05-19 19:36:55,124 [INFO]     max = 9.6634
2025-05-19 19:36:55,124 [INFO]     stddev = 2.4622
2025-05-19 19:36:55,125 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:36:55,126 [INFO]     p5 = 2.0856
2025-05-19 19:36:55,126 [INFO]     p25 = 7.917

Running Requests: 100%|██████████| 64/64 [00:18<00:00,  3.51it/s]
Running Requests: 100%|██████████| 64/64 [01:02<00:00,  1.09it/s]

2025-05-19 19:37:58,108 [INFO] Tasks Executed!
2025-05-19 19:37:58,110 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-8b-fc-r queried with the sambastudio API.
2025-05-19 19:37:58,127 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:37:58,130 [INFO]     p5 = 0.834
2025-05-19 19:37:58,130 [INFO]     p25 = 0.8997
2025-05-19 19:37:58,131 [INFO]     p50 = 0.921
2025-05-19 19:37:58,131 [INFO]     p75 = 1.0013
2025-05-19 19:37:58,132 [INFO]     p90 = 1.0212
2025-05-19 19:37:58,132 [INFO]     p95 = 1.1066
2025-05-19 19:37:58,132 [INFO]     p99 = 1.6116
2025-05-19 19:37:58,133 [INFO]     mean = 0.963
2025-05-19 19:37:58,134 [INFO]     min = 0.813
2025-05-19 19:37:58,134 [INFO]     max = 2.4412
2025-05-19 19:37:58,135 [INFO]     stddev = 0.2009
2025-05-19 19:37:58,135 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:37:58,137 [INFO]     p5 = 0.8603
2025-05-19 19:37:58,137 [INFO]     p25 = 0.91
202

Running Requests: 100%|██████████| 64/64 [01:02<00:00,  1.02it/s]
Running Requests: 100%|██████████| 64/64 [00:44<00:00,  1.47it/s]

2025-05-19 19:38:42,975 [INFO] Tasks Executed!
2025-05-19 19:38:42,977 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-8b-fc-r queried with the sambastudio API.
2025-05-19 19:38:42,996 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:38:42,998 [INFO]     p5 = 1.2986
2025-05-19 19:38:42,999 [INFO]     p25 = 1.3257
2025-05-19 19:38:43,000 [INFO]     p50 = 1.3473
2025-05-19 19:38:43,000 [INFO]     p75 = 1.4155
2025-05-19 19:38:43,000 [INFO]     p90 = 1.4307
2025-05-19 19:38:43,001 [INFO]     p95 = 1.4353
2025-05-19 19:38:43,001 [INFO]     p99 = 1.497
2025-05-19 19:38:43,002 [INFO]     mean = 1.3574
2025-05-19 19:38:43,002 [INFO]     min = 0.876
2025-05-19 19:38:43,003 [INFO]     max = 1.546
2025-05-19 19:38:43,003 [INFO]     stddev = 0.0812
2025-05-19 19:38:43,003 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:38:43,005 [INFO]     p5 = 1.3254
2025-05-19 19:38:43,005 [INFO]     p25 = 1.3407


Running Requests: 100%|██████████| 64/64 [00:44<00:00,  1.43it/s]
Running Requests: 100%|██████████| 64/64 [00:55<00:00,  1.16s/it]

2025-05-19 19:39:38,981 [INFO] Tasks Executed!
2025-05-19 19:39:38,982 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-8b-fc-r queried with the sambastudio API.
2025-05-19 19:39:38,999 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:39:39,001 [INFO]     p5 = 3.2682
2025-05-19 19:39:39,001 [INFO]     p25 = 3.3186
2025-05-19 19:39:39,002 [INFO]     p50 = 3.3361
2025-05-19 19:39:39,002 [INFO]     p75 = 3.3703
2025-05-19 19:39:39,003 [INFO]     p90 = 3.4577
2025-05-19 19:39:39,003 [INFO]     p95 = 4.5936
2025-05-19 19:39:39,003 [INFO]     p99 = 4.8926
2025-05-19 19:39:39,004 [INFO]     mean = 3.3999
2025-05-19 19:39:39,004 [INFO]     min = 0.8181
2025-05-19 19:39:39,005 [INFO]     max = 4.8929
2025-05-19 19:39:39,005 [INFO]     stddev = 0.4975
2025-05-19 19:39:39,005 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:39:39,007 [INFO]     p5 = 3.285
2025-05-19 19:39:39,007 [INFO]     p25 = 3.352

Running Requests: 100%|██████████| 64/64 [00:56<00:00,  1.14it/s]
Running Requests: 100%|██████████| 64/64 [00:49<00:00,  1.53s/it]

2025-05-19 19:40:29,010 [INFO] Tasks Executed!
2025-05-19 19:40:29,011 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-8b-fc-r queried with the sambastudio API.
2025-05-19 19:40:29,024 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:40:29,027 [INFO]     p5 = 5.8621
2025-05-19 19:40:29,027 [INFO]     p25 = 5.8795
2025-05-19 19:40:29,027 [INFO]     p50 = 5.9097
2025-05-19 19:40:29,028 [INFO]     p75 = 5.9222
2025-05-19 19:40:29,029 [INFO]     p90 = 7.7136
2025-05-19 19:40:29,029 [INFO]     p95 = 7.7836
2025-05-19 19:40:29,030 [INFO]     p99 = 7.7846
2025-05-19 19:40:29,031 [INFO]     mean = 6.0533
2025-05-19 19:40:29,031 [INFO]     min = 0.9216
2025-05-19 19:40:29,032 [INFO]     max = 7.7853
2025-05-19 19:40:29,032 [INFO]     stddev = 0.8987
2025-05-19 19:40:29,032 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:40:29,034 [INFO]     p5 = 5.9169
2025-05-19 19:40:29,034 [INFO]     p25 = 5.93

Running Requests: 100%|██████████| 64/64 [00:50<00:00,  1.28it/s]
Running Requests: 100%|██████████| 64/64 [00:54<00:00,  2.25s/it]

2025-05-19 19:41:23,733 [INFO] Tasks Executed!
2025-05-19 19:41:23,735 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-8b-fc-r queried with the sambastudio API.
2025-05-19 19:41:23,754 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:41:23,756 [INFO]     p5 = 11.6806
2025-05-19 19:41:23,757 [INFO]     p25 = 11.7228
2025-05-19 19:41:23,757 [INFO]     p50 = 12.3026
2025-05-19 19:41:23,757 [INFO]     p75 = 13.2692
2025-05-19 19:41:23,758 [INFO]     p90 = 15.1917
2025-05-19 19:41:23,758 [INFO]     p95 = 15.1993
2025-05-19 19:41:23,759 [INFO]     p99 = 17.372
2025-05-19 19:41:23,760 [INFO]     mean = 12.727
2025-05-19 19:41:23,760 [INFO]     min = 0.9603
2025-05-19 19:41:23,760 [INFO]     max = 17.9767
2025-05-19 19:41:23,761 [INFO]     stddev = 2.1789
2025-05-19 19:41:23,761 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:41:23,762 [INFO]     p5 = 12.1957
2025-05-19 19:41:23,763 [INFO]     p2

Running Requests: 100%|██████████| 64/64 [00:54<00:00,  1.17it/s]
Running Requests: 100%|██████████| 64/64 [00:44<00:00,  1.68it/s]

2025-05-19 19:42:08,920 [INFO] Tasks Executed!
2025-05-19 19:42:08,921 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-8b-fc-r queried with the sambastudio API.
2025-05-19 19:42:08,931 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:42:08,933 [INFO]     p5 = 11.2962
2025-05-19 19:42:08,934 [INFO]     p25 = 18.7096
2025-05-19 19:42:08,934 [INFO]     p50 = 21.2386
2025-05-19 19:42:08,934 [INFO]     p75 = 22.1981
2025-05-19 19:42:08,935 [INFO]     p90 = 22.2178
2025-05-19 19:42:08,935 [INFO]     p95 = 22.2208
2025-05-19 19:42:08,935 [INFO]     p99 = 25.8619
2025-05-19 19:42:08,936 [INFO]     mean = 18.9982
2025-05-19 19:42:08,937 [INFO]     min = 1.0423
2025-05-19 19:42:08,937 [INFO]     max = 32.0572
2025-05-19 19:42:08,937 [INFO]     stddev = 5.1933
2025-05-19 19:42:08,937 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:42:08,939 [INFO]     p5 = 11.8769
2025-05-19 19:42:08,939 [INFO]     

Running Requests: 100%|██████████| 64/64 [00:45<00:00,  1.42it/s]
Running Requests: 100%|██████████| 64/64 [01:25<00:00,  1.25s/it]

2025-05-19 19:43:34,574 [INFO] Tasks Executed!
2025-05-19 19:43:34,581 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-70b-fc-r queried with the sambastudio API.
2025-05-19 19:43:34,598 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:43:34,600 [INFO]     p5 = 0.5085
2025-05-19 19:43:34,601 [INFO]     p25 = 0.5884
2025-05-19 19:43:34,602 [INFO]     p50 = 0.6505
2025-05-19 19:43:34,602 [INFO]     p75 = 0.7409
2025-05-19 19:43:34,603 [INFO]     p90 = 0.8118
2025-05-19 19:43:34,603 [INFO]     p95 = 0.889
2025-05-19 19:43:34,603 [INFO]     p99 = 1.6267
2025-05-19 19:43:34,604 [INFO]     mean = 0.6968
2025-05-19 19:43:34,604 [INFO]     min = 0.4922
2025-05-19 19:43:34,605 [INFO]     max = 2.6715
2025-05-19 19:43:34,605 [INFO]     stddev = 0.2736
2025-05-19 19:43:34,606 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:43:34,607 [INFO]     p5 = 1.163
2025-05-19 19:43:34,608 [INFO]     p25 = 1.202

Running Requests: 100%|██████████| 64/64 [01:25<00:00,  1.34s/it]
Running Requests: 100%|██████████| 64/64 [01:10<00:00,  1.10s/it]

2025-05-19 19:44:45,846 [INFO] Tasks Executed!
2025-05-19 19:44:45,847 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-70b-fc-r queried with the sambastudio API.
2025-05-19 19:44:45,864 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:44:45,867 [INFO]     p5 = 1.5347
2025-05-19 19:44:45,868 [INFO]     p25 = 1.5401
2025-05-19 19:44:45,868 [INFO]     p50 = 1.5461
2025-05-19 19:44:45,869 [INFO]     p75 = 1.5529
2025-05-19 19:44:45,869 [INFO]     p90 = 1.5938
2025-05-19 19:44:45,870 [INFO]     p95 = 1.6378
2025-05-19 19:44:45,870 [INFO]     p99 = 1.6488
2025-05-19 19:44:45,871 [INFO]     mean = 1.5398
2025-05-19 19:44:45,872 [INFO]     min = 0.5574
2025-05-19 19:44:45,872 [INFO]     max = 1.6499
2025-05-19 19:44:45,872 [INFO]     stddev = 0.1297
2025-05-19 19:44:45,873 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:44:45,875 [INFO]     p5 = 2.1989
2025-05-19 19:44:45,876 [INFO]     p25 = 2.2

Running Requests: 100%|██████████| 64/64 [01:11<00:00,  1.11s/it]
Running Requests:  98%|█████████▊| 63/64 [00:51<00:01,  1.07s/it]

2025-05-19 19:45:37,629 [INFO] Tasks Executed!
2025-05-19 19:45:37,630 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-70b-fc-r queried with the sambastudio API.
2025-05-19 19:45:37,669 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:45:37,676 [INFO]     p5 = 2.3873
2025-05-19 19:45:37,681 [INFO]     p25 = 2.3965
2025-05-19 19:45:37,683 [INFO]     p50 = 2.4255
2025-05-19 19:45:37,685 [INFO]     p75 = 2.4733
2025-05-19 19:45:37,687 [INFO]     p90 = 2.5
2025-05-19 19:45:37,688 [INFO]     p95 = 3.889
2025-05-19 19:45:37,690 [INFO]     p99 = 4.1813
2025-05-19 19:45:37,692 [INFO]     mean = 2.5102
2025-05-19 19:45:37,695 [INFO]     min = 0.4874
2025-05-19 19:45:37,697 [INFO]     max = 4.182
2025-05-19 19:45:37,699 [INFO]     stddev = 0.4967
2025-05-19 19:45:37,702 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:45:37,710 [INFO]     p5 = 3.0816
2025-05-19 19:45:37,711 [INFO]     p25 = 3.0876
2

Running Requests: 100%|██████████| 64/64 [00:52<00:00,  1.23it/s]
Running Requests: 100%|██████████| 64/64 [00:43<00:00,  1.04s/it]

2025-05-19 19:46:21,390 [INFO] Tasks Executed!
2025-05-19 19:46:21,391 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-70b-fc-r queried with the sambastudio API.
2025-05-19 19:46:21,405 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:46:21,407 [INFO]     p5 = 4.4414
2025-05-19 19:46:21,408 [INFO]     p25 = 4.4654
2025-05-19 19:46:21,408 [INFO]     p50 = 4.4882
2025-05-19 19:46:21,409 [INFO]     p75 = 4.522
2025-05-19 19:46:21,409 [INFO]     p90 = 6.2408
2025-05-19 19:46:21,409 [INFO]     p95 = 6.3196
2025-05-19 19:46:21,410 [INFO]     p99 = 6.3213
2025-05-19 19:46:21,411 [INFO]     mean = 4.6158
2025-05-19 19:46:21,411 [INFO]     min = 1.3786
2025-05-19 19:46:21,411 [INFO]     max = 6.3216
2025-05-19 19:46:21,411 [INFO]     stddev = 0.8414
2025-05-19 19:46:21,412 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:46:21,413 [INFO]     p5 = 5.1248
2025-05-19 19:46:21,414 [INFO]     p25 = 5.13

Running Requests: 100%|██████████| 64/64 [00:43<00:00,  1.46it/s]
Running Requests:  78%|███████▊  | 50/64 [00:28<00:09,  1.44it/s]

2025-05-19 19:46:50,916 [INFO] Tasks Executed!
2025-05-19 19:46:50,917 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-70b-fc-r queried with the sambastudio API.
2025-05-19 19:46:50,929 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:46:50,932 [INFO]     p5 = 5.8564
2025-05-19 19:46:50,932 [INFO]     p25 = 5.8736
2025-05-19 19:46:50,932 [INFO]     p50 = 5.8874
2025-05-19 19:46:50,933 [INFO]     p75 = 6.506
2025-05-19 19:46:50,934 [INFO]     p90 = 7.8256
2025-05-19 19:46:50,935 [INFO]     p95 = 7.8293
2025-05-19 19:46:50,935 [INFO]     p99 = 7.8412
2025-05-19 19:46:50,936 [INFO]     mean = 6.2893
2025-05-19 19:46:50,936 [INFO]     min = 0.5273
2025-05-19 19:46:50,936 [INFO]     max = 7.8574
2025-05-19 19:46:50,936 [INFO]     stddev = 1.1175
2025-05-19 19:46:50,936 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:46:50,938 [INFO]     p5 = 6.6799
2025-05-19 19:46:50,938 [INFO]     p25 = 6.69

Running Requests: 100%|██████████| 64/64 [00:29<00:00,  2.19it/s]
Running Requests: 100%|██████████| 64/64 [00:24<00:00,  1.24s/it]

2025-05-19 19:47:15,898 [INFO] Tasks Executed!
2025-05-19 19:47:15,898 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-70b-fc-r queried with the sambastudio API.
2025-05-19 19:47:15,910 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:47:15,912 [INFO]     p5 = 6.9244
2025-05-19 19:47:15,912 [INFO]     p25 = 6.9306
2025-05-19 19:47:15,913 [INFO]     p50 = 10.3758
2025-05-19 19:47:15,913 [INFO]     p75 = 10.4116
2025-05-19 19:47:15,914 [INFO]     p90 = 12.6127
2025-05-19 19:47:15,914 [INFO]     p95 = 12.6132
2025-05-19 19:47:15,914 [INFO]     p99 = 12.6164
2025-05-19 19:47:15,915 [INFO]     mean = 9.7285
2025-05-19 19:47:15,915 [INFO]     min = 1.4581
2025-05-19 19:47:15,916 [INFO]     max = 12.6185
2025-05-19 19:47:15,916 [INFO]     stddev = 2.496
2025-05-19 19:47:15,916 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:47:15,918 [INFO]     p5 = 7.7569
2025-05-19 19:47:15,918 [INFO]     p25 

Running Requests: 100%|██████████| 64/64 [00:25<00:00,  2.54it/s]
Running Requests: 100%|██████████| 64/64 [01:54<00:00,  1.71s/it]

2025-05-19 19:49:10,732 [INFO] Tasks Executed!
2025-05-19 19:49:10,733 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-70b-fc-r queried with the sambastudio API.
2025-05-19 19:49:10,750 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:49:10,752 [INFO]     p5 = 1.0198
2025-05-19 19:49:10,752 [INFO]     p25 = 1.0592
2025-05-19 19:49:10,753 [INFO]     p50 = 1.1075
2025-05-19 19:49:10,754 [INFO]     p75 = 1.1892
2025-05-19 19:49:10,754 [INFO]     p90 = 1.237
2025-05-19 19:49:10,754 [INFO]     p95 = 1.2995
2025-05-19 19:49:10,755 [INFO]     p99 = 1.525
2025-05-19 19:49:10,756 [INFO]     mean = 1.1338
2025-05-19 19:49:10,756 [INFO]     min = 0.9982
2025-05-19 19:49:10,757 [INFO]     max = 1.7235
2025-05-19 19:49:10,757 [INFO]     stddev = 0.1134
2025-05-19 19:49:10,757 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:49:10,759 [INFO]     p5 = 1.6769
2025-05-19 19:49:10,759 [INFO]     p25 = 1.711

Running Requests: 100%|██████████| 64/64 [01:54<00:00,  1.79s/it]
Running Requests: 100%|██████████| 64/64 [01:42<00:00,  1.61s/it]

2025-05-19 19:50:54,256 [INFO] Tasks Executed!
2025-05-19 19:50:54,257 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-70b-fc-r queried with the sambastudio API.
2025-05-19 19:50:54,277 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:50:54,280 [INFO]     p5 = 2.5097
2025-05-19 19:50:54,280 [INFO]     p25 = 2.5279
2025-05-19 19:50:54,281 [INFO]     p50 = 2.5626
2025-05-19 19:50:54,281 [INFO]     p75 = 2.5981
2025-05-19 19:50:54,282 [INFO]     p90 = 2.6276
2025-05-19 19:50:54,282 [INFO]     p95 = 2.6388
2025-05-19 19:50:54,283 [INFO]     p99 = 2.667
2025-05-19 19:50:54,284 [INFO]     mean = 2.5422
2025-05-19 19:50:54,284 [INFO]     min = 1.0147
2025-05-19 19:50:54,284 [INFO]     max = 2.6904
2025-05-19 19:50:54,285 [INFO]     stddev = 0.1993
2025-05-19 19:50:54,285 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:50:54,287 [INFO]     p5 = 3.161
2025-05-19 19:50:54,287 [INFO]     p25 = 3.198

Running Requests: 100%|██████████| 64/64 [01:43<00:00,  1.62s/it]
Running Requests: 100%|██████████| 64/64 [01:29<00:00,  1.98s/it]

2025-05-19 19:52:24,577 [INFO] Tasks Executed!
2025-05-19 19:52:24,579 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-70b-fc-r queried with the sambastudio API.
2025-05-19 19:52:24,596 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:52:24,598 [INFO]     p5 = 4.8097
2025-05-19 19:52:24,599 [INFO]     p25 = 4.8489
2025-05-19 19:52:24,599 [INFO]     p50 = 4.8835
2025-05-19 19:52:24,600 [INFO]     p75 = 4.9215
2025-05-19 19:52:24,600 [INFO]     p90 = 4.9448
2025-05-19 19:52:24,600 [INFO]     p95 = 5.3382
2025-05-19 19:52:24,601 [INFO]     p99 = 5.4462
2025-05-19 19:52:24,602 [INFO]     mean = 4.8537
2025-05-19 19:52:24,602 [INFO]     min = 0.9834
2025-05-19 19:52:24,602 [INFO]     max = 5.4471
2025-05-19 19:52:24,603 [INFO]     stddev = 0.5114
2025-05-19 19:52:24,603 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:52:24,605 [INFO]     p5 = 5.5124
2025-05-19 19:52:24,605 [INFO]     p25 = 5.5

Running Requests: 100%|██████████| 64/64 [01:32<00:00,  1.45s/it]
Running Requests: 100%|██████████| 64/64 [01:10<00:00,  1.65s/it]

2025-05-19 19:53:38,353 [INFO] Tasks Executed!
2025-05-19 19:53:38,354 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-70b-fc-r queried with the sambastudio API.
2025-05-19 19:53:38,367 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:53:38,369 [INFO]     p5 = 7.8935
2025-05-19 19:53:38,370 [INFO]     p25 = 7.9095
2025-05-19 19:53:38,370 [INFO]     p50 = 7.969
2025-05-19 19:53:38,371 [INFO]     p75 = 8.0317
2025-05-19 19:53:38,371 [INFO]     p90 = 8.5295
2025-05-19 19:53:38,371 [INFO]     p95 = 8.5347
2025-05-19 19:53:38,372 [INFO]     p99 = 8.5544
2025-05-19 19:53:38,372 [INFO]     mean = 7.9302
2025-05-19 19:53:38,372 [INFO]     min = 1.0521
2025-05-19 19:53:38,373 [INFO]     max = 8.585
2025-05-19 19:53:38,373 [INFO]     stddev = 0.8964
2025-05-19 19:53:38,373 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:53:38,375 [INFO]     p5 = 8.7087
2025-05-19 19:53:38,375 [INFO]     p25 = 8.720

Running Requests: 100%|██████████| 64/64 [01:11<00:00,  1.11s/it]
Running Requests: 100%|██████████| 64/64 [01:01<00:00,  1.17s/it]

2025-05-19 19:54:40,998 [INFO] Tasks Executed!
2025-05-19 19:54:40,999 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-70b-fc-r queried with the sambastudio API.
2025-05-19 19:54:41,006 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:54:41,007 [INFO]     p5 = 14.1214
2025-05-19 19:54:41,007 [INFO]     p25 = 14.1877
2025-05-19 19:54:41,008 [INFO]     p50 = 14.224
2025-05-19 19:54:41,008 [INFO]     p75 = 14.5919
2025-05-19 19:54:41,008 [INFO]     p90 = 14.5962
2025-05-19 19:54:41,008 [INFO]     p95 = 14.5994
2025-05-19 19:54:41,009 [INFO]     p99 = 14.872
2025-05-19 19:54:41,010 [INFO]     mean = 14.1113
2025-05-19 19:54:41,010 [INFO]     min = 1.0301
2025-05-19 19:54:41,011 [INFO]     max = 15.1373
2025-05-19 19:54:41,011 [INFO]     stddev = 1.675
2025-05-19 19:54:41,012 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:54:41,013 [INFO]     p5 = 15.3165
2025-05-19 19:54:41,014 [INFO]     p2

Running Requests: 100%|██████████| 64/64 [01:02<00:00,  1.02it/s]
Running Requests: 100%|██████████| 64/64 [01:05<00:00,  3.54s/it]

2025-05-19 19:55:47,612 [INFO] Tasks Executed!
2025-05-19 19:55:47,613 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-70b-fc-r queried with the sambastudio API.
2025-05-19 19:55:47,631 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 19:55:47,633 [INFO]     p5 = 27.9766
2025-05-19 19:55:47,633 [INFO]     p25 = 27.9868
2025-05-19 19:55:47,634 [INFO]     p50 = 29.4941
2025-05-19 19:55:47,634 [INFO]     p75 = 29.524
2025-05-19 19:55:47,634 [INFO]     p90 = 29.5293
2025-05-19 19:55:47,635 [INFO]     p95 = 30.6329
2025-05-19 19:55:47,635 [INFO]     p99 = 33.6474
2025-05-19 19:55:47,636 [INFO]     mean = 28.1757
2025-05-19 19:55:47,636 [INFO]     min = 1.3396
2025-05-19 19:55:47,637 [INFO]     max = 33.6478
2025-05-19 19:55:47,637 [INFO]     stddev = 4.8121
2025-05-19 19:55:47,637 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 19:55:47,639 [INFO]     p5 = 30.0282
2025-05-19 19:55:47,639 [INFO]     

Running Requests: 100%|██████████| 64/64 [01:06<00:00,  1.03s/it]
Running Requests: 100%|██████████| 64/64 [08:20<00:00,  7.81s/it]

2025-05-19 20:04:08,224 [INFO] Tasks Executed!
2025-05-19 20:04:08,225 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-70b-fc-r queried with the sambastudio API.
2025-05-19 20:04:08,270 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 20:04:08,278 [INFO]     p5 = 6.8642
2025-05-19 20:04:08,280 [INFO]     p25 = 6.9187
2025-05-19 20:04:08,281 [INFO]     p50 = 6.9829
2025-05-19 20:04:08,285 [INFO]     p75 = 7.0469
2025-05-19 20:04:08,287 [INFO]     p90 = 7.0999
2025-05-19 20:04:08,288 [INFO]     p95 = 7.1333
2025-05-19 20:04:08,289 [INFO]     p99 = 7.4999
2025-05-19 20:04:08,294 [INFO]     mean = 6.9984
2025-05-19 20:04:08,297 [INFO]     min = 6.8335
2025-05-19 20:04:08,302 [INFO]     max = 7.8005
2025-05-19 20:04:08,304 [INFO]     stddev = 0.1353
2025-05-19 20:04:08,304 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 20:04:08,306 [INFO]     p5 = 7.6811
2025-05-19 20:04:08,306 [INFO]     p25 = 7.7

Running Requests: 100%|██████████| 64/64 [08:21<00:00,  7.84s/it]
Running Requests: 100%|██████████| 64/64 [08:05<00:00,  7.58s/it]

2025-05-19 20:12:14,829 [INFO] Tasks Executed!
2025-05-19 20:12:14,830 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-70b-fc-r queried with the sambastudio API.
2025-05-19 20:12:14,847 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 20:12:14,850 [INFO]     p5 = 14.3014
2025-05-19 20:12:14,850 [INFO]     p25 = 14.3162
2025-05-19 20:12:14,851 [INFO]     p50 = 14.3577
2025-05-19 20:12:14,851 [INFO]     p75 = 14.3812
2025-05-19 20:12:14,852 [INFO]     p90 = 14.411
2025-05-19 20:12:14,852 [INFO]     p95 = 14.4234
2025-05-19 20:12:14,853 [INFO]     p99 = 14.4692
2025-05-19 20:12:14,853 [INFO]     mean = 14.2398
2025-05-19 20:12:14,854 [INFO]     min = 6.8566
2025-05-19 20:12:14,854 [INFO]     max = 14.5205
2025-05-19 20:12:14,854 [INFO]     stddev = 0.9385
2025-05-19 20:12:14,855 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 20:12:14,856 [INFO]     p5 = 15.1337
2025-05-19 20:12:14,857 [INFO]     

Running Requests: 100%|██████████| 64/64 [08:06<00:00,  7.61s/it]
Running Requests: 100%|██████████| 64/64 [08:05<00:00,  7.58s/it]

2025-05-19 20:20:21,819 [INFO] Tasks Executed!
2025-05-19 20:20:21,822 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-70b-fc-r queried with the sambastudio API.
2025-05-19 20:20:21,841 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 20:20:21,844 [INFO]     p5 = 29.4377
2025-05-19 20:20:21,844 [INFO]     p25 = 29.4661
2025-05-19 20:20:21,845 [INFO]     p50 = 29.4884
2025-05-19 20:20:21,845 [INFO]     p75 = 29.5652
2025-05-19 20:20:21,846 [INFO]     p90 = 29.5803
2025-05-19 20:20:21,846 [INFO]     p95 = 29.5915
2025-05-19 20:20:21,847 [INFO]     p99 = 29.6225
2025-05-19 20:20:21,847 [INFO]     mean = 28.8077
2025-05-19 20:20:21,848 [INFO]     min = 6.9281
2025-05-19 20:20:21,848 [INFO]     max = 29.6607
2025-05-19 20:20:21,849 [INFO]     stddev = 3.4741
2025-05-19 20:20:21,849 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 20:20:21,851 [INFO]     p5 = 30.2816
2025-05-19 20:20:21,851 [INFO]    

Running Requests: 100%|██████████| 64/64 [08:06<00:00,  7.60s/it]
Running Requests: 100%|██████████| 64/64 [08:05<00:00,  7.58s/it]

2025-05-19 20:28:27,878 [INFO] Tasks Executed!
2025-05-19 20:28:27,879 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-70b-fc-r queried with the sambastudio API.
2025-05-19 20:28:27,898 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 20:28:27,900 [INFO]     p5 = 30.6743
2025-05-19 20:28:27,900 [INFO]     p25 = 59.763
2025-05-19 20:28:27,901 [INFO]     p50 = 59.7972
2025-05-19 20:28:27,901 [INFO]     p75 = 59.8309
2025-05-19 20:28:27,902 [INFO]     p90 = 59.8934
2025-05-19 20:28:27,902 [INFO]     p95 = 59.9009
2025-05-19 20:28:27,903 [INFO]     p99 = 59.9292
2025-05-19 20:28:27,903 [INFO]     mean = 56.5063
2025-05-19 20:28:27,904 [INFO]     min = 6.8467
2025-05-19 20:28:27,904 [INFO]     max = 59.94
2025-05-19 20:28:27,905 [INFO]     stddev = 10.7724
2025-05-19 20:28:27,905 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 20:28:27,907 [INFO]     p5 = 31.5067
2025-05-19 20:28:27,907 [INFO]     p

Running Requests: 100%|██████████| 64/64 [08:05<00:00,  7.59s/it]
Running Requests: 100%|██████████| 64/64 [08:05<00:00,  7.58s/it]

2025-05-19 20:36:33,742 [INFO] Tasks Executed!
2025-05-19 20:36:33,744 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-70b-fc-r queried with the sambastudio API.
2025-05-19 20:36:33,764 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 20:36:33,766 [INFO]     p5 = 30.7753
2025-05-19 20:36:33,767 [INFO]     p25 = 120.3658
2025-05-19 20:36:33,767 [INFO]     p50 = 120.4083
2025-05-19 20:36:33,768 [INFO]     p75 = 120.4476
2025-05-19 20:36:33,768 [INFO]     p90 = 120.524
2025-05-19 20:36:33,768 [INFO]     p95 = 120.5507
2025-05-19 20:36:33,769 [INFO]     p99 = 120.5746
2025-05-19 20:36:33,769 [INFO]     mean = 106.2557
2025-05-19 20:36:33,770 [INFO]     min = 6.8724
2025-05-19 20:36:33,770 [INFO]     max = 120.5788
2025-05-19 20:36:33,770 [INFO]     stddev = 30.3777
2025-05-19 20:36:33,771 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 20:36:33,772 [INFO]     p5 = 31.5773
2025-05-19 20:36:33,773 [I

Running Requests: 100%|██████████| 64/64 [08:08<00:00,  7.64s/it]
Running Requests: 100%|██████████| 64/64 [08:05<00:00,  7.58s/it]

2025-05-19 20:44:43,412 [INFO] Tasks Executed!
2025-05-19 20:44:43,413 [INFO] Benchmarking results obtained for model Salesforce--Llama-xLAM-2-70b-fc-r queried with the sambastudio API.
2025-05-19 20:44:43,465 [INFO] Building Client Metrics Summary for metric: client_ttft_s
2025-05-19 20:44:43,473 [INFO]     p5 = 31.6024
2025-05-19 20:44:43,475 [INFO]     p25 = 127.0769
2025-05-19 20:44:43,476 [INFO]     p50 = 241.5003
2025-05-19 20:44:43,478 [INFO]     p75 = 241.5663
2025-05-19 20:44:43,481 [INFO]     p90 = 241.66
2025-05-19 20:44:43,485 [INFO]     p95 = 241.6718
2025-05-19 20:44:43,489 [INFO]     p99 = 242.0805
2025-05-19 20:44:43,490 [INFO]     mean = 183.3796
2025-05-19 20:44:43,491 [INFO]     min = 7.734
2025-05-19 20:44:43,491 [INFO]     max = 242.6297
2025-05-19 20:44:43,491 [INFO]     stddev = 76.9887
2025-05-19 20:44:43,491 [INFO] Building Client Metrics Summary for metric: client_end_to_end_latency_s
2025-05-19 20:44:43,493 [INFO]     p5 = 32.4285
2025-05-19 20:44:43,493 [INF