Skip to content

feat: Add worker event recording#40

Merged
arekay-nv merged 13 commits intomainfrom
arekay/add_worker_event_recording
Dec 12, 2025
Merged

feat: Add worker event recording#40
arekay-nv merged 13 commits intomainfrom
arekay/add_worker_event_recording

Conversation

@arekay-nv
Copy link
Copy Markdown
Collaborator

@arekay-nv arekay-nv commented Dec 4, 2025

What does this PR do?

Adds support for event recording from the worker process. This helps to debug possible inconsistency in the request being sent from the main process (which schedules events globally) and those from the worker process (which sends the actual http request).
The http logging can be enabled via a config in the yaml for now:

  client:
    workers: 4
    record_worker_events: true

This will be supported by the CLI later as well.
The report generated is under the output_dir path specified or the default result_timestamp folder with the name worker_report_{worker_id}_{pid}.csv and will have request sent/complete events from each workers with the timestamps:

37cece1bc7c84678a57aec1e3f0c85ba,2148533351018776,request_sent
3d1667031bf044aa96f691df13cb0aa5,2148533936690034,request_sent
0ebb6514b65d4fa9a46f58981ac58d4c,2148336016937689,request_completed
9089146dca7d430e873f6971621bb365,2148336408427189,request_completed
4603d8002bf64cad8cf5546f91085834,2148337192828009,request_completed

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
@arekay-nv arekay-nv requested a review from a team as a code owner December 4, 2025 20:51
Copilot AI review requested due to automatic review settings December 4, 2025 20:51
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 4, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@github-actions github-actions Bot requested a review from nvzhihanj December 4, 2025 20:52
@arekay-nv arekay-nv requested review from Copilot, nv-alicheng and viraatc and removed request for Copilot, nv-alicheng and nvzhihanj December 4, 2025 20:52
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @arekay-nv, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a crucial debugging capability by enabling detailed event recording within worker processes. It allows for the capture of HTTP request lifecycle events, such as when a request is sent and completed, directly from the worker's perspective. This feature is configurable via a new http_logging setting and generates specific CSV reports per worker, which will be invaluable for diagnosing inconsistencies between the main process's event scheduling and the actual execution of HTTP requests by workers. Additionally, it includes minor improvements to benchmark calculations and enhances the flexibility of OpenAI request schemas.

Highlights

  • Worker Event Recording: Implemented a new feature to record events from worker processes, specifically for HTTP request lifecycles.
  • Configurable HTTP Logging: Introduced a http_logging configuration option in the YAML file to enable or disable this new event recording.
  • Detailed Worker Reports: When enabled, workers will generate worker_report_{worker_id}_{pid}.csv files containing request_sent and request_completed events with nanosecond timestamps.
  • Improved QPS Calculation: The benchmark script's QPS calculation was updated to use success_count instead of response_collector.count for more accurate reporting.
  • OpenAI Request Schema Flexibility: Made several fields in the OpenAI ChatCompletionRequest and ChatMessage schemas optional to increase flexibility.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@arekay-nv arekay-nv requested a review from anandhu-eng December 4, 2025 20:52
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds worker-side event recording capabilities to track HTTP request lifecycle events (REQUEST_SENT and REQUEST_COMPLETED) for debugging purposes. This helps identify potential discrepancies between events scheduled by the main process and those executed by worker processes.

  • Adds new SampleEvent types (REQUEST_SENT, REQUEST_COMPLETED) to track request lifecycle
  • Implements optional HTTP logging via configuration flag that generates per-worker CSV reports
  • Makes optional fields in OpenAI request structures to support flexible request construction
  • Updates QPS calculation to use only successful requests

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/inference_endpoint/openai/openai_msgspec_adapter.py Makes ChatMessage and ChatCompletionRequest fields optional with None defaults
src/inference_endpoint/metrics/reporter.py Adds dump_all_to_csv method to export REQUEST_SENT/REQUEST_COMPLETED events
src/inference_endpoint/metrics/recorder.py Adds timestamp_ns field to output buffer entries for event correlation
src/inference_endpoint/load_generator/events.py Defines new REQUEST_SENT and REQUEST_COMPLETED event types
src/inference_endpoint/endpoint_client/worker.py Implements worker-level event recording with optional CSV report generation
src/inference_endpoint/endpoint_client/configs.py Adds http_logging configuration flag to HTTPClientConfig
src/inference_endpoint/config/schema.py Adds http_logging setting to ClientSettings schema
src/inference_endpoint/commands/benchmark.py Updates QPS calculation to use success_count and passes http_logging config to HTTPClientConfig

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/inference_endpoint/endpoint_client/worker.py
Comment thread src/inference_endpoint/metrics/reporter.py Outdated
Comment thread src/inference_endpoint/commands/benchmark.py
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature for recording worker events, which will greatly aid in debugging. The implementation is well-structured, and the configuration options are clear. I've identified a couple of areas for improvement. The primary suggestion is to optimize the CSV dumping logic in reporter.py by consolidating two SQL queries into one and using the standard csv module for more robust file writing. Additionally, there's a minor cleanup of a commented-out debug statement. The other changes, including the fix for QPS calculation and making request fields optional, are solid improvements.

Comment thread src/inference_endpoint/metrics/reporter.py Outdated
Comment thread src/inference_endpoint/openai/openai_msgspec_adapter.py Outdated
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Copilot AI review requested due to automatic review settings December 4, 2025 21:47
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/inference_endpoint/endpoint_client/worker.py
Comment thread src/inference_endpoint/endpoint_client/worker.py
Comment thread src/inference_endpoint/endpoint_client/worker.py Outdated
Comment thread src/inference_endpoint/load_generator/events.py Outdated
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
@arekay-nv arekay-nv changed the title Add worker event recording feat: Add worker event recording Dec 5, 2025
@anandhu-eng
Copy link
Copy Markdown
Contributor

anandhu-eng commented Dec 7, 2025

Hi @arekay-nv , I was testing it out from my side and I'm getting the following messages for the entire 60 queries that I have sent:

[Worker-0-715289] ERROR - Request 026809cfcfc1438897bde65d18aab4d9 failed with HTTP Error: {"error":{"message":"[{'type': 'literal_error', 'loc': ('body', 'messages', 0, 'ChatCompletionDeveloperMessageParam', 'role'), 'msg': \"Input should be 'developer'\", 'input': 'user', 'ctx': {'expected': \"'developer'\"}}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'ChatCompletionDeveloperMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'literal_error', 'loc': ('body', 'messages', 0, 'ChatCompletionSystemMessageParam', 'role'), 'msg': \"Input should be 'system'\", 'input': 'user', 'ctx': {'expected': \"'system'\"}}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'ChatCompletionSystemMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'ChatCompletionUserMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'literal_error', 'loc': ('body', 'messages', 0, 'ChatCompletionAssistantMessageParam', 'role'), 'msg': \"Input should be 'assistant'\", 'input': 'user', 'ctx': {'expected': \"'assistant'\"}}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'ChatCompletionAssistantMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'literal_error', 'loc': ('body', 'messages', 0, 'ChatCompletionToolMessageParam', 'role'), 'msg': \"Input should be 'tool'\", 'input': 'user', 'ctx': {'expected': \"'tool'\"}}, {'type': 'missing', 'loc': ('body', 'messages', 0, 'ChatCompletionToolMessageParam', 'tool_call_id'), 'msg': 'Field required', 'input': {'role': 'user', 'content': 'Write a short story about artificial intelligence (case 750)', 'name': None}}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'ChatCompletionFunctionMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'literal_error', 'loc': ('body', 'messages', 0, 'ChatCompletionFunctionMessageParam', 'role'), 'msg': \"Input should be 'function'\", 'input': 'user', 'ctx': {'expected': \"'function'\"}}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'CustomChatCompletionMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'missing', 'loc': ('body', 'messages', 0, 'Message', 'author'), 'msg': 'Field required', 'input': {'role': 'user', 'content': 'Write a short story about artificial intelligence (case 750)', 'name': None}}, {'type': 'list_type', 'loc': ('body', 'messages', 0, 'Message', 'content'), 'msg': 'Input should be a valid list', 'input': 'Write a short story about artificial intelligence (case 750)'}]","type":"Bad Request","param":null,"code":400}} (worker:339)
2025-12-07 14:46:24,772 - inference_endpoint.load_generator.sample - ERROR - Error in request 026809cfcfc1438897bde65d18aab4d9: HTTP 400: {"error":{"message":"[{'type': 'literal_error', 'loc': ('body', 'messages', 0, 'ChatCompletionDeveloperMessageParam', 'role'), 'msg': \"Input should be 'developer'\", 'input': 'user', 'ctx': {'expected': \"'developer'\"}}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'ChatCompletionDeveloperMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'literal_error', 'loc': ('body', 'messages', 0, 'ChatCompletionSystemMessageParam', 'role'), 'msg': \"Input should be 'system'\", 'input': 'user', 'ctx': {'expected': \"'system'\"}}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'ChatCompletionSystemMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'ChatCompletionUserMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'literal_error', 'loc': ('body', 'messages', 0, 'ChatCompletionAssistantMessageParam', 'role'), 'msg': \"Input should be 'assistant'\", 'input': 'user', 'ctx': {'expected': \"'assistant'\"}}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'ChatCompletionAssistantMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'literal_error', 'loc': ('body', 'messages', 0, 'ChatCompletionToolMessageParam', 'role'), 'msg': \"Input should be 'tool'\", 'input': 'user', 'ctx': {'expected': \"'tool'\"}}, {'type': 'missing', 'loc': ('body', 'messages', 0, 'ChatCompletionToolMessageParam', 'tool_call_id'), 'msg': 'Field required', 'input': {'role': 'user', 'content': 'Write a short story about artificial intelligence (case 750)', 'name': None}}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'ChatCompletionFunctionMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'literal_error', 'loc': ('body', 'messages', 0, 'ChatCompletionFunctionMessageParam', 'role'), 'msg': \"Input should be 'function'\", 'input': 'user', 'ctx': {'expected': \"'function'\"}}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'CustomChatCompletionMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'missing', 'loc': ('body', 'messages', 0, 'Message', 'author'), 'msg': 'Field required', 'input': {'role': 'user', 'content': 'Write a short story about artificial intelligence (case 750)', 'name': None}}, {'type': 'list_type', 'loc': ('body', 'messages', 0, 'Message', 'content'), 'msg': 'Input should be a valid list', 'input': 'Write a short story about artificial intelligence (case 750)'}]","type":"Bad Request","param":null,"code":400}}
....

Also, the summary seems to be considering every query as successful:

----------------- Summary -----------------
Total samples issued: 60
Total samples completed: 60
Duration: 4.82 seconds
QPS: 12.45
TPS: 0.00

But the logs at the end seems to be correct:

2025-12-07 14:46:29,312 - inference_endpoint.commands.benchmark - INFO - Completed in 5.1s
2025-12-07 14:46:29,312 - inference_endpoint.commands.benchmark - INFO - Results: 0/60 successful
2025-12-07 14:46:29,312 - inference_endpoint.commands.benchmark - INFO - Estimated QPS: 0.0
2025-12-07 14:46:29,312 - inference_endpoint.commands.benchmark - WARNING - Errors: 60
2025-12-07 14:46:29,312 - inference_endpoint.commands.benchmark - INFO - Cleaning up...
Qwen/Qwen2.5-1.5B (Streaming: True): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [00:08<00:00,  6.81it/s]
2025-12-07 14:46:29,313 - inference_endpoint.endpoint_client.http_client - INFO - Shutting down HTTP endpoint client...
[Worker-0-715289] INFO  - Worker 0 will cancel 0 tasks and cleanup. (worker:495)
2025-12-07 14:46:29,870 - inference_endpoint.endpoint_client.http_client - INFO - HTTP endpoint client shutdown complete.

The same run is appearing to be successful when ran from the main branch:

(endpoints) anandhusooraj@mlc2:~/endp$ inference-endpoint benchmark from-config --output benchmark_results.json --config config-template.yaml 
2025-12-07 14:53:27,772 - inference_endpoint.main - INFO - Starting MLPerf Inference Endpoint Benchmarking System
2025-12-07 14:53:27,777 - inference_endpoint.config.yaml_loader - INFO - Loaded config: test-name (type: TestType.ONLINE)
2025-12-07 14:53:27,777 - inference_endpoint.commands.benchmark - INFO - Loading tokenizer for model: Qwen/Qwen2.5-1.5B
2025-12-07 14:53:28,173 - inference_endpoint.commands.benchmark - INFO - Tokenizer loaded successfully
2025-12-07 14:53:28,174 - inference_endpoint.commands.benchmark - INFO - Inferred dataset format: pkl
2025-12-07 14:53:28,174 - inference_endpoint.commands.benchmark - INFO - Loading: tests/datasets/dummy_1k.pkl (format: pkl)
2025-12-07 14:53:28,174 - inference_endpoint.commands.benchmark - INFO - Streaming: enabled (auto, online mode)
2025-12-07 14:53:28,174 - inference_endpoint.commands.benchmark - INFO - Parser key maps: None
2025-12-07 14:53:28,174 - inference_endpoint.dataset_manager.factory - INFO - Creating pickle dataset loader for tests/datasets/dummy_1k.pkl
2025-12-07 14:53:28,177 - inference_endpoint.commands.benchmark - INFO - Loaded 1000 samples
2025-12-07 14:53:28,177 - inference_endpoint.commands.benchmark - INFO - Mode: TestMode.PERF, Target QPS: 10.0, Responses: False
2025-12-07 14:53:28,177 - inference_endpoint.commands.benchmark - INFO - Min Duration: 600.0s, Expected samples: 60
2025-12-07 14:53:28,177 - inference_endpoint.commands.benchmark - INFO - Scheduler: PoissonDistributionScheduler (pattern: poisson)
Qwen/Qwen2.5-1.5B (Streaming: True):   0%|                                                                                                                                           | 0/60 [00:00<?, ?it/s]2025-12-07 14:53:28,179 - inference_endpoint.commands.benchmark - INFO - Connecting: http://localhost:1143
2025-12-07 14:53:28,179 - inference_endpoint.commands.benchmark - INFO - Client config: workers=1
2025-12-07 14:53:28,678 - inference_endpoint.endpoint_client.http_client - INFO - HTTP endpoint client using adapter: OpenAIMsgspecAdapter
2025-12-07 14:53:28,680 - inference_endpoint.endpoint_client.worker - INFO - Starting 1 worker processes
[Worker-0-719884] INFO  - Worker 0 started and ready (worker:184)
2025-12-07 14:53:31,926 - inference_endpoint.endpoint_client.worker - INFO - 1/1 workers ready
2025-12-07 14:53:31,927 - inference_endpoint.commands.benchmark - INFO - Running...
Qwen/Qwen2.5-1.5B (Streaming: True):  97%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋    | 58/60 [00:15<00:00,  9.69it/s]2025-12-07 14:53:44,087 - inference_endpoint.load_generator.session - INFO - Waiting for the test to end... 0 samples remaining
----------------- Summary -----------------
Total samples issued: 60
Total samples completed: 60
Duration: 12.16 seconds
QPS: 4.94
TPS: 7040.61


------------------- Latency Breakdowns -------------------
TTFT:
  Min: 17.23 ms
  Max: 177.56 ms
  Median: 22.15 ms
  Avg.: 30.66 ms
  Std Dev.: 33.54 ms

  Histogram:
      [17.23, 33.27) |############################## 56
      [33.27, 49.30) | 0
      [49.30, 65.33) | 0
      [65.33, 81.37) | 0
      [81.37, 97.40) | 0
     [97.40, 113.43) | 0
    [113.43, 129.46) | 1
    [129.46, 145.50) | 1
    [145.50, 161.53) | 0
    [161.53, 177.56) |# 2

  Percentiles:
  99.9: 177.43 ms
    99: 176.25 ms
    95: 118.80 ms
    90: 28.03 ms
    80: 25.62 ms
    75: 25.00 ms
    50: 22.15 ms
    25: 19.20 ms
    10: 18.08 ms
     5: 17.78 ms
     1: 17.32 ms


TPOT (request_weighted):
  Min: 2.67 ms
  Max: 3.85 ms
  Median: 3.62 ms
  Avg.: 3.50 ms
  Std Dev.: 0.28 ms

  Histogram:
    [2.67, 2.79) |### 3
    [2.79, 2.91) | 0
    [2.91, 3.02) |## 2
    [3.02, 3.14) |##### 5
    [3.14, 3.26) |# 1
    [3.26, 3.38) |## 2
    [3.38, 3.50) |### 3
    [3.50, 3.61) |############## 13
    [3.61, 3.73) |############################# 26
    [3.73, 3.85) |##### 5

  Percentiles:
  99.9: 3.85 ms
    99: 3.84 ms
    95: 3.76 ms
    90: 3.70 ms
    80: 3.69 ms
    75: 3.68 ms
    50: 3.62 ms
    25: 3.45 ms
    10: 3.05 ms
     5: 2.97 ms
     1: 2.68 ms


Latency:
  Min: 51.01 ms
  Max: 7599.37 ms
  Median: 7289.60 ms
  Avg.: 5167.79 ms
  Std Dev.: 3086.23 ms

  Histogram:
       [51.01, 805.85) |######## 11
     [805.85, 1560.68) |### 5
    [1560.68, 2315.52) |## 3
    [2315.52, 3070.36) | 1
    [3070.36, 3825.19) | 0
    [3825.19, 4580.03) | 0
    [4580.03, 5334.86) |# 2
    [5334.86, 6089.70) | 0
    [6089.70, 6844.53) | 0
    [6844.53, 7599.37) |############################## 38

  Percentiles:
  99.9: 7599.05 ms
    99: 7596.14 ms
    95: 7586.57 ms
    90: 7579.93 ms
    80: 7540.86 ms
    75: 7526.00 ms
    50: 7289.60 ms
    25: 1235.69 ms
    10: 501.05 ms
     5: 149.97 ms
     1: 58.67 ms


Output sequence lengths:
  Min: 12.00 tokens
  Max: 2313.00 tokens
  Median: 2048.00 tokens
  Avg.: 1426.63 tokens
  Std Dev.: 849.29 tokens

  Histogram:
       [12.00, 242.10) |######## 11
      [242.10, 472.20) |##### 7
      [472.20, 702.30) | 1
      [702.30, 932.40) | 1
     [932.40, 1162.50) | 0
    [1162.50, 1392.60) |# 2
    [1392.60, 1622.70) | 0
    [1622.70, 1852.80) | 0
    [1852.80, 2082.90) |############################## 37
    [2082.90, 2313.00) | 1

  Percentiles:
  99.9: 2297.42 tokens
    99: 2157.24 tokens
    95: 2048.00 tokens
    90: 2048.00 tokens
    80: 2048.00 tokens
    75: 2048.00 tokens
    50: 2048.00 tokens
    25: 388.75 tokens
    10: 149.40 tokens
     5: 35.90 tokens
     1: 13.18 tokens


2025-12-07 14:53:44,759 - inference_endpoint.commands.benchmark - INFO - Completed in 12.8s
2025-12-07 14:53:44,759 - inference_endpoint.commands.benchmark - INFO - Results: 60/60 successful
2025-12-07 14:53:44,760 - inference_endpoint.commands.benchmark - INFO - Estimated QPS: 4.7
2025-12-07 14:53:44,760 - inference_endpoint.commands.benchmark - INFO - Cleaning up...
Qwen/Qwen2.5-1.5B (Streaming: True): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [00:16<00:00,  3.62it/s]
2025-12-07 14:53:44,760 - inference_endpoint.endpoint_client.http_client - INFO - Shutting down HTTP endpoint client...
[Worker-0-719884] INFO  - Worker 0 will cancel 0 tasks and cleanup. (worker:411)
2025-12-07 14:53:45,323 - inference_endpoint.endpoint_client.http_client - INFO - HTTP endpoint client shutdown complete.

The following is the content of my config file:

(endpoints) anandhusooraj@mlc2:~/endp$ cat config-template.yaml 
name: "test-name"
type: "online" # offline|online|eval|submission
benchmark_mode: "online" # Required for submission: offline or online

submission_ref:
  model: "Qwen/Qwen2.5-1.5B"
  ruleset: "mlperf-inference-v5.1"

model_params:
  temperature: 0.7
  max_new_tokens: 2048

datasets:
  - name: "perf"
    type: "performance"
    path: "tests/datasets/dummy_1k.pkl"

settings:
  runtime:
    #min_duration_ms: 600000 # 10 minutes
    n_samples_to_issue: 60 # Optional: explicit sample count (null = auto-calculate)
    scheduler_random_seed: 42 # For Poisson/distribution sampling
    dataloader_random_seed: 42 # For dataset shuffling
  load_pattern:
    type: "poisson"
    target_qps: 10.0
  client:
    workers: 1
    max_concurrency: 1 # -1 = unlimited

metrics:
  collect: ["throughput", "latency", "ttft", "tpot"]

endpoint_config:
  endpoint: "http://localhost:1143"
  api_key: null

The following is part of the css file being created for event recording:

(endpoints) anandhusooraj@mlc2:~/endp$ cat worker_report_0_724955.csv 
3f481728850f49bc8be63a56c4f31f9c,12175397861290669,zmq_request_received
341c6a546cf046dfae9134239c07bbf4,12175397861419002,zmq_request_received
b58d37a7907e49eda7477776b135d99d,12175397861488536,zmq_request_received
3f481728850f49bc8be63a56c4f31f9c,12175397861608305,http_request_issued
341c6a546cf046dfae9134239c07bbf4,12175397862489672,http_request_issued
b58d37a7907e49eda7477776b135d99d,12175397862658449,http_request_issued
5dd7b7c8ab0a4a5eaf995d0865595e54,12175397866231802,zmq_request_received
5dd7b7c8ab0a4a5eaf995d0865595e54,12175397866345653,http_request_issued
b58d37a7907e49eda7477776b135d99d,12175397869016890,zmq_response_sent
341c6a546cf046dfae9134239c07bbf4,12175397869284830,zmq_response_sent
3f481728850f49bc8be63a56c4f31f9c,12175397870918875,zmq_response_sent
5dd7b7c8ab0a4a5eaf995d0865595e54,12175397872687208,zmq_response_sent
...

Copilot AI review requested due to automatic review settings December 9, 2025 17:26
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/inference_endpoint/endpoint_client/worker.py
Comment thread src/inference_endpoint/endpoint_client/worker.py
Comment thread src/inference_endpoint/endpoint_client/worker.py Fixed
@arekay-nv arekay-nv requested a review from nv-alicheng December 9, 2025 19:14
Copilot AI review requested due to automatic review settings December 10, 2025 01:42
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/inference_endpoint/endpoint_client/worker.py
@arekay-nv
Copy link
Copy Markdown
Collaborator Author

@anandhu-eng can you retry. I was able to run the config you have with vllm.

@arekay-nv
Copy link
Copy Markdown
Collaborator Author

@anandhu-eng can you retry. I was able to run the config you have with vllm.

I tested using vllm - and have noticed that some frameworks have different requirements for input formats. Let me know if you are using something different to server the model and i can retry.

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
@arekay-nv arekay-nv requested a review from viraatc December 10, 2025 18:25
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Copilot AI review requested due to automatic review settings December 10, 2025 18:51
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/inference_endpoint/endpoint_client/worker.py
@anandhu-eng
Copy link
Copy Markdown
Contributor

@anandhu-eng can you retry. I was able to run the config you have with vllm.

Hi @arekay-nv , I was able to run it successfully this time. I am serving the model with vLLM using the following command:

python -m vllm.entrypoints.openai.api_server     --model Qwen/Qwen2.5-1.5B     --host 0.0.0.0     --port 1143

I noticed that the worker_report_** file is created in the directory from which the benchmark is launched. Would it be better to generate this file in the same folder where the benchmark report is stored?

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Comment thread src/inference_endpoint/commands/benchmark.py Outdated
Comment thread src/inference_endpoint/commands/benchmark.py
Comment thread src/inference_endpoint/commands/probe.py
Comment thread src/inference_endpoint/endpoint_client/worker.py
Comment thread tests/integration/commands/test_probe_command.py
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Copilot AI review requested due to automatic review settings December 12, 2025 15:52
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/inference_endpoint/endpoint_client/worker.py
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
@arekay-nv arekay-nv merged commit 9a7cdbe into main Dec 12, 2025
4 checks passed
@arekay-nv arekay-nv deleted the arekay/add_worker_event_recording branch December 12, 2025 16:03
@github-actions github-actions Bot locked and limited conversation to collaborators Dec 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants