feat: Add worker event recording by arekay-nv · Pull Request #40 · mlcommons/endpoints

arekay-nv · 2025-12-04T20:51:49Z

What does this PR do?

Adds support for event recording from the worker process. This helps to debug possible inconsistency in the request being sent from the main process (which schedules events globally) and those from the worker process (which sends the actual http request).
The http logging can be enabled via a config in the yaml for now:

  client:
    workers: 4
    record_worker_events: true

This will be supported by the CLI later as well.
The report generated is under the output_dir path specified or the default result_timestamp folder with the name worker_report_{worker_id}_{pid}.csv and will have request sent/complete events from each workers with the timestamps:

37cece1bc7c84678a57aec1e3f0c85ba,2148533351018776,request_sent
3d1667031bf044aa96f691df13cb0aa5,2148533936690034,request_sent
0ebb6514b65d4fa9a46f58981ac58d4c,2148336016937689,request_completed
9089146dca7d430e873f6971621bb365,2148336408427189,request_completed
4603d8002bf64cad8cf5546f91085834,2148337192828009,request_completed

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

github-actions · 2025-12-04T20:52:01Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist · 2025-12-04T20:52:12Z

Summary of Changes

Hello @arekay-nv, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a crucial debugging capability by enabling detailed event recording within worker processes. It allows for the capture of HTTP request lifecycle events, such as when a request is sent and completed, directly from the worker's perspective. This feature is configurable via a new http_logging setting and generates specific CSV reports per worker, which will be invaluable for diagnosing inconsistencies between the main process's event scheduling and the actual execution of HTTP requests by workers. Additionally, it includes minor improvements to benchmark calculations and enhances the flexibility of OpenAI request schemas.

Highlights

Worker Event Recording: Implemented a new feature to record events from worker processes, specifically for HTTP request lifecycles.
Configurable HTTP Logging: Introduced a http_logging configuration option in the YAML file to enable or disable this new event recording.
Detailed Worker Reports: When enabled, workers will generate worker_report_{worker_id}_{pid}.csv files containing request_sent and request_completed events with nanosecond timestamps.
Improved QPS Calculation: The benchmark script's QPS calculation was updated to use success_count instead of response_collector.count for more accurate reporting.
OpenAI Request Schema Flexibility: Made several fields in the OpenAI ChatCompletionRequest and ChatMessage schemas optional to increase flexibility.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copilot

Pull request overview

This PR adds worker-side event recording capabilities to track HTTP request lifecycle events (REQUEST_SENT and REQUEST_COMPLETED) for debugging purposes. This helps identify potential discrepancies between events scheduled by the main process and those executed by worker processes.

Adds new SampleEvent types (REQUEST_SENT, REQUEST_COMPLETED) to track request lifecycle
Implements optional HTTP logging via configuration flag that generates per-worker CSV reports
Makes optional fields in OpenAI request structures to support flexible request construction
Updates QPS calculation to use only successful requests

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/inference_endpoint/openai/openai_msgspec_adapter.py	Makes ChatMessage and ChatCompletionRequest fields optional with None defaults
src/inference_endpoint/metrics/reporter.py	Adds dump_all_to_csv method to export REQUEST_SENT/REQUEST_COMPLETED events
src/inference_endpoint/metrics/recorder.py	Adds timestamp_ns field to output buffer entries for event correlation
src/inference_endpoint/load_generator/events.py	Defines new REQUEST_SENT and REQUEST_COMPLETED event types
src/inference_endpoint/endpoint_client/worker.py	Implements worker-level event recording with optional CSV report generation
src/inference_endpoint/endpoint_client/configs.py	Adds http_logging configuration flag to HTTPClientConfig
src/inference_endpoint/config/schema.py	Adds http_logging setting to ClientSettings schema
src/inference_endpoint/commands/benchmark.py	Updates QPS calculation to use success_count and passes http_logging config to HTTPClientConfig

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

gemini-code-assist

Code Review

This pull request introduces a valuable feature for recording worker events, which will greatly aid in debugging. The implementation is well-structured, and the configuration options are clear. I've identified a couple of areas for improvement. The primary suggestion is to optimize the CSV dumping logic in reporter.py by consolidating two SQL queries into one and using the standard csv module for more robust file writing. Additionally, there's a minor cleanup of a commented-out debug statement. The other changes, including the fix for QPS calculation and making request fields optional, are solid improvements.

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

anandhu-eng · 2025-12-07T14:57:35Z

Hi @arekay-nv , I was testing it out from my side and I'm getting the following messages for the entire 60 queries that I have sent:

[Worker-0-715289] ERROR - Request 026809cfcfc1438897bde65d18aab4d9 failed with HTTP Error: {"error":{"message":"[{'type': 'literal_error', 'loc': ('body', 'messages', 0, 'ChatCompletionDeveloperMessageParam', 'role'), 'msg': \"Input should be 'developer'\", 'input': 'user', 'ctx': {'expected': \"'developer'\"}}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'ChatCompletionDeveloperMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'literal_error', 'loc': ('body', 'messages', 0, 'ChatCompletionSystemMessageParam', 'role'), 'msg': \"Input should be 'system'\", 'input': 'user', 'ctx': {'expected': \"'system'\"}}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'ChatCompletionSystemMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'ChatCompletionUserMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'literal_error', 'loc': ('body', 'messages', 0, 'ChatCompletionAssistantMessageParam', 'role'), 'msg': \"Input should be 'assistant'\", 'input': 'user', 'ctx': {'expected': \"'assistant'\"}}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'ChatCompletionAssistantMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'literal_error', 'loc': ('body', 'messages', 0, 'ChatCompletionToolMessageParam', 'role'), 'msg': \"Input should be 'tool'\", 'input': 'user', 'ctx': {'expected': \"'tool'\"}}, {'type': 'missing', 'loc': ('body', 'messages', 0, 'ChatCompletionToolMessageParam', 'tool_call_id'), 'msg': 'Field required', 'input': {'role': 'user', 'content': 'Write a short story about artificial intelligence (case 750)', 'name': None}}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'ChatCompletionFunctionMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'literal_error', 'loc': ('body', 'messages', 0, 'ChatCompletionFunctionMessageParam', 'role'), 'msg': \"Input should be 'function'\", 'input': 'user', 'ctx': {'expected': \"'function'\"}}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'CustomChatCompletionMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'missing', 'loc': ('body', 'messages', 0, 'Message', 'author'), 'msg': 'Field required', 'input': {'role': 'user', 'content': 'Write a short story about artificial intelligence (case 750)', 'name': None}}, {'type': 'list_type', 'loc': ('body', 'messages', 0, 'Message', 'content'), 'msg': 'Input should be a valid list', 'input': 'Write a short story about artificial intelligence (case 750)'}]","type":"Bad Request","param":null,"code":400}} (worker:339)
2025-12-07 14:46:24,772 - inference_endpoint.load_generator.sample - ERROR - Error in request 026809cfcfc1438897bde65d18aab4d9: HTTP 400: {"error":{"message":"[{'type': 'literal_error', 'loc': ('body', 'messages', 0, 'ChatCompletionDeveloperMessageParam', 'role'), 'msg': \"Input should be 'developer'\", 'input': 'user', 'ctx': {'expected': \"'developer'\"}}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'ChatCompletionDeveloperMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'literal_error', 'loc': ('body', 'messages', 0, 'ChatCompletionSystemMessageParam', 'role'), 'msg': \"Input should be 'system'\", 'input': 'user', 'ctx': {'expected': \"'system'\"}}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'ChatCompletionSystemMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'ChatCompletionUserMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'literal_error', 'loc': ('body', 'messages', 0, 'ChatCompletionAssistantMessageParam', 'role'), 'msg': \"Input should be 'assistant'\", 'input': 'user', 'ctx': {'expected': \"'assistant'\"}}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'ChatCompletionAssistantMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'literal_error', 'loc': ('body', 'messages', 0, 'ChatCompletionToolMessageParam', 'role'), 'msg': \"Input should be 'tool'\", 'input': 'user', 'ctx': {'expected': \"'tool'\"}}, {'type': 'missing', 'loc': ('body', 'messages', 0, 'ChatCompletionToolMessageParam', 'tool_call_id'), 'msg': 'Field required', 'input': {'role': 'user', 'content': 'Write a short story about artificial intelligence (case 750)', 'name': None}}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'ChatCompletionFunctionMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'literal_error', 'loc': ('body', 'messages', 0, 'ChatCompletionFunctionMessageParam', 'role'), 'msg': \"Input should be 'function'\", 'input': 'user', 'ctx': {'expected': \"'function'\"}}, {'type': 'string_type', 'loc': ('body', 'messages', 0, 'CustomChatCompletionMessageParam', 'name'), 'msg': 'Input should be a valid string', 'input': None}, {'type': 'missing', 'loc': ('body', 'messages', 0, 'Message', 'author'), 'msg': 'Field required', 'input': {'role': 'user', 'content': 'Write a short story about artificial intelligence (case 750)', 'name': None}}, {'type': 'list_type', 'loc': ('body', 'messages', 0, 'Message', 'content'), 'msg': 'Input should be a valid list', 'input': 'Write a short story about artificial intelligence (case 750)'}]","type":"Bad Request","param":null,"code":400}}
....

Also, the summary seems to be considering every query as successful:

----------------- Summary -----------------
Total samples issued: 60
Total samples completed: 60
Duration: 4.82 seconds
QPS: 12.45
TPS: 0.00

But the logs at the end seems to be correct:

2025-12-07 14:46:29,312 - inference_endpoint.commands.benchmark - INFO - Completed in 5.1s
2025-12-07 14:46:29,312 - inference_endpoint.commands.benchmark - INFO - Results: 0/60 successful
2025-12-07 14:46:29,312 - inference_endpoint.commands.benchmark - INFO - Estimated QPS: 0.0
2025-12-07 14:46:29,312 - inference_endpoint.commands.benchmark - WARNING - Errors: 60
2025-12-07 14:46:29,312 - inference_endpoint.commands.benchmark - INFO - Cleaning up...
Qwen/Qwen2.5-1.5B (Streaming: True): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [00:08<00:00,  6.81it/s]
2025-12-07 14:46:29,313 - inference_endpoint.endpoint_client.http_client - INFO - Shutting down HTTP endpoint client...
[Worker-0-715289] INFO  - Worker 0 will cancel 0 tasks and cleanup. (worker:495)
2025-12-07 14:46:29,870 - inference_endpoint.endpoint_client.http_client - INFO - HTTP endpoint client shutdown complete.

The same run is appearing to be successful when ran from the main branch:

(endpoints) anandhusooraj@mlc2:~/endp$ inference-endpoint benchmark from-config --output benchmark_results.json --config config-template.yaml 
2025-12-07 14:53:27,772 - inference_endpoint.main - INFO - Starting MLPerf Inference Endpoint Benchmarking System
2025-12-07 14:53:27,777 - inference_endpoint.config.yaml_loader - INFO - Loaded config: test-name (type: TestType.ONLINE)
2025-12-07 14:53:27,777 - inference_endpoint.commands.benchmark - INFO - Loading tokenizer for model: Qwen/Qwen2.5-1.5B
2025-12-07 14:53:28,173 - inference_endpoint.commands.benchmark - INFO - Tokenizer loaded successfully
2025-12-07 14:53:28,174 - inference_endpoint.commands.benchmark - INFO - Inferred dataset format: pkl
2025-12-07 14:53:28,174 - inference_endpoint.commands.benchmark - INFO - Loading: tests/datasets/dummy_1k.pkl (format: pkl)
2025-12-07 14:53:28,174 - inference_endpoint.commands.benchmark - INFO - Streaming: enabled (auto, online mode)
2025-12-07 14:53:28,174 - inference_endpoint.commands.benchmark - INFO - Parser key maps: None
2025-12-07 14:53:28,174 - inference_endpoint.dataset_manager.factory - INFO - Creating pickle dataset loader for tests/datasets/dummy_1k.pkl
2025-12-07 14:53:28,177 - inference_endpoint.commands.benchmark - INFO - Loaded 1000 samples
2025-12-07 14:53:28,177 - inference_endpoint.commands.benchmark - INFO - Mode: TestMode.PERF, Target QPS: 10.0, Responses: False
2025-12-07 14:53:28,177 - inference_endpoint.commands.benchmark - INFO - Min Duration: 600.0s, Expected samples: 60
2025-12-07 14:53:28,177 - inference_endpoint.commands.benchmark - INFO - Scheduler: PoissonDistributionScheduler (pattern: poisson)
Qwen/Qwen2.5-1.5B (Streaming: True):   0%|                                                                                                                                           | 0/60 [00:00<?, ?it/s]2025-12-07 14:53:28,179 - inference_endpoint.commands.benchmark - INFO - Connecting: http://localhost:1143
2025-12-07 14:53:28,179 - inference_endpoint.commands.benchmark - INFO - Client config: workers=1
2025-12-07 14:53:28,678 - inference_endpoint.endpoint_client.http_client - INFO - HTTP endpoint client using adapter: OpenAIMsgspecAdapter
2025-12-07 14:53:28,680 - inference_endpoint.endpoint_client.worker - INFO - Starting 1 worker processes
[Worker-0-719884] INFO  - Worker 0 started and ready (worker:184)
2025-12-07 14:53:31,926 - inference_endpoint.endpoint_client.worker - INFO - 1/1 workers ready
2025-12-07 14:53:31,927 - inference_endpoint.commands.benchmark - INFO - Running...
Qwen/Qwen2.5-1.5B (Streaming: True):  97%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋    | 58/60 [00:15<00:00,  9.69it/s]2025-12-07 14:53:44,087 - inference_endpoint.load_generator.session - INFO - Waiting for the test to end... 0 samples remaining
----------------- Summary -----------------
Total samples issued: 60
Total samples completed: 60
Duration: 12.16 seconds
QPS: 4.94
TPS: 7040.61


------------------- Latency Breakdowns -------------------
TTFT:
  Min: 17.23 ms
  Max: 177.56 ms
  Median: 22.15 ms
  Avg.: 30.66 ms
  Std Dev.: 33.54 ms

  Histogram:
      [17.23, 33.27) |############################## 56
      [33.27, 49.30) | 0
      [49.30, 65.33) | 0
      [65.33, 81.37) | 0
      [81.37, 97.40) | 0
     [97.40, 113.43) | 0
    [113.43, 129.46) | 1
    [129.46, 145.50) | 1
    [145.50, 161.53) | 0
    [161.53, 177.56) |# 2

  Percentiles:
  99.9: 177.43 ms
    99: 176.25 ms
    95: 118.80 ms
    90: 28.03 ms
    80: 25.62 ms
    75: 25.00 ms
    50: 22.15 ms
    25: 19.20 ms
    10: 18.08 ms
     5: 17.78 ms
     1: 17.32 ms


TPOT (request_weighted):
  Min: 2.67 ms
  Max: 3.85 ms
  Median: 3.62 ms
  Avg.: 3.50 ms
  Std Dev.: 0.28 ms

  Histogram:
    [2.67, 2.79) |### 3
    [2.79, 2.91) | 0
    [2.91, 3.02) |## 2
    [3.02, 3.14) |##### 5
    [3.14, 3.26) |# 1
    [3.26, 3.38) |## 2
    [3.38, 3.50) |### 3
    [3.50, 3.61) |############## 13
    [3.61, 3.73) |############################# 26
    [3.73, 3.85) |##### 5

  Percentiles:
  99.9: 3.85 ms
    99: 3.84 ms
    95: 3.76 ms
    90: 3.70 ms
    80: 3.69 ms
    75: 3.68 ms
    50: 3.62 ms
    25: 3.45 ms
    10: 3.05 ms
     5: 2.97 ms
     1: 2.68 ms


Latency:
  Min: 51.01 ms
  Max: 7599.37 ms
  Median: 7289.60 ms
  Avg.: 5167.79 ms
  Std Dev.: 3086.23 ms

  Histogram:
       [51.01, 805.85) |######## 11
     [805.85, 1560.68) |### 5
    [1560.68, 2315.52) |## 3
    [2315.52, 3070.36) | 1
    [3070.36, 3825.19) | 0
    [3825.19, 4580.03) | 0
    [4580.03, 5334.86) |# 2
    [5334.86, 6089.70) | 0
    [6089.70, 6844.53) | 0
    [6844.53, 7599.37) |############################## 38

  Percentiles:
  99.9: 7599.05 ms
    99: 7596.14 ms
    95: 7586.57 ms
    90: 7579.93 ms
    80: 7540.86 ms
    75: 7526.00 ms
    50: 7289.60 ms
    25: 1235.69 ms
    10: 501.05 ms
     5: 149.97 ms
     1: 58.67 ms


Output sequence lengths:
  Min: 12.00 tokens
  Max: 2313.00 tokens
  Median: 2048.00 tokens
  Avg.: 1426.63 tokens
  Std Dev.: 849.29 tokens

  Histogram:
       [12.00, 242.10) |######## 11
      [242.10, 472.20) |##### 7
      [472.20, 702.30) | 1
      [702.30, 932.40) | 1
     [932.40, 1162.50) | 0
    [1162.50, 1392.60) |# 2
    [1392.60, 1622.70) | 0
    [1622.70, 1852.80) | 0
    [1852.80, 2082.90) |############################## 37
    [2082.90, 2313.00) | 1

  Percentiles:
  99.9: 2297.42 tokens
    99: 2157.24 tokens
    95: 2048.00 tokens
    90: 2048.00 tokens
    80: 2048.00 tokens
    75: 2048.00 tokens
    50: 2048.00 tokens
    25: 388.75 tokens
    10: 149.40 tokens
     5: 35.90 tokens
     1: 13.18 tokens


2025-12-07 14:53:44,759 - inference_endpoint.commands.benchmark - INFO - Completed in 12.8s
2025-12-07 14:53:44,759 - inference_endpoint.commands.benchmark - INFO - Results: 60/60 successful
2025-12-07 14:53:44,760 - inference_endpoint.commands.benchmark - INFO - Estimated QPS: 4.7
2025-12-07 14:53:44,760 - inference_endpoint.commands.benchmark - INFO - Cleaning up...
Qwen/Qwen2.5-1.5B (Streaming: True): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [00:16<00:00,  3.62it/s]
2025-12-07 14:53:44,760 - inference_endpoint.endpoint_client.http_client - INFO - Shutting down HTTP endpoint client...
[Worker-0-719884] INFO  - Worker 0 will cancel 0 tasks and cleanup. (worker:411)
2025-12-07 14:53:45,323 - inference_endpoint.endpoint_client.http_client - INFO - HTTP endpoint client shutdown complete.

The following is the content of my config file:

(endpoints) anandhusooraj@mlc2:~/endp$ cat config-template.yaml 
name: "test-name"
type: "online" # offline|online|eval|submission
benchmark_mode: "online" # Required for submission: offline or online

submission_ref:
  model: "Qwen/Qwen2.5-1.5B"
  ruleset: "mlperf-inference-v5.1"

model_params:
  temperature: 0.7
  max_new_tokens: 2048

datasets:
  - name: "perf"
    type: "performance"
    path: "tests/datasets/dummy_1k.pkl"

settings:
  runtime:
    #min_duration_ms: 600000 # 10 minutes
    n_samples_to_issue: 60 # Optional: explicit sample count (null = auto-calculate)
    scheduler_random_seed: 42 # For Poisson/distribution sampling
    dataloader_random_seed: 42 # For dataset shuffling
  load_pattern:
    type: "poisson"
    target_qps: 10.0
  client:
    workers: 1
    max_concurrency: 1 # -1 = unlimited

metrics:
  collect: ["throughput", "latency", "ttft", "tpot"]

endpoint_config:
  endpoint: "http://localhost:1143"
  api_key: null

The following is part of the css file being created for event recording:

(endpoints) anandhusooraj@mlc2:~/endp$ cat worker_report_0_724955.csv 
3f481728850f49bc8be63a56c4f31f9c,12175397861290669,zmq_request_received
341c6a546cf046dfae9134239c07bbf4,12175397861419002,zmq_request_received
b58d37a7907e49eda7477776b135d99d,12175397861488536,zmq_request_received
3f481728850f49bc8be63a56c4f31f9c,12175397861608305,http_request_issued
341c6a546cf046dfae9134239c07bbf4,12175397862489672,http_request_issued
b58d37a7907e49eda7477776b135d99d,12175397862658449,http_request_issued
5dd7b7c8ab0a4a5eaf995d0865595e54,12175397866231802,zmq_request_received
5dd7b7c8ab0a4a5eaf995d0865595e54,12175397866345653,http_request_issued
b58d37a7907e49eda7477776b135d99d,12175397869016890,zmq_response_sent
341c6a546cf046dfae9134239c07bbf4,12175397869284830,zmq_response_sent
3f481728850f49bc8be63a56c4f31f9c,12175397870918875,zmq_response_sent
5dd7b7c8ab0a4a5eaf995d0865595e54,12175397872687208,zmq_response_sent
...

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

arekay-nv · 2025-12-10T05:41:52Z

@anandhu-eng can you retry. I was able to run the config you have with vllm.

arekay-nv · 2025-12-10T18:01:51Z

@anandhu-eng can you retry. I was able to run the config you have with vllm.

I tested using vllm - and have noticed that some frameworks have different requirements for input formats. Let me know if you are using something different to server the model and i can retry.

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

anandhu-eng · 2025-12-10T22:22:09Z

@anandhu-eng can you retry. I was able to run the config you have with vllm.

Hi @arekay-nv , I was able to run it successfully this time. I am serving the model with vLLM using the following command:

python -m vllm.entrypoints.openai.api_server     --model Qwen/Qwen2.5-1.5B     --host 0.0.0.0     --port 1143

I noticed that the worker_report_** file is created in the directory from which the benchmark is launched. Would it be better to generate this file in the same folder where the benchmark report is stored?

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Add worker event recording

3d901ec

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

arekay-nv requested a review from a team as a code owner December 4, 2025 20:51

Copilot AI review requested due to automatic review settings December 4, 2025 20:51

github-actions Bot requested a review from nvzhihanj December 4, 2025 20:52

arekay-nv requested review from Copilot, nv-alicheng and viraatc and removed request for Copilot, nv-alicheng and nvzhihanj December 4, 2025 20:52

arekay-nv requested a review from anandhu-eng December 4, 2025 20:52

Copilot AI reviewed Dec 4, 2025

View reviewed changes

Comment thread src/inference_endpoint/endpoint_client/worker.py

Comment thread src/inference_endpoint/metrics/reporter.py Outdated

arekay-nv commented Dec 4, 2025

View reviewed changes

Comment thread src/inference_endpoint/commands/benchmark.py

gemini-code-assist Bot reviewed Dec 4, 2025

View reviewed changes

Comment thread src/inference_endpoint/metrics/reporter.py Outdated

Comment thread src/inference_endpoint/openai/openai_msgspec_adapter.py Outdated

arekay-nv added 2 commits December 4, 2025 13:04

Revert file

51d891e

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Simplify reporter for all events

ccdb867

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot AI review requested due to automatic review settings December 4, 2025 21:47

Copilot AI reviewed Dec 4, 2025

View reviewed changes

Comment thread src/inference_endpoint/endpoint_client/worker.py

Comment thread src/inference_endpoint/endpoint_client/worker.py

viraatc reviewed Dec 4, 2025

View reviewed changes

Comment thread src/inference_endpoint/endpoint_client/worker.py Outdated

viraatc reviewed Dec 4, 2025

View reviewed changes

Comment thread src/inference_endpoint/load_generator/events.py Outdated

arekay-nv added 2 commits December 4, 2025 18:06

Rename - address discussion

57ff889

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Merge branch 'main' into arekay/add_worker_event_recording

bd829c1

arekay-nv changed the title ~~Add worker event recording~~ feat: Add worker event recording Dec 5, 2025

Merge branch 'main' into arekay/add_worker_event_recording

746b538

Copilot AI review requested due to automatic review settings December 9, 2025 17:26

Copilot AI reviewed Dec 9, 2025

View reviewed changes

Comment thread src/inference_endpoint/endpoint_client/worker.py

Comment thread src/inference_endpoint/endpoint_client/worker.py

github-code-quality Bot found potential problems Dec 9, 2025

View reviewed changes

Comment thread src/inference_endpoint/endpoint_client/worker.py Fixed

Fix var usage

e792d1b

arekay-nv requested a review from nv-alicheng December 9, 2025 19:14

Merge branch 'main' into arekay/add_worker_event_recording

24a8e82

Copilot AI review requested due to automatic review settings December 10, 2025 01:42

Copilot AI reviewed Dec 10, 2025

View reviewed changes

Comment thread src/inference_endpoint/endpoint_client/worker.py

Fix test scope + comment addressed

b2bce87

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

arekay-nv requested a review from viraatc December 10, 2025 18:25

Use context manager

dfb7b2c

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot AI review requested due to automatic review settings December 10, 2025 18:51

Copilot AI reviewed Dec 10, 2025

View reviewed changes

Comment thread src/inference_endpoint/endpoint_client/worker.py

Move output to reports

5e5393a

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

nvzhihanj reviewed Dec 11, 2025

View reviewed changes

Comment thread src/inference_endpoint/commands/benchmark.py Outdated

nvzhihanj reviewed Dec 11, 2025

View reviewed changes

Comment thread src/inference_endpoint/commands/benchmark.py

nvzhihanj reviewed Dec 11, 2025

View reviewed changes

Comment thread src/inference_endpoint/commands/probe.py

nvzhihanj approved these changes Dec 11, 2025

View reviewed changes

nv-alicheng reviewed Dec 11, 2025

View reviewed changes

Comment thread src/inference_endpoint/endpoint_client/worker.py

nv-alicheng reviewed Dec 11, 2025

View reviewed changes

Comment thread tests/integration/commands/test_probe_command.py

nv-alicheng approved these changes Dec 11, 2025

View reviewed changes

anandhu-eng approved these changes Dec 12, 2025

View reviewed changes

Use tmp dir as default report path

1f8e7db

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot AI review requested due to automatic review settings December 12, 2025 15:52

Copilot AI reviewed Dec 12, 2025

View reviewed changes

Comment thread src/inference_endpoint/endpoint_client/worker.py

Bleh

93eed5b

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

arekay-nv merged commit 9a7cdbe into main Dec 12, 2025
4 checks passed

arekay-nv deleted the arekay/add_worker_event_recording branch December 12, 2025 16:03

github-actions Bot locked and limited conversation to collaborators Dec 12, 2025

Conversation

arekay-nv commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions Bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Dec 4, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anandhu-eng commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

arekay-nv commented Dec 10, 2025

Uh oh!

arekay-nv commented Dec 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

anandhu-eng commented Dec 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

arekay-nv commented Dec 4, 2025 •

edited

Loading

github-actions Bot commented Dec 4, 2025 •

edited

Loading

anandhu-eng commented Dec 7, 2025 •

edited

Loading