Conversation
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
Summary of ChangesHello @arekay-nv, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a crucial debugging capability by enabling detailed event recording within worker processes. It allows for the capture of HTTP request lifecycle events, such as when a request is sent and completed, directly from the worker's perspective. This feature is configurable via a new Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Pull request overview
This PR adds worker-side event recording capabilities to track HTTP request lifecycle events (REQUEST_SENT and REQUEST_COMPLETED) for debugging purposes. This helps identify potential discrepancies between events scheduled by the main process and those executed by worker processes.
- Adds new SampleEvent types (REQUEST_SENT, REQUEST_COMPLETED) to track request lifecycle
- Implements optional HTTP logging via configuration flag that generates per-worker CSV reports
- Makes optional fields in OpenAI request structures to support flexible request construction
- Updates QPS calculation to use only successful requests
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/inference_endpoint/openai/openai_msgspec_adapter.py | Makes ChatMessage and ChatCompletionRequest fields optional with None defaults |
| src/inference_endpoint/metrics/reporter.py | Adds dump_all_to_csv method to export REQUEST_SENT/REQUEST_COMPLETED events |
| src/inference_endpoint/metrics/recorder.py | Adds timestamp_ns field to output buffer entries for event correlation |
| src/inference_endpoint/load_generator/events.py | Defines new REQUEST_SENT and REQUEST_COMPLETED event types |
| src/inference_endpoint/endpoint_client/worker.py | Implements worker-level event recording with optional CSV report generation |
| src/inference_endpoint/endpoint_client/configs.py | Adds http_logging configuration flag to HTTPClientConfig |
| src/inference_endpoint/config/schema.py | Adds http_logging setting to ClientSettings schema |
| src/inference_endpoint/commands/benchmark.py | Updates QPS calculation to use success_count and passes http_logging config to HTTPClientConfig |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable feature for recording worker events, which will greatly aid in debugging. The implementation is well-structured, and the configuration options are clear. I've identified a couple of areas for improvement. The primary suggestion is to optimize the CSV dumping logic in reporter.py by consolidating two SQL queries into one and using the standard csv module for more robust file writing. Additionally, there's a minor cleanup of a commented-out debug statement. The other changes, including the fix for QPS calculation and making request fields optional, are solid improvements.
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
|
Hi @arekay-nv , I was testing it out from my side and I'm getting the following messages for the entire 60 queries that I have sent: Also, the summary seems to be considering every query as successful: But the logs at the end seems to be correct: The same run is appearing to be successful when ran from the main branch: The following is the content of my config file: The following is part of the css file being created for event recording: |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@anandhu-eng can you retry. I was able to run the config you have with vllm. |
I tested using vllm - and have noticed that some frameworks have different requirements for input formats. Let me know if you are using something different to server the model and i can retry. |
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Hi @arekay-nv , I was able to run it successfully this time. I am serving the model with vLLM using the following command: I noticed that the |
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
What does this PR do?
Adds support for event recording from the worker process. This helps to debug possible inconsistency in the request being sent from the main process (which schedules events globally) and those from the worker process (which sends the actual http request).
The http logging can be enabled via a config in the yaml for now:
This will be supported by the CLI later as well.
The report generated is under the
output_dirpath specified or the defaultresult_timestampfolder with the nameworker_report_{worker_id}_{pid}.csvand will have request sent/complete events from each workers with the timestamps:Type of change
Related issues
Testing
Checklist