feat: Support llm-d manual runs#16
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new CLI Python script to parse guidellm 0.5.x+ benchmark JSON outputs, extract per-benchmark metrics and metadata, and append or create a consolidated CSV, supporting two CLI modes with different required parameters. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
manual_runs/scripts/import_manual_runs_json_v2.py (2)
28-52: Docstring missing documentation for new parameters.The new LLM-D parameters (
dp,ep,replicas,prefill_pod_count,decode_pod_count,router_config) are not documented in the Args section of the docstring.📝 Suggested docstring addition
Args: benchmark: Benchmark data from JSON (guidellm 0.5.x format). accelerator: Accelerator type (e.g., H200, MI300X). model_name: Name of the AI model. version: Version of the inference server. tp_size: Tensor parallelism size. runtime_args: Runtime configuration arguments. global_data_config: Global data configuration from top-level args. image_tag: Container image tag used for the run. guidellm_version: Version of guidellm used to run the benchmark. guidellm_start_time_ms: Aggregated start time in milliseconds. guidellm_end_time_ms: Aggregated end time in milliseconds. + dp: Data parallelism size (LLM-D mode). + ep: Expert parallelism size (LLM-D mode). + replicas: Number of replicas (LLM-D mode). + prefill_pod_count: Number of prefill pods (LLM-D mode). + decode_pod_count: Number of decode pods (LLM-D mode). + router_config: Router/endpoint picker configuration (LLM-D mode).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manual_runs/scripts/import_manual_runs_json_v2.py` around lines 28 - 52, Add docstring entries for the new LLM-D parameters (dp, ep, replicas, prefill_pod_count, decode_pod_count, router_config) in the function's Args section: describe each parameter's expected type and purpose (e.g., dp: int or None — data parallel size; ep: int or None — expert/experts or expert parallelism; replicas: int or None — number of model replicas; prefill_pod_count: int or None — number of pods used for prefill stage; decode_pod_count: int or None — number of pods used for decode stage; router_config: dict or None — routing configuration for request distribution), mark optional parameters as None default, and keep formatting consistent with the existing docstring style used in this function.
199-212: Docstring missing documentation for new parameters.Similar to
process_benchmark_section, the new LLM-D parameters should be documented in the Args section.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manual_runs/scripts/import_manual_runs_json_v2.py` around lines 199 - 212, The docstring for the function that starts with "Parse guidellm 0.5.x JSON benchmark results." is missing entries for the new LLM-D parameters; update its Args section to document each new parameter exactly as done in process_benchmark_section (include parameter name, type, and brief description for LLM-D specific fields such as any tokenizer/sequence/prompt config, temperature/beam/candidate settings, or other runtime flags added), ensuring names match the function signature and runtime_args/guidellm_version/other existing params are preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@manual_runs/scripts/import_manual_runs_json_v2.py`:
- Around line 426-476: The LLM-D branch (when args.llm_d is true) is missing the
guidellm_start_time_ms and guidellm_end_time_ms columns, so rows produced by
parse_guidellm_json (which computes and adds guidellm_start_time_ms and
guidellm_end_time_ms) are truncated when written; update the fieldnames list
inside the args.llm_d block to include guidellm_start_time_ms and
guidellm_end_time_ms (matching the standard-mode fieldnames) so those timing
fields are preserved in the CSV output.
---
Nitpick comments:
In `@manual_runs/scripts/import_manual_runs_json_v2.py`:
- Around line 28-52: Add docstring entries for the new LLM-D parameters (dp, ep,
replicas, prefill_pod_count, decode_pod_count, router_config) in the function's
Args section: describe each parameter's expected type and purpose (e.g., dp: int
or None — data parallel size; ep: int or None — expert/experts or expert
parallelism; replicas: int or None — number of model replicas;
prefill_pod_count: int or None — number of pods used for prefill stage;
decode_pod_count: int or None — number of pods used for decode stage;
router_config: dict or None — routing configuration for request distribution),
mark optional parameters as None default, and keep formatting consistent with
the existing docstring style used in this function.
- Around line 199-212: The docstring for the function that starts with "Parse
guidellm 0.5.x JSON benchmark results." is missing entries for the new LLM-D
parameters; update its Args section to document each new parameter exactly as
done in process_benchmark_section (include parameter name, type, and brief
description for LLM-D specific fields such as any tokenizer/sequence/prompt
config, temperature/beam/candidate settings, or other runtime flags added),
ensuring names match the function signature and
runtime_args/guidellm_version/other existing params are preserved.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8c34a60d-9fcd-4e64-96af-b02dfcabf30a
📒 Files selected for processing (1)
manual_runs/scripts/import_manual_runs_json_v2.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
manual_runs/scripts/import_manual_runs_json_v2.py (1)
413-529:⚠️ Potential issue | 🟡 MinorData loss risk when appending CSV files with different modes.
The script supports two CSV modes: standard (default) and
--llm-d. When appending to an existing CSV file (line 414), if the existing file was created with a different mode, thecombined_df[fieldnames]filtering at line 529 will silently drop all columns not in the current mode's fieldnames list.For example: Running in standard mode on an LLM-D CSV will drop DP, EP, replicas, prefill_pod_count, decode_pod_count, and router_config columns from all rows.
Note: The
--llm-dflag is not documented in the README and requires explicit parameters (--dp,--ep,--replicas,--router-config), reducing the likelihood of accidental mode mixing. However, the vulnerability exists and should be mitigated to prevent silent data loss.Consider adding a schema mismatch check when appending:
🛡️ Proposed fix: Add schema mismatch detection
if os.path.exists(args.csv_file): print(f"Appending {len(new_data_df)} new rows to {args.csv_file}...") existing_df = pd.read_csv(args.csv_file) + # Detect schema mismatch + llm_d_columns = {"DP", "EP", "replicas", "prefill_pod_count", "decode_pod_count", "router_config"} + existing_has_llm_d = bool(llm_d_columns & set(existing_df.columns)) + if existing_has_llm_d != args.llm_d: + print(f"Warning: Existing CSV {'has' if existing_has_llm_d else 'lacks'} LLM-D columns, " + f"but current mode is {'LLM-D' if args.llm_d else 'standard'}. " + f"Some columns may be dropped or added with null values.") combined_df = pd.concat([existing_df, new_data_df], ignore_index=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manual_runs/scripts/import_manual_runs_json_v2.py` around lines 413 - 529, Appending to an existing CSV ignores schema differences between modes and then drops columns by reindexing to the current mode's fieldnames; detect and prevent this by comparing the existing_df.columns set to the current mode fieldnames (use args.llm_d to choose the expected list), and if there is a mismatch raise/print a clear error or merge-safe warning instead of blindly doing combined_df = combined_df[fieldnames]; update the logic around existing_df/combined_df and the fieldnames construction (refer to variables fieldnames, existing_df, combined_df, and args.llm_d) to either (a) preserve all existing columns by unioning fieldnames with existing_df.columns before reindexing, or (b) abort with a schema-mismatch message that explains which columns differ and suggests running with the matching --llm-d flag.
🧹 Nitpick comments (1)
manual_runs/scripts/import_manual_runs_json_v2.py (1)
28-51: Docstrings lack documentation for new parameters.The new LLM-D parameters (
dp,ep,replicas,prefill_pod_count,decode_pod_count,router_config) are not documented in the function docstrings. Consider adding Args entries for completeness.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manual_runs/scripts/import_manual_runs_json_v2.py` around lines 28 - 51, The docstring for the function that processes a single benchmark section is missing entries for the new LLM-D parameters (dp, ep, replicas, prefill_pod_count, decode_pod_count, router_config); update the Args block in that function's docstring to add a short description and expected type for each of these parameters (e.g., dp: data parallelism size (int), ep: expert parallelism (int), replicas: number of model replicas (int), prefill_pod_count: pods used for prefill (int), decode_pod_count: pods used for decode (int), router_config: routing configuration dict/str) so the docstring remains complete and consistent with other parameters.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@manual_runs/scripts/import_manual_runs_json_v2.py`:
- Around line 413-529: Appending to an existing CSV ignores schema differences
between modes and then drops columns by reindexing to the current mode's
fieldnames; detect and prevent this by comparing the existing_df.columns set to
the current mode fieldnames (use args.llm_d to choose the expected list), and if
there is a mismatch raise/print a clear error or merge-safe warning instead of
blindly doing combined_df = combined_df[fieldnames]; update the logic around
existing_df/combined_df and the fieldnames construction (refer to variables
fieldnames, existing_df, combined_df, and args.llm_d) to either (a) preserve all
existing columns by unioning fieldnames with existing_df.columns before
reindexing, or (b) abort with a schema-mismatch message that explains which
columns differ and suggests running with the matching --llm-d flag.
---
Nitpick comments:
In `@manual_runs/scripts/import_manual_runs_json_v2.py`:
- Around line 28-51: The docstring for the function that processes a single
benchmark section is missing entries for the new LLM-D parameters (dp, ep,
replicas, prefill_pod_count, decode_pod_count, router_config); update the Args
block in that function's docstring to add a short description and expected type
for each of these parameters (e.g., dp: data parallelism size (int), ep: expert
parallelism (int), replicas: number of model replicas (int), prefill_pod_count:
pods used for prefill (int), decode_pod_count: pods used for decode (int),
router_config: routing configuration dict/str) so the docstring remains complete
and consistent with other parameters.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bb9730ea-0526-4f4f-a6f9-c31219aef13d
📒 Files selected for processing (1)
manual_runs/scripts/import_manual_runs_json_v2.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
manual_runs/scripts/import_manual_runs_json_v2.py (1)
28-52: Docstring missing documentation for new parameters.The new parameters (
dp,ep,replicas,prefill_pod_count,decode_pod_count,router_config,notes) are not documented in the function docstring. Consider adding them for completeness.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manual_runs/scripts/import_manual_runs_json_v2.py` around lines 28 - 52, Update the function docstring for process_benchmark_section to document the newly added parameters dp, ep, replicas, prefill_pod_count, decode_pod_count, router_config, and notes; for each parameter add a short one-line description and expected type (e.g., int, dict, str, or None) and include any semantic meaning (e.g., dp/ep are data/engine parallel sizes, replicas is number of server replicas, prefill_pod_count/decode_pod_count are pod counts for prefill/decode stages, router_config is routing settings, notes is freeform metadata). Ensure these entries follow the existing Args style and placement with the other parameters (accelerator, model_name, etc.) in the same docstring block for consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@manual_runs/scripts/import_manual_runs_json_v2.py`:
- Around line 273-279: The call to process_benchmark_section is missing the
notes argument so the CLI --notes value parsed in parse_guidellm_json is never
forwarded; update the call that currently passes dp, ep, replicas,
prefill_pod_count, decode_pod_count, router_config (the invocation inside
parse_guidellm_json) to include notes=notes so the notes parameter is propagated
into process_benchmark_section and stored on each row.
---
Nitpick comments:
In `@manual_runs/scripts/import_manual_runs_json_v2.py`:
- Around line 28-52: Update the function docstring for process_benchmark_section
to document the newly added parameters dp, ep, replicas, prefill_pod_count,
decode_pod_count, router_config, and notes; for each parameter add a short
one-line description and expected type (e.g., int, dict, str, or None) and
include any semantic meaning (e.g., dp/ep are data/engine parallel sizes,
replicas is number of server replicas, prefill_pod_count/decode_pod_count are
pod counts for prefill/decode stages, router_config is routing settings, notes
is freeform metadata). Ensure these entries follow the existing Args style and
placement with the other parameters (accelerator, model_name, etc.) in the same
docstring block for consistency.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b23b331d-2b0f-41d7-940e-fbb39bf04705
📒 Files selected for processing (1)
manual_runs/scripts/import_manual_runs_json_v2.py
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (5)
manual_runs/scripts/import_manual_runs_llm_d.py (5)
283-285: Inconsistent fallback type: empty string vs None.The timestamp fallback uses empty string
""while other optional fields useNone. This inconsistency may cause type issues downstream (e.g., when computing duration or filtering).♻️ Proposed fix for consistent None fallback
# Get min start_time and max end_time, convert to milliseconds - guidellm_start_time_ms = int(min(start_times) * 1000) if start_times else "" - guidellm_end_time_ms = int(max(end_times) * 1000) if end_times else "" + guidellm_start_time_ms = int(min(start_times) * 1000) if start_times else None + guidellm_end_time_ms = int(max(end_times) * 1000) if end_times else None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manual_runs/scripts/import_manual_runs_llm_d.py` around lines 283 - 285, The timestamp fallbacks for guidellm_start_time_ms and guidellm_end_time_ms use an empty string which is inconsistent with other optionals; change them to use None and only compute the int conversion when start_times/end_times are present (e.g., set guidellm_start_time_ms = int(min(start_times) * 1000) if start_times else None and similarly for guidellm_end_time_ms) so downstream code sees consistent None for absent timestamps; update any subsequent code that assumes a string to handle None if necessary.
71-72: Simplify truthiness check.The check
global_data_config and len(global_data_config) > 0is redundant since an empty list is falsy in Python.♻️ Proposed simplification
- if global_data_config and len(global_data_config) > 0: + if global_data_config:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manual_runs/scripts/import_manual_runs_llm_d.py` around lines 71 - 72, The condition "if global_data_config and len(global_data_config) > 0" is redundant; simplify it to a single truthiness check on the variable: replace that compound check with "if global_data_config" wherever used (in this try block checking global_data_config) to rely on Python's falsy empty-sequence behavior and keep logic identical.
464-564: Consider extracting common fieldnames to reduce duplication.The two
fieldnameslists share ~40 identical entries. Extracting the common base and extending with mode-specific columns would improve maintainability.♻️ Proposed refactor to reduce duplication
base_fieldnames = [ "run", "accelerator", "model", "version", "prompt toks", "output toks", "TP", ] llm_d_specific = [ "DP", "EP", "replicas", "prefill_pod_count", "decode_pod_count", "router_config", ] common_metrics = [ "measured concurrency", "intended concurrency", "measured rps", # ... remaining common fields ... "guidellm_version", "notes", ] if args.llm_d: fieldnames = base_fieldnames + llm_d_specific + common_metrics else: fieldnames = base_fieldnames + common_metrics🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manual_runs/scripts/import_manual_runs_llm_d.py` around lines 464 - 564, The two large lists assigned to fieldnames in the args.llm_d branch and the else branch duplicate most entries; extract the shared entries into a single base list (e.g., base_fieldnames or common_metrics) and create a small llm_d_specific list for the DP/EP/replicas/prefill_pod_count/decode_pod_count/router_config/notes entries, then set fieldnames = base_fieldnames + llm_d_specific when args.llm_d is true and fieldnames = base_fieldnames otherwise; update any references to the original fieldnames variable but keep the args.llm_d conditional to select the extended list.
227-229: Specify explicit encoding when opening the JSON file.Using
open()without an explicit encoding relies on the platform default, which may cause issues with non-ASCII characters on some systems.♻️ Proposed fix
- with open(json_path) as f: + with open(json_path, encoding="utf-8") as f: data = json.load(f)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manual_runs/scripts/import_manual_runs_llm_d.py` around lines 227 - 229, The code that reads the JSON file (the try block using with open(json_path) as f: followed by data = json.load(f)) should specify an explicit encoding to avoid platform-dependent defaults; update the open call in the import_manual_runs_llm_d.py reader to open(json_path, encoding='utf-8') so json.load reads using UTF-8 and handles non-ASCII characters reliably.
36-53: Incomplete docstring - missing parameter documentation.The docstring is missing documentation for the llm-d specific parameters:
dp,ep,replicas,prefill_pod_count,decode_pod_count,router_config, andnotes.📝 Proposed docstring additions
guidellm_start_time_ms: Aggregated start time in milliseconds. guidellm_end_time_ms: Aggregated end time in milliseconds. + dp: Data parallelism size (llm-d mode). + ep: Expert parallelism size (llm-d mode). + replicas: Number of replicas (llm-d mode). + prefill_pod_count: Number of prefill pods (llm-d mode). + decode_pod_count: Number of decode pods (llm-d mode). + router_config: Router/endpoint picker configuration (llm-d mode). + notes: Free-form notes for the run (llm-d mode). Returns: dict: Processed benchmark metrics.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manual_runs/scripts/import_manual_runs_llm_d.py` around lines 36 - 53, The function docstring that describes "Process a single benchmark section and extract performance metrics" lacks descriptions for llm-d specific params; update that docstring in manual_runs/scripts/import_manual_runs_llm_d.py to add parameter entries for dp, ep, replicas, prefill_pod_count, decode_pod_count, router_config, and notes, each with a one-line explanation (e.g., dp/ep: data/embedding parallelism sizes; replicas: number of model replicas; prefill_pod_count/decode_pod_count: pods used for prefill/decode stages; router_config: routing settings structure; notes: freeform metadata), and ensure these names appear in the Args section alongside existing params like accelerator, model_name, version, tp_size, runtime_args, global_data_config, image_tag, guidellm_version, guidellm_start_time_ms, guidellm_end_time_ms so the docstring fully documents all function inputs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@manual_runs/scripts/import_manual_runs_llm_d.py`:
- Around line 518-564: The non-llm-d CSV header in the else branch is missing
the "notes" column causing process_benchmark_section (which always sets "notes")
to drop that value; update the fieldnames list in the else branch to include
"notes" (matching the key used in process_benchmark_section) so rows written
when --llm-d is not used preserve the notes column.
- Around line 566-571: The current code forcibly restricts combined_df to only
the current mode's fieldnames which will drop any columns already present in the
existing CSV; instead compute the union of the existing CSV's columns and the
current mode's fieldnames (e.g., merge existing_csv.columns ∪ fieldnames),
ensure missing columns on either side are created with None, and write the
dataframe with that combined column set so no existing columns are lost when
saving to args.csv_file; alternatively emit a clear warning when schema
mismatches are detected before overwriting.
- Around line 237-240: The code unconditionally assigns guidellm_version =
metadata.get("guidellm_version", "unknown") which overwrites the function
parameter and ignores the CLI flag; change this to prefer the metadata value
only if present (e.g., value = metadata.get("guidellm_version"); if value:
guidellm_version = value) or else leave the incoming guidellm_version parameter
intact so the --guidellm-version fallback works; update the print to use the
resulting guidellm_version variable.
- Around line 16-35: Add unit tests for the new import_manual_runs_llm_d.py
script: write tests that exercise process_benchmark_section (and any helper
functions it calls) to validate the new function signature and behavior with
required args (benchmark, accelerator, model_name, version, tp_size,
runtime_args, global_data_config, image_tag, guidellm_version,
guidellm_start_time_ms, guidellm_end_time_ms) and optional args (dp, ep,
replicas, prefill_pod_count, decode_pod_count, router_config, notes). Create
parametrized test cases that cover normal, minimal, and edge inputs, assert the
returned/produced structures or side effects, and mock/stub any external
dependencies (I/O, DB, network) used by process_benchmark_section to keep tests
deterministic; add these tests alongside existing import tests (or in a new test
file) and run lint/tests to ensure the new coverage passes.
---
Nitpick comments:
In `@manual_runs/scripts/import_manual_runs_llm_d.py`:
- Around line 283-285: The timestamp fallbacks for guidellm_start_time_ms and
guidellm_end_time_ms use an empty string which is inconsistent with other
optionals; change them to use None and only compute the int conversion when
start_times/end_times are present (e.g., set guidellm_start_time_ms =
int(min(start_times) * 1000) if start_times else None and similarly for
guidellm_end_time_ms) so downstream code sees consistent None for absent
timestamps; update any subsequent code that assumes a string to handle None if
necessary.
- Around line 71-72: The condition "if global_data_config and
len(global_data_config) > 0" is redundant; simplify it to a single truthiness
check on the variable: replace that compound check with "if global_data_config"
wherever used (in this try block checking global_data_config) to rely on
Python's falsy empty-sequence behavior and keep logic identical.
- Around line 464-564: The two large lists assigned to fieldnames in the
args.llm_d branch and the else branch duplicate most entries; extract the shared
entries into a single base list (e.g., base_fieldnames or common_metrics) and
create a small llm_d_specific list for the
DP/EP/replicas/prefill_pod_count/decode_pod_count/router_config/notes entries,
then set fieldnames = base_fieldnames + llm_d_specific when args.llm_d is true
and fieldnames = base_fieldnames otherwise; update any references to the
original fieldnames variable but keep the args.llm_d conditional to select the
extended list.
- Around line 227-229: The code that reads the JSON file (the try block using
with open(json_path) as f: followed by data = json.load(f)) should specify an
explicit encoding to avoid platform-dependent defaults; update the open call in
the import_manual_runs_llm_d.py reader to open(json_path, encoding='utf-8') so
json.load reads using UTF-8 and handles non-ASCII characters reliably.
- Around line 36-53: The function docstring that describes "Process a single
benchmark section and extract performance metrics" lacks descriptions for llm-d
specific params; update that docstring in
manual_runs/scripts/import_manual_runs_llm_d.py to add parameter entries for dp,
ep, replicas, prefill_pod_count, decode_pod_count, router_config, and notes,
each with a one-line explanation (e.g., dp/ep: data/embedding parallelism sizes;
replicas: number of model replicas; prefill_pod_count/decode_pod_count: pods
used for prefill/decode stages; router_config: routing settings structure;
notes: freeform metadata), and ensure these names appear in the Args section
alongside existing params like accelerator, model_name, version, tp_size,
runtime_args, global_data_config, image_tag, guidellm_version,
guidellm_start_time_ms, guidellm_end_time_ms so the docstring fully documents
all function inputs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fe54a0f9-96b6-4e57-8f4f-6209673f137a
📒 Files selected for processing (1)
manual_runs/scripts/import_manual_runs_llm_d.py
|
Please run make to fix the lint issues. you can run make ci-local, make format, make .... to view and fix the linting issues. |
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
5f5afba to
cb706e7
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (3)
manual_runs/scripts/import_manual_runs_llm_d.py (3)
237-240:⚠️ Potential issue | 🟠 MajorBug:
guidellm_versionparameter is still overwritten, ignoring CLI input.Line 239 unconditionally overwrites the
guidellm_versionparameter with the value from JSON metadata, making the--guidellm-versionCLI flag ineffective.🐛 Proposed fix to preserve CLI fallback
# Check guidellm version metadata = data.get("metadata", {}) - guidellm_version = metadata.get("guidellm_version", "unknown") + detected_version = metadata.get("guidellm_version") + if detected_version: + guidellm_version = detected_version + elif guidellm_version is None: + guidellm_version = "unknown" print(f"Detected guidellm version: {guidellm_version}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manual_runs/scripts/import_manual_runs_llm_d.py` around lines 237 - 240, The code unconditionally replaces the existing guidellm_version with metadata.get(...), which ignores the CLI-provided --guidellm-version; change the assignment so you only read the metadata value when the current guidellm_version is not set or is the default placeholder (e.g., None, empty string, or "unknown"). Locate the guidellm_version variable near the metadata = data.get("metadata", {}) line and update the logic to preserve an existing CLI value (guidellm_version) and only fallback to metadata.get("guidellm_version") when that CLI value is absent/placeholder.
566-571:⚠️ Potential issue | 🟠 MajorRisk of data loss when appending to CSV with a different column schema.
Line 570 filters
combined_dfto only include columns in the current mode'sfieldnames. This silently drops any columns present in the existing CSV but absent fromfieldnames(e.g., appending non-llm-d rows to an llm-d CSV loses DP, EP, replicas columns).Consider preserving existing columns or warning the user about schema mismatches.
🛡️ Proposed fix to preserve existing columns
for col in fieldnames: if col not in combined_df.columns: combined_df[col] = None - combined_df = combined_df[fieldnames] + # Reorder: put expected fieldnames first, then preserve any extra columns from existing CSV + extra_cols = [c for c in combined_df.columns if c not in fieldnames] + if extra_cols: + print(f"Warning: Preserving {len(extra_cols)} extra columns from existing CSV: {extra_cols}") + combined_df = combined_df[fieldnames + extra_cols] combined_df.to_csv(args.csv_file, index=False)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manual_runs/scripts/import_manual_runs_llm_d.py` around lines 566 - 571, The current code forces combined_df = combined_df[fieldnames], which drops any existing CSV columns that aren't in fieldnames and can cause silent data loss; update the logic around combined_df and fieldnames to preserve existing columns or explicitly warn: compute the union of the current mode's fieldnames and any columns already present in combined_df (or the target CSV) and reindex combined_df to that union so no columns are dropped, and if there are columns present in the CSV but not in fieldnames emit a clear warning mentioning args.csv_file, combined_df and fieldnames so the user can decide to trim columns intentionally.
518-564:⚠️ Potential issue | 🟡 MinorNon-llm-d fieldnames list is missing the
"notes"column.The
process_benchmark_sectionfunction always includes"notes"in the row dictionary (line 189), but the non-llm-dfieldnameslist omits it. This causes thenotescolumn to be silently dropped when running without--llm-d, even if the user provided--notes.🐛 Proposed fix
"image_tag", "guidellm_version", + "notes", ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manual_runs/scripts/import_manual_runs_llm_d.py` around lines 518 - 564, The non-llm-d branch's fieldnames list is missing the "notes" column while process_benchmark_section always sets a "notes" key (so notes get dropped when --llm-d is not used); update the non-llm-d fieldnames list (the fieldnames variable in the else branch) to include "notes" so the CSV output preserves user-provided --notes values and matches the keys produced by process_benchmark_section.
🧹 Nitpick comments (1)
manual_runs/scripts/import_manual_runs_llm_d.py (1)
227-235: Add explicit encoding when opening JSON file.Opening files without specifying encoding relies on platform defaults, which can cause
UnicodeDecodeErroron systems with non-UTF-8 defaults when the JSON contains non-ASCII characters.♻️ Proposed fix
- with open(json_path) as f: + with open(json_path, encoding="utf-8") as f: data = json.load(f)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manual_runs/scripts/import_manual_runs_llm_d.py` around lines 227 - 235, The JSON file is opened without an explicit encoding which can cause UnicodeDecodeError on non-UTF-8 platforms; update the open call in the try block that reads json_path (inside the import_manual_runs_llm_d.py JSON load block) to use encoding='utf-8' and also add a UnicodeDecodeError except handler (similar to FileNotFoundError/json.JSONDecodeError) to surface a clear error message if decoding fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@manual_runs/scripts/import_manual_runs_llm_d.py`:
- Around line 283-286: The variables guidellm_start_time_ms and
guidellm_end_time_ms mix ints and empty-string fallbacks which creates a type
inconsistency; update the assignment so when start_times or end_times is empty
you set the variables to None (not "") and only call int(min(...)*1000) /
int(max(...)*1000) when the respective list is present — locate and change the
two assignments for guidellm_start_time_ms and guidellm_end_time_ms accordingly
to ensure they are either ints or None.
- Line 65: The current assignment to intended_concurrency uses a truthy-or
fallback which treats a valid zero (streams=0) as missing; change the logic to
explicitly check for presence/None instead: read strategy.get("streams") into a
local (e.g., streams) and set intended_concurrency = streams if streams is not
None else strategy.get("max_concurrency", 0) (or use an explicit if 'streams' in
strategy ... else ...), referencing the intended_concurrency variable and
strategy dict to locate the fix.
- Line 154: The CSV key "measured rps" is wrong — change it to "measured_rps"
wherever it's emitted (the dict entry that currently uses "measured rps" in
import_manual_runs_llm_d.py) and also update the two fieldnames lists named
fieldnames (the lists that enumerate CSV columns later in the file) so they
include "measured_rps" instead of "measured rps"; keep the exact underscore
spelling so llmd_dashboard.py's numeric_cols will match and the numeric
conversion runs.
---
Duplicate comments:
In `@manual_runs/scripts/import_manual_runs_llm_d.py`:
- Around line 237-240: The code unconditionally replaces the existing
guidellm_version with metadata.get(...), which ignores the CLI-provided
--guidellm-version; change the assignment so you only read the metadata value
when the current guidellm_version is not set or is the default placeholder
(e.g., None, empty string, or "unknown"). Locate the guidellm_version variable
near the metadata = data.get("metadata", {}) line and update the logic to
preserve an existing CLI value (guidellm_version) and only fallback to
metadata.get("guidellm_version") when that CLI value is absent/placeholder.
- Around line 566-571: The current code forces combined_df =
combined_df[fieldnames], which drops any existing CSV columns that aren't in
fieldnames and can cause silent data loss; update the logic around combined_df
and fieldnames to preserve existing columns or explicitly warn: compute the
union of the current mode's fieldnames and any columns already present in
combined_df (or the target CSV) and reindex combined_df to that union so no
columns are dropped, and if there are columns present in the CSV but not in
fieldnames emit a clear warning mentioning args.csv_file, combined_df and
fieldnames so the user can decide to trim columns intentionally.
- Around line 518-564: The non-llm-d branch's fieldnames list is missing the
"notes" column while process_benchmark_section always sets a "notes" key (so
notes get dropped when --llm-d is not used); update the non-llm-d fieldnames
list (the fieldnames variable in the else branch) to include "notes" so the CSV
output preserves user-provided --notes values and matches the keys produced by
process_benchmark_section.
---
Nitpick comments:
In `@manual_runs/scripts/import_manual_runs_llm_d.py`:
- Around line 227-235: The JSON file is opened without an explicit encoding
which can cause UnicodeDecodeError on non-UTF-8 platforms; update the open call
in the try block that reads json_path (inside the import_manual_runs_llm_d.py
JSON load block) to use encoding='utf-8' and also add a UnicodeDecodeError
except handler (similar to FileNotFoundError/json.JSONDecodeError) to surface a
clear error message if decoding fails.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 82c86ccb-02fb-4fc7-b9da-2c0785e1205a
📒 Files selected for processing (1)
manual_runs/scripts/import_manual_runs_llm_d.py
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
* feat: Support llm-d manual runs Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * fix: Linter and missing fields * feat: Add notes flag for llm-d runs Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * feat: Add llm-d processing script Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * chore: Fix format issues Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * fix: Linter Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> --------- Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
* feat: Support llm-d manual runs * fix: Linter and missing fields * feat: Add notes flag for llm-d runs * feat: Add llm-d processing script * chore: Fix format issues * fix: Linter --------- Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> Co-authored-by: alberto <aperdomo@redhat.com>
Add support for llm-d manual runs with the necessary new fields.
Summary by CodeRabbit