feat: Support llm-d manual runs by albertoperdomo2 · Pull Request #16 · openshift-psap/performance-dashboard

albertoperdomo2 · 2026-03-11T08:11:32Z

Add support for llm-d manual runs with the necessary new fields.

Summary by CodeRabbit

New Features
- Import and convert guidellm 0.5.x+ benchmark JSON outputs into a consolidated CSV.
- CLI with two operational modes to include optional deployment metadata when present.
- Aggregates and exports key metrics (throughput, token counts, latency percentiles, request success/error counts, concurrency) and runtime/prompts metadata into a unified schema.

albertoperdomo2 · 2026-03-11T08:11:47Z

cc: @Harshith-umesh @MML-coder

coderabbitai · 2026-03-11T08:11:49Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a new CLI Python script to parse guidellm 0.5.x+ benchmark JSON outputs, extract per-benchmark metrics and metadata, and append or create a consolidated CSV, supporting two CLI modes with different required parameters.

Changes

Cohort / File(s)	Summary
New Benchmark Import Script `manual_runs/scripts/import_manual_runs_llm_d.py`	Adds a ~580-line CLI utility that loads and validates guidellm JSON (`benchmarks` list, `metadata`), aggregates run-level start/end times, extracts per-benchmark metrics (concurrency, throughput/token rates, token counts, latency percentiles, request counts, run UUID), parses prompt/output token settings, supports default and `--llm-d` modes (additional deployment metadata), normalizes CSV columns, and appends or creates the output CSV.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I nibbled JSON bits by night,
Turned benchmarks into rows so bright,
Streams and tokens, latencies too,
CSV carrots for the crew,
Hop—data's tidy, ready for flight!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'feat: Support llm-d manual runs' directly aligns with the main change: adding a new Python CLI script to import and process llm-d manual runs benchmark data into CSV format.
Docstring Coverage	✅ Passed	Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

manual_runs/scripts/import_manual_runs_json_v2.py (2)

28-52: Docstring missing documentation for new parameters.

The new LLM-D parameters (dp, ep, replicas, prefill_pod_count, decode_pod_count, router_config) are not documented in the Args section of the docstring.

📝 Suggested docstring addition

     Args:
         benchmark: Benchmark data from JSON (guidellm 0.5.x format).
         accelerator: Accelerator type (e.g., H200, MI300X).
         model_name: Name of the AI model.
         version: Version of the inference server.
         tp_size: Tensor parallelism size.
         runtime_args: Runtime configuration arguments.
         global_data_config: Global data configuration from top-level args.
         image_tag: Container image tag used for the run.
         guidellm_version: Version of guidellm used to run the benchmark.
         guidellm_start_time_ms: Aggregated start time in milliseconds.
         guidellm_end_time_ms: Aggregated end time in milliseconds.
+        dp: Data parallelism size (LLM-D mode).
+        ep: Expert parallelism size (LLM-D mode).
+        replicas: Number of replicas (LLM-D mode).
+        prefill_pod_count: Number of prefill pods (LLM-D mode).
+        decode_pod_count: Number of decode pods (LLM-D mode).
+        router_config: Router/endpoint picker configuration (LLM-D mode).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@manual_runs/scripts/import_manual_runs_json_v2.py` around lines 28 - 52, Add
docstring entries for the new LLM-D parameters (dp, ep, replicas,
prefill_pod_count, decode_pod_count, router_config) in the function's Args
section: describe each parameter's expected type and purpose (e.g., dp: int or
None — data parallel size; ep: int or None — expert/experts or expert
parallelism; replicas: int or None — number of model replicas;
prefill_pod_count: int or None — number of pods used for prefill stage;
decode_pod_count: int or None — number of pods used for decode stage;
router_config: dict or None — routing configuration for request distribution),
mark optional parameters as None default, and keep formatting consistent with
the existing docstring style used in this function.

199-212: Docstring missing documentation for new parameters.

Similar to process_benchmark_section, the new LLM-D parameters should be documented in the Args section.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@manual_runs/scripts/import_manual_runs_json_v2.py` around lines 199 - 212,
The docstring for the function that starts with "Parse guidellm 0.5.x JSON
benchmark results." is missing entries for the new LLM-D parameters; update its
Args section to document each new parameter exactly as done in
process_benchmark_section (include parameter name, type, and brief description
for LLM-D specific fields such as any tokenizer/sequence/prompt config,
temperature/beam/candidate settings, or other runtime flags added), ensuring
names match the function signature and runtime_args/guidellm_version/other
existing params are preserved.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@manual_runs/scripts/import_manual_runs_json_v2.py`:
- Around line 426-476: The LLM-D branch (when args.llm_d is true) is missing the
guidellm_start_time_ms and guidellm_end_time_ms columns, so rows produced by
parse_guidellm_json (which computes and adds guidellm_start_time_ms and
guidellm_end_time_ms) are truncated when written; update the fieldnames list
inside the args.llm_d block to include guidellm_start_time_ms and
guidellm_end_time_ms (matching the standard-mode fieldnames) so those timing
fields are preserved in the CSV output.

---

Nitpick comments:
In `@manual_runs/scripts/import_manual_runs_json_v2.py`:
- Around line 28-52: Add docstring entries for the new LLM-D parameters (dp, ep,
replicas, prefill_pod_count, decode_pod_count, router_config) in the function's
Args section: describe each parameter's expected type and purpose (e.g., dp: int
or None — data parallel size; ep: int or None — expert/experts or expert
parallelism; replicas: int or None — number of model replicas;
prefill_pod_count: int or None — number of pods used for prefill stage;
decode_pod_count: int or None — number of pods used for decode stage;
router_config: dict or None — routing configuration for request distribution),
mark optional parameters as None default, and keep formatting consistent with
the existing docstring style used in this function.
- Around line 199-212: The docstring for the function that starts with "Parse
guidellm 0.5.x JSON benchmark results." is missing entries for the new LLM-D
parameters; update its Args section to document each new parameter exactly as
done in process_benchmark_section (include parameter name, type, and brief
description for LLM-D specific fields such as any tokenizer/sequence/prompt
config, temperature/beam/candidate settings, or other runtime flags added),
ensuring names match the function signature and
runtime_args/guidellm_version/other existing params are preserved.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8c34a60d-9fcd-4e64-96af-b02dfcabf30a

📥 Commits

Reviewing files that changed from the base of the PR and between 52a7418 and 6f1670a.

📒 Files selected for processing (1)

manual_runs/scripts/import_manual_runs_json_v2.py

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

manual_runs/scripts/import_manual_runs_json_v2.py (1)
413-529: ⚠️ Potential issue | 🟡 Minor

Data loss risk when appending CSV files with different modes.

The script supports two CSV modes: standard (default) and --llm-d. When appending to an existing CSV file (line 414), if the existing file was created with a different mode, the combined_df[fieldnames] filtering at line 529 will silently drop all columns not in the current mode's fieldnames list.

For example: Running in standard mode on an LLM-D CSV will drop DP, EP, replicas, prefill_pod_count, decode_pod_count, and router_config columns from all rows.

Note: The --llm-d flag is not documented in the README and requires explicit parameters (--dp, --ep, --replicas, --router-config), reducing the likelihood of accidental mode mixing. However, the vulnerability exists and should be mitigated to prevent silent data loss.

Consider adding a schema mismatch check when appending:
🛡️ Proposed fix: Add schema mismatch detection
         if os.path.exists(args.csv_file):
             print(f"Appending {len(new_data_df)} new rows to {args.csv_file}...")
             existing_df = pd.read_csv(args.csv_file)
+            # Detect schema mismatch
+            llm_d_columns = {"DP", "EP", "replicas", "prefill_pod_count", "decode_pod_count", "router_config"}
+            existing_has_llm_d = bool(llm_d_columns & set(existing_df.columns))
+            if existing_has_llm_d != args.llm_d:
+                print(f"Warning: Existing CSV {'has' if existing_has_llm_d else 'lacks'} LLM-D columns, "
+                      f"but current mode is {'LLM-D' if args.llm_d else 'standard'}. "
+                      f"Some columns may be dropped or added with null values.")
             combined_df = pd.concat([existing_df, new_data_df], ignore_index=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@manual_runs/scripts/import_manual_runs_json_v2.py` around lines 413 - 529,
Appending to an existing CSV ignores schema differences between modes and then
drops columns by reindexing to the current mode's fieldnames; detect and prevent
this by comparing the existing_df.columns set to the current mode fieldnames
(use args.llm_d to choose the expected list), and if there is a mismatch
raise/print a clear error or merge-safe warning instead of blindly doing
combined_df = combined_df[fieldnames]; update the logic around
existing_df/combined_df and the fieldnames construction (refer to variables
fieldnames, existing_df, combined_df, and args.llm_d) to either (a) preserve all
existing columns by unioning fieldnames with existing_df.columns before
reindexing, or (b) abort with a schema-mismatch message that explains which
columns differ and suggests running with the matching --llm-d flag.

🧹 Nitpick comments (1)

manual_runs/scripts/import_manual_runs_json_v2.py (1)

28-51: Docstrings lack documentation for new parameters.

The new LLM-D parameters (dp, ep, replicas, prefill_pod_count, decode_pod_count, router_config) are not documented in the function docstrings. Consider adding Args entries for completeness.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@manual_runs/scripts/import_manual_runs_json_v2.py` around lines 28 - 51, The
docstring for the function that processes a single benchmark section is missing
entries for the new LLM-D parameters (dp, ep, replicas, prefill_pod_count,
decode_pod_count, router_config); update the Args block in that function's
docstring to add a short description and expected type for each of these
parameters (e.g., dp: data parallelism size (int), ep: expert parallelism (int),
replicas: number of model replicas (int), prefill_pod_count: pods used for
prefill (int), decode_pod_count: pods used for decode (int), router_config:
routing configuration dict/str) so the docstring remains complete and consistent
with other parameters.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@manual_runs/scripts/import_manual_runs_json_v2.py`:
- Around line 413-529: Appending to an existing CSV ignores schema differences
between modes and then drops columns by reindexing to the current mode's
fieldnames; detect and prevent this by comparing the existing_df.columns set to
the current mode fieldnames (use args.llm_d to choose the expected list), and if
there is a mismatch raise/print a clear error or merge-safe warning instead of
blindly doing combined_df = combined_df[fieldnames]; update the logic around
existing_df/combined_df and the fieldnames construction (refer to variables
fieldnames, existing_df, combined_df, and args.llm_d) to either (a) preserve all
existing columns by unioning fieldnames with existing_df.columns before
reindexing, or (b) abort with a schema-mismatch message that explains which
columns differ and suggests running with the matching --llm-d flag.

---

Nitpick comments:
In `@manual_runs/scripts/import_manual_runs_json_v2.py`:
- Around line 28-51: The docstring for the function that processes a single
benchmark section is missing entries for the new LLM-D parameters (dp, ep,
replicas, prefill_pod_count, decode_pod_count, router_config); update the Args
block in that function's docstring to add a short description and expected type
for each of these parameters (e.g., dp: data parallelism size (int), ep: expert
parallelism (int), replicas: number of model replicas (int), prefill_pod_count:
pods used for prefill (int), decode_pod_count: pods used for decode (int),
router_config: routing configuration dict/str) so the docstring remains complete
and consistent with other parameters.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bb9730ea-0526-4f4f-a6f9-c31219aef13d

📥 Commits

Reviewing files that changed from the base of the PR and between 6f1670a and a3d954b.

📒 Files selected for processing (1)

manual_runs/scripts/import_manual_runs_json_v2.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

manual_runs/scripts/import_manual_runs_json_v2.py (1)

28-52: Docstring missing documentation for new parameters.

The new parameters (dp, ep, replicas, prefill_pod_count, decode_pod_count, router_config, notes) are not documented in the function docstring. Consider adding them for completeness.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@manual_runs/scripts/import_manual_runs_json_v2.py` around lines 28 - 52,
Update the function docstring for process_benchmark_section to document the
newly added parameters dp, ep, replicas, prefill_pod_count, decode_pod_count,
router_config, and notes; for each parameter add a short one-line description
and expected type (e.g., int, dict, str, or None) and include any semantic
meaning (e.g., dp/ep are data/engine parallel sizes, replicas is number of
server replicas, prefill_pod_count/decode_pod_count are pod counts for
prefill/decode stages, router_config is routing settings, notes is freeform
metadata). Ensure these entries follow the existing Args style and placement
with the other parameters (accelerator, model_name, etc.) in the same docstring
block for consistency.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@manual_runs/scripts/import_manual_runs_json_v2.py`:
- Around line 273-279: The call to process_benchmark_section is missing the
notes argument so the CLI --notes value parsed in parse_guidellm_json is never
forwarded; update the call that currently passes dp, ep, replicas,
prefill_pod_count, decode_pod_count, router_config (the invocation inside
parse_guidellm_json) to include notes=notes so the notes parameter is propagated
into process_benchmark_section and stored on each row.

---

Nitpick comments:
In `@manual_runs/scripts/import_manual_runs_json_v2.py`:
- Around line 28-52: Update the function docstring for process_benchmark_section
to document the newly added parameters dp, ep, replicas, prefill_pod_count,
decode_pod_count, router_config, and notes; for each parameter add a short
one-line description and expected type (e.g., int, dict, str, or None) and
include any semantic meaning (e.g., dp/ep are data/engine parallel sizes,
replicas is number of server replicas, prefill_pod_count/decode_pod_count are
pod counts for prefill/decode stages, router_config is routing settings, notes
is freeform metadata). Ensure these entries follow the existing Args style and
placement with the other parameters (accelerator, model_name, etc.) in the same
docstring block for consistency.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b23b331d-2b0f-41d7-940e-fbb39bf04705

📥 Commits

Reviewing files that changed from the base of the PR and between a3d954b and 46a22d8.

📒 Files selected for processing (1)

manual_runs/scripts/import_manual_runs_json_v2.py

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (5)

manual_runs/scripts/import_manual_runs_llm_d.py (5)

283-285: Inconsistent fallback type: empty string vs None.

The timestamp fallback uses empty string "" while other optional fields use None. This inconsistency may cause type issues downstream (e.g., when computing duration or filtering).

♻️ Proposed fix for consistent None fallback

     # Get min start_time and max end_time, convert to milliseconds
-    guidellm_start_time_ms = int(min(start_times) * 1000) if start_times else ""
-    guidellm_end_time_ms = int(max(end_times) * 1000) if end_times else ""
+    guidellm_start_time_ms = int(min(start_times) * 1000) if start_times else None
+    guidellm_end_time_ms = int(max(end_times) * 1000) if end_times else None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@manual_runs/scripts/import_manual_runs_llm_d.py` around lines 283 - 285, The
timestamp fallbacks for guidellm_start_time_ms and guidellm_end_time_ms use an
empty string which is inconsistent with other optionals; change them to use None
and only compute the int conversion when start_times/end_times are present
(e.g., set guidellm_start_time_ms = int(min(start_times) * 1000) if start_times
else None and similarly for guidellm_end_time_ms) so downstream code sees
consistent None for absent timestamps; update any subsequent code that assumes a
string to handle None if necessary.

71-72: Simplify truthiness check.

The check global_data_config and len(global_data_config) > 0 is redundant since an empty list is falsy in Python.

♻️ Proposed simplification

-        if global_data_config and len(global_data_config) > 0:
+        if global_data_config:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@manual_runs/scripts/import_manual_runs_llm_d.py` around lines 71 - 72, The
condition "if global_data_config and len(global_data_config) > 0" is redundant;
simplify it to a single truthiness check on the variable: replace that compound
check with "if global_data_config" wherever used (in this try block checking
global_data_config) to rely on Python's falsy empty-sequence behavior and keep
logic identical.

464-564: Consider extracting common fieldnames to reduce duplication.

The two fieldnames lists share ~40 identical entries. Extracting the common base and extending with mode-specific columns would improve maintainability.

♻️ Proposed refactor to reduce duplication

        base_fieldnames = [
            "run",
            "accelerator",
            "model",
            "version",
            "prompt toks",
            "output toks",
            "TP",
        ]
        
        llm_d_specific = [
            "DP",
            "EP",
            "replicas",
            "prefill_pod_count",
            "decode_pod_count",
            "router_config",
        ]
        
        common_metrics = [
            "measured concurrency",
            "intended concurrency",
            "measured rps",
            # ... remaining common fields ...
            "guidellm_version",
            "notes",
        ]
        
        if args.llm_d:
            fieldnames = base_fieldnames + llm_d_specific + common_metrics
        else:
            fieldnames = base_fieldnames + common_metrics

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@manual_runs/scripts/import_manual_runs_llm_d.py` around lines 464 - 564, The
two large lists assigned to fieldnames in the args.llm_d branch and the else
branch duplicate most entries; extract the shared entries into a single base
list (e.g., base_fieldnames or common_metrics) and create a small llm_d_specific
list for the
DP/EP/replicas/prefill_pod_count/decode_pod_count/router_config/notes entries,
then set fieldnames = base_fieldnames + llm_d_specific when args.llm_d is true
and fieldnames = base_fieldnames otherwise; update any references to the
original fieldnames variable but keep the args.llm_d conditional to select the
extended list.

227-229: Specify explicit encoding when opening the JSON file.

Using open() without an explicit encoding relies on the platform default, which may cause issues with non-ASCII characters on some systems.

♻️ Proposed fix

-        with open(json_path) as f:
+        with open(json_path, encoding="utf-8") as f:
             data = json.load(f)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@manual_runs/scripts/import_manual_runs_llm_d.py` around lines 227 - 229, The
code that reads the JSON file (the try block using with open(json_path) as f:
followed by data = json.load(f)) should specify an explicit encoding to avoid
platform-dependent defaults; update the open call in the
import_manual_runs_llm_d.py reader to open(json_path, encoding='utf-8') so
json.load reads using UTF-8 and handles non-ASCII characters reliably.

36-53: Incomplete docstring - missing parameter documentation.

The docstring is missing documentation for the llm-d specific parameters: dp, ep, replicas, prefill_pod_count, decode_pod_count, router_config, and notes.

📝 Proposed docstring additions

         guidellm_start_time_ms: Aggregated start time in milliseconds.
         guidellm_end_time_ms: Aggregated end time in milliseconds.
+        dp: Data parallelism size (llm-d mode).
+        ep: Expert parallelism size (llm-d mode).
+        replicas: Number of replicas (llm-d mode).
+        prefill_pod_count: Number of prefill pods (llm-d mode).
+        decode_pod_count: Number of decode pods (llm-d mode).
+        router_config: Router/endpoint picker configuration (llm-d mode).
+        notes: Free-form notes for the run (llm-d mode).
 
     Returns:
         dict: Processed benchmark metrics.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@manual_runs/scripts/import_manual_runs_llm_d.py` around lines 36 - 53, The
function docstring that describes "Process a single benchmark section and
extract performance metrics" lacks descriptions for llm-d specific params;
update that docstring in manual_runs/scripts/import_manual_runs_llm_d.py to add
parameter entries for dp, ep, replicas, prefill_pod_count, decode_pod_count,
router_config, and notes, each with a one-line explanation (e.g., dp/ep:
data/embedding parallelism sizes; replicas: number of model replicas;
prefill_pod_count/decode_pod_count: pods used for prefill/decode stages;
router_config: routing settings structure; notes: freeform metadata), and ensure
these names appear in the Args section alongside existing params like
accelerator, model_name, version, tp_size, runtime_args, global_data_config,
image_tag, guidellm_version, guidellm_start_time_ms, guidellm_end_time_ms so the
docstring fully documents all function inputs.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@manual_runs/scripts/import_manual_runs_llm_d.py`:
- Around line 518-564: The non-llm-d CSV header in the else branch is missing
the "notes" column causing process_benchmark_section (which always sets "notes")
to drop that value; update the fieldnames list in the else branch to include
"notes" (matching the key used in process_benchmark_section) so rows written
when --llm-d is not used preserve the notes column.
- Around line 566-571: The current code forcibly restricts combined_df to only
the current mode's fieldnames which will drop any columns already present in the
existing CSV; instead compute the union of the existing CSV's columns and the
current mode's fieldnames (e.g., merge existing_csv.columns ∪ fieldnames),
ensure missing columns on either side are created with None, and write the
dataframe with that combined column set so no existing columns are lost when
saving to args.csv_file; alternatively emit a clear warning when schema
mismatches are detected before overwriting.
- Around line 237-240: The code unconditionally assigns guidellm_version =
metadata.get("guidellm_version", "unknown") which overwrites the function
parameter and ignores the CLI flag; change this to prefer the metadata value
only if present (e.g., value = metadata.get("guidellm_version"); if value:
guidellm_version = value) or else leave the incoming guidellm_version parameter
intact so the --guidellm-version fallback works; update the print to use the
resulting guidellm_version variable.
- Around line 16-35: Add unit tests for the new import_manual_runs_llm_d.py
script: write tests that exercise process_benchmark_section (and any helper
functions it calls) to validate the new function signature and behavior with
required args (benchmark, accelerator, model_name, version, tp_size,
runtime_args, global_data_config, image_tag, guidellm_version,
guidellm_start_time_ms, guidellm_end_time_ms) and optional args (dp, ep,
replicas, prefill_pod_count, decode_pod_count, router_config, notes). Create
parametrized test cases that cover normal, minimal, and edge inputs, assert the
returned/produced structures or side effects, and mock/stub any external
dependencies (I/O, DB, network) used by process_benchmark_section to keep tests
deterministic; add these tests alongside existing import tests (or in a new test
file) and run lint/tests to ensure the new coverage passes.

---

Nitpick comments:
In `@manual_runs/scripts/import_manual_runs_llm_d.py`:
- Around line 283-285: The timestamp fallbacks for guidellm_start_time_ms and
guidellm_end_time_ms use an empty string which is inconsistent with other
optionals; change them to use None and only compute the int conversion when
start_times/end_times are present (e.g., set guidellm_start_time_ms =
int(min(start_times) * 1000) if start_times else None and similarly for
guidellm_end_time_ms) so downstream code sees consistent None for absent
timestamps; update any subsequent code that assumes a string to handle None if
necessary.
- Around line 71-72: The condition "if global_data_config and
len(global_data_config) > 0" is redundant; simplify it to a single truthiness
check on the variable: replace that compound check with "if global_data_config"
wherever used (in this try block checking global_data_config) to rely on
Python's falsy empty-sequence behavior and keep logic identical.
- Around line 464-564: The two large lists assigned to fieldnames in the
args.llm_d branch and the else branch duplicate most entries; extract the shared
entries into a single base list (e.g., base_fieldnames or common_metrics) and
create a small llm_d_specific list for the
DP/EP/replicas/prefill_pod_count/decode_pod_count/router_config/notes entries,
then set fieldnames = base_fieldnames + llm_d_specific when args.llm_d is true
and fieldnames = base_fieldnames otherwise; update any references to the
original fieldnames variable but keep the args.llm_d conditional to select the
extended list.
- Around line 227-229: The code that reads the JSON file (the try block using
with open(json_path) as f: followed by data = json.load(f)) should specify an
explicit encoding to avoid platform-dependent defaults; update the open call in
the import_manual_runs_llm_d.py reader to open(json_path, encoding='utf-8') so
json.load reads using UTF-8 and handles non-ASCII characters reliably.
- Around line 36-53: The function docstring that describes "Process a single
benchmark section and extract performance metrics" lacks descriptions for llm-d
specific params; update that docstring in
manual_runs/scripts/import_manual_runs_llm_d.py to add parameter entries for dp,
ep, replicas, prefill_pod_count, decode_pod_count, router_config, and notes,
each with a one-line explanation (e.g., dp/ep: data/embedding parallelism sizes;
replicas: number of model replicas; prefill_pod_count/decode_pod_count: pods
used for prefill/decode stages; router_config: routing settings structure;
notes: freeform metadata), and ensure these names appear in the Args section
alongside existing params like accelerator, model_name, version, tp_size,
runtime_args, global_data_config, image_tag, guidellm_version,
guidellm_start_time_ms, guidellm_end_time_ms so the docstring fully documents
all function inputs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fe54a0f9-96b6-4e57-8f4f-6209673f137a

📥 Commits

Reviewing files that changed from the base of the PR and between 46a22d8 and 1f7daa5.

📒 Files selected for processing (1)

manual_runs/scripts/import_manual_runs_llm_d.py

Harshith-umesh · 2026-04-03T21:06:39Z

Please run make to fix the lint issues. you can run make ci-local, make format, make .... to view and fix the linting issues.

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (3)

manual_runs/scripts/import_manual_runs_llm_d.py (3)

237-240: ⚠️ Potential issue | 🟠 Major

Bug: guidellm_version parameter is still overwritten, ignoring CLI input.

Line 239 unconditionally overwrites the guidellm_version parameter with the value from JSON metadata, making the --guidellm-version CLI flag ineffective.

🐛 Proposed fix to preserve CLI fallback

     # Check guidellm version
     metadata = data.get("metadata", {})
-    guidellm_version = metadata.get("guidellm_version", "unknown")
+    detected_version = metadata.get("guidellm_version")
+    if detected_version:
+        guidellm_version = detected_version
+    elif guidellm_version is None:
+        guidellm_version = "unknown"
     print(f"Detected guidellm version: {guidellm_version}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@manual_runs/scripts/import_manual_runs_llm_d.py` around lines 237 - 240, The
code unconditionally replaces the existing guidellm_version with
metadata.get(...), which ignores the CLI-provided --guidellm-version; change the
assignment so you only read the metadata value when the current guidellm_version
is not set or is the default placeholder (e.g., None, empty string, or
"unknown"). Locate the guidellm_version variable near the metadata =
data.get("metadata", {}) line and update the logic to preserve an existing CLI
value (guidellm_version) and only fallback to metadata.get("guidellm_version")
when that CLI value is absent/placeholder.

566-571: ⚠️ Potential issue | 🟠 Major

Risk of data loss when appending to CSV with a different column schema.

Line 570 filters combined_df to only include columns in the current mode's fieldnames. This silently drops any columns present in the existing CSV but absent from fieldnames (e.g., appending non-llm-d rows to an llm-d CSV loses DP, EP, replicas columns).

Consider preserving existing columns or warning the user about schema mismatches.

🛡️ Proposed fix to preserve existing columns

         for col in fieldnames:
             if col not in combined_df.columns:
                 combined_df[col] = None

-        combined_df = combined_df[fieldnames]
+        # Reorder: put expected fieldnames first, then preserve any extra columns from existing CSV
+        extra_cols = [c for c in combined_df.columns if c not in fieldnames]
+        if extra_cols:
+            print(f"Warning: Preserving {len(extra_cols)} extra columns from existing CSV: {extra_cols}")
+        combined_df = combined_df[fieldnames + extra_cols]
         combined_df.to_csv(args.csv_file, index=False)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@manual_runs/scripts/import_manual_runs_llm_d.py` around lines 566 - 571, The
current code forces combined_df = combined_df[fieldnames], which drops any
existing CSV columns that aren't in fieldnames and can cause silent data loss;
update the logic around combined_df and fieldnames to preserve existing columns
or explicitly warn: compute the union of the current mode's fieldnames and any
columns already present in combined_df (or the target CSV) and reindex
combined_df to that union so no columns are dropped, and if there are columns
present in the CSV but not in fieldnames emit a clear warning mentioning
args.csv_file, combined_df and fieldnames so the user can decide to trim columns
intentionally.

518-564: ⚠️ Potential issue | 🟡 Minor

Non-llm-d fieldnames list is missing the "notes" column.

The process_benchmark_section function always includes "notes" in the row dictionary (line 189), but the non-llm-d fieldnames list omits it. This causes the notes column to be silently dropped when running without --llm-d, even if the user provided --notes.

🐛 Proposed fix

             "image_tag",
             "guidellm_version",
+            "notes",
         ]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@manual_runs/scripts/import_manual_runs_llm_d.py` around lines 518 - 564, The
non-llm-d branch's fieldnames list is missing the "notes" column while
process_benchmark_section always sets a "notes" key (so notes get dropped when
--llm-d is not used); update the non-llm-d fieldnames list (the fieldnames
variable in the else branch) to include "notes" so the CSV output preserves
user-provided --notes values and matches the keys produced by
process_benchmark_section.

🧹 Nitpick comments (1)

manual_runs/scripts/import_manual_runs_llm_d.py (1)
227-235: Add explicit encoding when opening JSON file.

Opening files without specifying encoding relies on platform defaults, which can cause UnicodeDecodeError on systems with non-UTF-8 defaults when the JSON contains non-ASCII characters.
♻️ Proposed fix
-        with open(json_path) as f:
+        with open(json_path, encoding="utf-8") as f:
             data = json.load(f)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@manual_runs/scripts/import_manual_runs_llm_d.py` around lines 227 - 235, The
JSON file is opened without an explicit encoding which can cause
UnicodeDecodeError on non-UTF-8 platforms; update the open call in the try block
that reads json_path (inside the import_manual_runs_llm_d.py JSON load block) to
use encoding='utf-8' and also add a UnicodeDecodeError except handler (similar
to FileNotFoundError/json.JSONDecodeError) to surface a clear error message if
decoding fails.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@manual_runs/scripts/import_manual_runs_llm_d.py`:
- Around line 283-286: The variables guidellm_start_time_ms and
guidellm_end_time_ms mix ints and empty-string fallbacks which creates a type
inconsistency; update the assignment so when start_times or end_times is empty
you set the variables to None (not "") and only call int(min(...)*1000) /
int(max(...)*1000) when the respective list is present — locate and change the
two assignments for guidellm_start_time_ms and guidellm_end_time_ms accordingly
to ensure they are either ints or None.
- Line 65: The current assignment to intended_concurrency uses a truthy-or
fallback which treats a valid zero (streams=0) as missing; change the logic to
explicitly check for presence/None instead: read strategy.get("streams") into a
local (e.g., streams) and set intended_concurrency = streams if streams is not
None else strategy.get("max_concurrency", 0) (or use an explicit if 'streams' in
strategy ... else ...), referencing the intended_concurrency variable and
strategy dict to locate the fix.
- Line 154: The CSV key "measured rps" is wrong — change it to "measured_rps"
wherever it's emitted (the dict entry that currently uses "measured rps" in
import_manual_runs_llm_d.py) and also update the two fieldnames lists named
fieldnames (the lists that enumerate CSV columns later in the file) so they
include "measured_rps" instead of "measured rps"; keep the exact underscore
spelling so llmd_dashboard.py's numeric_cols will match and the numeric
conversion runs.

---

Duplicate comments:
In `@manual_runs/scripts/import_manual_runs_llm_d.py`:
- Around line 237-240: The code unconditionally replaces the existing
guidellm_version with metadata.get(...), which ignores the CLI-provided
--guidellm-version; change the assignment so you only read the metadata value
when the current guidellm_version is not set or is the default placeholder
(e.g., None, empty string, or "unknown"). Locate the guidellm_version variable
near the metadata = data.get("metadata", {}) line and update the logic to
preserve an existing CLI value (guidellm_version) and only fallback to
metadata.get("guidellm_version") when that CLI value is absent/placeholder.
- Around line 566-571: The current code forces combined_df =
combined_df[fieldnames], which drops any existing CSV columns that aren't in
fieldnames and can cause silent data loss; update the logic around combined_df
and fieldnames to preserve existing columns or explicitly warn: compute the
union of the current mode's fieldnames and any columns already present in
combined_df (or the target CSV) and reindex combined_df to that union so no
columns are dropped, and if there are columns present in the CSV but not in
fieldnames emit a clear warning mentioning args.csv_file, combined_df and
fieldnames so the user can decide to trim columns intentionally.
- Around line 518-564: The non-llm-d branch's fieldnames list is missing the
"notes" column while process_benchmark_section always sets a "notes" key (so
notes get dropped when --llm-d is not used); update the non-llm-d fieldnames
list (the fieldnames variable in the else branch) to include "notes" so the CSV
output preserves user-provided --notes values and matches the keys produced by
process_benchmark_section.

---

Nitpick comments:
In `@manual_runs/scripts/import_manual_runs_llm_d.py`:
- Around line 227-235: The JSON file is opened without an explicit encoding
which can cause UnicodeDecodeError on non-UTF-8 platforms; update the open call
in the try block that reads json_path (inside the import_manual_runs_llm_d.py
JSON load block) to use encoding='utf-8' and also add a UnicodeDecodeError
except handler (similar to FileNotFoundError/json.JSONDecodeError) to surface a
clear error message if decoding fails.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 82c86ccb-02fb-4fc7-b9da-2c0785e1205a

📥 Commits

Reviewing files that changed from the base of the PR and between 1f7daa5 and 5f5afba.

📒 Files selected for processing (1)

manual_runs/scripts/import_manual_runs_llm_d.py

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* feat: Support llm-d manual runs Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * fix: Linter and missing fields * feat: Add notes flag for llm-d runs Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * feat: Add llm-d processing script Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * chore: Fix format issues Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * fix: Linter Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> --------- Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* feat: Support llm-d manual runs * fix: Linter and missing fields * feat: Add notes flag for llm-d runs * feat: Add llm-d processing script * chore: Fix format issues * fix: Linter --------- Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> Co-authored-by: alberto <aperdomo@redhat.com>

coderabbitai Bot reviewed Mar 11, 2026

View reviewed changes

Comment thread manual_runs/scripts/import_manual_runs_json_v2.py Outdated

coderabbitai Bot reviewed Mar 11, 2026

View reviewed changes

Comment thread manual_runs/scripts/import_manual_runs_json_v2.py Outdated

coderabbitai Bot reviewed Apr 3, 2026

View reviewed changes

Comment thread manual_runs/scripts/import_manual_runs_llm_d.py

Comment thread manual_runs/scripts/import_manual_runs_llm_d.py

Comment thread manual_runs/scripts/import_manual_runs_llm_d.py

Comment thread manual_runs/scripts/import_manual_runs_llm_d.py

albertoperdomo2 added 5 commits April 7, 2026 15:42

feat: Support llm-d manual runs

18e68f4

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

fix: Linter and missing fields

0f554a4

feat: Add notes flag for llm-d runs

0620a04

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

feat: Add llm-d processing script

068e353

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

chore: Fix format issues

cb706e7

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

albertoperdomo2 force-pushed the feat/support-llm-d-manual-runs branch from 5f5afba to cb706e7 Compare April 7, 2026 14:42

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread manual_runs/scripts/import_manual_runs_llm_d.py Outdated

Comment thread manual_runs/scripts/import_manual_runs_llm_d.py

Comment thread manual_runs/scripts/import_manual_runs_llm_d.py

fix: Linter

1023be1

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

Harshith-umesh merged commit dde7b61 into openshift-psap:main May 24, 2026
5 checks passed

Harshith-umesh mentioned this pull request May 24, 2026

feat: Support llm-d manual runs (cherry-pick to staging) #29

Merged

Conversation

albertoperdomo2 commented Mar 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

albertoperdomo2 commented Mar 11, 2026

Uh oh!

coderabbitai Bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Harshith-umesh commented Apr 3, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

albertoperdomo2 commented Mar 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 11, 2026 •

edited

Loading