USHIFT-6637: Ansible: Improve performance metrics capture#6271
USHIFT-6637: Ansible: Improve performance metrics capture#6271openshift-merge-bot[bot] merged 4 commits intoopenshift:mainfrom
Conversation
|
@sjug: This pull request references USHIFT-6637 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds two new CLI utilities for CRI-O disk reporting and Prometheus network queries, replaces vnstat collection with these tools in microshift-start, updates node-exporter naming and package, adds microshift cleanup/uninstall steps, and enhances ready.sh diagnostics and timing output. Changes
Sequence Diagram(s)sequenceDiagram
participant Controller as Ansible Controller
participant Host as Target Host (CRI‑O)
participant Prom as Prometheus
Controller->>Host: copy `crio-df.py`
Controller->>Host: execute `crio-df.py` (collect image sizes)
Host-->>Controller: `results/images.txt` (image sizes, reclaimable)
Controller->>Controller: record network start_epoch
alt prometheus_logging enabled
Controller->>Prom: run `prom-network-query.py` / API queries
Prom-->>Controller: RX/TX totals, timeseries, JSON summary
Controller->>Controller: save `results/network.txt` and measurement end_epoch
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@sjug: This pull request references USHIFT-6637 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (9)
ansible/roles/common/tasks/files/ready.sh (1)
126-126:local elapseddeclared twice in the same function.Line 110 already declares
local elapsedinwait_for_ready. This secondlocalon line 126 is harmless in bash but redundant — you can just reassignelapsed=...here.Proposed fix
- local elapsed=$(( $(date +%s) - START_TIME )) + elapsed=$(( $(date +%s) - START_TIME ))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/common/tasks/files/ready.sh` at line 126, In wait_for_ready replace the redundant re-declaration of the variable elapsed (currently using "local elapsed=$(( ... ))") with a plain reassignment so you don't declare it twice; locate the second "local elapsed" in the function and change it to "elapsed=..." so the original local declaration at the top of wait_for_ready remains the single declaration.ansible/roles/common/tasks/files/crio-df.py (5)
340-343: Consider usingargparseinstead of manualsys.argvinspection.Currently
-vanywhere in argv triggers verbose mode, which could conflict if future args are added. Low risk for now, butargparseis already used in the sibling script.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/common/tasks/files/crio-df.py` around lines 340 - 343, The __main__ block currently sets verbose by scanning sys.argv which can misfire; replace this with argparse: import argparse, create an ArgumentParser, add a -v/--verbose flag (action='store_true'), call parser.parse_args() and pass args.verbose into main() instead of computing verbose via "-v" in sys.argv or "--verbose" in sys.argv; update the __main__ block in crio-df.py to use the new parser and ensure main(verbose) signature is unchanged.
333-333: Ambiguous variable namel.Rename to something clearer like
lidorlayer_idper E741.Proposed fix
- print(f" Shared layers (used by >1 image): {len([l for l, c in layer_count.items() if c > 1])}") + print(f" Shared layers (used by >1 image): {len([lid for lid, c in layer_count.items() if c > 1])}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/common/tasks/files/crio-df.py` at line 333, The comprehension in the print statement uses the ambiguous single-letter variable name "l" (print(f" Shared layers (used by >1 image): {len([l for l, c in layer_count.items() if c > 1])}")), rename that loop variable to a clear identifier like "layer_id" (or "lid") throughout the expression so it becomes len([layer_id for layer_id, c in layer_count.items() if c > 1]); update only that comprehension variable name to satisfy E741 and keep surrounding code unchanged.
281-284: Unused variables:comp_naive,comp_shared_size,comp_layer_usage.These are computed but never referenced. Either use them in output or remove to avoid confusion.
Proposed fix
- comp_layers, comp_layer_usage, comp_shared, comp_per_image = get_compressed_sizes(images) + comp_layers, _, comp_shared, comp_per_image = get_compressed_sizes(images) comp_deduplicated = sum(comp_layers.values()) - comp_naive = sum(img["compressed"] for img in comp_per_image) - comp_shared_size = sum(comp_layers[d] for d in comp_shared)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/common/tasks/files/crio-df.py` around lines 281 - 284, The variables comp_naive, comp_shared_size and comp_layer_usage computed after calling get_compressed_sizes(images) are never used; either remove their calculations or wire them into the function's output/logging. Edit the block that calls get_compressed_sizes (referencing comp_layers, comp_layer_usage, comp_shared, comp_per_image, comp_deduplicated) and either (A) delete the lines computing comp_naive and comp_shared_size and any unused comp_layer_usage handling, or (B) return/include comp_naive, comp_shared_size and comp_layer_usage in the function's return value or summary log so they are actually referenced downstream.
296-296: f-strings without placeholders on lines 296, 306, 311.These are plain strings with no interpolation — drop the
fprefix.Proposed fix
- print(f"\nContainers space usage:\n") + print("\nContainers space usage:\n")- print(f"\nLocal Volumes space usage:\n") + print("\nLocal Volumes space usage:\n")- print(f"\nCompressed download sizes:\n") + print("\nCompressed download sizes:\n")Also applies to: 306-306, 311-311
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/common/tasks/files/crio-df.py` at line 296, Remove the unnecessary f-string prefixes on plain printed strings in crio-df.py: replace the print(f"\nContainers space usage:\n") and the other two prints with the same literal content (the ones at the locations printing "Containers space usage:", plus the prints at the lines printing the other plain strings shown in the diff) with regular print("...") calls (i.e., drop the leading `f`), keeping the exact string text and spacing unchanged.
300-300: Silently truncates container listing to 10 entries.
volatile_layers[:10]caps the output with no indication to the user that entries were omitted. If the limit is intentional, print a note when truncated.Proposed fix
for container in volatile_layers[:10]: # Limit to first 10 ... + if len(volatile_layers) > 10: + print(f" ... and {len(volatile_layers) - 10} more containers")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/common/tasks/files/crio-df.py` at line 300, The loop currently uses volatile_layers[:10] which silently truncates results; change the logic around the for loop that uses volatile_layers (the line "for container in volatile_layers[:10]") to detect truncation (if len(volatile_layers) > 10) and, when truncated, print or log a clear notice stating the total count and that only the first 10 entries are shown (e.g., "Showing 10 of X volatile layers"); keep the existing slice for iteration but ensure the message is emitted using the same output/logging mechanism as the rest of the script.ansible/roles/common/tasks/files/prom-network-query.py (2)
128-136: No error handling for malformed Prometheus responses.Line 134 accesses
data["data"]["result"]— a non-200 or unexpected response shape will raiseKeyError. Consider wrapping in a try/except or using.get().Proposed fix
def get_increase(metric, start, end, prometheus, device, instance=None): """Use increase() to get accurate total accounting for counter resets.""" window = end - start sel = label_selector(device, instance) query = f"increase({metric}{{{sel}}}[{window}s])" data = query_instant(prometheus, query, end) - if data["status"] != "success" or not data["data"]["result"]: - return None - return float(data["data"]["result"][0]["value"][1]) + try: + if data.get("status") != "success" or not data.get("data", {}).get("result"): + return None + return float(data["data"]["result"][0]["value"][1]) + except (KeyError, IndexError, TypeError, ValueError): + return None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/common/tasks/files/prom-network-query.py` around lines 128 - 136, In get_increase, avoid direct indexing into the Prometheus response which can raise KeyError on non-200 or unexpected payloads: wrap the access to data returned by query_instant(prometheus, query, end) in a try/except (or use chained .get() calls) to verify data.get("status")== "success" and that data.get("data", {}).get("result") is a non-empty list before reading result[0]["value"][1]; on any failure return None (and optionally log the raw response), and also guard the float(...) conversion in a try/except to return None if the value is not parseable. Ensure you still use label_selector(device, instance) and the same query construction.
217-226: JSON summary reports0bytes when there's no data — indistinguishable from actual zero traffic.When
rx_increaseortx_increaseisNone, the summary emits0. Downstream consumers can't tell "no data" from "zero transfer." Consider usingnullinstead.Proposed fix
summary = { ... - "rx_bytes": int(rx_increase) if rx_increase is not None else 0, - "tx_bytes": int(tx_increase) if tx_increase is not None else 0, + "rx_bytes": int(rx_increase) if rx_increase is not None else None, + "tx_bytes": int(tx_increase) if tx_increase is not None else None, ... }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/common/tasks/files/prom-network-query.py` around lines 217 - 226, The summary currently converts missing rx_increase/tx_increase to 0 which conflates "no data" with actual zero traffic; update the summary dict (the "summary" assignment) to set "rx_bytes" to int(rx_increase) if rx_increase is not None else None and "tx_bytes" to int(tx_increase) if tx_increase is not None else None so that missing values remain Python None (serialized as JSON null) instead of 0; keep "counter_resets" and "samples" as-is.ansible/roles/microshift-start/tasks/main.yml (1)
125-143: Image capture block runs unconditionally — no guard or error handling.If CRI-O isn't running or has no images,
crio-df.pywill fail and halt the playbook. Consider addingignore_errors: trueor gating on a condition if this is a best-effort metric.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/microshift-start/tasks/main.yml` around lines 125 - 143, The image-capture block currently runs unconditionally and will fail the playbook if CRI-O isn't available; update the tasks so the crio-df invocation is best-effort by adding error-tolerant behavior: on the "run crio-df script" task (the task that registers crio_df) add failed_when: false and/or ignore_errors: true so the playbook continues on failure, and make the "save container image sizes to results" copy task conditional (when: crio_df is defined and crio_df.stdout | length > 0) so you only write results when output exists; reference the task names "run crio-df script", the registered variable crio_df, and the results destination variable results_dir when making these changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ansible/roles/common/tasks/files/crio-df.py`:
- Around line 92-95: The except: is too broad and can swallow system exceptions;
replace it with except Exception as e to only catch regular exceptions, and if
appropriate capture/log e (e.g. process_logger or logging) before returning
"Unknown" in the same function that computes the human-readable age (the block
handling years = days // 365 and the subsequent return).
In `@ansible/roles/common/tasks/files/prom-network-query.py`:
- Around line 106-107: The helper fmt_ts currently uses
datetime.fromtimestamp(epoch) which yields local time and conflicts with
parse_window_args that uses UTC; change fmt_ts to produce UTC timestamps (e.g.,
use datetime.fromtimestamp(epoch, timezone.utc) or datetime.utcfromtimestamp and
then strftime) so all displayed timestamps are in UTC, and add/ensure the
timezone import (timezone) is present if missing; update the fmt_ts function
name reference accordingly.
- Around line 89-93: The prom_query function currently calls
urllib.request.urlopen without a timeout which can hang; modify prom_query to
accept an optional timeout parameter (e.g., timeout=10) and pass that timeout
into urllib.request.urlopen(url, timeout=timeout), and update any callers if
necessary to supply a different timeout; ensure the new timeout has a sensible
default so existing callers continue to work.
In `@ansible/roles/install-microshift/tasks/main.yml`:
- Around line 4-8: The current check using ansible.builtin.command: rpm -q
microshift (registered to microshift_installed) misses cases where only
sub-packages like microshift-networking are present; replace that task with a
broadened shell check (e.g., ansible.builtin.shell) that matches package names
with a glob/regex such as rpm -qa | grep -E '^microshift' or rpm -q
'microshift*' so any package beginning with "microshift" is detected, keep
register: microshift_installed and preserve ignore_errors/changed_when semantics
so subsequent cleanup runs when any matching package is installed.
- Around line 33-39: The task using ansible.builtin.dnf with name:
[microshift-*] and state: absent uses a glob that the DNF module does not
expand, so it will match nothing and leave packages installed; replace it with a
two-step approach: run ansible.builtin.dnf list: installed and register it to
installed_packages, then call ansible.builtin.dnf state: absent with name set to
the filtered list of installed_packages.results whose name matches the
^microshift pattern (use selectattr/search and map(name) to build the list) and
keep autoremove: yes; update the task names (e.g., "List installed microshift
packages" and "Uninstall microshift packages") and variable installed_packages
accordingly.
In `@ansible/roles/microshift-start/tasks/main.yml`:
- Around line 149-168: The task "resolve prometheus network query settings"
currently falls back to the hardcoded host 'zen3' when no logging group and no
prometheus_query_endpoint are present; change this to fail fast or emit a clear
warning instead of silently using 'zen3'. Modify the set_fact logic for
prom_network_endpoint (or add a preceding task) to check if
prometheus_query_endpoint is undefined and groups['logging'] is empty, then call
ansible.builtin.fail with a clear message (or ansible.builtin.debug/warn)
indicating the missing Prometheus endpoint and how to supply it, rather than
defaulting to 'zen3'. Ensure the change references prom_network_endpoint and the
resolve prometheus network query settings task so reviewers can find the update.
- Around line 171-182: The task "run prometheus network query" uses a relative
script path "roles/common/tasks/files/prom-network-query.py" which can break
depending on the working directory; update the command to reference the script
via a role-relative or absolute path (e.g. use the {{ role_path }} Jinja
variable to point at the file within the role) or move the script into the
current role's files directory and call it from there so the
prom-network-query.py invocation is robust regardless of where ansible-playbook
is run.
---
Nitpick comments:
In `@ansible/roles/common/tasks/files/crio-df.py`:
- Around line 340-343: The __main__ block currently sets verbose by scanning
sys.argv which can misfire; replace this with argparse: import argparse, create
an ArgumentParser, add a -v/--verbose flag (action='store_true'), call
parser.parse_args() and pass args.verbose into main() instead of computing
verbose via "-v" in sys.argv or "--verbose" in sys.argv; update the __main__
block in crio-df.py to use the new parser and ensure main(verbose) signature is
unchanged.
- Line 333: The comprehension in the print statement uses the ambiguous
single-letter variable name "l" (print(f" Shared layers (used by >1 image):
{len([l for l, c in layer_count.items() if c > 1])}")), rename that loop
variable to a clear identifier like "layer_id" (or "lid") throughout the
expression so it becomes len([layer_id for layer_id, c in layer_count.items() if
c > 1]); update only that comprehension variable name to satisfy E741 and keep
surrounding code unchanged.
- Around line 281-284: The variables comp_naive, comp_shared_size and
comp_layer_usage computed after calling get_compressed_sizes(images) are never
used; either remove their calculations or wire them into the function's
output/logging. Edit the block that calls get_compressed_sizes (referencing
comp_layers, comp_layer_usage, comp_shared, comp_per_image, comp_deduplicated)
and either (A) delete the lines computing comp_naive and comp_shared_size and
any unused comp_layer_usage handling, or (B) return/include comp_naive,
comp_shared_size and comp_layer_usage in the function's return value or summary
log so they are actually referenced downstream.
- Line 296: Remove the unnecessary f-string prefixes on plain printed strings in
crio-df.py: replace the print(f"\nContainers space usage:\n") and the other two
prints with the same literal content (the ones at the locations printing
"Containers space usage:", plus the prints at the lines printing the other plain
strings shown in the diff) with regular print("...") calls (i.e., drop the
leading `f`), keeping the exact string text and spacing unchanged.
- Line 300: The loop currently uses volatile_layers[:10] which silently
truncates results; change the logic around the for loop that uses
volatile_layers (the line "for container in volatile_layers[:10]") to detect
truncation (if len(volatile_layers) > 10) and, when truncated, print or log a
clear notice stating the total count and that only the first 10 entries are
shown (e.g., "Showing 10 of X volatile layers"); keep the existing slice for
iteration but ensure the message is emitted using the same output/logging
mechanism as the rest of the script.
In `@ansible/roles/common/tasks/files/prom-network-query.py`:
- Around line 128-136: In get_increase, avoid direct indexing into the
Prometheus response which can raise KeyError on non-200 or unexpected payloads:
wrap the access to data returned by query_instant(prometheus, query, end) in a
try/except (or use chained .get() calls) to verify data.get("status")==
"success" and that data.get("data", {}).get("result") is a non-empty list before
reading result[0]["value"][1]; on any failure return None (and optionally log
the raw response), and also guard the float(...) conversion in a try/except to
return None if the value is not parseable. Ensure you still use
label_selector(device, instance) and the same query construction.
- Around line 217-226: The summary currently converts missing
rx_increase/tx_increase to 0 which conflates "no data" with actual zero traffic;
update the summary dict (the "summary" assignment) to set "rx_bytes" to
int(rx_increase) if rx_increase is not None else None and "tx_bytes" to
int(tx_increase) if tx_increase is not None else None so that missing values
remain Python None (serialized as JSON null) instead of 0; keep "counter_resets"
and "samples" as-is.
In `@ansible/roles/common/tasks/files/ready.sh`:
- Line 126: In wait_for_ready replace the redundant re-declaration of the
variable elapsed (currently using "local elapsed=$(( ... ))") with a plain
reassignment so you don't declare it twice; locate the second "local elapsed" in
the function and change it to "elapsed=..." so the original local declaration at
the top of wait_for_ready remains the single declaration.
In `@ansible/roles/microshift-start/tasks/main.yml`:
- Around line 125-143: The image-capture block currently runs unconditionally
and will fail the playbook if CRI-O isn't available; update the tasks so the
crio-df invocation is best-effort by adding error-tolerant behavior: on the "run
crio-df script" task (the task that registers crio_df) add failed_when: false
and/or ignore_errors: true so the playbook continues on failure, and make the
"save container image sizes to results" copy task conditional (when: crio_df is
defined and crio_df.stdout | length > 0) so you only write results when output
exists; reference the task names "run crio-df script", the registered variable
crio_df, and the results destination variable results_dir when making these
changes.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (8)
ansible/roles/common/tasks/files/crio-df.pyansible/roles/common/tasks/files/prom-network-query.pyansible/roles/common/tasks/files/ready.shansible/roles/install-logging-exporters/defaults/main.ymlansible/roles/install-logging-exporters/tasks/main.ymlansible/roles/install-microshift/tasks/main.ymlansible/roles/microshift-start/defaults/main.ymlansible/roles/microshift-start/tasks/main.yml
💤 Files with no reviewable changes (1)
- ansible/roles/microshift-start/defaults/main.yml
d618e67 to
6318af1
Compare
|
@sjug: This pull request references USHIFT-6637 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (6)
ansible/roles/microshift-start/tasks/main.yml (3)
149-166:⚠️ Potential issue | 🟡 MinorHardcoded
zen3fallback hides misconfiguration.Please fail fast or warn when no logging host/endpoint is configured.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/microshift-start/tasks/main.yml` around lines 149 - 166, The prom_network_endpoint resolution currently falls back to a hardcoded 'zen3' which hides misconfiguration; update the logic in the prom_network_endpoint set_fact (referencing prometheus_query_endpoint, prometheus_port, groups['logging'] and hostvars) to detect when groups['logging'] is empty and either (a) fail the playbook with a descriptive error via ansible.builtin.fail explaining no logging host/endpoint is configured, or (b) emit a warning via ansible.builtin.debug/ansible.builtin.warn and require an explicit prometheus_query_endpoint; ensure the fallback to 'zen3' is removed so missing configuration causes a visible failure or warning.
173-181:⚠️ Potential issue | 🟡 MinorUse a role/playbook‑relative path for
prom-network-query.py.The current relative path is fragile.
Suggested fix
- python3 roles/common/tasks/files/prom-network-query.py + python3 {{ role_path }}/../common/tasks/files/prom-network-query.py🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/microshift-start/tasks/main.yml` around lines 173 - 181, The task "run prometheus network query" uses a fragile relative path to prom-network-query.py; update the command to use a role/playbook-relative absolute path (e.g. reference {{ role_path }} when the script lives in the current role or {{ playbook_dir }}/roles/common/files/prom-network-query.py if it's in the common role) so the command invokes the correct file regardless of cwd, keeping the rest of the arguments ({{ prom_network_endpoint }}, {{ prom_network_instance }}, {{ prom_network_device }}, {{ network_start_epoch.stdout }}, {{ network_end_epoch.stdout }}) unchanged.
125-133:⚠️ Potential issue | 🟡 MinorUse
role_pathto reference files in other roles instead of CWD-relative paths.The path
roles/common/tasks/files/crio-df.pydepends on the playbook being run from theansible/directory. Running playbooks from elsewhere will break this reference.Suggested fix
- src: roles/common/tasks/files/crio-df.py + src: "{{ role_path }}/../common/tasks/files/crio-df.py"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/microshift-start/tasks/main.yml` around lines 125 - 133, The src path in the "copy crio-df script to remote host" task is CWD-relative and can break; update the ansible.builtin.copy src to use role_path so it references the common role reliably, e.g. replace the literal "roles/common/tasks/files/crio-df.py" in the src field with a path built from role_path (for this role) like "{{ role_path }}/../common/tasks/files/crio-df.py" so the file is found regardless of working directory.ansible/roles/common/tasks/files/prom-network-query.py (1)
143-144:⚠️ Potential issue | 🟡 MinorUse UTC for timestamps to match the rest of the script.
fmt_tscurrently uses local time; the window logic is UTC.Suggested fix
-def fmt_ts(epoch): - return datetime.fromtimestamp(epoch).strftime("%Y-%m-%d %H:%M:%S") +def fmt_ts(epoch): + return datetime.fromtimestamp(epoch, tz=timezone.utc).strftime("%Y-%m-%d %H:%M:%S")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/common/tasks/files/prom-network-query.py` around lines 143 - 144, The fmt_ts function currently formats epoch times in local time; change it to produce UTC timestamps to match the rest of the script by using a UTC-based conversion (e.g., datetime.utcfromtimestamp or datetime.fromtimestamp with timezone.utc) so fmt_ts(epoch) returns a UTC-formatted string like "%Y-%m-%d %H:%M:%S"; update the implementation of fmt_ts accordingly so all downstream window logic remains consistent.ansible/roles/install-microshift/tasks/main.yml (2)
33-39:⚠️ Potential issue | 🔴 Critical
dnfwon’t expandmicroshift-*forstate: absent.This matches nothing, so packages remain installed. Use list/filter → remove.
Suggested fix
+ - name: list installed microshift packages + ansible.builtin.dnf: + list: installed + register: installed_packages + - name: uninstall microshift packages and dependencies become: yes ansible.builtin.dnf: - name: - - microshift-* + name: "{{ installed_packages.results | selectattr('name', 'search', '^microshift') | map(attribute='name') | list }}" autoremove: yes state: absent🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/install-microshift/tasks/main.yml` around lines 33 - 39, The current task "uninstall microshift packages and dependencies" uses ansible.builtin.dnf with name: microshift-* which won't expand when state: absent; instead first gather the installed package names that match the pattern (e.g., via package_facts or a shell/command that lists installed packages matching "microshift-*"), store them in a variable (e.g., microshift_packages), then call ansible.builtin.dnf with name: "{{ microshift_packages }}" autoremove: yes state: absent so you pass an explicit list to the dnf module and only remove packages that are actually installed.
4-8:⚠️ Potential issue | 🟡 MinorDetect sub‑packages before skipping cleanup.
rpm -q microshiftonly matches the base package; if onlymicroshift-*sub‑packages are installed, the cleanup block never runs.Suggested fix
- ansible.builtin.command: rpm -q microshift + ansible.builtin.command: rpm -q --whatprovides microshift🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/install-microshift/tasks/main.yml` around lines 4 - 8, The current check uses "rpm -q microshift" which only matches the base package; update the task that registers microshift_installed to detect any installed package whose name begins with "microshift" (e.g., replace the command with something like `rpm -qa | grep -E '^microshift'` or `rpm -q "microshift*"`), keep register: microshift_installed, ignore_errors: true and changed_when: false, and ensure any later cleanup conditional continues to reference microshift_installed (e.g., microshift_installed.rc or microshift_installed.stdout) so the cleanup block runs when any microshift-* subpackage is present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ansible/roles/common/tasks/files/crio-df.py`:
- Around line 169-175: The variables flagged as unused should be removed or
renamed to underscore-prefixed names to satisfy linters: in the block using
get_image_display_name(image) remove the unused tag (use repo =
get_image_display_name(image) or assign repo, _ = get_image_display_name(image))
and in the section that computes compression metrics remove or underscore the
unused comp_layer_usage, comp_naive, and comp_shared_size variables; update the
per_image construction (per_image.append({...})) to only reference actual fields
(e.g., "id", "name", "compressed", "layers") so no unused local names remain and
lint warnings RUF059/F841 are resolved.
- Around line 296-297: Several print statements use f-strings with no
placeholders (e.g. print(f"\nContainers space usage:\n"), print(f"{'CONTAINER
ID':<12} {'IMAGE':<35} ... {'NAMES'}")) which triggers Ruff/Flake8 F541; remove
the unnecessary f prefix on those string literals (also for the similar prints
at the other occurrences around the block shown — the lines printing the header
and the formatted columns at 296-297 and the ones at 306-307 and 311-312) so
they become plain string arguments to print without changing the string content
or formatting expressions.
- Line 333: Rename the ambiguous loop variable 'l' in the list comprehension to
a clearer name (e.g., 'layer') to satisfy Flake8 E741; specifically update the
expression that builds the count of shared layers (it references layer_count and
currently uses "for l, c in layer_count.items()") so it reads "for layer, c in
layer_count.items()" (and update any matching variable references in that
expression).
---
Duplicate comments:
In `@ansible/roles/common/tasks/files/prom-network-query.py`:
- Around line 143-144: The fmt_ts function currently formats epoch times in
local time; change it to produce UTC timestamps to match the rest of the script
by using a UTC-based conversion (e.g., datetime.utcfromtimestamp or
datetime.fromtimestamp with timezone.utc) so fmt_ts(epoch) returns a
UTC-formatted string like "%Y-%m-%d %H:%M:%S"; update the implementation of
fmt_ts accordingly so all downstream window logic remains consistent.
In `@ansible/roles/install-microshift/tasks/main.yml`:
- Around line 33-39: The current task "uninstall microshift packages and
dependencies" uses ansible.builtin.dnf with name: microshift-* which won't
expand when state: absent; instead first gather the installed package names that
match the pattern (e.g., via package_facts or a shell/command that lists
installed packages matching "microshift-*"), store them in a variable (e.g.,
microshift_packages), then call ansible.builtin.dnf with name: "{{
microshift_packages }}" autoremove: yes state: absent so you pass an explicit
list to the dnf module and only remove packages that are actually installed.
- Around line 4-8: The current check uses "rpm -q microshift" which only matches
the base package; update the task that registers microshift_installed to detect
any installed package whose name begins with "microshift" (e.g., replace the
command with something like `rpm -qa | grep -E '^microshift'` or `rpm -q
"microshift*"`), keep register: microshift_installed, ignore_errors: true and
changed_when: false, and ensure any later cleanup conditional continues to
reference microshift_installed (e.g., microshift_installed.rc or
microshift_installed.stdout) so the cleanup block runs when any microshift-*
subpackage is present.
In `@ansible/roles/microshift-start/tasks/main.yml`:
- Around line 149-166: The prom_network_endpoint resolution currently falls back
to a hardcoded 'zen3' which hides misconfiguration; update the logic in the
prom_network_endpoint set_fact (referencing prometheus_query_endpoint,
prometheus_port, groups['logging'] and hostvars) to detect when
groups['logging'] is empty and either (a) fail the playbook with a descriptive
error via ansible.builtin.fail explaining no logging host/endpoint is
configured, or (b) emit a warning via ansible.builtin.debug/ansible.builtin.warn
and require an explicit prometheus_query_endpoint; ensure the fallback to 'zen3'
is removed so missing configuration causes a visible failure or warning.
- Around line 173-181: The task "run prometheus network query" uses a fragile
relative path to prom-network-query.py; update the command to use a
role/playbook-relative absolute path (e.g. reference {{ role_path }} when the
script lives in the current role or {{ playbook_dir
}}/roles/common/files/prom-network-query.py if it's in the common role) so the
command invokes the correct file regardless of cwd, keeping the rest of the
arguments ({{ prom_network_endpoint }}, {{ prom_network_instance }}, {{
prom_network_device }}, {{ network_start_epoch.stdout }}, {{
network_end_epoch.stdout }}) unchanged.
- Around line 125-133: The src path in the "copy crio-df script to remote host"
task is CWD-relative and can break; update the ansible.builtin.copy src to use
role_path so it references the common role reliably, e.g. replace the literal
"roles/common/tasks/files/crio-df.py" in the src field with a path built from
role_path (for this role) like "{{ role_path
}}/../common/tasks/files/crio-df.py" so the file is found regardless of working
directory.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (8)
ansible/roles/common/tasks/files/crio-df.pyansible/roles/common/tasks/files/prom-network-query.pyansible/roles/common/tasks/files/ready.shansible/roles/install-logging-exporters/defaults/main.ymlansible/roles/install-logging-exporters/tasks/main.ymlansible/roles/install-microshift/tasks/main.ymlansible/roles/microshift-start/defaults/main.ymlansible/roles/microshift-start/tasks/main.yml
💤 Files with no reviewable changes (1)
- ansible/roles/microshift-start/defaults/main.yml
🚧 Files skipped from review as they are similar to previous changes (1)
- ansible/roles/common/tasks/files/ready.sh
6318af1 to
836c547
Compare
|
@sjug: This pull request references USHIFT-6637 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
ansible/roles/microshift-start/tasks/main.yml (3)
171-182:⚠️ Potential issue | 🟡 MinorRelative script path for prom-network-query.py is CWD-dependent.
python3 roles/common/tasks/files/prom-network-query.pyrelies on theansible-playbookworking directory. This was flagged in a previous review and remains unaddressed. Using{{ playbook_dir }}to anchor the path removes the fragility.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/microshift-start/tasks/main.yml` around lines 171 - 182, The task "run prometheus network query" is invoking the script with a relative path which makes execution CWD-dependent; change the command to use an anchored path by prefacing the script with {{ playbook_dir }} (e.g., {{ playbook_dir }}/roles/common/tasks/files/prom-network-query.py) so the ansible.builtin.command uses an absolute path, keeping the existing flags/variables ({{ prom_network_endpoint }}, {{ prom_network_instance }}, {{ prom_network_device }}, {{ network_start_epoch.stdout }}, {{ network_end_epoch.stdout }}) and keeping the same register/ delegate_to behavior.
149-169:⚠️ Potential issue | 🟡 MinorHardcoded
zen3fallback will silently fail in other environments.When neither
prometheus_query_endpointis set nor thelogginggroup exists, the endpoint resolves tohttp://zen3:9091— a lab-specific hostname. This was raised in a previous review and remains unaddressed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/microshift-start/tasks/main.yml` around lines 149 - 169, The prom_network_endpoint fact currently falls back to the hardcoded hostname "zen3" when neither prometheus_query_endpoint nor a logging group exists; update the resolution in the prom_network_endpoint set_fact to use a safe, portable default (for example inventory_hostname or ansible_default_ipv4.address or "localhost") instead of "zen3" so it won't silently point at a lab-specific host—change the fallback logic around prometheus_query_endpoint and the groups['logging'] branch in the prom_network_endpoint expression to use that portable default and ensure prom_network_device and prom_network_instance remain unchanged.
127-132:⚠️ Potential issue | 🟡 MinorRelative
srcpath for crio-df.py copy is CWD-dependent.
src: roles/common/tasks/files/crio-df.pybreaks ifansible-playbookis invoked from anywhere other thanansible/. Use{{ role_path }}or a role-relative reference instead, consistent with how prom-network-query.py should be referenced.🛠️ Proposed fix
- src: roles/common/tasks/files/crio-df.py + src: "{{ playbook_dir }}/../roles/common/tasks/files/crio-df.py"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/microshift-start/tasks/main.yml` around lines 127 - 132, The copy task "copy crio-df script to remote host" uses a CWD-dependent src path; change src to a role-relative reference (e.g. src: "{{ role_path }}/files/crio-df.py") or otherwise use the same pattern as prom-network-query.py so the file is resolved regardless of where ansible-playbook is invoked; ensure the crio-df.py file lives in that role's files directory (or update the role_path to point to the common role's files if it intentionally lives in roles/common/files).ansible/roles/common/tasks/files/prom-network-query.py (1)
143-144:⚠️ Potential issue | 🟡 Minor
fmt_tsuses local time while the rest of the script uses UTC.
datetime.fromtimestamp(epoch)returns local time;parse_window_argsusesdatetime.now(timezone.utc). Displayed timestamps will silently shift based on the controller's timezone.🛠️ Proposed fix
def fmt_ts(epoch): - return datetime.fromtimestamp(epoch).strftime("%Y-%m-%d %H:%M:%S") + return datetime.fromtimestamp(epoch, tz=timezone.utc).strftime("%Y-%m-%d %H:%M:%S")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/common/tasks/files/prom-network-query.py` around lines 143 - 144, The helper fmt_ts currently uses datetime.fromtimestamp(epoch) which yields local time and conflicts with the UTC-based logic in parse_window_args (which uses datetime.now(timezone.utc)); update fmt_ts to produce UTC-aware timestamps (e.g., use datetime.fromtimestamp(epoch, timezone.utc) or equivalent) and keep the same strftime format so all displayed timestamps are in UTC and consistent with parse_window_args.
🧹 Nitpick comments (1)
ansible/roles/common/tasks/files/crio-df.py (1)
29-29: Optional: Ruff BLE001 — broadExceptioncatches still flagged.Both catches are intentional (line 29 prints a warning, line 94 returns
"Unknown"), so no behavior change is needed. If you want to silence the linter cleanly, you can narrow line 29 to the expected OS/IO errors and leave line 94 as-is with a# noqa: BLE001comment.Also applies to: 94-94
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/common/tasks/files/crio-df.py` at line 29, The broad except at the first catch (the "except Exception as e:" that logs a warning) should be narrowed to the specific expected errors (e.g., except (OSError, IOError) as e:) so the linter BLE001 is satisfied while preserving behavior; leave the second broad catch that returns "Unknown" as-is but add a "# noqa: BLE001" comment to that line to explicitly silence Ruff. Ensure you update the warning log statement using the same exception variable name (e) so existing logging remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ansible/roles/common/tasks/files/prom-network-query.py`:
- Around line 264-266: The loop converting Prometheus sample strings uses
int(val) which raises ValueError for scientific-notation strings; change the
conversion to parse as float first and then to int (e.g. int(float(val))) inside
the loop that iterates over values before calling fmt_ts(ts) so
scientific-notation and large float64 strings are handled correctly; update the
conversion in the block where values are unpacked (the for ts, val in values
loop) so downstream code (fmt_ts and any consumers of val) receives an integer.
---
Duplicate comments:
In `@ansible/roles/common/tasks/files/prom-network-query.py`:
- Around line 143-144: The helper fmt_ts currently uses
datetime.fromtimestamp(epoch) which yields local time and conflicts with the
UTC-based logic in parse_window_args (which uses datetime.now(timezone.utc));
update fmt_ts to produce UTC-aware timestamps (e.g., use
datetime.fromtimestamp(epoch, timezone.utc) or equivalent) and keep the same
strftime format so all displayed timestamps are in UTC and consistent with
parse_window_args.
In `@ansible/roles/microshift-start/tasks/main.yml`:
- Around line 171-182: The task "run prometheus network query" is invoking the
script with a relative path which makes execution CWD-dependent; change the
command to use an anchored path by prefacing the script with {{ playbook_dir }}
(e.g., {{ playbook_dir }}/roles/common/tasks/files/prom-network-query.py) so the
ansible.builtin.command uses an absolute path, keeping the existing
flags/variables ({{ prom_network_endpoint }}, {{ prom_network_instance }}, {{
prom_network_device }}, {{ network_start_epoch.stdout }}, {{
network_end_epoch.stdout }}) and keeping the same register/ delegate_to
behavior.
- Around line 149-169: The prom_network_endpoint fact currently falls back to
the hardcoded hostname "zen3" when neither prometheus_query_endpoint nor a
logging group exists; update the resolution in the prom_network_endpoint
set_fact to use a safe, portable default (for example inventory_hostname or
ansible_default_ipv4.address or "localhost") instead of "zen3" so it won't
silently point at a lab-specific host—change the fallback logic around
prometheus_query_endpoint and the groups['logging'] branch in the
prom_network_endpoint expression to use that portable default and ensure
prom_network_device and prom_network_instance remain unchanged.
- Around line 127-132: The copy task "copy crio-df script to remote host" uses a
CWD-dependent src path; change src to a role-relative reference (e.g. src: "{{
role_path }}/files/crio-df.py") or otherwise use the same pattern as
prom-network-query.py so the file is resolved regardless of where
ansible-playbook is invoked; ensure the crio-df.py file lives in that role's
files directory (or update the role_path to point to the common role's files if
it intentionally lives in roles/common/files).
---
Nitpick comments:
In `@ansible/roles/common/tasks/files/crio-df.py`:
- Line 29: The broad except at the first catch (the "except Exception as e:"
that logs a warning) should be narrowed to the specific expected errors (e.g.,
except (OSError, IOError) as e:) so the linter BLE001 is satisfied while
preserving behavior; leave the second broad catch that returns "Unknown" as-is
but add a "# noqa: BLE001" comment to that line to explicitly silence Ruff.
Ensure you update the warning log statement using the same exception variable
name (e) so existing logging remains unchanged.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (4)
ansible/roles/common/tasks/files/crio-df.pyansible/roles/common/tasks/files/prom-network-query.pyansible/roles/microshift-start/defaults/main.ymlansible/roles/microshift-start/tasks/main.yml
💤 Files with no reviewable changes (1)
- ansible/roles/microshift-start/defaults/main.yml
836c547 to
ec77ae3
Compare
|
@sjug: This pull request references USHIFT-6637 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ansible/roles/microshift-start/tasks/main.yml (1)
9-29:⚠️ Potential issue | 🟠 MajorFail fast if the cleanup script isn’t found.
Iffind_cleanup.filesis empty, Line 25 can render an empty command and fail opaquely.✅ Suggested fix
- name: set microshift_cleanup_bin variable set_fact: microshift_cleanup_bin: "{{ find_cleanup.files[0].path if find_cleanup.files }}" + + - name: fail if cleanup script not found + ansible.builtin.fail: + msg: "MicroShift cleanup script not found under /usr/bin." + when: + - cleanup_microshift | default('true') | bool + - microshift_active.stdout | int != 0 + - find_cleanup.files | length == 0 - name: cleanup microshift data become: yes ansible.builtin.shell: echo 1 | {{ microshift_cleanup_bin }} --all when: - cleanup_microshift | default('true') | bool - microshift_active.stdout | int != 0 + - microshift_cleanup_bin is defined🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/microshift-start/tasks/main.yml` around lines 9 - 29, The task assigns microshift_cleanup_bin from find_cleanup.files but doesn’t fail when no script is found, leading to an empty/opaque shell call; update the logic around the registered result (find_cleanup) and the set_fact/microshift cleanup task so that if find_cleanup.files is empty you fail fast with a clear message (e.g., using a fail task or conditional) before running the "cleanup microshift data" shell command; reference the registered variable find_cleanup, the set_fact that sets microshift_cleanup_bin, and the "cleanup microshift data" task to implement the check and explicit failure.
♻️ Duplicate comments (4)
ansible/roles/common/tasks/files/prom-network-query.py (2)
266-279:⚠️ Potential issue | 🟡 MinorGuard against scientific-notation values from Prometheus.
Line 269 usesint(val), which throws for strings like"1.23e+10"; this can break large counters.✅ Suggested fix
- val = int(val) + val = int(float(val))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/common/tasks/files/prom-network-query.py` around lines 266 - 279, The loop currently does val = int(val) which fails on scientific-notation strings like "1.23e+10"; change the conversion to parse as float first and then to int (e.g. val = int(float(val))) and optionally wrap that conversion in a try/except to log or skip malformed values; update uses in the loop (prev_val, resets, delta calculation and print) to rely on the new val numeric type and ensure fmt_ts(ts) remains unchanged.
147-148:⚠️ Potential issue | 🟡 MinorUse UTC in
fmt_tsfor consistency.
Line 148 currently uses local time, while window math is UTC-based; this silently shifts outputs by timezone.✅ Suggested fix
-def fmt_ts(epoch): - return datetime.fromtimestamp(epoch).strftime("%Y-%m-%d %H:%M:%S") +def fmt_ts(epoch): + return datetime.fromtimestamp(epoch, tz=timezone.utc).strftime("%Y-%m-%d %H:%M:%S")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/common/tasks/files/prom-network-query.py` around lines 147 - 148, The fmt_ts function currently formats epoch using local time; change it to use UTC so outputs align with UTC-based window math—update fmt_ts to convert the epoch to a UTC datetime (e.g., via datetime.utcfromtimestamp(...) or datetime.fromtimestamp(..., tz=timezone.utc)) and then strftime the same "%Y-%m-%d %H:%M:%S" format (or include a trailing Z if you want to mark UTC), keeping the function name fmt_ts unchanged.ansible/roles/microshift-start/tasks/main.yml (2)
149-168:⚠️ Potential issue | 🟡 MinorAvoid hardcoded
zen3fallback for Prometheus.
Line 161 silently defaults to a lab-specific host, which is brittle in other environments.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/microshift-start/tasks/main.yml` around lines 149 - 168, The prom_network_endpoint set_fact uses a hardcoded 'zen3' fallback which should be removed; update the prom_network_endpoint expression to not reference 'zen3' and instead fall back to a sensible, environment-agnostic value (for example inventory_hostname or '127.0.0.1') or fail loudly if no logging host is found—modify the templated expression that computes prom_network_endpoint (and optionally prom_network_device/prom_network_instance if related) to prefer groups['logging'] first, then inventory_hostname or localhost, and avoid any lab-specific literal.
171-181:⚠️ Potential issue | 🟠 MajorUse a role/playbook-relative path for prom-network-query.py.
Line 175 depends on the working directory; this can break when playbooks are invoked from repo root.✅ Example using playbook_dir
- python3 roles/common/tasks/files/prom-network-query.py + python3 {{ playbook_dir }}/roles/common/tasks/files/prom-network-query.py🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/microshift-start/tasks/main.yml` around lines 171 - 181, The task "run prometheus network query" currently invokes the script "prom-network-query.py" with a repository-relative path which breaks depending on the current working directory; change the command to build an absolute path using Ansible variables (e.g. use playbook_dir or role_path) to point to prom-network-query.py in the role/playbook files directory, leaving the rest of the arguments (prom_network_endpoint, prom_network_instance, prom_network_device, network_start_epoch.stdout, network_end_epoch.stdout) unchanged so the task reliably finds the script regardless of where the playbook is invoked.
🧹 Nitpick comments (1)
ansible/roles/microshift-start/tasks/main.yml (1)
125-133: Prefer a stable path for the local script source.
Line 130 relies on the current working directory; usingplaybook_dirorrole_pathis more robust.✅ Example using playbook_dir
- src: roles/common/tasks/files/crio-df.py + src: "{{ playbook_dir }}/roles/common/tasks/files/crio-df.py"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/microshift-start/tasks/main.yml` around lines 125 - 133, The copy task "copy crio-df script to remote host" currently uses a relative src path which depends on the CWD; change the src to a stable variable-based path (e.g. use role_path or playbook_dir) so the file is resolved reliably regardless of where the playbook is executed, update the ansible.builtin.copy task that references src: roles/common/tasks/files/crio-df.py to use "{{ role_path }}/..." or "{{ playbook_dir }}/..." accordingly and leave dest and mode unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@ansible/roles/microshift-start/tasks/main.yml`:
- Around line 9-29: The task assigns microshift_cleanup_bin from
find_cleanup.files but doesn’t fail when no script is found, leading to an
empty/opaque shell call; update the logic around the registered result
(find_cleanup) and the set_fact/microshift cleanup task so that if
find_cleanup.files is empty you fail fast with a clear message (e.g., using a
fail task or conditional) before running the "cleanup microshift data" shell
command; reference the registered variable find_cleanup, the set_fact that sets
microshift_cleanup_bin, and the "cleanup microshift data" task to implement the
check and explicit failure.
---
Duplicate comments:
In `@ansible/roles/common/tasks/files/prom-network-query.py`:
- Around line 266-279: The loop currently does val = int(val) which fails on
scientific-notation strings like "1.23e+10"; change the conversion to parse as
float first and then to int (e.g. val = int(float(val))) and optionally wrap
that conversion in a try/except to log or skip malformed values; update uses in
the loop (prev_val, resets, delta calculation and print) to rely on the new val
numeric type and ensure fmt_ts(ts) remains unchanged.
- Around line 147-148: The fmt_ts function currently formats epoch using local
time; change it to use UTC so outputs align with UTC-based window math—update
fmt_ts to convert the epoch to a UTC datetime (e.g., via
datetime.utcfromtimestamp(...) or datetime.fromtimestamp(..., tz=timezone.utc))
and then strftime the same "%Y-%m-%d %H:%M:%S" format (or include a trailing Z
if you want to mark UTC), keeping the function name fmt_ts unchanged.
In `@ansible/roles/microshift-start/tasks/main.yml`:
- Around line 149-168: The prom_network_endpoint set_fact uses a hardcoded
'zen3' fallback which should be removed; update the prom_network_endpoint
expression to not reference 'zen3' and instead fall back to a sensible,
environment-agnostic value (for example inventory_hostname or '127.0.0.1') or
fail loudly if no logging host is found—modify the templated expression that
computes prom_network_endpoint (and optionally
prom_network_device/prom_network_instance if related) to prefer
groups['logging'] first, then inventory_hostname or localhost, and avoid any
lab-specific literal.
- Around line 171-181: The task "run prometheus network query" currently invokes
the script "prom-network-query.py" with a repository-relative path which breaks
depending on the current working directory; change the command to build an
absolute path using Ansible variables (e.g. use playbook_dir or role_path) to
point to prom-network-query.py in the role/playbook files directory, leaving the
rest of the arguments (prom_network_endpoint, prom_network_instance,
prom_network_device, network_start_epoch.stdout, network_end_epoch.stdout)
unchanged so the task reliably finds the script regardless of where the playbook
is invoked.
---
Nitpick comments:
In `@ansible/roles/microshift-start/tasks/main.yml`:
- Around line 125-133: The copy task "copy crio-df script to remote host"
currently uses a relative src path which depends on the CWD; change the src to a
stable variable-based path (e.g. use role_path or playbook_dir) so the file is
resolved reliably regardless of where the playbook is executed, update the
ansible.builtin.copy task that references src:
roles/common/tasks/files/crio-df.py to use "{{ role_path }}/..." or "{{
playbook_dir }}/..." accordingly and leave dest and mode unchanged.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (4)
ansible/roles/common/tasks/files/crio-df.pyansible/roles/common/tasks/files/prom-network-query.pyansible/roles/microshift-start/defaults/main.ymlansible/roles/microshift-start/tasks/main.yml
💤 Files with no reviewable changes (1)
- ansible/roles/microshift-start/defaults/main.yml
ec77ae3 to
f77d12d
Compare
|
@sjug: This pull request references USHIFT-6637 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (6)
ansible/roles/common/tasks/files/prom-network-query.py (2)
154-155:⚠️ Potential issue | 🟡 Minor
fmt_tsstill uses local time while the rest of the script uses UTC.
datetime.fromtimestamp(epoch)returns local time; epochs are computed in UTC. Displayed timestamps will shift depending on the host's timezone.Proposed fix
def fmt_ts(epoch): - return datetime.fromtimestamp(epoch).strftime("%Y-%m-%d %H:%M:%S") + return datetime.fromtimestamp(epoch, tz=timezone.utc).strftime("%Y-%m-%d %H:%M:%S UTC")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/common/tasks/files/prom-network-query.py` around lines 154 - 155, The helper fmt_ts uses datetime.fromtimestamp(epoch) which yields local time; change it to produce UTC timestamps instead by using a UTC-aware conversion (e.g., datetime.utcfromtimestamp or datetime.fromtimestamp with timezone.utc) so that epoch values are formatted consistently with the rest of the script; update the fmt_ts function to convert the epoch to UTC before calling strftime("%Y-%m-%d %H:%M:%S").
275-276:⚠️ Potential issue | 🟠 Major
int(val)crashes on scientific-notation strings from Prometheus.Prometheus counter values can arrive as e.g.
"1.23456789e+10".int("1.23456789e+10")raisesValueError. Still unfixed from prior review.Proposed fix
for ts, val in values: - val = int(val) + val = int(float(val))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/common/tasks/files/prom-network-query.py` around lines 275 - 276, The code crashes when converting Prometheus scientific-notation strings because it does val = int(val) in the loop "for ts, val in values"; change the conversion to first parse as a float (e.g. int(float(val))) and wrap it in a small try/except to catch ValueError/TypeError so malformed values are logged/skipped instead of crashing; update the conversion at the "for ts, val in values" site and ensure any downstream logic expects an int after this safe conversion.ansible/roles/microshift-start/tasks/main.yml (2)
171-182:⚠️ Potential issue | 🟡 MinorRelative script path still fragile — depends on CWD being
ansible/.Previously flagged.
roles/common/tasks/files/prom-network-query.pyresolves relative to the playbook directory, not the role. Consider copying the script to the controller temp dir or using an absolute path via{{ playbook_dir }}.Proposed fix
- name: run prometheus network query ansible.builtin.command: > - python3 roles/common/tasks/files/prom-network-query.py + python3 {{ playbook_dir }}/roles/common/tasks/files/prom-network-query.py --prometheus {{ prom_network_endpoint }} --instance {{ prom_network_instance }} --device {{ prom_network_device }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/microshift-start/tasks/main.yml` around lines 171 - 182, The task "run prometheus network query" uses a relative script path "roles/common/tasks/files/prom-network-query.py" which is fragile; update the task so the controller runs an explicit absolute path or a copied temp file: either change the command to use "{{ playbook_dir }}/roles/common/tasks/files/prom-network-query.py" (keep delegate_to: localhost and register: prom_network) or add a prior step that uses ansible.builtin.copy (or fetch) to copy that script to a controller temp location (e.g., /tmp/prom-network-query.py) and run the copied file from that absolute path in the ansible.builtin.command invocation.
149-169:⚠️ Potential issue | 🟡 MinorHardcoded
'zen3'fallback is still present.Previously flagged: if no
logginggroup and noprometheus_query_endpointoverride, the endpoint silently resolves tohttp://zen3:9091which will fail opaquely in non-lab environments.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/microshift-start/tasks/main.yml` around lines 149 - 169, The prom_network_endpoint resolution currently falls back to a hardcoded 'zen3' when there is no logging group and no prometheus_query_endpoint; replace that hardcoded 'zen3' fallback in the prom_network_endpoint expression with a safe alternative (for example inventory_hostname or an explicit failure) so the playbook does not silently produce an unusable URL — update the logic around prom_network_endpoint (the block referencing prometheus_query_endpoint, groups['logging'], and 'zen3') to either use inventory_hostname (or ansible_default_ipv4.address) as the default host or call ansible.builtin.fail with a clear message requiring prometheus_query_endpoint when logging group is absent.ansible/roles/common/tasks/files/crio-df.py (2)
59-95: Past bare-except issue addressed;except Exceptionis fine here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/common/tasks/files/crio-df.py` around lines 59 - 95, The function format_time_ago currently catches all exceptions with "except Exception" and the reviewer approves this approach, so leave format_time_ago unchanged; do not revert to a bare except or add more granular exception handling here.
284-335: Past lint issues (f-string placeholders, ambiguousl) are resolved.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/common/tasks/files/crio-df.py` around lines 284 - 335, The updated verbose and summary printing blocks no longer contain the previous f-string placeholder/l ambiguity and look correct; no code changes required—keep the current print blocks (the verbose branch that iterates over image_data, volatile_layers and comp_per_image and the summary branch that uses total_size, total_reclaimable and reclaimable_pct) and ensure format_size and the variables layer_count/comp_deduplicated remain defined upstream so these print lines (including the compression ratio expression total_size / comp_deduplicated) do not raise runtime errors.
🧹 Nitpick comments (1)
ansible/roles/common/tasks/files/prom-network-query.py (1)
296-309: Consider usingnullinstead of0when increase data is absent.Lines 303–304: reporting
0whenrx_increase is Nonemakes "no data" indistinguishable from "zero transfer" for downstream consumers.Proposed fix
- "rx_bytes": int(rx_increase) if rx_increase is not None else 0, - "tx_bytes": int(tx_increase) if tx_increase is not None else 0, + "rx_bytes": int(rx_increase) if rx_increase is not None else None, + "tx_bytes": int(tx_increase) if tx_increase is not None else None,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/common/tasks/files/prom-network-query.py` around lines 296 - 309, The summary currently sets "rx_bytes" and "tx_bytes" to 0 when rx_increase or tx_increase is None, which hides "no data" vs "zero transfer"; update the summary construction (the dict assigned to summary) to set "rx_bytes": int(rx_increase) if rx_increase is not None else None and "tx_bytes": int(tx_increase) if tx_increase is not None else None so that json.dumps will emit JSON null for missing data; keep all other fields and behavior the same.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ansible/roles/common/tasks/files/crio-df.py`:
- Around line 130-138: The current parsing uses name.rsplit(":", 1) which
misinterprets registry ports as tags; after removing any "@sha256:" suffix, when
you split on the last ":" check the right-side segment (parts[1]) for a "/" — if
it contains "/" then it is a port (no tag) so return (name, "latest"); otherwise
treat it as a tag and return (parts[0], parts[1]). Update the logic around the
rsplit usage (the block handling ":" in name) to perform this extra "/" check
before deciding to return the split as repository and tag.
- Around line 215-218: The current size selection uses "or" chaining which
treats 0 as falsy and wrongfully falls through; in the loop over layer_count
(using layers_by_id and total_size) replace the or-chained lookup for
"diff-size" with a None-sentinel check: fetch diff = layer.get("diff-size",
None) and if diff is None use layer.get("uncompress_size", 0), assigning the
resulting integer to size before adding to total_size; apply the same
None-sentinel change to the similar lookup at the other occurrence (line 246) so
explicit zero sizes are preserved.
---
Duplicate comments:
In `@ansible/roles/common/tasks/files/crio-df.py`:
- Around line 59-95: The function format_time_ago currently catches all
exceptions with "except Exception" and the reviewer approves this approach, so
leave format_time_ago unchanged; do not revert to a bare except or add more
granular exception handling here.
- Around line 284-335: The updated verbose and summary printing blocks no longer
contain the previous f-string placeholder/l ambiguity and look correct; no code
changes required—keep the current print blocks (the verbose branch that iterates
over image_data, volatile_layers and comp_per_image and the summary branch that
uses total_size, total_reclaimable and reclaimable_pct) and ensure format_size
and the variables layer_count/comp_deduplicated remain defined upstream so these
print lines (including the compression ratio expression total_size /
comp_deduplicated) do not raise runtime errors.
In `@ansible/roles/common/tasks/files/prom-network-query.py`:
- Around line 154-155: The helper fmt_ts uses datetime.fromtimestamp(epoch)
which yields local time; change it to produce UTC timestamps instead by using a
UTC-aware conversion (e.g., datetime.utcfromtimestamp or datetime.fromtimestamp
with timezone.utc) so that epoch values are formatted consistently with the rest
of the script; update the fmt_ts function to convert the epoch to UTC before
calling strftime("%Y-%m-%d %H:%M:%S").
- Around line 275-276: The code crashes when converting Prometheus
scientific-notation strings because it does val = int(val) in the loop "for ts,
val in values"; change the conversion to first parse as a float (e.g.
int(float(val))) and wrap it in a small try/except to catch ValueError/TypeError
so malformed values are logged/skipped instead of crashing; update the
conversion at the "for ts, val in values" site and ensure any downstream logic
expects an int after this safe conversion.
In `@ansible/roles/microshift-start/tasks/main.yml`:
- Around line 171-182: The task "run prometheus network query" uses a relative
script path "roles/common/tasks/files/prom-network-query.py" which is fragile;
update the task so the controller runs an explicit absolute path or a copied
temp file: either change the command to use "{{ playbook_dir
}}/roles/common/tasks/files/prom-network-query.py" (keep delegate_to: localhost
and register: prom_network) or add a prior step that uses ansible.builtin.copy
(or fetch) to copy that script to a controller temp location (e.g.,
/tmp/prom-network-query.py) and run the copied file from that absolute path in
the ansible.builtin.command invocation.
- Around line 149-169: The prom_network_endpoint resolution currently falls back
to a hardcoded 'zen3' when there is no logging group and no
prometheus_query_endpoint; replace that hardcoded 'zen3' fallback in the
prom_network_endpoint expression with a safe alternative (for example
inventory_hostname or an explicit failure) so the playbook does not silently
produce an unusable URL — update the logic around prom_network_endpoint (the
block referencing prometheus_query_endpoint, groups['logging'], and 'zen3') to
either use inventory_hostname (or ansible_default_ipv4.address) as the default
host or call ansible.builtin.fail with a clear message requiring
prometheus_query_endpoint when logging group is absent.
---
Nitpick comments:
In `@ansible/roles/common/tasks/files/prom-network-query.py`:
- Around line 296-309: The summary currently sets "rx_bytes" and "tx_bytes" to 0
when rx_increase or tx_increase is None, which hides "no data" vs "zero
transfer"; update the summary construction (the dict assigned to summary) to set
"rx_bytes": int(rx_increase) if rx_increase is not None else None and
"tx_bytes": int(tx_increase) if tx_increase is not None else None so that
json.dumps will emit JSON null for missing data; keep all other fields and
behavior the same.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (4)
ansible/roles/common/tasks/files/crio-df.pyansible/roles/common/tasks/files/prom-network-query.pyansible/roles/microshift-start/defaults/main.ymlansible/roles/microshift-start/tasks/main.yml
💤 Files with no reviewable changes (1)
- ansible/roles/microshift-start/defaults/main.yml
907298e to
6989217
Compare
6989217 to
47823b5
Compare
|
/lgtm |
|
@ggiguash: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ggiguash, sjug The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test verify-deps |
|
@sjug: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary by CodeRabbit
New Features
Improvements
Chores