Skip to content

HYPERFLEET-774 - feat: update aggregation logic#82

Open
rh-amarin wants to merge 1 commit intoopenshift-hyperfleet:mainfrom
rh-amarin:HYPERFLEET-774
Open

HYPERFLEET-774 - feat: update aggregation logic#82
rh-amarin wants to merge 1 commit intoopenshift-hyperfleet:mainfrom
rh-amarin:HYPERFLEET-774

Conversation

@rh-amarin
Copy link
Contributor

@rh-amarin rh-amarin commented Mar 17, 2026

https://redhat.atlassian.net/browse/HYPERFLEET-774

Important for review:

  • The logic for aggregation is explained in api-resources.md in the section named "Aggregation logic"

  • I added some external black test scripts as curl scripts to tests manually the state changes

    • All should also be tested within the golang tests, we may want to remove these e2e-curl tests at a later stage
    • This was the only meaningful way I could find to test "the real thing" after so many changes
  • Corrects LastUpdatedTime semantics for the Ready and Available synthetic conditions. Previously, LastUpdatedTime was set uniformly to time.Now() regardless of the event. Now it is
    computed from the actual adapter report times:

    • Ready=True: min(LastReportTime) across all required adapters that report Available=True at the current generation.
    • Ready=False: min(LastReportTime) across all required adapters (fallback to now if any adapter has no stored status yet); True→False transitions override this with the triggering
      adapter's observed_time.
    • Available: min(LastReportTime) across all required adapters at the relevant observed generation.
    • Drops Available=Unknown on all reports (including the first). Previously the first report with Available=Unknown was stored but skipped aggregation; now it is discarded outright (spec
      §2 P3). This removes a latent inconsistency where an unknown first status could linger without ever being recomputed.
    • Adds spec §2 P1 enforcement: reports with observed_generation > resource.generation are discarded before upserting. This prevents phantom statuses arriving ahead of the resource's own
      generation.
    • Refactors ProcessAdapterStatus to apply pre-processing rules in the order defined by the spec (stale → P1 → P2 → P3) and always trigger aggregation after a valid upsert, passing the
      adapter's observed_time through to the synthetic condition builders.
    • Adds env-var aliases for required adapters config (HYPERFLEET_CLUSTER_ADAPTERS / HYPERFLEET_NODEPOOL_ADAPTERS) as convenience aliases for the canonical
      HYPERFLEET_ADAPTERS_REQUIRED_CLUSTER / HYPERFLEET_ADAPTERS_REQUIRED_NODEPOOL, with first-wins precedence.
    • Comprehensive test coverage: 847-line status_aggregation_test.go covering all matrix cells from the spec; new TestProcessAdapterStatus_FutureObservedGeneration and updated Unknown
      tests; 12 new e2e-curl scenario scripts covering initial state through stable-false and unknown transitions.
    • Documentation: updated docs/api-resources.md with the full aggregation logic spec including discard/reject rules and lifecycle event tables; updated pkg/services/CLAUDE.md with
      computeReadyLastUpdated semantics.

    Test plan

    • make verify-all — vet, fmt, lint, unit tests
    • make test-integration — integration tests with testcontainers
    • test/e2e-curl/run_all.sh — all 12 e2e curl scenarios pass against a running server

Summary by CodeRabbit

  • New Features

    • Enhanced status aggregation semantics for Available and Ready across generations and lifecycle events.
  • Bug Fixes

    • Unknown adapter statuses are now discarded (not stored); stale and future-generation reports are filtered to avoid incorrect state changes.
    • More precise computation and preservation of last_updated_time, last_transition_time, and observed_generation.
  • Documentation

    • Expanded aggregation rules and condition semantics.
  • Tests

    • Added comprehensive unit and 12-scenario end-to-end test suite validating aggregation behavior.

@openshift-ci openshift-ci bot requested review from aredenba-rh and jsell-rh March 17, 2026 06:57
@openshift-ci
Copy link

openshift-ci bot commented Mar 17, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yasun1 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link

coderabbitai bot commented Mar 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR overhauls status aggregation and its validation/lifecycle semantics. Key factual changes: extensive docs updates describing aggregation rules; config loader env-var parsing refinements; cluster and nodepool services now delegate status recomputation to internal updaters that accept observedTime and isLifecycleChange; ProcessAdapterStatus enforces pre-checks (generation staleness, mandatory conditions, Unknown Available discard), upserts adapter statuses, and triggers aggregation; status_aggregation was refactored to snapshot-driven Available computation, context-aware ComputeReadyCondition and BuildSyntheticConditions signatures, and many internal helpers; tests and new e2e curl scripts expanded to cover rules and edge cases. ComputeAvailableCondition was removed; UpdateNodePoolStatusFromAdapters was added.

Sequence Diagram

sequenceDiagram
    actor Adapter
    participant API
    participant Service as Cluster/NodePool Service
    participant Agg as Status Aggregation
    participant DB as Database
    participant Resource

    Adapter->>API: POST adapter status (generation, Available, conditions)
    API->>Service: ProcessAdapterStatus(ctx, adapterStatus)

    alt Pre-processing validation
        Service->>Service: Check stale generation (P1)
        Service->>Service: Validate mandatory conditions (P2)
        Service->>Service: Reject Unknown Available (P3)
    end

    alt Discarded
        Service-->>API: HTTP 204 No Content
        API-->>Adapter: 204
    else Accepted
        Service->>DB: Upsert adapter status
        Service->>Service: update...StatusFromAdapters(ctx, id, observedTime, isLifecycleChange)
        Service->>Agg: BuildSyntheticConditions(ctx, existingConditions, adapterStatuses, requiredAdapters, resourceGen, now, observedTime, isLifecycleChange)
        Agg-->>Service: (Available, Ready synthetic conditions)
        Service->>DB: Persist resource conditions
        Service->>Resource: Update in-memory/resource view
        Service-->>API: Return updated resource (201/200)
        API-->>Adapter: 201/200
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 74.12% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'HYPERFLEET-774 - feat: update aggregation logic' clearly and concisely summarizes the main change: a feature update to aggregation logic with a ticket reference.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use OpenGrep to find security vulnerabilities and bugs across 17+ programming languages.

OpenGrep is compatible with Semgrep configurations. Add an opengrep.yml or semgrep.yml configuration file to your project to enable OpenGrep analysis.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (2)
pkg/config/loader.go (1)

243-257: Warn when canonical and alias env vars conflict.

If both env vars are set with different non-empty values, current behavior silently picks the first one. Adding a warning would reduce misconfiguration risk and improve operability.

Suggested adjustment
 	setFromEnvVars := func(viperKey string, envVars []string) error {
+		if len(envVars) >= 2 {
+			primary := os.Getenv(envVars[0])
+			alias := os.Getenv(envVars[1])
+			if primary != "" && alias != "" && primary != alias {
+				logger.With(ctx, "viper_key", viperKey, "primary_env", envVars[0], "alias_env", envVars[1]).
+					Warn("Conflicting env vars detected; using primary env var by precedence")
+			}
+		}
 		for _, envVar := range envVars {
 			jsonValue := os.Getenv(envVar)
 			if jsonValue == "" {
 				continue
 			}

As per coding guidelines, “Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/config/loader.go` around lines 243 - 257, In setFromEnvVars, detect when
multiple envVars in the provided slice are set to different non-empty JSON
values before returning: iterate all envVars, collect non-empty jsonValue
strings, parse each into arrayValue (reuse the existing json.Unmarshal logic or
parse first then compare raw strings), if more than one distinct value exists
emit a warning via logger.With(ctx, "env_vars", envVars, "values",
<list-of-values>).Warn("Conflicting environment variables for same config key;
using first value"), then proceed to set viper.Set(viperKey, arrayValue) with
the value chosen (first in the slice) and continue returning as before;
reference setFromEnvVars, viper.Set and logger.With to locate where to add the
conflict check and warning.
test/e2e-curl/12-unknown-first/test.sh (1)

82-90: Re-assert “not stored” after the second Unknown POST.

Right now the second event only asserts HTTP 204. Add a second /statuses count check to guard against regressions where subsequent Unknown reports are accidentally persisted.

Proposed hardening
 # ── Event 2: second Unknown report ────────────────────────────────────────────
 echo ""
 log_step "POST ${ADAPTER2}@gen1=Unknown  (second report — also discarded)"
 CODE2=$(post_adapter_status "$CLUSTER_ID" "$ADAPTER2" 1 "Unknown")

 echo ""
 echo "  API response (second Unknown also discarded):"
 assert_http "second Unknown discarded (204)" "204" "$CODE2"
+
+log_step "GET /statuses — adapter record must still NOT exist"
+STATUSES2=$(get_statuses "$CLUSTER_ID")
+VAL_COUNT2=$(count_adapter_status "$STATUSES2" "$ADAPTER2")
+assert_eq "${ADAPTER2} status record still NOT stored" "0" "$VAL_COUNT2"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-curl/12-unknown-first/test.sh` around lines 82 - 90, After posting
the second Unknown (stored in CODE2 via post_adapter_status), add a repeat check
against the /statuses collection to assert the count did not change (i.e., the
Unknown was not persisted). Concretely, immediately after the existing
assert_http "second Unknown discarded (204)" call, perform the same GET
/statuses count assertion used earlier in the script (the helper that verifies
number of statuses) to ensure the total count remains unchanged for
CLUSTER_ID/ADAPTER2.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/api-resources.md`:
- Line 553: The documented contract for last_updated_time is outdated and
misleading: update the docs so last_updated_time for the Available synthetic
condition is derived from adapter report times (use min(last_report_time) of
reporting adapters or the triggering observed_time on a True→False transition),
and for Ready=False set last_updated_time to min(last_report_time) across
required adapters with a fallback to now when no reports exist; reference the
fields last_updated_time, Available, Ready, last_report_time, observed_time and
describe those exact rules in place of the old sentence.

In `@pkg/services/CLAUDE.md`:
- Around line 24-30: The documentation is outdated:
UpdateClusterStatusFromAdapters no longer computes Available by taking a minimum
across mixed generations—modify the text to state that Available only updates
when all required adapters report the same observed generation (i.e., it
requires a shared observedGeneration), and remove the sentence about using
minimum observed generation; also update the description of ProcessAdapterStatus
to reflect that Available=Unknown is discarded on every report (not allowed even
on the first report); finally, adjust the note about Ready/LastUpdatedTime in
status_aggregation.computeReadyLastUpdated only if any behavior around
generation alignment or Available calculation changed there to reference the new
generation-alignment rule.

In `@pkg/services/cluster.go`:
- Around line 286-324: The code applies P2/P3 condition validation before P1
(observed_generation) which causes reports to a missing cluster to be swallowed
instead of returning NotFound; move the cluster fetch and the
observed-generation check (the P1 logic using s.clusterDao.Get and the
adapterStatus.ObservedGeneration > cluster.Generation comparison) up so it runs
immediately after unmarshalling adapterStatus.Conditions (or even before
condition validation), and only proceed to ValidateMandatoryConditions and the
Available==Unknown loop after confirming the cluster exists and
observed_generation is not ahead; update references to adapterStatus,
ValidateMandatoryConditions, s.clusterDao.Get, and the ObservedGeneration check
accordingly.
- Around line 332-336: The code currently sets observedTime from
upsertedStatus.LastReportTime; change it to prefer the adapter-reported
observed_time field instead. In the "Trigger aggregation" block, set
observedTime from upsertedStatus.ObservedTime (if non-nil), otherwise fall back
to upsertedStatus.LastReportTime, then time.Now(); replace the current
observedTime assignment that references upsertedStatus.LastReportTime with this
new preference so aggregation uses the adapter's reported observed_time for
transition timestamps.

In `@pkg/services/node_pool.go`:
- Around line 337-340: The aggregation error from
updateNodePoolStatusFromAdapters is only logged and not returned, letting the
caller see success while synthesized conditions remain stale; change the method
to propagate the failure instead of masking it: after calling
updateNodePoolStatusFromAdapters (symbol: updateNodePoolStatusFromAdapters)
check aggregateErr and return it (or a wrapped error) rather than just logging
via logger.With(...).WithError(aggregateErr).Warn(...); ensure the surrounding
Upsert/handler method returns the error so callers observe the failure and can
retry or surface it.
- Around line 287-324: The code performs P2/P3 checks before P1; move the node
pool fetch and the observed-generation check (the logic using s.nodePoolDao.Get
/ nodePool and the comparison adapterStatus.ObservedGeneration >
nodePool.Generation) to run before condition validation so P1 is applied prior
to P2/P3; specifically, call s.nodePoolDao.Get(ctx, nodePoolID) and handle
errors via handleGetError(...) and then perform the observed_generation vs
nodePool.Generation check (logging and returning nil) before invoking
ValidateMandatoryConditions(conditions) or iterating conditions for
Available==Unknown, ensuring malformed or deleted-node-pool reports return
NotFound instead of being silently discarded.
- Around line 332-336: The code currently sets observedTime from
upsertedStatus.LastReportTime, but you must use the adapter's reported
observed_time instead; change the assignment to pull the adapter's observed_time
field from upsertedStatus (e.g., upsertedStatus.ObservedTime or the nested
status field that contains the adapter's observed_time) and only fall back to
LastReportTime or time.Now when that adapter-reported observed_time is nil,
keeping the variable name observedTime and the same fallback behavior.

In `@test/e2e-curl/common.sh`:
- Around line 11-14: The default adapter names in the helper are inconsistent
with the documented test setup; update the ADAPTER1/ADAPTER2 defaults so they
match the documented server adapters (use ADAPTER1 default "dns" and ADAPTER2
default "validation") so that run_all.sh posts the correct adapter names and
aligns with HYPERFLEET_CLUSTER_ADAPTERS; locate the ADAPTER1 and ADAPTER2
assignments in the common helper (the lines setting
ADAPTER1="${ADAPTER1:-adapter1}" and ADAPTER2="${ADAPTER2:-adapter2}") and
change their fallback values accordingly.

In `@test/e2e-curl/README.md`:
- Around line 55-62: The fenced code block in test/e2e-curl/README.md is missing
a language tag (MD040); update the opening fence to include a language like
"text" so the block becomes ```text and keep the contents unchanged; locate the
unlabeled fence in the e2e-curl directory structure example in README.md and add
the language tag to the opening triple backticks.

In `@test/e2e-curl/run_all.sh`:
- Around line 42-50: When applying filter prefixes to TESTS (the SELECTED
population logic in the for-loop that builds SELECTED from TESTS), detect the
case where no tests matched the provided filters and exit non-zero with an error
message instead of proceeding and reporting 0/0 success; after the selection
loop that fills SELECTED, check if SELECTED is empty and print a clear error
like "no tests matched filters: $@" and exit with a failure code (e.g., exit 2).
Apply the same empty-SELECTED check and failure behavior to the duplicate
selection block used later (the other SELECTED population around the second
filter-handling section).

---

Nitpick comments:
In `@pkg/config/loader.go`:
- Around line 243-257: In setFromEnvVars, detect when multiple envVars in the
provided slice are set to different non-empty JSON values before returning:
iterate all envVars, collect non-empty jsonValue strings, parse each into
arrayValue (reuse the existing json.Unmarshal logic or parse first then compare
raw strings), if more than one distinct value exists emit a warning via
logger.With(ctx, "env_vars", envVars, "values",
<list-of-values>).Warn("Conflicting environment variables for same config key;
using first value"), then proceed to set viper.Set(viperKey, arrayValue) with
the value chosen (first in the slice) and continue returning as before;
reference setFromEnvVars, viper.Set and logger.With to locate where to add the
conflict check and warning.

In `@test/e2e-curl/12-unknown-first/test.sh`:
- Around line 82-90: After posting the second Unknown (stored in CODE2 via
post_adapter_status), add a repeat check against the /statuses collection to
assert the count did not change (i.e., the Unknown was not persisted).
Concretely, immediately after the existing assert_http "second Unknown discarded
(204)" call, perform the same GET /statuses count assertion used earlier in the
script (the helper that verifies number of statuses) to ensure the total count
remains unchanged for CLUSTER_ID/ADAPTER2.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 78db7f36-6f41-4908-bfde-4dce5d34730b

📥 Commits

Reviewing files that changed from the base of the PR and between 77fe53d and 2d77e35.

📒 Files selected for processing (25)
  • docs/api-resources.md
  • pkg/config/loader.go
  • pkg/services/CLAUDE.md
  • pkg/services/cluster.go
  • pkg/services/cluster_test.go
  • pkg/services/node_pool.go
  • pkg/services/node_pool_test.go
  • pkg/services/status_aggregation.go
  • pkg/services/status_aggregation_test.go
  • test/e2e-curl/01-initial-state/test.sh
  • test/e2e-curl/02-partial-adapters/test.sh
  • test/e2e-curl/03-all-adapters-ready/test.sh
  • test/e2e-curl/04-generation-bump/test.sh
  • test/e2e-curl/05-mixed-generations/test.sh
  • test/e2e-curl/06-stale-report/test.sh
  • test/e2e-curl/07-all-adapters-new-gen/test.sh
  • test/e2e-curl/08-adapter-goes-false/test.sh
  • test/e2e-curl/09-stable-true/test.sh
  • test/e2e-curl/10-stable-false/test.sh
  • test/e2e-curl/11-unknown-subsequent/test.sh
  • test/e2e-curl/12-unknown-first/test.sh
  • test/e2e-curl/README.md
  • test/e2e-curl/common.sh
  • test/e2e-curl/run_all.sh
  • test/integration/adapter_status_test.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (10)
docs/api-resources.md (1)

553-553: ⚠️ Potential issue | 🟠 Major

last_updated_time contract is still outdated for synthetic conditions.

This line still says Available always uses evaluation time and Ready=False uses evaluation time, which contradicts the updated aggregation semantics. Update this sentence to match the current rules (report-time-based with transition/fallback behavior).

As per coding guidelines, "**: Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/api-resources.md` at line 553, Update the `last_updated_time` sentence
to reflect report-time-based semantics: state that `last_updated_time` is
primarily driven by adapters' report times (`last_report_time`) rather than
always using evaluation time; for `Available` use the adapter's
`last_report_time` when present (fall back to evaluation time only if no
report-time exists), and for `Ready` when Ready=True use the minimum
`last_report_time` across all required adapters reporting `Available=True` at
the current generation, while when Ready=False use the report-time that caused
the Ready transition (or fall back to evaluation time if no such report exists).
pkg/services/node_pool.go (3)

337-340: ⚠️ Potential issue | 🟠 Major

Don't return success when aggregation fails.

If Upsert succeeds and updateNodePoolStatusFromAdapters(...) fails, Lines 337-340 only log a warning and still return success. That leaves the stored adapter status and the synthetic node-pool conditions out of sync until another write happens.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/node_pool.go` around lines 337 - 340, The current logic logs a
warning when updateNodePoolStatusFromAdapters(ctx, nodePoolID, observedTime,
false) returns an error but still treats the overall operation as successful;
change this so that when updateNodePoolStatusFromAdapters returns a non-nil
error the caller returns that error (or wraps it) instead of proceeding as
success. Locate the call to updateNodePoolStatusFromAdapters in the same
function (after the Upsert) and propagate aggregateErr upward (e.g., return
aggregateErr or fmt.Errorf("...: %w", aggregateErr)) so the Upsert result is not
reported as successful when aggregation fails, preserving consistency between
stored adapter status and synthetic node-pool conditions.

332-336: ⚠️ Potential issue | 🟠 Major

Use the adapter's observed_time, not LastReportTime.

The model documents LastReportTime as the API-managed receipt time ("Updated on every POST"). Using it on Lines 333-335 shifts synthetic Available/Ready timestamps for delayed or retried reports. Pass the request's observed_time through here and only fall back to LastReportTime / time.Now() when it is missing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/node_pool.go` around lines 332 - 336, The code currently sets
observedTime from upsertedStatus.LastReportTime; instead use the
adapter/request-provided observed_time first (the adapter's observed_time field
passed into the handler) and only fall back to upsertedStatus.LastReportTime and
then time.Now() if the request observed_time is nil/missing; update the
observedTime assignment in pkg/services/node_pool.go (where observedTime is
defined and upsertedStatus is referenced) to read the incoming request's
observed_time value (or its wrapper/parameter) before falling back to
upsertedStatus.LastReportTime and time.Now().

287-324: ⚠️ Potential issue | 🟠 Major

Run P1 before P2/P3.

The docstring on Lines 263-267 says stale → P1 → P2 → P3, but Lines 287-315 still parse and apply P2/P3 before s.nodePoolDao.Get(...). A malformed or Available=Unknown report for a deleted node pool is swallowed before handleGetError(...) can return NotFound, and future-generation reports are checked later than the spec requires.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/node_pool.go` around lines 287 - 324, The P1 observed_generation
check must run before P2/P3; fetch the NodePool via s.nodePoolDao.Get(ctx,
nodePoolID) and perform the observed_generation comparison (using
adapterStatus.ObservedGeneration vs nodePool.Generation and returning
handleGetError on Get errors) before calling
ValidateMandatoryConditions(conditions) or iterating conditions for
Available==Unknown; keep the existing json.Unmarshal of adapterStatus.Conditions
if needed but relocate the nodePool fetch and the if
adapterStatus.ObservedGeneration > nodePool.Generation check to occur prior to
ValidateMandatoryConditions and the Available-Unknown loop.
pkg/services/cluster.go (3)

337-340: ⚠️ Potential issue | 🟠 Major

Don't return success when aggregation fails.

If Upsert succeeds and updateClusterStatusFromAdapters(...) fails, Lines 337-340 only log a warning and still return success. That leaves the stored adapter status and the synthetic cluster conditions out of sync until another write happens.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/cluster.go` around lines 337 - 340, The call to
s.updateClusterStatusFromAdapters(ctx, clusterID, observedTime, false) currently
only logs a warning on error, causing Upsert to return success despite
aggregation failure; change this so that when aggregateErr != nil you
propagate/return an error (e.g., wrap aggregateErr with context like "failed to
aggregate cluster status for clusterID") from the enclosing function instead of
only logging, using the same ctx via logger.WithClusterID and logger.WithError
to preserve logs; locate the call to updateClusterStatusFromAdapters and replace
the warning-only branch with a return of the wrapped aggregateErr so the caller
sees the failure.

332-336: ⚠️ Potential issue | 🟠 Major

Use the adapter's observed_time, not LastReportTime.

The model documents LastReportTime as the API-managed receipt time ("Updated on every POST"). Using it on Lines 333-335 shifts synthetic Available/Ready timestamps for delayed or retried reports. Pass the request's observed_time through here and only fall back to LastReportTime / time.Now() when it is missing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/cluster.go` around lines 332 - 336, The code currently sets
observedTime from upsertedStatus.LastReportTime; instead prefer the adapter's
reported observed_time and only fall back to LastReportTime or time.Now() if
that field is missing. Update the observedTime assignment in
pkg/services/cluster.go (the observedTime variable) to first use the incoming
report/request observed_time (e.g., report.ObservedTime or req.ObservedTime from
the handler), then if nil use upsertedStatus.LastReportTime, and finally
time.Now(); keep the variable name observedTime and the existing fallbacks order
as described.

286-324: ⚠️ Potential issue | 🟠 Major

Run P1 before P2/P3.

The docstring on Lines 262-269 says stale → P1 → P2 → P3, but Lines 286-314 still parse and apply P2/P3 before s.clusterDao.Get(...). A malformed or Available=Unknown report for a deleted cluster is swallowed before handleGetError(...) can return NotFound, and future-generation reports are checked later than the spec requires.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/cluster.go` around lines 286 - 324, Unmarshal conditions stays,
but perform P1 (fetch resource and check observed_generation) immediately after
unmarshalling and before calling ValidateMandatoryConditions/Available checks:
call s.clusterDao.Get(ctx, clusterID), handle errors with
handleGetError("Cluster","id",clusterID,err), then if
adapterStatus.ObservedGeneration > cluster.Generation log and return nil (the
existing observed_generation check using adapterStatus.ObservedGeneration and
cluster.Generation); only after that run ValidateMandatoryConditions(conditions)
and the loop checking api.ConditionTypeAvailable == api.AdapterConditionUnknown.
test/e2e-curl/common.sh (1)

11-14: ⚠️ Potential issue | 🟠 Major

Align the defaults with the documented adapters.

Lines 13-14 still default to adapter1/adapter2, but this helper says they must match the server's required adapters and the header documents dns/validation. run_all.sh against the documented setup will post the wrong adapter names and never satisfy the required set.

Suggested fix
-ADAPTER1="${ADAPTER1:-adapter1}"
-ADAPTER2="${ADAPTER2:-adapter2}"
+ADAPTER1="${ADAPTER1:-dns}"
+ADAPTER2="${ADAPTER2:-validation}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-curl/common.sh` around lines 11 - 14, The default adapter names
ADAPTER1 and ADAPTER2 in test/e2e-curl/common.sh currently use
"adapter1"/"adapter2" which mismatch the documented required adapters; update
the default assignments ADAPTER1="${ADAPTER1:-...}" and
ADAPTER2="${ADAPTER2:-...}" to use "dns" and "validation" respectively so the
helper aligns with the server's HYPERFLEET_CLUSTER_ADAPTERS expectations while
preserving the ability to override via env vars.
test/e2e-curl/README.md (1)

55-62: ⚠️ Potential issue | 🟡 Minor

Add a language tag to the fenced code block (MD040).

Line [55] starts an unlabeled fence; markdownlint will keep warning until it is tagged.

Proposed fix
-```
+```text
 e2e-curl/
   common.sh          # Shared helpers: API wrappers, assertions, logging
   run_all.sh         # Suite runner with per-test pass/fail tracking
   01-initial-state/
     test.sh
   ...
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @test/e2e-curl/README.md around lines 55 - 62, The unlabeled fenced code
block in README.md should include a language tag to satisfy markdownlint MD040;
change the triple-backtick fence that precedes the directory listing (the code
block showing "e2e-curl/ common.sh run_all.sh 01-initial-state/ test.sh ...") to
include an appropriate language tag such as text or bash (e.g., ```text or

test/e2e-curl/run_all.sh (1)

42-50: ⚠️ Potential issue | 🟠 Major

Fail fast when no tests match provided filters.

If filters match nothing, the runner reports success with 0/0, which can mask misconfigured executions.

Proposed fix
 SELECTED=()
 if [ $# -gt 0 ]; then
   for filter in "$@"; do
     for t in "${TESTS[@]}"; do
       [[ "$t" == "${filter}"* ]] && SELECTED+=("$t")
     done
   done
 else
   SELECTED=("${TESTS[@]}")
 fi
+
+if [ "${`#SELECTED`[@]}" -eq 0 ]; then
+  echo -e "${RED}ERROR${NC}: no tests matched filters: $*"
+  echo "Available tests:"
+  for t in "${TESTS[@]}"; do
+    echo "  - $t"
+  done
+  exit 2
+fi

Also applies to: 81-86

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-curl/run_all.sh` around lines 42 - 50, The runner currently silently
succeeds when no tests match filters because SELECTED remains empty; after the
filtering block (where TESTS is iterated and SELECTED is populated) add a check
that if SELECTED is empty you print a clear error like "No tests match provided
filters" and exit with a non-zero status (e.g., exit 1) so the CI fails fast;
apply the same empty-SELECTED check in the alternate filtering section mentioned
(lines 81-86) to cover both code paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/hyperfleet-api/migrate/cmd.go`:
- Around line 61-66: The migrate command currently calls
db.Migrate(connection.New(ctx)) directly which can race concurrent invocations;
wrap the migration call with a DB-backed lock (e.g., an advisory lock or a
dedicated migrations lock table) so only one process runs migrations at a time:
obtain the lock before calling db.Migrate, fail gracefully or wait with timeout
if the lock cannot be acquired, defer releasing the lock after migrations
complete (or on error), and log lock acquisition/failure; add or reuse a helper
like AcquireMigrationLock/ReleaseMigrationLock (or implement this logic in
pkg/db/migrations.go) and call it from the migrate command around db.Migrate so
migrations are serialized.

In `@docs/api-resources.md`:
- Around line 312-313: The table's Creation rows use observed_generation = 1
which conflicts with the Cluster and NodePool examples that show
observed_generation: 0; pick one source of truth and make them consistent:
either change the Creation table rows (the two "Creation" entries) to set
observed_generation (obs_gen) to 0 to match the examples, or update the Cluster
and NodePool example initial synthetic conditions to use observed_generation: 1;
update all instances of observed_generation/obs_gen (in the table and in the
Cluster/NodePool examples) so they match and add no other semantic changes.
- Around line 287-343: The aggregation matrix currently derives lut from
statuses[].lut (last_update_time); update the docs so lut is explicitly defined
as the adapter's last_report_time (replace all occurrences of statuses[].lut
with adapter last_report_time), and adjust each matrix rule: use observed_time
for synthetic last_report_time when a True→False transition is triggered, and
use now as the fallback last_report_time for Ready=False when any required
adapter lacks a stored status; update the notation lines (lut/last_update_time)
and all rows in the "Adapter Report Aggregation Matrix" and "Lifecycle Events"
tables to reflect this new source.

---

Duplicate comments:
In `@docs/api-resources.md`:
- Line 553: Update the `last_updated_time` sentence to reflect report-time-based
semantics: state that `last_updated_time` is primarily driven by adapters'
report times (`last_report_time`) rather than always using evaluation time; for
`Available` use the adapter's `last_report_time` when present (fall back to
evaluation time only if no report-time exists), and for `Ready` when Ready=True
use the minimum `last_report_time` across all required adapters reporting
`Available=True` at the current generation, while when Ready=False use the
report-time that caused the Ready transition (or fall back to evaluation time if
no such report exists).

In `@pkg/services/cluster.go`:
- Around line 337-340: The call to s.updateClusterStatusFromAdapters(ctx,
clusterID, observedTime, false) currently only logs a warning on error, causing
Upsert to return success despite aggregation failure; change this so that when
aggregateErr != nil you propagate/return an error (e.g., wrap aggregateErr with
context like "failed to aggregate cluster status for clusterID") from the
enclosing function instead of only logging, using the same ctx via
logger.WithClusterID and logger.WithError to preserve logs; locate the call to
updateClusterStatusFromAdapters and replace the warning-only branch with a
return of the wrapped aggregateErr so the caller sees the failure.
- Around line 332-336: The code currently sets observedTime from
upsertedStatus.LastReportTime; instead prefer the adapter's reported
observed_time and only fall back to LastReportTime or time.Now() if that field
is missing. Update the observedTime assignment in pkg/services/cluster.go (the
observedTime variable) to first use the incoming report/request observed_time
(e.g., report.ObservedTime or req.ObservedTime from the handler), then if nil
use upsertedStatus.LastReportTime, and finally time.Now(); keep the variable
name observedTime and the existing fallbacks order as described.
- Around line 286-324: Unmarshal conditions stays, but perform P1 (fetch
resource and check observed_generation) immediately after unmarshalling and
before calling ValidateMandatoryConditions/Available checks: call
s.clusterDao.Get(ctx, clusterID), handle errors with
handleGetError("Cluster","id",clusterID,err), then if
adapterStatus.ObservedGeneration > cluster.Generation log and return nil (the
existing observed_generation check using adapterStatus.ObservedGeneration and
cluster.Generation); only after that run ValidateMandatoryConditions(conditions)
and the loop checking api.ConditionTypeAvailable == api.AdapterConditionUnknown.

In `@pkg/services/node_pool.go`:
- Around line 337-340: The current logic logs a warning when
updateNodePoolStatusFromAdapters(ctx, nodePoolID, observedTime, false) returns
an error but still treats the overall operation as successful; change this so
that when updateNodePoolStatusFromAdapters returns a non-nil error the caller
returns that error (or wraps it) instead of proceeding as success. Locate the
call to updateNodePoolStatusFromAdapters in the same function (after the Upsert)
and propagate aggregateErr upward (e.g., return aggregateErr or fmt.Errorf("...:
%w", aggregateErr)) so the Upsert result is not reported as successful when
aggregation fails, preserving consistency between stored adapter status and
synthetic node-pool conditions.
- Around line 332-336: The code currently sets observedTime from
upsertedStatus.LastReportTime; instead use the adapter/request-provided
observed_time first (the adapter's observed_time field passed into the handler)
and only fall back to upsertedStatus.LastReportTime and then time.Now() if the
request observed_time is nil/missing; update the observedTime assignment in
pkg/services/node_pool.go (where observedTime is defined and upsertedStatus is
referenced) to read the incoming request's observed_time value (or its
wrapper/parameter) before falling back to upsertedStatus.LastReportTime and
time.Now().
- Around line 287-324: The P1 observed_generation check must run before P2/P3;
fetch the NodePool via s.nodePoolDao.Get(ctx, nodePoolID) and perform the
observed_generation comparison (using adapterStatus.ObservedGeneration vs
nodePool.Generation and returning handleGetError on Get errors) before calling
ValidateMandatoryConditions(conditions) or iterating conditions for
Available==Unknown; keep the existing json.Unmarshal of adapterStatus.Conditions
if needed but relocate the nodePool fetch and the if
adapterStatus.ObservedGeneration > nodePool.Generation check to occur prior to
ValidateMandatoryConditions and the Available-Unknown loop.

In `@test/e2e-curl/common.sh`:
- Around line 11-14: The default adapter names ADAPTER1 and ADAPTER2 in
test/e2e-curl/common.sh currently use "adapter1"/"adapter2" which mismatch the
documented required adapters; update the default assignments
ADAPTER1="${ADAPTER1:-...}" and ADAPTER2="${ADAPTER2:-...}" to use "dns" and
"validation" respectively so the helper aligns with the server's
HYPERFLEET_CLUSTER_ADAPTERS expectations while preserving the ability to
override via env vars.

In `@test/e2e-curl/README.md`:
- Around line 55-62: The unlabeled fenced code block in README.md should include
a language tag to satisfy markdownlint MD040; change the triple-backtick fence
that precedes the directory listing (the code block showing "e2e-curl/ common.sh
run_all.sh 01-initial-state/ test.sh ...") to include an appropriate language
tag such as text or bash (e.g., ```text or ```bash) so the fence is labeled and
the linter warning is resolved.

In `@test/e2e-curl/run_all.sh`:
- Around line 42-50: The runner currently silently succeeds when no tests match
filters because SELECTED remains empty; after the filtering block (where TESTS
is iterated and SELECTED is populated) add a check that if SELECTED is empty you
print a clear error like "No tests match provided filters" and exit with a
non-zero status (e.g., exit 1) so the CI fails fast; apply the same
empty-SELECTED check in the alternate filtering section mentioned (lines 81-86)
to cover both code paths.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5da2d72e-4d9a-4a2c-89e5-415cc9acd725

📥 Commits

Reviewing files that changed from the base of the PR and between 2d77e35 and f825425.

📒 Files selected for processing (38)
  • cmd/hyperfleet-api/migrate/cmd.go
  • docs/api-resources.md
  • pkg/config/db.go
  • pkg/config/loader.go
  • pkg/db/advisory_locks.go
  • pkg/db/context.go
  • pkg/db/db_session/default.go
  • pkg/db/db_session/test.go
  • pkg/db/db_session/testcontainer.go
  • pkg/db/migrations.go
  • pkg/db/mocks/session_factory.go
  • pkg/db/session.go
  • pkg/logger/fields.go
  • pkg/services/CLAUDE.md
  • pkg/services/cluster.go
  • pkg/services/cluster_test.go
  • pkg/services/node_pool.go
  • pkg/services/node_pool_test.go
  • pkg/services/status_aggregation.go
  • pkg/services/status_aggregation_test.go
  • test/e2e-curl/01-initial-state/test.sh
  • test/e2e-curl/02-partial-adapters/test.sh
  • test/e2e-curl/03-all-adapters-ready/test.sh
  • test/e2e-curl/04-generation-bump/test.sh
  • test/e2e-curl/05-mixed-generations/test.sh
  • test/e2e-curl/06-stale-report/test.sh
  • test/e2e-curl/07-all-adapters-new-gen/test.sh
  • test/e2e-curl/08-adapter-goes-false/test.sh
  • test/e2e-curl/09-stable-true/test.sh
  • test/e2e-curl/10-stable-false/test.sh
  • test/e2e-curl/11-unknown-subsequent/test.sh
  • test/e2e-curl/12-unknown-first/test.sh
  • test/e2e-curl/README.md
  • test/e2e-curl/common.sh
  • test/e2e-curl/run_all.sh
  • test/helper.go
  • test/integration/adapter_status_test.go
  • test/integration/advisory_locks_test.go
💤 Files with no reviewable changes (11)
  • pkg/db/mocks/session_factory.go
  • pkg/db/db_session/default.go
  • pkg/db/db_session/test.go
  • pkg/db/session.go
  • pkg/db/db_session/testcontainer.go
  • pkg/db/migrations.go
  • test/integration/advisory_locks_test.go
  • pkg/db/context.go
  • pkg/db/advisory_locks.go
  • test/helper.go
  • pkg/logger/fields.go
🚧 Files skipped from review as they are similar to previous changes (9)
  • test/e2e-curl/02-partial-adapters/test.sh
  • test/e2e-curl/01-initial-state/test.sh
  • test/e2e-curl/08-adapter-goes-false/test.sh
  • pkg/services/CLAUDE.md
  • test/e2e-curl/04-generation-bump/test.sh
  • test/e2e-curl/12-unknown-first/test.sh
  • test/e2e-curl/03-all-adapters-ready/test.sh
  • pkg/services/status_aggregation_test.go
  • test/e2e-curl/10-stable-false/test.sh

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (6)
pkg/services/cluster.go (2)

337-340: ⚠️ Potential issue | 🟠 Major

Return the aggregation error instead of swallowing it.

After Line 327 stores the adapter status, a failure at Line 337 leaves the cluster’s synthetic conditions stale while this method still returns success. Please propagate aggregateErr so callers can retry or surface the partial failure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/cluster.go` around lines 337 - 340, The call to
s.updateClusterStatusFromAdapters(ctx, clusterID, observedTime, false) currently
logs aggregateErr and continues, which swallows aggregation failures; change the
flow so that if aggregateErr != nil the method returns that error (or wraps it)
instead of only logging—locate the block around the call to
updateClusterStatusFromAdapters and replace the logger-only branch with a return
of aggregateErr (use fmt.Errorf or errors.Wrap if you need context) so callers
can observe and retry on aggregation failures.

332-336: ⚠️ Potential issue | 🟠 Major

Use the adapter-reported observed_time here.

Line 334 still derives observedTime from LastReportTime, which is the API receipt timestamp. The synthetic transition rules are keyed off the adapter’s observed_time, so delayed or retried reports will stamp Ready/Available with the wrong time unless this prefers the stored observed time and only falls back when it is absent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/cluster.go` around lines 332 - 336, The code currently sets
observedTime using upsertedStatus.LastReportTime (API receipt timestamp); change
it to prefer the adapter-reported timestamp by checking
upsertedStatus.ObservedTime first, then falling back to
upsertedStatus.LastReportTime, and finally time.Now() if both are nil. Update
the observedTime assignment in the aggregation/transition block (where
observedTime is defined) to use upsertedStatus.ObservedTime ->
upsertedStatus.LastReportTime -> time.Now() order so synthetic transition rules
use the adapter's observed_time.
pkg/services/node_pool.go (2)

337-340: ⚠️ Potential issue | 🟠 Major

Return the aggregation error instead of swallowing it.

After Line 327 stores the adapter status, a failure at Line 337 leaves the node pool’s synthetic conditions stale while this method still returns success. Please propagate aggregateErr so callers can retry or surface the partial failure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/node_pool.go` around lines 337 - 340, The call to
s.updateNodePoolStatusFromAdapters currently logs aggregateErr and swallows it;
instead propagate that error to the caller so callers can retry or surface the
partial failure. Modify the function containing this snippet to return
aggregateErr when s.updateNodePoolStatusFromAdapters(ctx, nodePoolID,
observedTime, false) yields an error (i.e., propagate the aggregateErr rather
than only logging), ensuring the enclosing function's signature returns an error
and updating any callers to handle the propagated error if necessary; keep the
existing logging but return the error after logging.

332-336: ⚠️ Potential issue | 🟠 Major

Use the adapter-reported observed_time here.

Line 334 still derives observedTime from LastReportTime, which is the API receipt timestamp. The synthetic transition rules are keyed off the adapter’s observed_time, so delayed or retried reports will stamp Ready/Available with the wrong time unless this prefers the stored observed time and only falls back when it is absent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/node_pool.go` around lines 332 - 336, The code currently sets
observedTime from upsertedStatus.LastReportTime (API receipt timestamp) which is
incorrect; update the logic that sets observedTime (the variable) to prefer the
adapter-reported observed_time stored on upsertedStatus (the stored adapter
observed time field) and only fall back to upsertedStatus.LastReportTime or
time.Now() if that stored adapter observed time is nil/zero; change the
assignment near where observedTime is initialized so it first checks the adapter
observed-time field on upsertedStatus (instead of LastReportTime) before using
LastReportTime/time.Now().
docs/api-resources.md (2)

310-313: ⚠️ Potential issue | 🟠 Major

Creation obs_gen here conflicts with the examples above.

The lifecycle table initializes both synthetic conditions at obs_gen=1, while the Cluster and NodePool create examples in this file still show observed_generation: 0. Please align these to one contract.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/api-resources.md` around lines 310 - 313, The lifecycle table sets
synthetic condition column obs_gen to 1 but the Cluster and NodePool examples
still use observed_generation: 0 causing a mismatch; pick one contract and make
examples consistent (either change the table to show obs_gen=0 or update the
Cluster/NodePool examples to observed_generation: 1). Locate occurrences of
obs_gen in the lifecycle table and the examples referencing observed_generation
and update them so both use the same value (prefer updating the Cluster and
NodePool examples to observed_generation: 1 to match the lifecycle table)
ensuring all instances of obs_gen/observed_generation are aligned.

282-305: ⚠️ Potential issue | 🟠 Major

The spec still documents the old last_updated_time inputs.

These sections still drive Ready/Available from all statuses[] and statuses[].lut, but pkg/services/status_aggregation.go now computes synthetic last_updated_time from required adapters only, using adapter last_report_time and the triggering observed_time on True→False transitions. In particular, Line 553 is wrong for both synthetic conditions, and Ready=False is no longer “evaluation time” except for the fallback case.

Also applies to: 320-343, 553-553

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/api-resources.md` around lines 282 - 305, Update the docs to stop
treating statuses[].lut (last_updated_time) as the source for Ready/Available
synthetic conditions and instead describe the new computation: synthetic
last_updated_time is derived from required adapters only using each adapter's
last_report_time and, for True→False transitions, the report's observed_time as
the triggering time; state that Ready=False uses that synthetic
last_updated_time (not “evaluation time”) except in the explicit fallback case,
and remove or replace any language that says Ready/Available are driven by all
statuses[] or statuses[].lut (including the sections currently referencing
statuses[], lut, obs_time and obs_gen). Reference the aggregation logic that
computes the synthetic last_updated_time and the Ready/Available synthetic
conditions (the status aggregation implementation that synthesizes
last_updated_time from adapter last_report_time and observed_time on
transitions) when updating the text.
🧹 Nitpick comments (1)
test/e2e-curl/common.sh (1)

134-148: Let callers provide observed_time for deterministic scenarios.

This helper always stamps observed_time itself, which is why several new scenarios have to sleep 1/2 just to make timestamp assertions reliable (test/e2e-curl/05-mixed-generations/test.sh Line 49, test/e2e-curl/07-all-adapters-new-gen/test.sh Lines 35-40 and 58, test/e2e-curl/10-stable-false/test.sh Line 51, test/e2e-curl/11-unknown-subsequent/test.sh Line 49). Accepting an optional timestamp here would make the suite faster and less timing-sensitive.

♻️ Proposed refactor
 post_adapter_status() {
-  local cluster_id="$1" adapter="$2" gen="$3" available="$4"
+  local cluster_id="$1" adapter="$2" gen="$3" available="$4" observed_time="${5:-$(now_rfc3339)}"
   curl -s -o /dev/null -w "%{http_code}" \
     -X POST "$API/clusters/$cluster_id/statuses" \
     -H "Content-Type: application/json" \
     -d "{
       \"adapter\": \"$adapter\",
       \"observed_generation\": $gen,
-      \"observed_time\": \"$(now_rfc3339)\",
+      \"observed_time\": \"$observed_time\",
       \"conditions\": [
         {\"type\": \"Available\", \"status\": \"$available\", \"reason\": \"Testing\"},
         {\"type\": \"Applied\",   \"status\": \"True\",       \"reason\": \"Testing\"},
         {\"type\": \"Health\",    \"status\": \"True\",       \"reason\": \"Testing\"}
       ]
     }"
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-curl/common.sh` around lines 134 - 148, The post_adapter_status
helper currently always calls now_rfc3339 for observed_time; change
post_adapter_status to accept an optional observed_time parameter (e.g., add a
fifth parameter or make the fourth optional) and set a local variable like
observed_time="${5:-$(now_rfc3339)}"; then use that variable in the JSON payload
for "observed_time" instead of calling now_rfc3339 inline. Update callers that
need deterministic timestamps to pass their desired timestamp to
post_adapter_status (leave existing callers functioning when they omit the arg).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/services/cluster.go`:
- Around line 281-327: The stale-generation check in clusterService's status
handling is racy because the read-side check in s.clusterDao/Get +
existingStatus check is not atomic with the write in s.adapterStatusDao.Upsert;
modify adapterStatusDao.Upsert (the Upsert implementation in
pkg/dao/adapter_status.go) to perform the stale-filtering atomically by
including a WHERE predicate on observed_generation (e.g., UPDATE ... WHERE
adapter = ? AND observed_generation <= ?) or by loading-and-writing within a
single DB transaction so an older observed_generation cannot overwrite a newer
one; ensure the Upsert returns whether the write was applied (or an error) so
callers such as the code invoking s.adapterStatusDao.Upsert can detect a no-op
and avoid assuming success.

In `@test/e2e-curl/08-adapter-goes-false/test.sh`:
- Around line 31-34: Tests call post_adapter_status with hardcoded generation 1
which can mismatch actual cluster generation; change calls to use the captured
RESOURCE_GEN variable instead of literal 1. Update the two invocations of
post_adapter_status (the ones posting for ADAPTER1 and ADAPTER2) to pass
RESOURCE_GEN as the generation arg, ensuring the function calls use CLUSTER_ID,
ADAPTER1/ADAPTER2 and RESOURCE_GEN so status reports reflect the actual cluster
generation.
- Around line 31-34: The two setup POST calls using post_adapter_status (for
ADAPTER1 and ADAPTER2 with CLUSTER_ID) currently discard HTTP responses and may
hide failures; update the calls to capture the command exit status or HTTP
response and if it indicates failure, log an error including adapter and cluster
info and exit immediately (fail fast) instead of redirecting output to
/dev/null. Specifically, run post_adapter_status "$CLUSTER_ID" "$ADAPTER1" 1
"True" and post_adapter_status "$CLUSTER_ID" "$ADAPTER2" 1 "True" into a
variable or check their return code, emit a descriptive error via log_step or
echo when non-zero (include $ADAPTER1/$ADAPTER2 and $CLUSTER_ID), and call exit
1 to stop the script on failure.

In `@test/e2e-curl/10-stable-false/test.sh`:
- Around line 30-49: The gen2 setup POST for ${ADAPTER2} must be verified so the
test proves this is a re-report, not the first gen2 report: capture the result
of post_adapter_status "$CLUSTER_ID" "$ADAPTER2" 2 "True" (the second call that
currently discards output), assert it succeeded (nonzero exit should fail the
test) and then call get_cluster and inspect the returned state to confirm
ADAPTER2 already reported for generation 2 before continuing (use get_cluster
and condition_field or whatever adapter-status accessor you have to prove an
existing gen=2 entry). If the POST failed or the cluster snapshot does not show
ADAPTER2@gen2=True, fail the test so the later POST cannot be mistaken for the
first gen2 report.

In `@test/e2e-curl/11-unknown-subsequent/test.sh`:
- Around line 29-47: The second setup POST to post_adapter_status for ADAPTER1
is ignored and the test never asserts the baseline cluster is Ready/Available;
capture and assert the ADAPTER1 POST response (like CODE=$(post_adapter_status
... ) and assert_http "initial ${ADAPTER1} True accepted" "201" "$CODE") and
then assert PRE_AVAIL_STATUS == "True" and PRE_READY_STATUS == "True" (use the
test suite's assertion helper such as assert_equals or equivalent) after
computing PRE_AVAIL_STATUS and PRE_READY_STATUS to fail early if the
precondition wasn't met.

In `@test/e2e-curl/README.md`:
- Around line 17-30: The README examples for running tests assume you are in the
test/e2e-curl directory but don't state that; update the prose or commands to
make the working directory explicit: either add a line telling the user to cd
test/e2e-curl before the examples, or change the invocations to use
repo-root-relative paths (e.g., test/e2e-curl/run_all.sh and ADAPTER1=...
test/e2e-curl/run_all.sh). Make the change around the examples that reference
run_all.sh so the commands work from the repo root.

---

Duplicate comments:
In `@docs/api-resources.md`:
- Around line 310-313: The lifecycle table sets synthetic condition column
obs_gen to 1 but the Cluster and NodePool examples still use
observed_generation: 0 causing a mismatch; pick one contract and make examples
consistent (either change the table to show obs_gen=0 or update the
Cluster/NodePool examples to observed_generation: 1). Locate occurrences of
obs_gen in the lifecycle table and the examples referencing observed_generation
and update them so both use the same value (prefer updating the Cluster and
NodePool examples to observed_generation: 1 to match the lifecycle table)
ensuring all instances of obs_gen/observed_generation are aligned.
- Around line 282-305: Update the docs to stop treating statuses[].lut
(last_updated_time) as the source for Ready/Available synthetic conditions and
instead describe the new computation: synthetic last_updated_time is derived
from required adapters only using each adapter's last_report_time and, for
True→False transitions, the report's observed_time as the triggering time; state
that Ready=False uses that synthetic last_updated_time (not “evaluation time”)
except in the explicit fallback case, and remove or replace any language that
says Ready/Available are driven by all statuses[] or statuses[].lut (including
the sections currently referencing statuses[], lut, obs_time and obs_gen).
Reference the aggregation logic that computes the synthetic last_updated_time
and the Ready/Available synthetic conditions (the status aggregation
implementation that synthesizes last_updated_time from adapter last_report_time
and observed_time on transitions) when updating the text.

In `@pkg/services/cluster.go`:
- Around line 337-340: The call to s.updateClusterStatusFromAdapters(ctx,
clusterID, observedTime, false) currently logs aggregateErr and continues, which
swallows aggregation failures; change the flow so that if aggregateErr != nil
the method returns that error (or wraps it) instead of only logging—locate the
block around the call to updateClusterStatusFromAdapters and replace the
logger-only branch with a return of aggregateErr (use fmt.Errorf or errors.Wrap
if you need context) so callers can observe and retry on aggregation failures.
- Around line 332-336: The code currently sets observedTime using
upsertedStatus.LastReportTime (API receipt timestamp); change it to prefer the
adapter-reported timestamp by checking upsertedStatus.ObservedTime first, then
falling back to upsertedStatus.LastReportTime, and finally time.Now() if both
are nil. Update the observedTime assignment in the aggregation/transition block
(where observedTime is defined) to use upsertedStatus.ObservedTime ->
upsertedStatus.LastReportTime -> time.Now() order so synthetic transition rules
use the adapter's observed_time.

In `@pkg/services/node_pool.go`:
- Around line 337-340: The call to s.updateNodePoolStatusFromAdapters currently
logs aggregateErr and swallows it; instead propagate that error to the caller so
callers can retry or surface the partial failure. Modify the function containing
this snippet to return aggregateErr when s.updateNodePoolStatusFromAdapters(ctx,
nodePoolID, observedTime, false) yields an error (i.e., propagate the
aggregateErr rather than only logging), ensuring the enclosing function's
signature returns an error and updating any callers to handle the propagated
error if necessary; keep the existing logging but return the error after
logging.
- Around line 332-336: The code currently sets observedTime from
upsertedStatus.LastReportTime (API receipt timestamp) which is incorrect; update
the logic that sets observedTime (the variable) to prefer the adapter-reported
observed_time stored on upsertedStatus (the stored adapter observed time field)
and only fall back to upsertedStatus.LastReportTime or time.Now() if that stored
adapter observed time is nil/zero; change the assignment near where observedTime
is initialized so it first checks the adapter observed-time field on
upsertedStatus (instead of LastReportTime) before using
LastReportTime/time.Now().

---

Nitpick comments:
In `@test/e2e-curl/common.sh`:
- Around line 134-148: The post_adapter_status helper currently always calls
now_rfc3339 for observed_time; change post_adapter_status to accept an optional
observed_time parameter (e.g., add a fifth parameter or make the fourth
optional) and set a local variable like observed_time="${5:-$(now_rfc3339)}";
then use that variable in the JSON payload for "observed_time" instead of
calling now_rfc3339 inline. Update callers that need deterministic timestamps to
pass their desired timestamp to post_adapter_status (leave existing callers
functioning when they omit the arg).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 697b363b-f760-4866-b2d2-98965b4d6ddb

📥 Commits

Reviewing files that changed from the base of the PR and between f825425 and 3ba7df1.

📒 Files selected for processing (38)
  • cmd/hyperfleet-api/migrate/cmd.go
  • docs/api-resources.md
  • pkg/config/db.go
  • pkg/config/loader.go
  • pkg/db/advisory_locks.go
  • pkg/db/context.go
  • pkg/db/db_session/default.go
  • pkg/db/db_session/test.go
  • pkg/db/db_session/testcontainer.go
  • pkg/db/migrations.go
  • pkg/db/mocks/session_factory.go
  • pkg/db/session.go
  • pkg/logger/fields.go
  • pkg/services/CLAUDE.md
  • pkg/services/cluster.go
  • pkg/services/cluster_test.go
  • pkg/services/node_pool.go
  • pkg/services/node_pool_test.go
  • pkg/services/status_aggregation.go
  • pkg/services/status_aggregation_test.go
  • test/e2e-curl/01-initial-state/test.sh
  • test/e2e-curl/02-partial-adapters/test.sh
  • test/e2e-curl/03-all-adapters-ready/test.sh
  • test/e2e-curl/04-generation-bump/test.sh
  • test/e2e-curl/05-mixed-generations/test.sh
  • test/e2e-curl/06-stale-report/test.sh
  • test/e2e-curl/07-all-adapters-new-gen/test.sh
  • test/e2e-curl/08-adapter-goes-false/test.sh
  • test/e2e-curl/09-stable-true/test.sh
  • test/e2e-curl/10-stable-false/test.sh
  • test/e2e-curl/11-unknown-subsequent/test.sh
  • test/e2e-curl/12-unknown-first/test.sh
  • test/e2e-curl/README.md
  • test/e2e-curl/common.sh
  • test/e2e-curl/run_all.sh
  • test/helper.go
  • test/integration/adapter_status_test.go
  • test/integration/advisory_locks_test.go
💤 Files with no reviewable changes (11)
  • pkg/logger/fields.go
  • pkg/db/db_session/default.go
  • test/integration/advisory_locks_test.go
  • pkg/db/mocks/session_factory.go
  • pkg/db/db_session/test.go
  • pkg/db/migrations.go
  • pkg/db/context.go
  • pkg/db/advisory_locks.go
  • test/helper.go
  • pkg/db/db_session/testcontainer.go
  • pkg/db/session.go
🚧 Files skipped from review as they are similar to previous changes (8)
  • test/e2e-curl/12-unknown-first/test.sh
  • pkg/services/CLAUDE.md
  • test/e2e-curl/06-stale-report/test.sh
  • test/e2e-curl/run_all.sh
  • pkg/services/node_pool_test.go
  • test/e2e-curl/09-stable-true/test.sh
  • test/e2e-curl/04-generation-bump/test.sh
  • test/e2e-curl/02-partial-adapters/test.sh

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (10)
test/e2e-curl/README.md (1)

23-31: ⚠️ Potential issue | 🟡 Minor

Clarify the working directory for running tests.

The commands ./run_all.sh and ./run_all.sh 03 07 assume the reader is in test/e2e-curl/. From the repo root, these paths won't work. Add an explicit cd test/e2e-curl instruction or use repo-root-relative paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-curl/README.md` around lines 23 - 31, The README's run instructions
assume the user is already in test/e2e-curl; update the text to make the working
directory explicit by adding a step that instructs the user to change into
test/e2e-curl (e.g., "cd test/e2e-curl") before running ./run_all.sh, or
alternatively present repo-root-relative commands that call
test/e2e-curl/run_all.sh; reference the existing run_all.sh examples so readers
know which scripts to run.
test/e2e-curl/run_all.sh (1)

42-50: ⚠️ Potential issue | 🟠 Major

Exit with error when filter arguments match zero tests.

If filter arguments don't match any test prefixes, SELECTED will be empty and the script will report "ALL TESTS PASSED (0/0)", which masks misconfiguration. Add a check after the selection loop:

if [ "${`#SELECTED`[@]}" -eq 0 ]; then
  echo -e "${RED}ERROR${NC}: no tests matched filters: $*"
  exit 1
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-curl/run_all.sh` around lines 42 - 50, After filtering tests into
SELECTED from TESTS, add a post-selection guard that checks if SELECTED is empty
(length zero) and, if so, prints an error including the provided filters ($*)
using the existing color variables (RED and NC) and exits with status 1; place
this check immediately after the else/fi block that populates SELECTED so any
invocation with filters that match zero tests fails fast instead of reporting
0/0 passed.
pkg/services/node_pool.go (2)

337-340: ⚠️ Potential issue | 🟠 Major

Aggregation failure is logged but not returned to caller.

If Upsert succeeds but updateNodePoolStatusFromAdapters fails, the method returns success while the node pool's synthetic conditions remain stale. Without an event/retry mechanism, this inconsistency can persist until another write occurs.

Consider returning the aggregation error so callers can observe the failure and handle it appropriately.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/node_pool.go` around lines 337 - 340, The call to
updateNodePoolStatusFromAdapters currently logs aggregation errors but swallows
them, causing callers (e.g., the Upsert flow) to return success while nodepool
conditions may be stale; change the surrounding method to propagate that
aggregation error: after calling updateNodePoolStatusFromAdapters(nodePoolID,
observedTime, false) check if aggregateErr != nil and return or wrap it
(preserving any prior success/err semantics) instead of only logging; ensure you
use the same error wrapping pattern used elsewhere in this file so callers can
observe and handle the failure.

332-336: ⚠️ Potential issue | 🟠 Major

Consider using the adapter's reported observed_time instead of LastReportTime.

LastReportTime is the API-managed receipt timestamp (when the server received the report), while the adapter's observed_time is when the adapter actually performed its work. For delayed or retried reports, these can differ significantly.

The comment at line 332 says "using the adapter's observed_time" but the code uses LastReportTime. If the spec intends for transition timestamps to reflect when the adapter observed the state (not when the API received the report), this should use adapterStatus.ObservedTime or equivalent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/node_pool.go` around lines 332 - 336, The code currently sets
observedTime from upsertedStatus.LastReportTime but the spec requires using the
adapter's reported observed_time; change the selection to prefer
adapterStatus.ObservedTime (or the struct field that carries the adapter's
observed_time) when non-nil/zero, falling back to upsertedStatus.LastReportTime
and finally time.Now() if neither is present; update the observedTime assignment
logic (referencing observedTime, upsertedStatus.LastReportTime and
adapterStatus.ObservedTime) and ensure proper nil checks and types when reading
adapterStatus.ObservedTime.
test/e2e-curl/11-unknown-subsequent/test.sh (1)

33-47: ⚠️ Potential issue | 🟠 Major

Assert baseline preconditions before the Unknown event.

The setup POST for ADAPTER1 at line 34 is not verified, and lines 36-47 don't assert that the baseline shows Available=True and Ready=True. If the precondition isn't met, the later "state unchanged" assertions could pass against an incorrect baseline.

Add assertions after capturing the baseline:

assert_eq "baseline Available.status" "True" "$PRE_AVAIL_STATUS"
assert_eq "baseline Ready.status"     "True" "$PRE_READY_STATUS"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-curl/11-unknown-subsequent/test.sh` around lines 33 - 47, After
posting the adapter status with post_adapter_status and capturing the baseline
with get_cluster and condition_field into PRE_AVAIL_STATUS and PRE_READY_STATUS,
add assertions using assert_eq to verify the baseline preconditions
(Available.status == "True" and Ready.status == "True") so the test fails early
if setup didn't produce the expected Ready/Available state; place these
assert_eq checks immediately after the assignments to PRE_AVAIL_STATUS and
PRE_READY_STATUS (i.e., after show_state "baseline" "$BASELINE") to validate the
POST succeeded before continuing.
docs/api-resources.md (2)

553-553: ⚠️ Potential issue | 🟠 Major

last_updated_time documentation for Available is inconsistent with PR objectives.

The current text states "For Available, always the evaluation time." However, the PR objectives specify that Available uses min(LastReportTime) across required adapters, and True→False transitions use the triggering adapter's observed_time. This mismatch will mislead API consumers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/api-resources.md` at line 553, The docs for last_updated_time are
incorrect for the Available condition: update the description for
`last_updated_time` so that for Available it is the minimum of `LastReportTime`
across all required adapters that report Available=True at the current
generation, and specify that a True→False transition uses the triggering
adapter's `observed_time` (while keeping the Ready semantics as already
described: when Ready=True use min(`last_report_time`) across required adapters
that report Available=True at the current generation, when Ready=False use the
evaluation time).

312-313: ⚠️ Potential issue | 🟡 Minor

observed_generation at creation is inconsistent with examples.

The Lifecycle Events table sets obs_gen to 1 for Creation, but the Cluster and NodePool creation examples (lines 58, 68, 404, 414) show observed_generation: 0. This inconsistency makes the API contract ambiguous for clients.

Either update the table to use obs_gen=0 or update the examples to use observed_generation: 1.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/api-resources.md` around lines 312 - 313, The docs show an inconsistency
between the Lifecycle Events table (which lists obs_gen = 1 on Creation) and the
Cluster/NodePool creation examples (which set observed_generation: 0); pick one
convention and make all occurrences match—for example, update the Cluster and
NodePool creation examples (the creation YAML/JSON examples that include
observed_generation) to use observed_generation: 1 so they align with the
Lifecycle Events table (or alternatively change the table rows for Creation to
obs_gen = 0 if you prefer that convention); ensure every example and the table
use the same field name (`observed_generation` / `obs_gen`) and value.
pkg/services/cluster.go (2)

337-340: ⚠️ Potential issue | 🟠 Major

Do not return success when post-upsert aggregation fails.

After upsert succeeds, failing aggregation is only logged. That leaves cluster synthetic conditions stale with no guaranteed retry path.

🛠️ Suggested fix
 if _, aggregateErr := s.updateClusterStatusFromAdapters(ctx, clusterID, observedTime, false); aggregateErr != nil {
 	ctx = logger.WithClusterID(ctx, clusterID)
 	logger.WithError(ctx, aggregateErr).Warn("Failed to aggregate cluster status")
+	return upsertedStatus, aggregateErr
 }
 
 return upsertedStatus, nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/cluster.go` around lines 337 - 340, After a successful upsert,
the code currently swallows failures from s.updateClusterStatusFromAdapters
(called with clusterID and observedTime) and only logs a warning; instead
propagate the failure so the caller sees the operation as failed. Replace the
log-only branch so that when updateClusterStatusFromAdapters returns an
aggregateErr you attach cluster context (using logger.WithClusterID/WithError)
and return or wrap aggregateErr from the enclosing function (rather than
continuing as success), ensuring callers and retry logic can handle the failed
post-upsert aggregation.

332-336: ⚠️ Potential issue | 🟠 Major

Use the adapter-reported observed timestamp for transition timing.

Line 334 currently derives observedTime from LastReportTime (receipt time). Delayed/retried reports can skew synthetic transition times; aggregation should use the adapter’s reported observed timestamp and only fall back to receipt time/now.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/cluster.go` around lines 332 - 336, The code sets observedTime
from upsertedStatus.LastReportTime but should prefer the adapter-reported
observed timestamp to avoid skew from delayed receipts; update the logic that
assigns observedTime (currently using observedTime and
upsertedStatus.LastReportTime) to first check for the adapter-reported observed
timestamp field on the status (e.g., upsertedStatus.ObservedTime or the actual
adapter-observed timestamp property on the status struct), use that if present,
otherwise fall back to upsertedStatus.LastReportTime, and finally default to
time.Now(); ensure you reference and use the actual adapter-reported field name
on upsertedStatus in place of the placeholder.
test/e2e-curl/10-stable-false/test.sh (1)

34-48: ⚠️ Potential issue | 🟠 Major

Verify the gen2 precondition before the “re-report” event.

Line 35 drops the setup response, and current baseline checks do not prove ${ADAPTER2}@gen2=True exists before Line 55. The event can become the first gen2 report, so this test may stop covering stable-false re-evaluation.

🛠️ Suggested fix
 log_step "Setup: POST ${ADAPTER2}@gen2=True  (${ADAPTER1} still missing → False@rG2)"
-post_adapter_status "$CLUSTER_ID" "$ADAPTER2" 2 "True" > /dev/null
+CODE=$(post_adapter_status "$CLUSTER_ID" "$ADAPTER2" 2 "True")
+assert_http "initial ${ADAPTER2}@gen2 accepted" "201" "$CODE"
@@
 assert_eq "baseline Available.status" "False" "$(condition_field "$BASELINE" Available status)"
 assert_eq "baseline Available.obsgen" "1"     "$(condition_field "$BASELINE" Available observed_generation)"
 assert_eq "baseline Ready.status"    "False" "$(condition_field "$BASELINE" Ready status)"
+assert_eq "baseline Ready.obsgen"    "2"     "$(condition_field "$BASELINE" Ready observed_generation)"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-curl/10-stable-false/test.sh` around lines 34 - 48, The test
currently posts "${ADAPTER2}@gen2=True" with post_adapter_status and discards
the response, then relies on baseline checks that don't prove gen2 has been
reported; change the setup to capture and verify the post result and/or re-poll
the cluster until the gen2 condition is observed before proceeding: call
post_adapter_status "$CLUSTER_ID" "$ADAPTER2" 2 "True" and save its output, then
use get_cluster "$CLUSTER_ID" (or re-run condition_field checks) to assert that
the gen2 report is reflected (e.g., verify the appropriate condition/status or
observed_generation for ADAPTER2 is present) before running the baseline
assertions.
🧹 Nitpick comments (1)
pkg/services/status_aggregation.go (1)

461-464: Align the Ready LUT comment with the implemented behavior.

Line 462 says Ready=False uses evaluation time, but the code now uses min(LastReportTime) across required adapters with fallback to now. Updating this comment will prevent incorrect future changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/status_aggregation.go` around lines 461 - 464, Update the
comment on the Ready LUT in the buildReadyCondition function to reflect the
implemented behavior: when Ready=False, LastUpdatedTime is computed as the
minimum LastReportTime across required adapters with a fallback to now (rather
than always using the evaluation time), and when Ready=True it uses the earliest
adapter report time; adjust the wording to mention min(LastReportTime) and the
fallback to now so future readers won't assume evaluation-time-only semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e-curl/README.md`:
- Around line 55-62: The fenced code block that lists the test directory
structure (the block starting with the line showing "e2e-curl/" and the entries
common.sh, run_all.sh, 01-initial-state/test.sh) is missing a language tag;
update the opening fence from ``` to ```text in the README.md's example block so
the block is tagged as text and satisfies markdownlint MD040.

---

Duplicate comments:
In `@docs/api-resources.md`:
- Line 553: The docs for last_updated_time are incorrect for the Available
condition: update the description for `last_updated_time` so that for Available
it is the minimum of `LastReportTime` across all required adapters that report
Available=True at the current generation, and specify that a True→False
transition uses the triggering adapter's `observed_time` (while keeping the
Ready semantics as already described: when Ready=True use
min(`last_report_time`) across required adapters that report Available=True at
the current generation, when Ready=False use the evaluation time).
- Around line 312-313: The docs show an inconsistency between the Lifecycle
Events table (which lists obs_gen = 1 on Creation) and the Cluster/NodePool
creation examples (which set observed_generation: 0); pick one convention and
make all occurrences match—for example, update the Cluster and NodePool creation
examples (the creation YAML/JSON examples that include observed_generation) to
use observed_generation: 1 so they align with the Lifecycle Events table (or
alternatively change the table rows for Creation to obs_gen = 0 if you prefer
that convention); ensure every example and the table use the same field name
(`observed_generation` / `obs_gen`) and value.

In `@pkg/services/cluster.go`:
- Around line 337-340: After a successful upsert, the code currently swallows
failures from s.updateClusterStatusFromAdapters (called with clusterID and
observedTime) and only logs a warning; instead propagate the failure so the
caller sees the operation as failed. Replace the log-only branch so that when
updateClusterStatusFromAdapters returns an aggregateErr you attach cluster
context (using logger.WithClusterID/WithError) and return or wrap aggregateErr
from the enclosing function (rather than continuing as success), ensuring
callers and retry logic can handle the failed post-upsert aggregation.
- Around line 332-336: The code sets observedTime from
upsertedStatus.LastReportTime but should prefer the adapter-reported observed
timestamp to avoid skew from delayed receipts; update the logic that assigns
observedTime (currently using observedTime and upsertedStatus.LastReportTime) to
first check for the adapter-reported observed timestamp field on the status
(e.g., upsertedStatus.ObservedTime or the actual adapter-observed timestamp
property on the status struct), use that if present, otherwise fall back to
upsertedStatus.LastReportTime, and finally default to time.Now(); ensure you
reference and use the actual adapter-reported field name on upsertedStatus in
place of the placeholder.

In `@pkg/services/node_pool.go`:
- Around line 337-340: The call to updateNodePoolStatusFromAdapters currently
logs aggregation errors but swallows them, causing callers (e.g., the Upsert
flow) to return success while nodepool conditions may be stale; change the
surrounding method to propagate that aggregation error: after calling
updateNodePoolStatusFromAdapters(nodePoolID, observedTime, false) check if
aggregateErr != nil and return or wrap it (preserving any prior success/err
semantics) instead of only logging; ensure you use the same error wrapping
pattern used elsewhere in this file so callers can observe and handle the
failure.
- Around line 332-336: The code currently sets observedTime from
upsertedStatus.LastReportTime but the spec requires using the adapter's reported
observed_time; change the selection to prefer adapterStatus.ObservedTime (or the
struct field that carries the adapter's observed_time) when non-nil/zero,
falling back to upsertedStatus.LastReportTime and finally time.Now() if neither
is present; update the observedTime assignment logic (referencing observedTime,
upsertedStatus.LastReportTime and adapterStatus.ObservedTime) and ensure proper
nil checks and types when reading adapterStatus.ObservedTime.

In `@test/e2e-curl/10-stable-false/test.sh`:
- Around line 34-48: The test currently posts "${ADAPTER2}@gen2=True" with
post_adapter_status and discards the response, then relies on baseline checks
that don't prove gen2 has been reported; change the setup to capture and verify
the post result and/or re-poll the cluster until the gen2 condition is observed
before proceeding: call post_adapter_status "$CLUSTER_ID" "$ADAPTER2" 2 "True"
and save its output, then use get_cluster "$CLUSTER_ID" (or re-run
condition_field checks) to assert that the gen2 report is reflected (e.g.,
verify the appropriate condition/status or observed_generation for ADAPTER2 is
present) before running the baseline assertions.

In `@test/e2e-curl/11-unknown-subsequent/test.sh`:
- Around line 33-47: After posting the adapter status with post_adapter_status
and capturing the baseline with get_cluster and condition_field into
PRE_AVAIL_STATUS and PRE_READY_STATUS, add assertions using assert_eq to verify
the baseline preconditions (Available.status == "True" and Ready.status ==
"True") so the test fails early if setup didn't produce the expected
Ready/Available state; place these assert_eq checks immediately after the
assignments to PRE_AVAIL_STATUS and PRE_READY_STATUS (i.e., after show_state
"baseline" "$BASELINE") to validate the POST succeeded before continuing.

In `@test/e2e-curl/README.md`:
- Around line 23-31: The README's run instructions assume the user is already in
test/e2e-curl; update the text to make the working directory explicit by adding
a step that instructs the user to change into test/e2e-curl (e.g., "cd
test/e2e-curl") before running ./run_all.sh, or alternatively present
repo-root-relative commands that call test/e2e-curl/run_all.sh; reference the
existing run_all.sh examples so readers know which scripts to run.

In `@test/e2e-curl/run_all.sh`:
- Around line 42-50: After filtering tests into SELECTED from TESTS, add a
post-selection guard that checks if SELECTED is empty (length zero) and, if so,
prints an error including the provided filters ($*) using the existing color
variables (RED and NC) and exits with status 1; place this check immediately
after the else/fi block that populates SELECTED so any invocation with filters
that match zero tests fails fast instead of reporting 0/0 passed.

---

Nitpick comments:
In `@pkg/services/status_aggregation.go`:
- Around line 461-464: Update the comment on the Ready LUT in the
buildReadyCondition function to reflect the implemented behavior: when
Ready=False, LastUpdatedTime is computed as the minimum LastReportTime across
required adapters with a fallback to now (rather than always using the
evaluation time), and when Ready=True it uses the earliest adapter report time;
adjust the wording to mention min(LastReportTime) and the fallback to now so
future readers won't assume evaluation-time-only semantics.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c09c2e74-99f1-45e3-bcfe-518874f87d0e

📥 Commits

Reviewing files that changed from the base of the PR and between 3ba7df1 and 41963b7.

📒 Files selected for processing (25)
  • docs/api-resources.md
  • pkg/config/loader.go
  • pkg/services/CLAUDE.md
  • pkg/services/cluster.go
  • pkg/services/cluster_test.go
  • pkg/services/node_pool.go
  • pkg/services/node_pool_test.go
  • pkg/services/status_aggregation.go
  • pkg/services/status_aggregation_test.go
  • test/e2e-curl/01-initial-state/test.sh
  • test/e2e-curl/02-partial-adapters/test.sh
  • test/e2e-curl/03-all-adapters-ready/test.sh
  • test/e2e-curl/04-generation-bump/test.sh
  • test/e2e-curl/05-mixed-generations/test.sh
  • test/e2e-curl/06-stale-report/test.sh
  • test/e2e-curl/07-all-adapters-new-gen/test.sh
  • test/e2e-curl/08-adapter-goes-false/test.sh
  • test/e2e-curl/09-stable-true/test.sh
  • test/e2e-curl/10-stable-false/test.sh
  • test/e2e-curl/11-unknown-subsequent/test.sh
  • test/e2e-curl/12-unknown-first/test.sh
  • test/e2e-curl/README.md
  • test/e2e-curl/common.sh
  • test/e2e-curl/run_all.sh
  • test/integration/adapter_status_test.go
🚧 Files skipped from review as they are similar to previous changes (8)
  • test/e2e-curl/12-unknown-first/test.sh
  • pkg/services/CLAUDE.md
  • test/e2e-curl/04-generation-bump/test.sh
  • test/e2e-curl/common.sh
  • test/e2e-curl/02-partial-adapters/test.sh
  • test/e2e-curl/01-initial-state/test.sh
  • test/e2e-curl/06-stale-report/test.sh
  • test/e2e-curl/08-adapter-goes-false/test.sh

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/services/node_pool.go (1)

33-36: ⚠️ Potential issue | 🟡 Minor

Stale interface documentation contradicts new behavior.

Same issue as in cluster.go: the interface comment still describes accepting Unknown on first report, but the implementation now always discards Unknown Available reports per P3 rule.

📝 Suggested documentation update
 // ProcessAdapterStatus handles the business logic for adapter status:
-// - First report: accepts Unknown Available condition, skips aggregation
-// - Subsequent reports: rejects Unknown Available condition
+// - Discards reports with Available=Unknown (P3 rule)
+// - Discards reports ahead of resource generation (P1 rule)
+// - Validates mandatory conditions (P2 rule)
 // - Otherwise: upserts the status and triggers aggregation
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/node_pool.go` around lines 33 - 36, Update the stale interface
comment for ProcessAdapterStatus in node_pool.go to reflect the current
behavior: remove the note about accepting Unknown Available on the first report
and instead state that Unknown Available reports are always discarded per the P3
rule; keep that the function upserts non-Unknown statuses and triggers
aggregation. Also mirror the same documentation change in cluster.go where the
same outdated comment appears so both ProcessAdapterStatus implementations
describe the new consistent behavior.
pkg/services/cluster.go (1)

33-36: ⚠️ Potential issue | 🟡 Minor

Stale interface documentation contradicts new behavior.

The interface comment states:

First report: accepts Unknown Available condition, skips aggregation
Subsequent reports: rejects Unknown Available condition

But the implementation (lines 302-310) now always discards Unknown Available reports per P3 rule, regardless of whether it's the first report. The comment should be updated to reflect the current behavior.

📝 Suggested documentation update
 // ProcessAdapterStatus handles the business logic for adapter status:
-// - First report: accepts Unknown Available condition, skips aggregation
-// - Subsequent reports: rejects Unknown Available condition
+// - Discards reports with Available=Unknown (P3 rule)
+// - Discards reports ahead of resource generation (P1 rule)
+// - Validates mandatory conditions (P2 rule)
 // - Otherwise: upserts the status and triggers aggregation
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/cluster.go` around lines 33 - 36, Update the stale doc comment
for ProcessAdapterStatus to reflect current behavior: instead of saying the
first report accepts Unknown Available and subsequent reports reject it, state
that Unknown Available reports are always discarded (per the P3 rule) and thus
never trigger aggregation; keep the rest of the description about upserting
status and triggering aggregation for non-Unknown reports and reference the
function name ProcessAdapterStatus so reviewers can find and update the comment
accordingly.
♻️ Duplicate comments (2)
pkg/services/cluster.go (1)

337-340: ⚠️ Potential issue | 🟠 Major

Aggregation failure is still swallowed despite being marked as addressed.

The past review flagged that swallowing updateClusterStatusFromAdapters errors leaves the cluster's synthesized conditions stale. The current code still only logs the error and returns success:

if _, aggregateErr := s.updateClusterStatusFromAdapters(...); aggregateErr != nil {
    logger.WithError(ctx, aggregateErr).Warn("Failed to aggregate cluster status")
}
return upsertedStatus, nil

This means the adapter status is persisted but cluster synthetic conditions may be inconsistent. Without a retry mechanism, this inconsistency persists until another write occurs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/cluster.go` around lines 337 - 340, The call to
updateClusterStatusFromAdapters currently logs aggregateErr and continues,
causing synthesized cluster conditions to remain stale; change the behavior in
the function (the block using updateClusterStatusFromAdapters and returning
upsertedStatus) to propagate the aggregation error instead of swallowing
it—i.e., when updateClusterStatusFromAdapters returns a non-nil aggregateErr,
return upsertedStatus together with that error (or wrap it) so callers can
retry/handle the failure, ensuring callers see the failure and do not assume
success.
pkg/services/node_pool.go (1)

337-340: ⚠️ Potential issue | 🟠 Major

Aggregation failure is still swallowed.

Same issue as in cluster.go: the aggregation error is logged but not returned. The adapter status is persisted while nodepool synthetic conditions may remain stale.

if _, aggregateErr := s.updateNodePoolStatusFromAdapters(...); aggregateErr != nil {
    logger.With(ctx, logger.FieldNodePoolID, nodePoolID).
        WithError(aggregateErr).Warn("Failed to aggregate nodepool status")
}
return upsertedStatus, nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/node_pool.go` around lines 337 - 340, The aggregation error from
s.updateNodePoolStatusFromAdapters is currently only logged and swallowed;
change the control flow so that when updateNodePoolStatusFromAdapters returns a
non-nil aggregateErr you both log it and propagate it to the caller instead of
returning nil error. Locate the call to s.updateNodePoolStatusFromAdapters in
the node pool reconciliation path (the block that logs "Failed to aggregate
nodepool status") and modify it to return the aggregateErr (after logging) so
synthetic nodePool conditions are not left stale.
🧹 Nitpick comments (1)
test/integration/adapter_status_test.go (1)

435-498: Consider consolidating overlapping test coverage.

TestClusterStatusPost_UnknownAvailableAlwaysDiscarded (lines 435-498) and TestClusterStatusPost_UnknownAvailableNeverStored (lines 878-941) test very similar scenarios - both verify that Unknown Available reports are discarded with 204 and not stored. The main difference is the adapter name used.

This is not blocking, but consolidating or differentiating these tests more clearly would improve maintainability.

Also applies to: 878-941

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integration/adapter_status_test.go` around lines 435 - 498, The two
tests TestClusterStatusPost_UnknownAvailableAlwaysDiscarded and
TestClusterStatusPost_UnknownAvailableNeverStored overlap by asserting that
reports with Available=Unknown are discarded (204) and not persisted;
consolidate them by removing one and ensuring the remaining test covers both
first and subsequent reports for multiple adapters or, alternatively, make their
intent distinct by renaming one and adding an explicit differentiator (e.g.,
different adapter lifecycle or preexisting stored status) so each exercises a
unique code path; update references to the helper newAdapterStatusRequest and
calls to PostClusterStatusesWithResponse/GetClusterStatusesWithResponse
accordingly to keep coverage and avoid duplicate assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e-curl/common.sh`:
- Line 6: The header comment in common.sh uses HYPERFLEET_CLUSTER_ADAPTERS but
the check_server() function prints HYPERFLEET_ADAPTERS_REQUIRED_CLUSTER, causing
a docs mismatch; update the header comment to reference the same environment
variable name used by check_server() or list both aliases
(HYPERFLEET_CLUSTER_ADAPTERS and HYPERFLEET_ADAPTERS_REQUIRED_CLUSTER) so they
are consistent and unambiguous for users, and ensure any examples match the
variable name used in the check_server() output.

---

Outside diff comments:
In `@pkg/services/cluster.go`:
- Around line 33-36: Update the stale doc comment for ProcessAdapterStatus to
reflect current behavior: instead of saying the first report accepts Unknown
Available and subsequent reports reject it, state that Unknown Available reports
are always discarded (per the P3 rule) and thus never trigger aggregation; keep
the rest of the description about upserting status and triggering aggregation
for non-Unknown reports and reference the function name ProcessAdapterStatus so
reviewers can find and update the comment accordingly.

In `@pkg/services/node_pool.go`:
- Around line 33-36: Update the stale interface comment for ProcessAdapterStatus
in node_pool.go to reflect the current behavior: remove the note about accepting
Unknown Available on the first report and instead state that Unknown Available
reports are always discarded per the P3 rule; keep that the function upserts
non-Unknown statuses and triggers aggregation. Also mirror the same
documentation change in cluster.go where the same outdated comment appears so
both ProcessAdapterStatus implementations describe the new consistent behavior.

---

Duplicate comments:
In `@pkg/services/cluster.go`:
- Around line 337-340: The call to updateClusterStatusFromAdapters currently
logs aggregateErr and continues, causing synthesized cluster conditions to
remain stale; change the behavior in the function (the block using
updateClusterStatusFromAdapters and returning upsertedStatus) to propagate the
aggregation error instead of swallowing it—i.e., when
updateClusterStatusFromAdapters returns a non-nil aggregateErr, return
upsertedStatus together with that error (or wrap it) so callers can retry/handle
the failure, ensuring callers see the failure and do not assume success.

In `@pkg/services/node_pool.go`:
- Around line 337-340: The aggregation error from
s.updateNodePoolStatusFromAdapters is currently only logged and swallowed;
change the control flow so that when updateNodePoolStatusFromAdapters returns a
non-nil aggregateErr you both log it and propagate it to the caller instead of
returning nil error. Locate the call to s.updateNodePoolStatusFromAdapters in
the node pool reconciliation path (the block that logs "Failed to aggregate
nodepool status") and modify it to return the aggregateErr (after logging) so
synthetic nodePool conditions are not left stale.

---

Nitpick comments:
In `@test/integration/adapter_status_test.go`:
- Around line 435-498: The two tests
TestClusterStatusPost_UnknownAvailableAlwaysDiscarded and
TestClusterStatusPost_UnknownAvailableNeverStored overlap by asserting that
reports with Available=Unknown are discarded (204) and not persisted;
consolidate them by removing one and ensuring the remaining test covers both
first and subsequent reports for multiple adapters or, alternatively, make their
intent distinct by renaming one and adding an explicit differentiator (e.g.,
different adapter lifecycle or preexisting stored status) so each exercises a
unique code path; update references to the helper newAdapterStatusRequest and
calls to PostClusterStatusesWithResponse/GetClusterStatusesWithResponse
accordingly to keep coverage and avoid duplicate assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 91a97282-e828-4c7b-8bed-4c3adf7c0b31

📥 Commits

Reviewing files that changed from the base of the PR and between 41963b7 and 5247788.

📒 Files selected for processing (25)
  • docs/api-resources.md
  • pkg/config/loader.go
  • pkg/services/CLAUDE.md
  • pkg/services/cluster.go
  • pkg/services/cluster_test.go
  • pkg/services/node_pool.go
  • pkg/services/node_pool_test.go
  • pkg/services/status_aggregation.go
  • pkg/services/status_aggregation_test.go
  • test/e2e-curl/01-initial-state/test.sh
  • test/e2e-curl/02-partial-adapters/test.sh
  • test/e2e-curl/03-all-adapters-ready/test.sh
  • test/e2e-curl/04-generation-bump/test.sh
  • test/e2e-curl/05-mixed-generations/test.sh
  • test/e2e-curl/06-stale-report/test.sh
  • test/e2e-curl/07-all-adapters-new-gen/test.sh
  • test/e2e-curl/08-adapter-goes-false/test.sh
  • test/e2e-curl/09-stable-true/test.sh
  • test/e2e-curl/10-stable-false/test.sh
  • test/e2e-curl/11-unknown-subsequent/test.sh
  • test/e2e-curl/12-unknown-first/test.sh
  • test/e2e-curl/README.md
  • test/e2e-curl/common.sh
  • test/e2e-curl/run_all.sh
  • test/integration/adapter_status_test.go
🚧 Files skipped from review as they are similar to previous changes (12)
  • test/e2e-curl/11-unknown-subsequent/test.sh
  • test/e2e-curl/05-mixed-generations/test.sh
  • test/e2e-curl/run_all.sh
  • pkg/services/node_pool_test.go
  • pkg/services/CLAUDE.md
  • test/e2e-curl/04-generation-bump/test.sh
  • test/e2e-curl/07-all-adapters-new-gen/test.sh
  • test/e2e-curl/01-initial-state/test.sh
  • test/e2e-curl/12-unknown-first/test.sh
  • test/e2e-curl/09-stable-true/test.sh
  • test/e2e-curl/03-all-adapters-ready/test.sh
  • test/e2e-curl/06-stale-report/test.sh

Copy link
Collaborator

@xueli181114 xueli181114 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review: the Unknown discard change (P3) introduces a behavioral difference in the gen-bump scenario that could leave Available frozen at True when it should go False. See inline comment for details.

ctx = logger.WithClusterID(ctx, clusterID)
logger.Info(ctx, fmt.Sprintf("Discarding adapter status update from %s: Available=Unknown reports are not processed",
adapterStatus.Adapter))
return nil, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncovered scenario: gen bump + Unknown@gen2 = Available frozen at True

Discarding Unknown means the adapter's previous status (e.g. True@gen1) stays in the DB. This creates a behavioral difference vs the old code when a generation bump is involved:

Setup: Adapters [A, B], both True@gen1 → Available=True, Ready=True
Event: Generation bumps to gen2
       A reports Unknown@gen2 → discarded here, A stays True@gen1 in DB
       B reports True@gen2   → stored, triggers aggregation

New codescanAdapterAvailability sees A@gen1 vs B@gen2 → consistent=false → Available frozen at True@gen1.

Old code — Unknown@gen2 would have been stored (first report at gen2). Aggregation sees A@gen2(Unknown) vs B@gen2 → consistent=true, allAvailable=false → Available set to False.

The cluster appears Available=True when one adapter is actually in an uncertain state at the new generation. This persists until A sends a definitive True/False at gen2.

Suggested test to add:

// TestUnknownAtNewGen_AvailableFrozen verifies that after a generation bump,
// if one adapter reports Unknown@gen2 (discarded) and the other reports True@gen2,
// Available is correctly frozen (not actively set to False) and Ready=False.
func TestUnknownAtNewGen_AvailableFrozen(t *testing.T) {
    // 1. Both adapters True@gen1 → Available=True, Ready=True
    // 2. Bump generation to 2
    // 3. Adapter A reports Unknown@gen2 → discarded (204)
    // 4. Adapter B reports True@gen2 → stored, aggregation runs
    // 5. Assert Available still True@gen1 (frozen), Ready=False
    //    — or decide this should be Available=False and fix the code
}

The question for the author: is "frozen at True" the intended behavior here, or should the code detect that a required adapter hasn't given a definitive answer at the new generation and actively set Available=False?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old code - Only there is no adapter status, the unknown status is stored; it means that for an adapter deployed from the beginning, you only can see Unknown@gen1, and should never see Unknown@gen2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Available frozen at True at an old generation" is the correct behavior, it means:

  • The resource was True for this generation and that was updated until observed_date

Getting Unknown does not provide more useful information.
Getting False on the other hand means that the resource is not Available for some generation later.

// viper key -> ordered list of env var names (first one set wins)
clusterEnvVars := []string{
EnvPrefix + "_ADAPTERS_REQUIRED_CLUSTER",
EnvPrefix + "_CLUSTER_ADAPTERS", // alias for user convenience
Copy link
Contributor

@yasun1 yasun1 Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I previously migrated from HYPERFLEET_CLUSTER_ADAPTERS to HYPERFLEET_ADAPTERS_REQUIRED_CLUSTER for better clarity. Any strong reason to maintain two Env variables for the same config? It also brings the confusion what happens if both are set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems an error from my side when rebasing. I'm restoring the code you introduced

numReady := 0
for _, name := range requiredAdapters {
info, exists := adapterMap[name]
if !exists || (resourceGeneration > 0 && info.observedGeneration != resourceGeneration) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need resourceGeneration > 0 as the default value is 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed, I simplify it

// - P3: discards if Available == Unknown (not processed per spec)
//
// Otherwise: upserts the status and triggers aggregation.
// Returns (nil, nil) for discarded/rejected updates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe define a custom error type ErrDiscarded or isDiscarded could be better than nil.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is invoked just once, adding it IMO (and chatting with the LLM) introduce more complexity than it it is worth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants