Skip to content

feat(web-next): align NAAP dashboard and BFF with OpenAPI v1 (streaming + requests)#270

Merged
eliteprox merged 12 commits into
mainfrom
feat/dashboard-data-updates
Apr 27, 2026
Merged

feat(web-next): align NAAP dashboard and BFF with OpenAPI v1 (streaming + requests)#270
eliteprox merged 12 commits into
mainfrom
feat/dashboard-data-updates

Conversation

@eliteprox
Copy link
Copy Markdown
Contributor

@eliteprox eliteprox commented Apr 17, 2026

Summary

This PR aligns the web BFF and related resolvers with the repo OpenAPI v1 contract: dashboard data is sourced from streaming and requests resources where specified, merged where the UI needs a single view, and cached consistently. It also tightens dashboard window handling and timeouts for large ranges, fixes orchestrator listing by unioning dashboard/orchestrators with inventory and merging serviceUri, documents warm routes against the spec, and updates the developer-api network-models path to use NAAP_API_SERVER_URL (no hardcoded NAAP base) and merged streaming/models + requests/models.

Changes

  • Document BFF/NAAP warm routes and point them at openapi.yaml / upstream semantics.
  • Developer API: network models merged from streaming/models + requests/models (+ catalog where applicable); upstream base from NAAP_API_SERVER_URL only.
  • Facade: perf-by-model backed by streaming/models with a single in-process + SWR cache key (24h semantics per OpenAPI).
  • Dashboard: hooks and parsers for combined KPI/pipelines payloads; FPS column uses streaming/models avg_fps with header tooltip (24h, live-video scope); KPI/pipelines use formatDashboardWindow and scaled upstream timeouts for large windows.
  • Network data: merged streaming + requests model rows where used.
  • Orchestrators: merge dashboard SLA rows (preserve order) with streaming + requests inventory; merge serviceUri into uris; pipelines/pricing resolvers updated for OpenAPI v1 shapes.

Type

  • Feature
  • Bug fix
  • Refactor
  • Documentation
  • CI / Tooling
  • Plugin (new or update)
  • Dependencies

Plugin(s) Affected

  • plugins/developer-api (backend network-models / NAAP upstream env)

Checklist

  • Tests pass locally
  • Lint passes (npm run lint)
  • Build succeeds (npm run build)
  • No new lint warnings introduced
  • Breaking changes documented below
  • Database migration included (if Prisma schema changed)

Breaking Changes

None for consumers of the public dashboard JSON shapes beyond what is already required by the OpenAPI v1 migration (e.g. NAAP_API_SERVER_URL must be set for developer-api network-models when calling NAAP).

Screenshots / Recordings

Optional: Overview Pipelines & GPUs (FPS column + info tooltip) and Orchestrators table length before/after if you want visual proof.

Summary by CodeRabbit

  • New Features

    • Added informational tooltips to pipeline FPS column for metric provenance.
    • Enhanced job feed polling with adaptive retry logic and cache-busting for improved resilience.
  • Changes

    • Renamed KPI metric from "orchestratorsOnline" to "orchestratorsObserved".
    • Success Rate KPI now displays with one decimal precision.
    • Pipeline FPS values display with one decimal precision.
    • Pipeline table default sort order changed from model name to GPU count.
    • Dashboard timeframe selector maximum reduced from 7 days to 48 hours.
    • Improved FPS metric calculation for greater accuracy.

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Apr 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
naap-platform Ready Ready Preview, Comment Apr 27, 2026 9:56pm

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 17, 2026

Warning

Rate limit exceeded

@eliteprox has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 25 minutes and 24 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 99778e4a-7dca-42f8-b80d-02f7f562d2b3

📥 Commits

Reviewing files that changed from the base of the PR and between 0a8e88c and a69625f.

📒 Files selected for processing (3)
  • apps/web-next/src/hooks/useJobFeedStream.ts
  • apps/web-next/src/lib/facade/resolvers/orchestrators.ts
  • plugins/developer-api/backend/src/server.ts
📝 Walkthrough

Walkthrough

This PR consolidates dashboard data sources by fetching from merged NAAP streaming and requests endpoints instead of legacy single endpoints. It renames the orchestrators KPI metric from orchestratorsOnline to orchestratorsObserved, caps timeframe options at 48 hours, introduces request-aware data types for KPI and pipelines, and refactors multiple resolvers to aggregate multi-source data with updated caching strategies.

Changes

Cohort / File(s) Summary
KPI Field Rename
packages/types/src/index.ts, packages/plugin-sdk/src/contracts/dashboard.ts, packages/plugin-sdk/src/contracts/__tests__/dashboard.test.ts, apps/web-next/src/content/docs/guides/dashboard-data-provider.mdx, apps/web-next/src/lib/facade/stubs.ts, plugins/dashboard-data-provider/frontend/src/__tests__/provider.test.ts
Updates orchestratorsOnline to orchestratorsObserved across schema, types, tests, documentation, and stub data.
Dashboard Data Types & Contract Expansion
packages/plugin-sdk/src/contracts/dashboard.ts, packages/plugin-sdk/src/contracts/index.ts, packages/plugin-sdk/src/index.ts
Adds DashboardKPIWithRequests, DashboardPipelinesWithRequests, and related job-stat types; updates resolver contracts to accept extended return types.
Facade Resolvers — KPI & Pipelines
apps/web-next/src/lib/facade/resolvers/kpi.ts, apps/web-next/src/lib/facade/resolvers/pipelines.ts, apps/web-next/src/lib/facade/resolvers/perf-by-model.ts
Refactors KPI resolver to fetch from single /v1/dashboard/kpi endpoint instead of combining multiple sources; pipelines now returns object with streaming and requests fields; perf-by-model sources FPS from streaming/models instead of legacy perf endpoint.
Facade Resolvers — Orchestrators & Network Data
apps/web-next/src/lib/facade/resolvers/net-orchestrators.ts, apps/web-next/src/lib/facade/resolvers/orchestrators.ts, apps/web-next/src/lib/facade/resolvers/net-capacity.ts
Consolidates net-orchestrators to fetch from /v1/streaming/orchestrators and /v1/requests/orchestrators with new merge logic; orchestrators resolver caps timeframe and merges service URIs; net-capacity sources warm-orch counts from raw network models.
Facade Resolvers — Pricing & GPU Capacity
apps/web-next/src/lib/facade/resolvers/pricing.ts, apps/web-next/src/lib/facade/resolvers/gpu-capacity.ts
Pricing resolver now handles dual payload shapes (legacy aggregated vs. per-orchestrator) with runtime type detection and aggregation; GPU capacity uses new formatDashboardWindow helper for upstream query parameter.
Facade Helpers & Utilities
apps/web-next/src/lib/facade/index.ts, apps/web-next/src/lib/facade/dashboard-window.ts, apps/web-next/src/lib/facade/upstream-parse.ts, apps/web-next/src/lib/facade/network-data.ts
Introduces formatDashboardWindow and dashboardUpstreamTimeoutMs helpers; new upstream-parse.ts module with payload shape detection and parsing; network-data now aggregates from streaming/models and requests/models with pricing merge.
Timeframe Management
apps/web-next/src/lib/dashboard/overview-timeframe.ts
Introduces OVERVIEW_TIMEFRAME_MAX_HOURS constant (48), removes 72h and 7d options, marks 48h as maximum in selector.
Dashboard Routes
apps/web-next/src/app/api/v1/dashboard/orchestrators/route.ts, apps/web-next/src/app/api/v1/network/perf-by-model/route.ts, apps/web-next/src/app/api/internal/bff-warm/route.ts, apps/web-next/src/app/api/v1/naap-api/warm/route.ts, apps/web-next/src/app/api/v1/developer/network-models/route.ts
Orchestrators route derives cache key from effective upstream window; perf-by-model route uses constant cache key; developer network-models route fetches from streaming/requests instead of legacy net/models; route comments updated to reference new data sources.
UI Components & Hooks
apps/web-next/src/components/dashboard/overview-content.tsx, apps/web-next/src/hooks/usePublicDashboard.ts, apps/web-next/src/app/(dashboard)/dashboard/page.tsx, apps/web-next/src/hooks/useJobFeedStream.ts
Dashboard page queries new orchestratorsObserved KPI field; overview-content updates FPS handling to use only perf-based values, adds info tooltip for FPS provenance, changes default sort to GPUs descending; usePublicDashboard handles new type signatures and normalizes pipelines response shape; job feed polling adds cache-busting and adaptive retry logic.
Plugin System & Data Provider
plugins/dashboard-data-provider/frontend/src/provider.ts, plugins/dashboard-data-provider/frontend/src/__tests__/provider.test.ts, plugins/developer-api/backend/src/server.ts, packages/plugin-sdk/src/types/network-model.ts
Data provider fetches pipelines and normalizes response shape; developer API endpoint derives model inventory from streaming/requests; plugin SDK type docs updated to reflect new data sources; tests updated for KPI field rename.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant BFF as Web Server<br/>(Facade)
    participant StreamModels as NAAP:<br/>streaming/models
    participant RequestModels as NAAP:<br/>requests/models
    participant Pricing as NAAP:<br/>dashboard/pricing
    participant Cache as Redis Cache

    Client->>BFF: GET /api/v1/network/models
    
    alt Cache Hit
        BFF->>Cache: lookup networkModels
        Cache-->>BFF: cached payload
    else Cache Miss
        par Concurrent Fetch
            BFF->>+StreamModels: GET /v1/streaming/models
            BFF->>+RequestModels: GET /v1/requests/models
        and
            BFF->>+Pricing: GET /v1/dashboard/pricing
        end
        
        StreamModels-->>-BFF: rows (pipeline, model, warm_orch_count...)
        RequestModels-->>-BFF: rows (pipeline, model, gpu_slots...)
        Pricing-->>-BFF: pricing rows (min/max/avg per key)
        
        Note over BFF: Merge by pipeline:model key<br/>Sum capacity fields<br/>Aggregate pricing
        
        BFF->>Cache: store aggregated models
    end
    
    BFF-->>Client: NetworkModel[] with pricing
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

Suggested reviewers

  • seanhanca
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the primary goal: aligning NAAP dashboard and BFF with OpenAPI v1 to use streaming and requests data sources.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/dashboard-data-updates

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the size/XL Extra large PR (500+ lines) label Apr 17, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 17, 2026

⚠️ This PR is very large (1693 lines changed). Please split it into smaller, focused PRs if possible.

…PI compliance

- Enhanced BFF warm route documentation to reference the upstream OpenAPI contract.
- Updated developer network models endpoint to merge streaming and requests models.
- Refined NAAP API warm route comments to clarify data sources.
- Adjusted performance by model resolver to utilize a single cache key for streaming models.
- Improved dashboard hooks to handle new data structures and ensure compatibility with requests.
- Introduced new utility functions for dashboard window formatting and timeout management.
- Merged streaming and requests model data in the network data layer for better consistency.
- Updated orchestrator and pipeline resolvers to reflect changes in data fetching and merging logic.
- Enhanced pricing resolver to accommodate new OpenAPI v1 structure and ensure accurate pricing data representation.
- Introduced a new utility function to determine the effective upstream window for orchestrators based on the requested period.
- Updated the orchestrator cache key to utilize the effective window for improved data accuracy.
- Simplified the KPI resolver by removing unnecessary net orchestrator data fetching and directly returning parsed KPI data.
- Enhanced documentation to clarify the behavior of the orchestrator endpoint and its limitations on timeframes.
…ed performance

- Changed default sorting in PipelinesCard from 'model' to 'gpus' to prioritize busy models.
- Introduced OVERVIEW_TIMEFRAME_MAX_HOURS constant to cap timeframe options and ensure consistent handling across various resolvers.
- Updated KPI and pipeline resolvers to utilize the new normalization function for timeframe parsing, enhancing data accuracy and performance.
@eliteprox
Copy link
Copy Markdown
Contributor Author

eliteprox commented Apr 21, 2026

Two issues remain blockers for this PR.

** partial blocker ** workaround added in @53bb1cea2d121a864580b268dc5b464e87f75514 to query only past 48 hours

  1. /dashboard/kpi query is hitting limits
    curl -X 'GET'
    'https://naap-api.cloudspe.com/v1/dashboard/kpi?window=7d'
    -H 'accept: application/json'
  • < 24h ~3 seconds or less
  • 24h takes ~5 seconds
  • 48h takes ~15-20 seconds
  • 7d never succeeds
  1. /dashboard/kpi, the field orchestratorsOnline is always the same even if a 1h query or a 48h query, returned 84 orchestrators every time. This is fine if it's intended, but I will need to update UI a little to reflect that.
    "orchestratorsOnline": {
    "value": 84,
    "delta": 0
    },

  2. Additionally, we need to be able to find the 84 orchestrators and fill in the Orchestrators list table. In staging, the queries are failing to return all of them, only able to see about 33 of them currently

@eliteprox eliteprox marked this pull request as ready for review April 23, 2026 16:50
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
apps/web-next/src/components/dashboard/overview-content.tsx (1)

798-852: ⚠️ Potential issue | 🟡 Minor

Keep missing FPS values last when sorting descending.

Now that missing FPS is null, descending FPS sort puts missing rows first because missingLast() returns 1 and the final comparator reverses it.

Proposed comparator fix
     let c = 0;
+    const applyDirection = (inner: number): number => (sortDir === 'asc' ? inner : -inner);
     const missingLast = (aMissing: boolean, bMissing: boolean, inner: number): number => {
       if (aMissing && bMissing) return 0;
       if (aMissing) return 1;
       if (bMissing) return -1;
-      return inner;
+      return applyDirection(inner);
     };
     switch (sortCol) {
       case 'model':
-        c = a.localeCompare(b);
+        c = applyDirection(a.localeCompare(b));
         break;
       case 'gpus':
-        c = ra.gpus - rb.gpus;
+        c = applyDirection(ra.gpus - rb.gpus);
         break;
@@
     }
-    if (c !== 0) return sortDir === 'asc' ? c : -c;
+    if (c !== 0) return c;
     return a.localeCompare(b);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/components/dashboard/overview-content.tsx` around lines 798
- 852, The comparator puts missing FPS rows first when sorting descending
because missingLast currently always returns 1 for aMissing and -1 for bMissing
and then the final sortDir reversal flips that; update missingLast (used in the
sort block) to take sortDir into account (capture the outer sortDir or add a
parameter) so it returns 1 for aMissing only when sortDir === 'asc' and -1 when
sortDir === 'desc' (and the inverse for bMissing), then keep the existing final
reversal logic; this ensures cases like the 'fps' branch (ra.modelFps /
rb.modelFps) keep missing values last regardless of asc/desc.
packages/plugin-sdk/src/contracts/dashboard.ts (1)

70-79: ⚠️ Potential issue | 🟡 Minor

Add plugin-sdk CHANGELOG entry for breaking KPI.orchestratorsObserved rename.

The field rename from orchestratorsOnlineorchestratorsObserved is already complete in both GraphQL schema and TypeScript contracts. However, plugin-sdk has no CHANGELOG documenting this breaking change for downstream consumers. Add a semver-major entry (or breaking-change note) to track this contract change.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/plugin-sdk/src/contracts/dashboard.ts` around lines 70 - 79, Add a
semver-major breaking-change entry to the plugin-sdk CHANGELOG documenting the
field rename orchestratorsOnline → orchestratorsObserved in the GraphQL contract
and TypeScript types; mention the affected contract/type
(KPI.orchestratorsObserved) and that downstream consumers must update
references, include the version bump or "BREAKING CHANGE" header, and provide a
short migration note showing the old and new field names so consumers can find
and update usages.
🧹 Nitpick comments (7)
docs/pymthouse-minimal-design.md (2)

341-355: Consider stale-if-error directive for improved discovery resilience.

The error handling strategy uses Cache-Control: public, max-age=0, s-maxage=5, stale-while-revalidate=0, which provides fail-closed behavior (no stale data on error). This is conservative and appropriate for preventing clients from caching error states.

However, line 173 states: "If NaaP discovery is temporarily unavailable, clients receive a 503..." — emphasizing temporary unavailability. For transient failures, allowing stale discovery data may provide better user experience than hard failure.

Consider whether the stale-if-error directive would improve resilience:

Cache-Control: public, max-age=0, s-maxage=1800, stale-if-error=3600

This would allow CDN/clients to serve up to 1-hour-old discovery data if NaaP returns a 503, reducing the blast radius of transient infrastructure issues while still preventing indefinite stale data serving.

Trade-off: slightly stale orchestrator data vs. complete discovery unavailability during incidents.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/pymthouse-minimal-design.md` around lines 341 - 355, Update the Error
Handling Cache-Control recommendation to include a stale-if-error directive for
transient failures: in the "Error Handling" section where the header is
currently specified as "Cache-Control: public, max-age=0, s-maxage=5,
stale-while-revalidate=0" (and the sentence "If NaaP discovery is temporarily
unavailable, clients receive a `503`..."), change the suggested header to
include stale-if-error (for example "public, max-age=0, s-maxage=1800,
stale-if-error=3600") and add a short note explaining that this allows
CDNs/clients to serve recent discovery data on 503s to improve resilience while
accepting limited staleness.

280-282: Add language identifier to fenced code block.

The fenced code block on line 280 lacks a language specifier, which can affect syntax highlighting and accessibility in some markdown renderers.

🎨 Suggested fix
-```
+```http
 GET /api/v1/network/discover-orchestrators

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/pymthouse-minimal-design.md around lines 280 - 282, The fenced code
block containing the HTTP request "GET /api/v1/network/discover-orchestrators"
is missing a language identifier; update the opening fence to include the
language (e.g., use http) so the block becomes fenced with http to enable
proper syntax highlighting and accessibility for the GET
/api/v1/network/discover-orchestrators example.


</details>

</blockquote></details>
<details>
<summary>plugins/developer-api/backend/src/server.ts (1)</summary><blockquote>

`400-406`: **Brittle error-to-status mapping via substring match.**

Matching `error.message.includes('NAAP_API_SERVER_URL')` couples control flow to the error's wording. A future message tweak silently downgrades 503 to 502. Prefer a typed error (or sentinel) thrown from `getNaapApiServerBase`.

<details>
<summary>Suggested approach</summary>

```diff
+class NaapConfigError extends Error {}
 function getNaapApiServerBase(): string {
   const raw = process.env.NAAP_API_SERVER_URL?.trim();
   if (!raw) {
-    throw new Error(
+    throw new NaapConfigError(
       'NAAP_API_SERVER_URL is not set. Set it to the NAAP OpenAPI base URL (including /v1). See repo root .env.example.',
     );
   }
   return raw.replace(/\/+$/, '');
 }
@@
-    if (error instanceof Error && error.message.includes('NAAP_API_SERVER_URL')) {
-      return res.status(503).json({ error: error.message });
-    }
+    if (error instanceof NaapConfigError) {
+      return res.status(503).json({ error: error.message });
+    }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@plugins/developer-api/backend/src/server.ts` around lines 400 - 406, The
catch block in server.ts should not rely on error.message.includes; modify
getNaapApiServerBase to throw a specific Error subclass or sentinel (e.g.,
NaapConfigError) when the NAAP API server URL is missing/invalid, and then in
the catch branch replace the substring check with an instanceof check against
that class (or compare the sentinel) to return 503; leave other errors returning
502. Update references to getNaapApiServerBase and the catch block handling to
use the new NaapConfigError type.
```

</details>

</blockquote></details>
<details>
<summary>apps/web-next/src/lib/facade/dashboard-window.ts (1)</summary><blockquote>

`24-32`: **Redundant branch in `dashboardUpstreamTimeoutMs`.**

The `hours <= 72` branch and the fallback both return `55_000`, so the intermediate check is dead. Either collapse to a single non-`<=24` branch, or differentiate (e.g., cap > 72h to a smaller value or reject) to make intent explicit.

<details>
<summary>Proposed simplification</summary>

```diff
 export function dashboardUpstreamTimeoutMs(hours: number): number {
   if (hours <= 24) {
     return 30_000;
   }
-  if (hours <= 72) {
-    return 55_000;
-  }
   return 55_000;
 }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/lib/facade/dashboard-window.ts` around lines 24 - 32, The
function dashboardUpstreamTimeoutMs has a redundant branch: both the hours <= 72
case and the final fallback return 55_000; simplify by collapsing the branches
so that dashboardUpstreamTimeoutMs(hours) returns 30_000 when hours <= 24 and
55_000 otherwise (or, if intended, add distinct handling for >72h), updating the
implementation of dashboardUpstreamTimeoutMs to remove the dead hours <= 72
check and make the intent explicit.
```

</details>

</blockquote></details>
<details>
<summary>apps/web-next/src/app/api/v1/network/perf-by-model/route.ts (1)</summary><blockquote>

`47-54`: **Intentional single cache key — consider noting start/end are contract-only.**

Since the upstream `streaming/models` resolver ignores `start`/`end` and always returns a 24h weighted average, the single SWR key is correct. The inline comment already explains this clearly. You may also want to consider either removing the `start`/`end` query param requirement or keeping a clearer doc note that they're contract-only (not data-affecting) so API consumers don't assume time-window filtering. No change required.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/app/api/v1/network/perf-by-model/route.ts` around lines 47
- 54, The current code uses a single SWR cache key
'perf-by-model:streaming-models' with bffStaleWhileRevalidate and getPerfByModel
while upstream ignores start/end; update the inline comment or API doc in this
file to state explicitly that the query params start and end are contract-only
and do not affect returned data (or optionally remove start/end from the route
signature), and reference the bffStaleWhileRevalidate call, the cacheKey
'perf-by-model:streaming-models', and getPerfByModel to make clear where this
behavior originates.
```

</details>

</blockquote></details>
<details>
<summary>apps/web-next/src/lib/facade/resolvers/orchestrators.ts (1)</summary><blockquote>

`64-74`: **`mergeOrchestratorServiceUris` dedup is exact-match; inventory URIs aren't normalized.**

`out.includes(extra)` compares the raw strings. If inventory has `https://orch.example.com` and dashboard returns `https://orch.example.com/` (or different case in the host), both will be kept. Minor, but since this now feeds the `uris` list that drives row-filtering via `hasNonBlankServiceUri`, a small normalization (trim + strip trailing `/`) would reduce UI noise.

<details>
<summary>♻️ Optional normalization</summary>

```diff
 function mergeOrchestratorServiceUris(netUris: string[], dash: ApiOrchestrator): string[] {
-  const out = [...netUris];
+  const norm = (u: string) => u.trim().replace(/\/+$/, '');
+  const seen = new Set(netUris.map(norm));
+  const out = [...netUris];
   const extra =
     (typeof dash.serviceUri === 'string' && dash.serviceUri.trim()) ||
     (typeof dash.service_uri === 'string' && dash.service_uri.trim()) ||
     '';
-  if (extra && !out.includes(extra)) {
+  if (extra && !seen.has(norm(extra))) {
     out.push(extra);
   }
   return out;
 }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/lib/facade/resolvers/orchestrators.ts` around lines 64 -
74, mergeOrchestratorServiceUris currently deduplicates by exact string match so
variants like "https://orch.example.com" and "https://orch.example.com/" both
appear; update the function to normalize URIs before comparison by trimming
whitespace and removing a single trailing slash (e.g., normalize both existing
entries in out and the extracted extra from dash.serviceUri / dash.service_uri),
compare using these normalized forms, and only push the extra (preferably the
normalized form) if no normalized match exists; reference
mergeOrchestratorServiceUris and the dash.serviceUri / dash.service_uri
extraction when making the change.
```

</details>

</blockquote></details>
<details>
<summary>apps/web-next/src/lib/facade/upstream-parse.ts (1)</summary><blockquote>

`22-41`: **Minor: fallback when `streaming` key exists but is nullish loses `requests`.**

If upstream ever returns `{ streaming: null, requests: {...} }`, the current guard `o.streaming && typeof o.streaming === 'object'` fails and you fall through to `{ kpi: body as DashboardKPI }`, casting the wrapper itself as KPI and dropping the `requests` breakdown. Harmless given current API shape, but small hardening:

<details>
<summary>♻️ Defensive fallback</summary>

```diff
 export function parseDashboardKpiBody(body: unknown): {
   kpi: DashboardKPI;
   requests?: DashboardJobsOverview;
 } {
   if (body && typeof body === 'object' && 'streaming' in body) {
     const o = body as { streaming?: DashboardKPI; requests?: DashboardJobsOverview };
     if (o.streaming && typeof o.streaming === 'object') {
       return {
         kpi: o.streaming,
         requests: o.requests,
       };
     }
+    // Wrapper shape present but `streaming` missing/null: preserve requests if any.
+    return { kpi: {} as DashboardKPI, requests: o.requests };
   }
   return { kpi: body as DashboardKPI };
 }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/lib/facade/upstream-parse.ts` around lines 22 - 41, The
guard in parseDashboardKpiBody drops requests when body has a streaming key
that's nullish; change the logic in parseDashboardKpiBody to detect the presence
of the streaming key (using 'streaming' in body) and when present return { kpi:
o.streaming as DashboardKPI, requests: o.requests } so requests are preserved
even if o.streaming is null, and keep parseDashboardKpiWithRequests unchanged to
merge requests when present.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @apps/web-next/src/components/dashboard/overview-content.tsx:

  • Line 487: The label currently uses the timeframe variable tfLabel making the
    orchestrator count appear timeframe-scoped; update the tile invocation that
    renders Server (tile(Server, Orchestrators (${tfLabel}),
    data.orchestratorsObserved.value)) to a timeframe-agnostic label such as
    "Orchestrators (observed)" or "Orchestrators (current)" — replace the template
    string Orchestrators (${tfLabel}) with the new static label while leaving the
    Server and data.orchestratorsObserved.value arguments unchanged.
  • Line 1443: The current JSX in the that renders FPS hides valid outputFps
    when inputFps is missing; update the conditional to show outputFps whenever it
    exists. Specifically, modify the expression in the (the one referencing
    job.inputFps and job.outputFps) to: if job.outputFps != null then render
    ${job.inputFps != null ? job.inputFps.toFixed(1) : '—'} / ${job.outputFps.toFixed(1)}, otherwise render '—'; this ensures outputFps is
    displayed even when inputFps is null while preserving the existing formatting
    and fallback.

In @apps/web-next/src/lib/facade/resolvers/net-orchestrators.ts:

  • Around line 157-171: The two functions fetchStreamingOrchestrators and
    fetchRequestsOrchestrators currently call naapGet without pagination and
    therefore only return the first page; fix both by reintroducing a page size
    (e.g. NET_ORCHESTRATORS_PAGE_SIZE) and implementing a paginated fetch loop that
    calls naapGet repeatedly with pagination parameters (page/limit or offset/cursor
    depending on naapGet API), concatenates results into a single
    StreamingOrchestratorRow[] / RequestsOrchestratorRow[] and stops when a page
    returns fewer items than the page size (or no cursor is returned); apply the
    same pagination change for the other calls noted around lines 229–273 so all
    orchestrator records are collected before returning.

In @apps/web-next/src/lib/facade/resolvers/orchestrators.ts:

  • Around line 13-19: Update the stale header comment to reflect the actual
    upstream value ranges: state that effectiveSuccessRate and noSwapRatio are
    provided in 0–1 (and are scaled via pct()), while slaScore and successRatio are
    provided in 0–100 (slaScore is used via Math.round(dash.slaScore) and
    successRatio is handled as Math.round(dash.successRatio * 10) / 10); remove the
    assertion that slaScore and successRatio are 0–1 and note that dashboard
    endpoints are capped at 24h as before.

In @docs/pymthouse-minimal-design.md:

  • Around line 254-270: The doc's Tenant Model mentions client_id RLS but doesn't
    list which DB tables are protected; add a new subsection under the "Tenant
    Model" heading that explicitly enumerates the tables with PostgreSQL RLS
    (examples: usage_records, api_keys, subscriptions, app_users, refresh_tokens,
    billing_events), state the canonical RLS pattern (e.g., client_id =
    current_setting('app.current_client_id')::uuid), list any tables that are only
    filtered at the application layer, and describe exemptions for platform/admin
    queries that require cross-tenant visibility so Phase 0 migration and security
    review have an unambiguous RLS scope.
  • Around line 649-664: The docs currently state "Signer API keys stored
    encrypted at rest with one application encryption key" and defer per-provider
    derived-key encryption; add a required subsection under Decision 7 that mandates
    operational key-management controls for the MVP: include a documented and tested
    key rotation runbook for the application encryption key, explicit access-control
    and audit requirements for which services/roles can access the key, an assurance
    that the key is never in source control and must be provisioned via secret
    management or secure env vars, and an incident-response procedure (revocation
    and re-encryption steps) to follow if the application encryption key is
    compromised; reference the existing lines mentioning "Signer API keys", "one
    application encryption key", "planned rotation runbook", and the deferred
    "per-provider derived-key encryption" so reviewers can find and expand Decision
  • Around line 774-784: The header "Phase 1: NaaP Discovery Endpoint (In Progress
    — PR #266)" conflicts with all checklist items marked complete; update the
    document to reconcile status by either changing the header to indicate
    completion (e.g., "Complete — PR #266" or "Done") if all tasks are finished, or
    by unchecking any tasks that remain outstanding and adding a short note next to
    the header clarifying that completion is pending review/merge; make the change
    around the "Phase 1: NaaP Discovery Endpoint (In Progress — PR #266)" heading
    and the checklist entries so the header and the eight checkbox items reflect the
    same actual state.
  • Around line 284-289: Update the "Query Parameters" section row for the caps
    parameter to explicitly document the multi-value syntax and give a concrete
    example URL; mention that caps is repeated and show the preferred format (e.g.,
    repeated query params like
    ?caps=text-to-image/stabilityai/sd-turbo&caps=image-to-video/other) and clarify
    the OR semantics for multiple caps values, or state if comma-separated values
    are also supported — include one short example line demonstrating the exact
    request syntax and a one-sentence note that multiple caps are evaluated with OR
    semantics.
  • Line 339: The doc is ambiguous about whether the pipeline slug regex
    ^[a-z][a-z0-9-]{0,63}$ applies to the full slug or to each slash-separated
    segment (e.g., text-to-image/stabilityai/sd-turbo); update the text to
    explicitly state which approach we use: either say "validation is applied
    per-segment (each segment must match ^[a-z][a-z0-9-]{0,63}$)" or state "the
    full slug is validated and use an updated regex that permits / characters" and
    provide the chosen behavior and rationale next to the example slug so
    implementers know to validate segments vs. use a different pattern.

In @plugins/developer-api/backend/src/server.ts:

  • Around line 243-281: The ingest helper currently drops non-ok Response objects
    silently (in function ingest called with sRes and rRes), which hides upstream
    4xx/5xx failures; update ingest to log a warning when res.ok is false including
    res.status and the request URL (or a caller-provided label) and return early,
    and additionally in the caller that awaits ingest(sRes) and ingest(rRes) detect
    if both responses were non-ok (or both logged failures) and surface an error
    (e.g., return a 502) instead of merging into pipeline-catalog-only rows;
    reference ingest, sRes, rRes and netModelRowKey to locate where to add the
    logging and the dual-failure check.

Outside diff comments:
In @apps/web-next/src/components/dashboard/overview-content.tsx:

  • Around line 798-852: The comparator puts missing FPS rows first when sorting
    descending because missingLast currently always returns 1 for aMissing and -1
    for bMissing and then the final sortDir reversal flips that; update missingLast
    (used in the sort block) to take sortDir into account (capture the outer sortDir
    or add a parameter) so it returns 1 for aMissing only when sortDir === 'asc' and
    -1 when sortDir === 'desc' (and the inverse for bMissing), then keep the
    existing final reversal logic; this ensures cases like the 'fps' branch
    (ra.modelFps / rb.modelFps) keep missing values last regardless of asc/desc.

In @packages/plugin-sdk/src/contracts/dashboard.ts:

  • Around line 70-79: Add a semver-major breaking-change entry to the plugin-sdk
    CHANGELOG documenting the field rename orchestratorsOnline →
    orchestratorsObserved in the GraphQL contract and TypeScript types; mention the
    affected contract/type (KPI.orchestratorsObserved) and that downstream consumers
    must update references, include the version bump or "BREAKING CHANGE" header,
    and provide a short migration note showing the old and new field names so
    consumers can find and update usages.

Nitpick comments:
In @apps/web-next/src/app/api/v1/network/perf-by-model/route.ts:

  • Around line 47-54: The current code uses a single SWR cache key
    'perf-by-model:streaming-models' with bffStaleWhileRevalidate and getPerfByModel
    while upstream ignores start/end; update the inline comment or API doc in this
    file to state explicitly that the query params start and end are contract-only
    and do not affect returned data (or optionally remove start/end from the route
    signature), and reference the bffStaleWhileRevalidate call, the cacheKey
    'perf-by-model:streaming-models', and getPerfByModel to make clear where this
    behavior originates.

In @apps/web-next/src/lib/facade/dashboard-window.ts:

  • Around line 24-32: The function dashboardUpstreamTimeoutMs has a redundant
    branch: both the hours <= 72 case and the final fallback return 55_000; simplify
    by collapsing the branches so that dashboardUpstreamTimeoutMs(hours) returns
    30_000 when hours <= 24 and 55_000 otherwise (or, if intended, add distinct
    handling for >72h), updating the implementation of dashboardUpstreamTimeoutMs to
    remove the dead hours <= 72 check and make the intent explicit.

In @apps/web-next/src/lib/facade/resolvers/orchestrators.ts:

  • Around line 64-74: mergeOrchestratorServiceUris currently deduplicates by
    exact string match so variants like "https://orch.example.com" and
    "https://orch.example.com/" both appear; update the function to normalize URIs
    before comparison by trimming whitespace and removing a single trailing slash
    (e.g., normalize both existing entries in out and the extracted extra from
    dash.serviceUri / dash.service_uri), compare using these normalized forms, and
    only push the extra (preferably the normalized form) if no normalized match
    exists; reference mergeOrchestratorServiceUris and the dash.serviceUri /
    dash.service_uri extraction when making the change.

In @apps/web-next/src/lib/facade/upstream-parse.ts:

  • Around line 22-41: The guard in parseDashboardKpiBody drops requests when body
    has a streaming key that's nullish; change the logic in parseDashboardKpiBody to
    detect the presence of the streaming key (using 'streaming' in body) and when
    present return { kpi: o.streaming as DashboardKPI, requests: o.requests } so
    requests are preserved even if o.streaming is null, and keep
    parseDashboardKpiWithRequests unchanged to merge requests when present.

In @docs/pymthouse-minimal-design.md:

  • Around line 341-355: Update the Error Handling Cache-Control recommendation to
    include a stale-if-error directive for transient failures: in the "Error
    Handling" section where the header is currently specified as "Cache-Control:
    public, max-age=0, s-maxage=5, stale-while-revalidate=0" (and the sentence "If
    NaaP discovery is temporarily unavailable, clients receive a 503..."), change
    the suggested header to include stale-if-error (for example "public, max-age=0,
    s-maxage=1800, stale-if-error=3600") and add a short note explaining that this
    allows CDNs/clients to serve recent discovery data on 503s to improve resilience
    while accepting limited staleness.
  • Around line 280-282: The fenced code block containing the HTTP request "GET
    /api/v1/network/discover-orchestrators" is missing a language identifier; update
    the opening fence to include the language (e.g., use http) so the block becomes fenced with http to enable proper syntax highlighting and
    accessibility for the GET /api/v1/network/discover-orchestrators example.

In @plugins/developer-api/backend/src/server.ts:

  • Around line 400-406: The catch block in server.ts should not rely on
    error.message.includes; modify getNaapApiServerBase to throw a specific Error
    subclass or sentinel (e.g., NaapConfigError) when the NAAP API server URL is
    missing/invalid, and then in the catch branch replace the substring check with
    an instanceof check against that class (or compare the sentinel) to return 503;
    leave other errors returning 502. Update references to getNaapApiServerBase and
    the catch block handling to use the new NaapConfigError type.

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `781edadd-689e-42d4-8c16-23c90a6b9ccf`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 77244d7cc8b07f9a6f77f62bce178db810a54776 and aa89fcb050a54302425de6cb8a00ad6d3fd75203.

</details>

<details>
<summary>📒 Files selected for processing (33)</summary>

* `apps/web-next/src/app/(dashboard)/dashboard/page.tsx`
* `apps/web-next/src/app/api/internal/bff-warm/route.ts`
* `apps/web-next/src/app/api/v1/dashboard/orchestrators/route.ts`
* `apps/web-next/src/app/api/v1/developer/network-models/route.ts`
* `apps/web-next/src/app/api/v1/naap-api/warm/route.ts`
* `apps/web-next/src/app/api/v1/network/perf-by-model/route.ts`
* `apps/web-next/src/components/dashboard/overview-content.tsx`
* `apps/web-next/src/content/docs/guides/dashboard-data-provider.mdx`
* `apps/web-next/src/hooks/usePublicDashboard.ts`
* `apps/web-next/src/lib/dashboard/overview-timeframe.ts`
* `apps/web-next/src/lib/facade/dashboard-window.ts`
* `apps/web-next/src/lib/facade/index.ts`
* `apps/web-next/src/lib/facade/network-data.ts`
* `apps/web-next/src/lib/facade/resolvers/gpu-capacity.ts`
* `apps/web-next/src/lib/facade/resolvers/kpi.ts`
* `apps/web-next/src/lib/facade/resolvers/net-capacity.ts`
* `apps/web-next/src/lib/facade/resolvers/net-orchestrators.ts`
* `apps/web-next/src/lib/facade/resolvers/orchestrators.ts`
* `apps/web-next/src/lib/facade/resolvers/perf-by-model.ts`
* `apps/web-next/src/lib/facade/resolvers/pipelines.ts`
* `apps/web-next/src/lib/facade/resolvers/pricing.ts`
* `apps/web-next/src/lib/facade/stubs.ts`
* `apps/web-next/src/lib/facade/upstream-parse.ts`
* `docs/pymthouse-minimal-design.md`
* `packages/plugin-sdk/src/contracts/__tests__/dashboard.test.ts`
* `packages/plugin-sdk/src/contracts/dashboard.ts`
* `packages/plugin-sdk/src/contracts/index.ts`
* `packages/plugin-sdk/src/index.ts`
* `packages/plugin-sdk/src/types/network-model.ts`
* `packages/types/src/index.ts`
* `plugins/dashboard-data-provider/frontend/src/__tests__/provider.test.ts`
* `plugins/dashboard-data-provider/frontend/src/provider.ts`
* `plugins/developer-api/backend/src/server.ts`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment thread apps/web-next/src/components/dashboard/overview-content.tsx
Comment thread apps/web-next/src/components/dashboard/overview-content.tsx Outdated
Comment thread apps/web-next/src/lib/facade/resolvers/net-orchestrators.ts
Comment thread apps/web-next/src/lib/facade/resolvers/orchestrators.ts Outdated
Comment thread docs/pymthouse-minimal-design.md Outdated
Comment thread docs/pymthouse-minimal-design.md Outdated
Comment thread docs/pymthouse-minimal-design.md Outdated
Comment thread docs/pymthouse-minimal-design.md Outdated
Comment thread docs/pymthouse-minimal-design.md Outdated
Comment thread plugins/developer-api/backend/src/server.ts Outdated
…sponse handling

- Updated the JobFeedCard component to conditionally display input FPS, ensuring proper formatting when values are null.
- Clarified documentation in orchestrators resolver regarding the scaling of API response values for effectiveSuccessRate, noSwapRatio, slaScore, and successRatio.
- Enhanced error handling in the developer API server by adding labels to the ingest function for better logging of failed requests.
…oved error handling

- Added a `bustCache` option to `appendJobFeedPollQuery` for cache invalidation.
- Updated `fetchJobFeed` to accept options for cache busting and improved error handling.
- Introduced `JOB_FEED_RECOVERY_RETRY_DELAYS` for better retry logic on upstream errors.
- Enhanced logging to reflect the actual request URL during fetch operations.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/web-next/src/lib/facade/resolvers/orchestrators.ts (1)

178-191: ⚠️ Potential issue | 🟠 Major

Data loss via URI filter — likely root cause of the "33 of 84 orchestrators" blocker.

orderedKeys correctly unions dashboard addresses with net-only addresses, but line 190 drops every row with empty uris via hasNonBlankServiceUri. Given the PR blocker (84 orchestrators in → ~33 out), the most plausible explanation is:

  1. Upstream GET /v1/dashboard/orchestrators is not populating serviceUri/service_uri for most rows (the new optional fields on ApiOrchestrator are the only dashboard-side URI source feeding mergeOrchestratorServiceUris).
  2. Those addresses also lack an entry in netData.urisByAddress (i.e., neither streaming/orchestrators nor requests/orchestrators returned a URI), so mapDashboardIntoNetRow produces uris: [] and the row is filtered out at line 190.

To diagnose and prevent silent data loss:

  • Add instrumentation before returning: count rows entering the merge, count after the filter, and count dropped because both inventory urisByAddress and dashboard serviceUri were empty.
  • Verify upstream payload contains serviceUri/service_uri on the dashboard rows you're seeing in staging; if not, recovery requires upstream or inventory fixes.
  • Consider expanding orderedKeys to include addresses from pipelineModelsByAddress/lastSeenMsByAddress that lack urisByAddress entries—today net-only addresses are sourced from urisByAddress.keys() only, making any inventory record without a URI invisible.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/lib/facade/resolvers/orchestrators.ts` around lines 178 -
191, The merge is silently dropping rows with empty uris because merged rows are
filtered by hasNonBlankServiceUri; update the logic around orderedKeys/merged to
(1) add instrumentation counters inside the mergeOrchestratorServiceUris flow to
log total input keys (orderedKeys), rows before filter, rows after filter and
count of rows dropped due to both dashboard serviceUri and netData.urisByAddress
being empty (use mapDashboardIntoNetRow/netOnlyPlaceholder outputs to detect
drops), and (2) expand orderedKeys to also include addresses from
pipelineModelsByAddress and lastSeenMsByAddress so inventory-only records
without uris are not excluded; reference functions mapDashboardIntoNetRow,
netOnlyPlaceholder, hasNonBlankServiceUri and the merged/orderedKeys variables
when making these changes.
🧹 Nitpick comments (4)
plugins/developer-api/backend/src/server.ts (1)

405-413: Prefer a typed/sentinel error over substring matching for the 503 path.

error.message.includes('NAAP_API_SERVER_URL') is brittle — any future error whose message happens to mention that env name (e.g., from upstream logs or fetch wrappers) will get mapped to 503 instead of 502, and a wording change in getNaapApiServerBase would silently flip the status code. A small class NaapConfigError extends Error {} (thrown from getNaapApiServerBase) and an instanceof check here would be more robust.

♻️ Sketch
+class NaapConfigError extends Error {}
+
 function getNaapApiServerBase(): string {
   const raw = process.env.NAAP_API_SERVER_URL?.trim();
   if (!raw) {
-    throw new Error(
+    throw new NaapConfigError(
       'NAAP_API_SERVER_URL is not set. Set it to the NAAP OpenAPI base URL (including /v1). See repo root .env.example.',
     );
   }
   return raw.replace(/\/+$/, '');
 }
@@
-    if (error instanceof Error && error.message.includes('NAAP_API_SERVER_URL')) {
+    if (error instanceof NaapConfigError) {
       return res.status(503).json({ error: error.message });
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/developer-api/backend/src/server.ts` around lines 405 - 413, The
catch block in server.ts currently checks
error.message.includes('NAAP_API_SERVER_URL') which is brittle; instead
introduce a sentinel error class (e.g., class NaapConfigError extends Error) and
throw it from getNaapApiServerBase when the NAAP config is missing/invalid, then
replace the substring check with an instanceof NaapConfigError check in the
catch (near netModelsInflight.delete and the existing res.status calls) so true
config errors return 503 and all other errors return 502.
apps/web-next/src/lib/facade/resolvers/orchestrators.ts (2)

67-77: Dedup is whitespace/case-sensitive — may push a near-duplicate URI.

extra is .trim()ed before the membership check, but netUris entries are not normalized at this site. If inventory yields "https://example.com/" and dashboard serviceUri is " https://example.com/ " (or differs only in trailing slash / scheme casing), the out.includes(extra) check fails and a duplicate-looking URI is appended. Consider normalizing both sides (trim at minimum) before the comparison, e.g.:

♻️ Suggested normalization
 function mergeOrchestratorServiceUris(netUris: string[], dash: ApiOrchestrator): string[] {
-  const out = [...netUris];
+  const out = [...netUris];
   const extra =
     (typeof dash.serviceUri === 'string' && dash.serviceUri.trim()) ||
     (typeof dash.service_uri === 'string' && dash.service_uri.trim()) ||
     '';
-  if (extra && !out.includes(extra)) {
+  if (extra && !out.some((u) => u.trim() === extra)) {
     out.push(extra);
   }
   return out;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/lib/facade/resolvers/orchestrators.ts` around lines 67 -
77, mergeOrchestratorServiceUris can append near-duplicate URIs because only
dash.serviceUri is trimmed before the includes check; normalize both sides
instead: map netUris to a normalized form (trim, optional lowercase, and remove
trailing slash) and compute the normalized form of extra (from dash.serviceUri
or dash.service_uri) and check against that normalized list before pushing;
update references to netUris, extra, out, and the function
mergeOrchestratorServiceUris so the original array preserves values but the
membership check uses the normalized set to avoid whitespace/case/trailing-slash
duplicates.

120-125: Minor: null-guard inconsistency vs ApiOrchestrator typing.

ApiOrchestrator.successRatio is typed number (non-nullable, line 42), yet line 122 defensively guards dash.successRatio != null. Either tighten the runtime contract (drop the guard) or relax the type to number | null to match the guard — pick one for consistency. slaScore (line 125) is correctly typed number | null and guarded with !== null, which is fine. Not a functional bug, just a contract mismatch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/lib/facade/resolvers/orchestrators.ts` around lines 120 -
125, The runtime uses a null-check for dash.successRatio but
ApiOrchestrator.successRatio is declared non-nullable; make the contract
consistent by updating ApiOrchestrator.successRatio to be number | null (or
number|null) and keep the existing guard in orchestrators.ts (the mapping that
produces the successRatio field); also review any other callsites that assume a
non-null number and adjust them to handle null (or coerce to 0) so the type
change does not break consumers.
apps/web-next/src/hooks/useJobFeedStream.ts (1)

135-139: Tighten the comment — the ladder applies to any failure streak, not just first load.

JOB_FEED_RECOVERY_RETRY_DELAYS is consulted whenever feedFailureStreak > 0, regardless of whether it's the first load or a transient blip mid-session. The current wording ("on first load") under-describes the actual behavior.

📝 Proposed wording
-/**
- * When the upstream briefly errors on first load, retry quickly before falling
- * back to the steady poll interval.
- */
+/**
+ * Recovery back-off (ms) consulted whenever the feed reports an unhealthy
+ * response (HTTP error, invalid JSON, or `queryFailed:true`). Each delay is
+ * additionally capped by the steady `pollIntervalMs`, so it can only shorten
+ * — never lengthen — the cadence relative to a healthy stream.
+ */
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/hooks/useJobFeedStream.ts` around lines 135 - 139, The
comment above JOB_FEED_RECOVERY_RETRY_DELAYS is misleading — it says “on first
load” but the delay ladder is used whenever feedFailureStreak > 0 (any failure
streak). Update the comment to state that these delays are used for recovery
retries on any failure streak (transient or persistent) rather than only on
first load, and mention the tie to feedFailureStreak so future readers see when
the array is consulted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/developer-api/backend/src/server.ts`:
- Around line 269-288: The merge currently blindly sums capacities for identical
Pipeline:Model keys (netModelRowKey) across `streaming/models` and
`requests/models` which can overstate capacity if the endpoints report
overlapping orchestrators; change the merge in the loop over `accum` (variables:
byKey, netModelRowKey, WarmOrchCount, TotalCapacity) to detect duplicates coming
from different sources and resolve them by either asserting disjointness
(throw/log) or by a deterministic merge rule (e.g., prefer the larger capacity,
or prefer one source over the other) instead of always summing. Also propagate a
`partial` indicator from the `ingest` calls (use the boolean results from
`streamingOk` and `requestsOk`) back to the cache layer: if either upstream
failed set `partial=true` and either skip writing to cache or write with a much
shorter TTL (instead of the current 60s) so transient partial results are not
served for a minute.

---

Outside diff comments:
In `@apps/web-next/src/lib/facade/resolvers/orchestrators.ts`:
- Around line 178-191: The merge is silently dropping rows with empty uris
because merged rows are filtered by hasNonBlankServiceUri; update the logic
around orderedKeys/merged to (1) add instrumentation counters inside the
mergeOrchestratorServiceUris flow to log total input keys (orderedKeys), rows
before filter, rows after filter and count of rows dropped due to both dashboard
serviceUri and netData.urisByAddress being empty (use
mapDashboardIntoNetRow/netOnlyPlaceholder outputs to detect drops), and (2)
expand orderedKeys to also include addresses from pipelineModelsByAddress and
lastSeenMsByAddress so inventory-only records without uris are not excluded;
reference functions mapDashboardIntoNetRow, netOnlyPlaceholder,
hasNonBlankServiceUri and the merged/orderedKeys variables when making these
changes.

---

Nitpick comments:
In `@apps/web-next/src/hooks/useJobFeedStream.ts`:
- Around line 135-139: The comment above JOB_FEED_RECOVERY_RETRY_DELAYS is
misleading — it says “on first load” but the delay ladder is used whenever
feedFailureStreak > 0 (any failure streak). Update the comment to state that
these delays are used for recovery retries on any failure streak (transient or
persistent) rather than only on first load, and mention the tie to
feedFailureStreak so future readers see when the array is consulted.

In `@apps/web-next/src/lib/facade/resolvers/orchestrators.ts`:
- Around line 67-77: mergeOrchestratorServiceUris can append near-duplicate URIs
because only dash.serviceUri is trimmed before the includes check; normalize
both sides instead: map netUris to a normalized form (trim, optional lowercase,
and remove trailing slash) and compute the normalized form of extra (from
dash.serviceUri or dash.service_uri) and check against that normalized list
before pushing; update references to netUris, extra, out, and the function
mergeOrchestratorServiceUris so the original array preserves values but the
membership check uses the normalized set to avoid whitespace/case/trailing-slash
duplicates.
- Around line 120-125: The runtime uses a null-check for dash.successRatio but
ApiOrchestrator.successRatio is declared non-nullable; make the contract
consistent by updating ApiOrchestrator.successRatio to be number | null (or
number|null) and keep the existing guard in orchestrators.ts (the mapping that
produces the successRatio field); also review any other callsites that assume a
non-null number and adjust them to handle null (or coerce to 0) so the type
change does not break consumers.

In `@plugins/developer-api/backend/src/server.ts`:
- Around line 405-413: The catch block in server.ts currently checks
error.message.includes('NAAP_API_SERVER_URL') which is brittle; instead
introduce a sentinel error class (e.g., class NaapConfigError extends Error) and
throw it from getNaapApiServerBase when the NAAP config is missing/invalid, then
replace the substring check with an instanceof NaapConfigError check in the
catch (near netModelsInflight.delete and the existing res.status calls) so true
config errors return 503 and all other errors return 502.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e1620015-2792-4c8d-b05e-9cd8e27dc01a

📥 Commits

Reviewing files that changed from the base of the PR and between aa89fcb and 0a8e88c.

📒 Files selected for processing (4)
  • apps/web-next/src/components/dashboard/overview-content.tsx
  • apps/web-next/src/hooks/useJobFeedStream.ts
  • apps/web-next/src/lib/facade/resolvers/orchestrators.ts
  • plugins/developer-api/backend/src/server.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/web-next/src/components/dashboard/overview-content.tsx

Comment thread plugins/developer-api/backend/src/server.ts Outdated
- Enhanced `fetchJobFeed` to accept a base URL and effective polling interval, improving flexibility.
- Updated retry logic with `JOB_FEED_RECOVERY_RETRY_DELAYS` for better handling of transient errors.
- Refactored job feed polling to utilize the new fetch function, ensuring consistent cache busting behavior.
- Added documentation for recovery backoff strategy after failed polls.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

plugin/developer-api Developer API plugin scope/packages Shared package changes scope/sdk Plugin SDK changes scope/shell Shell app changes size/XL Extra large PR (500+ lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants