feat(web-next): align NAAP dashboard and BFF with OpenAPI v1 (streaming + requests) by eliteprox · Pull Request #270 · livepeer/naap

eliteprox · 2026-04-17T00:40:59Z

Summary

This PR aligns the web BFF and related resolvers with the repo OpenAPI v1 contract: dashboard data is sourced from streaming and requests resources where specified, merged where the UI needs a single view, and cached consistently. It also tightens dashboard window handling and timeouts for large ranges, fixes orchestrator listing by unioning dashboard/orchestrators with inventory and merging serviceUri, documents warm routes against the spec, and updates the developer-api network-models path to use NAAP_API_SERVER_URL (no hardcoded NAAP base) and merged streaming/models + requests/models.

Changes

Document BFF/NAAP warm routes and point them at openapi.yaml / upstream semantics.
Developer API: network models merged from streaming/models + requests/models (+ catalog where applicable); upstream base from NAAP_API_SERVER_URL only.
Facade: perf-by-model backed by streaming/models with a single in-process + SWR cache key (24h semantics per OpenAPI).
Dashboard: hooks and parsers for combined KPI/pipelines payloads; FPS column uses streaming/models avg_fps with header tooltip (24h, live-video scope); KPI/pipelines use formatDashboardWindow and scaled upstream timeouts for large windows.
Network data: merged streaming + requests model rows where used.
Orchestrators: merge dashboard SLA rows (preserve order) with streaming + requests inventory; merge serviceUri into uris; pipelines/pricing resolvers updated for OpenAPI v1 shapes.

Type

Plugin(s) Affected

plugins/developer-api (backend network-models / NAAP upstream env)

Checklist

Tests pass locally
Lint passes (npm run lint)
Build succeeds (npm run build)
No new lint warnings introduced
Breaking changes documented below
Database migration included (if Prisma schema changed)

Breaking Changes

None for consumers of the public dashboard JSON shapes beyond what is already required by the OpenAPI v1 migration (e.g. NAAP_API_SERVER_URL must be set for developer-api network-models when calling NAAP).

Screenshots / Recordings

Optional: Overview Pipelines & GPUs (FPS column + info tooltip) and Orchestrators table length before/after if you want visual proof.

Summary by CodeRabbit

New Features
- Added informational tooltips to pipeline FPS column for metric provenance.
- Enhanced job feed polling with adaptive retry logic and cache-busting for improved resilience.
Changes
- Renamed KPI metric from "orchestratorsOnline" to "orchestratorsObserved".
- Success Rate KPI now displays with one decimal precision.
- Pipeline FPS values display with one decimal precision.
- Pipeline table default sort order changed from model name to GPU count.
- Dashboard timeframe selector maximum reduced from 7 days to 48 hours.
- Improved FPS metric calculation for greater accuracy.

vercel · 2026-04-17T00:41:04Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
naap-platform	Ready	Preview, Comment	Apr 27, 2026 9:56pm

coderabbitai · 2026-04-17T00:41:06Z

Warning

Rate limit exceeded

@eliteprox has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 25 minutes and 24 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 99778e4a-7dca-42f8-b80d-02f7f562d2b3

📥 Commits

Reviewing files that changed from the base of the PR and between 0a8e88c and a69625f.

📒 Files selected for processing (3)

apps/web-next/src/hooks/useJobFeedStream.ts
apps/web-next/src/lib/facade/resolvers/orchestrators.ts
plugins/developer-api/backend/src/server.ts

📝 Walkthrough

Walkthrough

This PR consolidates dashboard data sources by fetching from merged NAAP streaming and requests endpoints instead of legacy single endpoints. It renames the orchestrators KPI metric from orchestratorsOnline to orchestratorsObserved, caps timeframe options at 48 hours, introduces request-aware data types for KPI and pipelines, and refactors multiple resolvers to aggregate multi-source data with updated caching strategies.

Changes

Cohort / File(s)	Summary
KPI Field Rename `packages/types/src/index.ts`, `packages/plugin-sdk/src/contracts/dashboard.ts`, `packages/plugin-sdk/src/contracts/__tests__/dashboard.test.ts`, `apps/web-next/src/content/docs/guides/dashboard-data-provider.mdx`, `apps/web-next/src/lib/facade/stubs.ts`, `plugins/dashboard-data-provider/frontend/src/__tests__/provider.test.ts`	Updates `orchestratorsOnline` to `orchestratorsObserved` across schema, types, tests, documentation, and stub data.
Dashboard Data Types & Contract Expansion `packages/plugin-sdk/src/contracts/dashboard.ts`, `packages/plugin-sdk/src/contracts/index.ts`, `packages/plugin-sdk/src/index.ts`	Adds `DashboardKPIWithRequests`, `DashboardPipelinesWithRequests`, and related job-stat types; updates resolver contracts to accept extended return types.
Facade Resolvers — KPI & Pipelines `apps/web-next/src/lib/facade/resolvers/kpi.ts`, `apps/web-next/src/lib/facade/resolvers/pipelines.ts`, `apps/web-next/src/lib/facade/resolvers/perf-by-model.ts`	Refactors KPI resolver to fetch from single `/v1/dashboard/kpi` endpoint instead of combining multiple sources; pipelines now returns object with `streaming` and `requests` fields; perf-by-model sources FPS from `streaming/models` instead of legacy perf endpoint.
Facade Resolvers — Orchestrators & Network Data `apps/web-next/src/lib/facade/resolvers/net-orchestrators.ts`, `apps/web-next/src/lib/facade/resolvers/orchestrators.ts`, `apps/web-next/src/lib/facade/resolvers/net-capacity.ts`	Consolidates net-orchestrators to fetch from `/v1/streaming/orchestrators` and `/v1/requests/orchestrators` with new merge logic; orchestrators resolver caps timeframe and merges service URIs; net-capacity sources warm-orch counts from raw network models.
Facade Resolvers — Pricing & GPU Capacity `apps/web-next/src/lib/facade/resolvers/pricing.ts`, `apps/web-next/src/lib/facade/resolvers/gpu-capacity.ts`	Pricing resolver now handles dual payload shapes (legacy aggregated vs. per-orchestrator) with runtime type detection and aggregation; GPU capacity uses new `formatDashboardWindow` helper for upstream query parameter.
Facade Helpers & Utilities `apps/web-next/src/lib/facade/index.ts`, `apps/web-next/src/lib/facade/dashboard-window.ts`, `apps/web-next/src/lib/facade/upstream-parse.ts`, `apps/web-next/src/lib/facade/network-data.ts`	Introduces `formatDashboardWindow` and `dashboardUpstreamTimeoutMs` helpers; new `upstream-parse.ts` module with payload shape detection and parsing; network-data now aggregates from `streaming/models` and `requests/models` with pricing merge.
Timeframe Management `apps/web-next/src/lib/dashboard/overview-timeframe.ts`	Introduces `OVERVIEW_TIMEFRAME_MAX_HOURS` constant (48), removes 72h and 7d options, marks 48h as maximum in selector.
Dashboard Routes `apps/web-next/src/app/api/v1/dashboard/orchestrators/route.ts`, `apps/web-next/src/app/api/v1/network/perf-by-model/route.ts`, `apps/web-next/src/app/api/internal/bff-warm/route.ts`, `apps/web-next/src/app/api/v1/naap-api/warm/route.ts`, `apps/web-next/src/app/api/v1/developer/network-models/route.ts`	Orchestrators route derives cache key from effective upstream window; perf-by-model route uses constant cache key; developer network-models route fetches from streaming/requests instead of legacy net/models; route comments updated to reference new data sources.
UI Components & Hooks `apps/web-next/src/components/dashboard/overview-content.tsx`, `apps/web-next/src/hooks/usePublicDashboard.ts`, `apps/web-next/src/app/(dashboard)/dashboard/page.tsx`, `apps/web-next/src/hooks/useJobFeedStream.ts`	Dashboard page queries new `orchestratorsObserved` KPI field; overview-content updates FPS handling to use only perf-based values, adds info tooltip for FPS provenance, changes default sort to GPUs descending; usePublicDashboard handles new type signatures and normalizes pipelines response shape; job feed polling adds cache-busting and adaptive retry logic.
Plugin System & Data Provider `plugins/dashboard-data-provider/frontend/src/provider.ts`, `plugins/dashboard-data-provider/frontend/src/__tests__/provider.test.ts`, `plugins/developer-api/backend/src/server.ts`, `packages/plugin-sdk/src/types/network-model.ts`	Data provider fetches pipelines and normalizes response shape; developer API endpoint derives model inventory from streaming/requests; plugin SDK type docs updated to reflect new data sources; tests updated for KPI field rename.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant BFF as Web Server<br/>(Facade)
    participant StreamModels as NAAP:<br/>streaming/models
    participant RequestModels as NAAP:<br/>requests/models
    participant Pricing as NAAP:<br/>dashboard/pricing
    participant Cache as Redis Cache

    Client->>BFF: GET /api/v1/network/models
    
    alt Cache Hit
        BFF->>Cache: lookup networkModels
        Cache-->>BFF: cached payload
    else Cache Miss
        par Concurrent Fetch
            BFF->>+StreamModels: GET /v1/streaming/models
            BFF->>+RequestModels: GET /v1/requests/models
        and
            BFF->>+Pricing: GET /v1/dashboard/pricing
        end
        
        StreamModels-->>-BFF: rows (pipeline, model, warm_orch_count...)
        RequestModels-->>-BFF: rows (pipeline, model, gpu_slots...)
        Pricing-->>-BFF: pricing rows (min/max/avg per key)
        
        Note over BFF: Merge by pipeline:model key<br/>Sum capacity fields<br/>Aggregate pricing
        
        BFF->>Cache: store aggregated models
    end
    
    BFF-->>Client: NetworkModel[] with pricing

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

feat(api): add orchestrator discovery endpoint using cached NaaP data #266 — Modifies the same net-orchestrators resolver with overlapping changes to streaming/requests orchestrator source parsing and row merging logic.
feat(dashboard): remove orch address display and add Livepeer Explorer links #253 — Updates orchestrator URI merging and consolidation logic in the resolver, directly related to the URI-merging changes in resolveOrchestrators.
feat(dashboard): Redis SWR for BFF routes, cron warm-up, and facade/cache refactor #235 — Modifies dashboard BFF caching and upstream-window formatting helpers used across multiple resolvers in this PR.

Suggested reviewers

seanhanca

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the primary goal: aligning NAAP dashboard and BFF with OpenAPI v1 to use streaming and requests data sources.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/dashboard-data-updates

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-17T00:41:08Z

⚠️ This PR is very large (1693 lines changed). Please split it into smaller, focused PRs if possible.

…PI compliance - Enhanced BFF warm route documentation to reference the upstream OpenAPI contract. - Updated developer network models endpoint to merge streaming and requests models. - Refined NAAP API warm route comments to clarify data sources. - Adjusted performance by model resolver to utilize a single cache key for streaming models. - Improved dashboard hooks to handle new data structures and ensure compatibility with requests. - Introduced new utility functions for dashboard window formatting and timeout management. - Merged streaming and requests model data in the network data layer for better consistency. - Updated orchestrator and pipeline resolvers to reflect changes in data fetching and merging logic. - Enhanced pricing resolver to accommodate new OpenAPI v1 structure and ensure accurate pricing data representation.

- Introduced a new utility function to determine the effective upstream window for orchestrators based on the requested period. - Updated the orchestrator cache key to utilize the effective window for improved data accuracy. - Simplified the KPI resolver by removing unnecessary net orchestrator data fetching and directly returning parsed KPI data. - Enhanced documentation to clarify the behavior of the orchestrator endpoint and its limitations on timeframes.

…ed performance - Changed default sorting in PipelinesCard from 'model' to 'gpus' to prioritize busy models. - Introduced OVERVIEW_TIMEFRAME_MAX_HOURS constant to cap timeframe options and ensure consistent handling across various resolvers. - Updated KPI and pipeline resolvers to utilize the new normalization function for timeframe parsing, enhancing data accuracy and performance.

eliteprox · 2026-04-21T22:35:09Z

Two issues remain blockers for this PR.

** partial blocker ** workaround added in @53bb1cea2d121a864580b268dc5b464e87f75514 to query only past 48 hours

/dashboard/kpi query is hitting limits
curl -X 'GET'
'https://naap-api.cloudspe.com/v1/dashboard/kpi?window=7d'
-H 'accept: application/json'

< 24h ~3 seconds or less
24h takes ~5 seconds
48h takes ~15-20 seconds
7d never succeeds

/dashboard/kpi, the field orchestratorsOnline is always the same even if a 1h query or a 48h query, returned 84 orchestrators every time. This is fine if it's intended, but I will need to update UI a little to reflect that.
"orchestratorsOnline": {
"value": 84,
"delta": 0
},
Additionally, we need to be able to find the 84 orchestrators and fill in the Orchestrators list table. In staging, the queries are failing to return all of them, only able to see about 33 of them currently

coderabbitai

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

apps/web-next/src/components/dashboard/overview-content.tsx (1)

798-852: ⚠️ Potential issue | 🟡 Minor

Keep missing FPS values last when sorting descending.

Now that missing FPS is null, descending FPS sort puts missing rows first because missingLast() returns 1 and the final comparator reverses it.

Proposed comparator fix

     let c = 0;
+    const applyDirection = (inner: number): number => (sortDir === 'asc' ? inner : -inner);
     const missingLast = (aMissing: boolean, bMissing: boolean, inner: number): number => {
       if (aMissing && bMissing) return 0;
       if (aMissing) return 1;
       if (bMissing) return -1;
-      return inner;
+      return applyDirection(inner);
     };
     switch (sortCol) {
       case 'model':
-        c = a.localeCompare(b);
+        c = applyDirection(a.localeCompare(b));
         break;
       case 'gpus':
-        c = ra.gpus - rb.gpus;
+        c = applyDirection(ra.gpus - rb.gpus);
         break;
@@
     }
-    if (c !== 0) return sortDir === 'asc' ? c : -c;
+    if (c !== 0) return c;
     return a.localeCompare(b);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/components/dashboard/overview-content.tsx` around lines 798
- 852, The comparator puts missing FPS rows first when sorting descending
because missingLast currently always returns 1 for aMissing and -1 for bMissing
and then the final sortDir reversal flips that; update missingLast (used in the
sort block) to take sortDir into account (capture the outer sortDir or add a
parameter) so it returns 1 for aMissing only when sortDir === 'asc' and -1 when
sortDir === 'desc' (and the inverse for bMissing), then keep the existing final
reversal logic; this ensures cases like the 'fps' branch (ra.modelFps /
rb.modelFps) keep missing values last regardless of asc/desc.

packages/plugin-sdk/src/contracts/dashboard.ts (1)

70-79: ⚠️ Potential issue | 🟡 Minor

Add plugin-sdk CHANGELOG entry for breaking KPI.orchestratorsObserved rename.

The field rename from orchestratorsOnline → orchestratorsObserved is already complete in both GraphQL schema and TypeScript contracts. However, plugin-sdk has no CHANGELOG documenting this breaking change for downstream consumers. Add a semver-major entry (or breaking-change note) to track this contract change.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/plugin-sdk/src/contracts/dashboard.ts` around lines 70 - 79, Add a
semver-major breaking-change entry to the plugin-sdk CHANGELOG documenting the
field rename orchestratorsOnline → orchestratorsObserved in the GraphQL contract
and TypeScript types; mention the affected contract/type
(KPI.orchestratorsObserved) and that downstream consumers must update
references, include the version bump or "BREAKING CHANGE" header, and provide a
short migration note showing the old and new field names so consumers can find
and update usages.

🧹 Nitpick comments (7)

docs/pymthouse-minimal-design.md (2)
341-355: Consider stale-if-error directive for improved discovery resilience.

The error handling strategy uses Cache-Control: public, max-age=0, s-maxage=5, stale-while-revalidate=0, which provides fail-closed behavior (no stale data on error). This is conservative and appropriate for preventing clients from caching error states.

However, line 173 states: "If NaaP discovery is temporarily unavailable, clients receive a 503..." — emphasizing temporary unavailability. For transient failures, allowing stale discovery data may provide better user experience than hard failure.

Consider whether the stale-if-error directive would improve resilience:
Cache-Control: public, max-age=0, s-maxage=1800, stale-if-error=3600
This would allow CDN/clients to serve up to 1-hour-old discovery data if NaaP returns a 503, reducing the blast radius of transient infrastructure issues while still preventing indefinite stale data serving.

Trade-off: slightly stale orchestrator data vs. complete discovery unavailability during incidents.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/pymthouse-minimal-design.md` around lines 341 - 355, Update the Error
Handling Cache-Control recommendation to include a stale-if-error directive for
transient failures: in the "Error Handling" section where the header is
currently specified as "Cache-Control: public, max-age=0, s-maxage=5,
stale-while-revalidate=0" (and the sentence "If NaaP discovery is temporarily
unavailable, clients receive a `503`..."), change the suggested header to
include stale-if-error (for example "public, max-age=0, s-maxage=1800,
stale-if-error=3600") and add a short note explaining that this allows
CDNs/clients to serve recent discovery data on 503s to improve resilience while
accepting limited staleness.
280-282: Add language identifier to fenced code block.

The fenced code block on line 280 lacks a language specifier, which can affect syntax highlighting and accessibility in some markdown renderers.
🎨 Suggested fix
-```
+```http
 GET /api/v1/network/discover-orchestrators
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @docs/pymthouse-minimal-design.md around lines 280 - 282, The fenced code
block containing the HTTP request "GET /api/v1/network/discover-orchestrators"
is missing a language identifier; update the opening fence to include the
language (e.g., use http) so the block becomes fenced with http to enable
proper syntax highlighting and accessibility for the GET
/api/v1/network/discover-orchestrators example.
</details>

</blockquote></details>
<details>
<summary>plugins/developer-api/backend/src/server.ts (1)</summary><blockquote>

`400-406`: **Brittle error-to-status mapping via substring match.**

Matching `error.message.includes('NAAP_API_SERVER_URL')` couples control flow to the error's wording. A future message tweak silently downgrades 503 to 502. Prefer a typed error (or sentinel) thrown from `getNaapApiServerBase`.

<details>
<summary>Suggested approach</summary>

```diff
+class NaapConfigError extends Error {}
 function getNaapApiServerBase(): string {
   const raw = process.env.NAAP_API_SERVER_URL?.trim();
   if (!raw) {
-    throw new Error(
+    throw new NaapConfigError(
       'NAAP_API_SERVER_URL is not set. Set it to the NAAP OpenAPI base URL (including /v1). See repo root .env.example.',
     );
   }
   return raw.replace(/\/+$/, '');
 }
@@
-    if (error instanceof Error && error.message.includes('NAAP_API_SERVER_URL')) {
-      return res.status(503).json({ error: error.message });
-    }
+    if (error instanceof NaapConfigError) {
+      return res.status(503).json({ error: error.message });
+    }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@plugins/developer-api/backend/src/server.ts` around lines 400 - 406, The
catch block in server.ts should not rely on error.message.includes; modify
getNaapApiServerBase to throw a specific Error subclass or sentinel (e.g.,
NaapConfigError) when the NAAP API server URL is missing/invalid, and then in
the catch branch replace the substring check with an instanceof check against
that class (or compare the sentinel) to return 503; leave other errors returning
502. Update references to getNaapApiServerBase and the catch block handling to
use the new NaapConfigError type.
```

</details>

</blockquote></details>
<details>
<summary>apps/web-next/src/lib/facade/dashboard-window.ts (1)</summary><blockquote>

`24-32`: **Redundant branch in `dashboardUpstreamTimeoutMs`.**

The `hours <= 72` branch and the fallback both return `55_000`, so the intermediate check is dead. Either collapse to a single non-`<=24` branch, or differentiate (e.g., cap > 72h to a smaller value or reject) to make intent explicit.

<details>
<summary>Proposed simplification</summary>

```diff
 export function dashboardUpstreamTimeoutMs(hours: number): number {
   if (hours <= 24) {
     return 30_000;
   }
-  if (hours <= 72) {
-    return 55_000;
-  }
   return 55_000;
 }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/lib/facade/dashboard-window.ts` around lines 24 - 32, The
function dashboardUpstreamTimeoutMs has a redundant branch: both the hours <= 72
case and the final fallback return 55_000; simplify by collapsing the branches
so that dashboardUpstreamTimeoutMs(hours) returns 30_000 when hours <= 24 and
55_000 otherwise (or, if intended, add distinct handling for >72h), updating the
implementation of dashboardUpstreamTimeoutMs to remove the dead hours <= 72
check and make the intent explicit.
```

</details>

</blockquote></details>
<details>
<summary>apps/web-next/src/app/api/v1/network/perf-by-model/route.ts (1)</summary><blockquote>

`47-54`: **Intentional single cache key — consider noting start/end are contract-only.**

Since the upstream `streaming/models` resolver ignores `start`/`end` and always returns a 24h weighted average, the single SWR key is correct. The inline comment already explains this clearly. You may also want to consider either removing the `start`/`end` query param requirement or keeping a clearer doc note that they're contract-only (not data-affecting) so API consumers don't assume time-window filtering. No change required.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/app/api/v1/network/perf-by-model/route.ts` around lines 47
- 54, The current code uses a single SWR cache key
'perf-by-model:streaming-models' with bffStaleWhileRevalidate and getPerfByModel
while upstream ignores start/end; update the inline comment or API doc in this
file to state explicitly that the query params start and end are contract-only
and do not affect returned data (or optionally remove start/end from the route
signature), and reference the bffStaleWhileRevalidate call, the cacheKey
'perf-by-model:streaming-models', and getPerfByModel to make clear where this
behavior originates.
```

</details>

</blockquote></details>
<details>
<summary>apps/web-next/src/lib/facade/resolvers/orchestrators.ts (1)</summary><blockquote>

`64-74`: **`mergeOrchestratorServiceUris` dedup is exact-match; inventory URIs aren't normalized.**

`out.includes(extra)` compares the raw strings. If inventory has `https://orch.example.com` and dashboard returns `https://orch.example.com/` (or different case in the host), both will be kept. Minor, but since this now feeds the `uris` list that drives row-filtering via `hasNonBlankServiceUri`, a small normalization (trim + strip trailing `/`) would reduce UI noise.

<details>
<summary>♻️ Optional normalization</summary>

```diff
 function mergeOrchestratorServiceUris(netUris: string[], dash: ApiOrchestrator): string[] {
-  const out = [...netUris];
+  const norm = (u: string) => u.trim().replace(/\/+$/, '');
+  const seen = new Set(netUris.map(norm));
+  const out = [...netUris];
   const extra =
     (typeof dash.serviceUri === 'string' && dash.serviceUri.trim()) ||
     (typeof dash.service_uri === 'string' && dash.service_uri.trim()) ||
     '';
-  if (extra && !out.includes(extra)) {
+  if (extra && !seen.has(norm(extra))) {
     out.push(extra);
   }
   return out;
 }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/lib/facade/resolvers/orchestrators.ts` around lines 64 -
74, mergeOrchestratorServiceUris currently deduplicates by exact string match so
variants like "https://orch.example.com" and "https://orch.example.com/" both
appear; update the function to normalize URIs before comparison by trimming
whitespace and removing a single trailing slash (e.g., normalize both existing
entries in out and the extracted extra from dash.serviceUri / dash.service_uri),
compare using these normalized forms, and only push the extra (preferably the
normalized form) if no normalized match exists; reference
mergeOrchestratorServiceUris and the dash.serviceUri / dash.service_uri
extraction when making the change.
```

</details>

</blockquote></details>
<details>
<summary>apps/web-next/src/lib/facade/upstream-parse.ts (1)</summary><blockquote>

`22-41`: **Minor: fallback when `streaming` key exists but is nullish loses `requests`.**

If upstream ever returns `{ streaming: null, requests: {...} }`, the current guard `o.streaming && typeof o.streaming === 'object'` fails and you fall through to `{ kpi: body as DashboardKPI }`, casting the wrapper itself as KPI and dropping the `requests` breakdown. Harmless given current API shape, but small hardening:

<details>
<summary>♻️ Defensive fallback</summary>

```diff
 export function parseDashboardKpiBody(body: unknown): {
   kpi: DashboardKPI;
   requests?: DashboardJobsOverview;
 } {
   if (body && typeof body === 'object' && 'streaming' in body) {
     const o = body as { streaming?: DashboardKPI; requests?: DashboardJobsOverview };
     if (o.streaming && typeof o.streaming === 'object') {
       return {
         kpi: o.streaming,
         requests: o.requests,
       };
     }
+    // Wrapper shape present but `streaming` missing/null: preserve requests if any.
+    return { kpi: {} as DashboardKPI, requests: o.requests };
   }
   return { kpi: body as DashboardKPI };
 }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/lib/facade/upstream-parse.ts` around lines 22 - 41, The
guard in parseDashboardKpiBody drops requests when body has a streaming key
that's nullish; change the logic in parseDashboardKpiBody to detect the presence
of the streaming key (using 'streaming' in body) and when present return { kpi:
o.streaming as DashboardKPI, requests: o.requests } so requests are preserved
even if o.streaming is null, and keep parseDashboardKpiWithRequests unchanged to
merge requests when present.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @apps/web-next/src/components/dashboard/overview-content.tsx:

Line 487: The label currently uses the timeframe variable tfLabel making the
orchestrator count appear timeframe-scoped; update the tile invocation that
renders Server (tile(Server, Orchestrators (${tfLabel}),
data.orchestratorsObserved.value)) to a timeframe-agnostic label such as
"Orchestrators (observed)" or "Orchestrators (current)" — replace the template
string Orchestrators (${tfLabel}) with the new static label while leaving the
Server and data.orchestratorsObserved.value arguments unchanged.

Line 1443: The current JSX in the that renders FPS hides valid outputFps
when inputFps is missing; update the conditional to show outputFps whenever it
exists. Specifically, modify the expression in the (the one referencing
job.inputFps and job.outputFps) to: if job.outputFps != null then render
${job.inputFps != null ? job.inputFps.toFixed(1) : '—'} / ${job.outputFps.toFixed(1)}, otherwise render '—'; this ensures outputFps is
displayed even when inputFps is null while preserving the existing formatting
and fallback.

In @apps/web-next/src/lib/facade/resolvers/net-orchestrators.ts:

Around line 157-171: The two functions fetchStreamingOrchestrators and
fetchRequestsOrchestrators currently call naapGet without pagination and
therefore only return the first page; fix both by reintroducing a page size
(e.g. NET_ORCHESTRATORS_PAGE_SIZE) and implementing a paginated fetch loop that
calls naapGet repeatedly with pagination parameters (page/limit or offset/cursor
depending on naapGet API), concatenates results into a single
StreamingOrchestratorRow[] / RequestsOrchestratorRow[] and stops when a page
returns fewer items than the page size (or no cursor is returned); apply the
same pagination change for the other calls noted around lines 229–273 so all
orchestrator records are collected before returning.

In @apps/web-next/src/lib/facade/resolvers/orchestrators.ts:

Around line 13-19: Update the stale header comment to reflect the actual
upstream value ranges: state that effectiveSuccessRate and noSwapRatio are
provided in 0–1 (and are scaled via pct()), while slaScore and successRatio are
provided in 0–100 (slaScore is used via Math.round(dash.slaScore) and
successRatio is handled as Math.round(dash.successRatio * 10) / 10); remove the
assertion that slaScore and successRatio are 0–1 and note that dashboard
endpoints are capped at 24h as before.

In @docs/pymthouse-minimal-design.md:

Around line 254-270: The doc's Tenant Model mentions client_id RLS but doesn't
list which DB tables are protected; add a new subsection under the "Tenant
Model" heading that explicitly enumerates the tables with PostgreSQL RLS
(examples: usage_records, api_keys, subscriptions, app_users, refresh_tokens,
billing_events), state the canonical RLS pattern (e.g., client_id =
current_setting('app.current_client_id')::uuid), list any tables that are only
filtered at the application layer, and describe exemptions for platform/admin
queries that require cross-tenant visibility so Phase 0 migration and security
review have an unambiguous RLS scope.

Around line 649-664: The docs currently state "Signer API keys stored
encrypted at rest with one application encryption key" and defer per-provider
derived-key encryption; add a required subsection under Decision 7 that mandates
operational key-management controls for the MVP: include a documented and tested
key rotation runbook for the application encryption key, explicit access-control
and audit requirements for which services/roles can access the key, an assurance
that the key is never in source control and must be provisioned via secret
management or secure env vars, and an incident-response procedure (revocation
and re-encryption steps) to follow if the application encryption key is
compromised; reference the existing lines mentioning "Signer API keys", "one
application encryption key", "planned rotation runbook", and the deferred
"per-provider derived-key encryption" so reviewers can find and expand Decision

Around line 774-784: The header "Phase 1: NaaP Discovery Endpoint (In Progress
— PR #266)" conflicts with all checklist items marked complete; update the
document to reconcile status by either changing the header to indicate
completion (e.g., "Complete — PR #266" or "Done") if all tasks are finished, or
by unchecking any tasks that remain outstanding and adding a short note next to
the header clarifying that completion is pending review/merge; make the change
around the "Phase 1: NaaP Discovery Endpoint (In Progress — PR #266)" heading
and the checklist entries so the header and the eight checkbox items reflect the
same actual state.

Around line 284-289: Update the "Query Parameters" section row for the caps
parameter to explicitly document the multi-value syntax and give a concrete
example URL; mention that caps is repeated and show the preferred format (e.g.,
repeated query params like
?caps=text-to-image/stabilityai/sd-turbo&caps=image-to-video/other) and clarify
the OR semantics for multiple caps values, or state if comma-separated values
are also supported — include one short example line demonstrating the exact
request syntax and a one-sentence note that multiple caps are evaluated with OR
semantics.

Line 339: The doc is ambiguous about whether the pipeline slug regex
^[a-z][a-z0-9-]{0,63}$ applies to the full slug or to each slash-separated
segment (e.g., text-to-image/stabilityai/sd-turbo); update the text to
explicitly state which approach we use: either say "validation is applied
per-segment (each segment must match ^[a-z][a-z0-9-]{0,63}$)" or state "the
full slug is validated and use an updated regex that permits / characters" and
provide the chosen behavior and rationale next to the example slug so
implementers know to validate segments vs. use a different pattern.

In @plugins/developer-api/backend/src/server.ts:

Around line 243-281: The ingest helper currently drops non-ok Response objects
silently (in function ingest called with sRes and rRes), which hides upstream
4xx/5xx failures; update ingest to log a warning when res.ok is false including
res.status and the request URL (or a caller-provided label) and return early,
and additionally in the caller that awaits ingest(sRes) and ingest(rRes) detect
if both responses were non-ok (or both logged failures) and surface an error
(e.g., return a 502) instead of merging into pipeline-catalog-only rows;
reference ingest, sRes, rRes and netModelRowKey to locate where to add the
logging and the dual-failure check.

Outside diff comments:
In @apps/web-next/src/components/dashboard/overview-content.tsx:

Around line 798-852: The comparator puts missing FPS rows first when sorting
descending because missingLast currently always returns 1 for aMissing and -1
for bMissing and then the final sortDir reversal flips that; update missingLast
(used in the sort block) to take sortDir into account (capture the outer sortDir
or add a parameter) so it returns 1 for aMissing only when sortDir === 'asc' and
-1 when sortDir === 'desc' (and the inverse for bMissing), then keep the
existing final reversal logic; this ensures cases like the 'fps' branch
(ra.modelFps / rb.modelFps) keep missing values last regardless of asc/desc.

In @packages/plugin-sdk/src/contracts/dashboard.ts:

Around line 70-79: Add a semver-major breaking-change entry to the plugin-sdk
CHANGELOG documenting the field rename orchestratorsOnline →
orchestratorsObserved in the GraphQL contract and TypeScript types; mention the
affected contract/type (KPI.orchestratorsObserved) and that downstream consumers
must update references, include the version bump or "BREAKING CHANGE" header,
and provide a short migration note showing the old and new field names so
consumers can find and update usages.

Nitpick comments:
In @apps/web-next/src/app/api/v1/network/perf-by-model/route.ts:

Around line 47-54: The current code uses a single SWR cache key
'perf-by-model:streaming-models' with bffStaleWhileRevalidate and getPerfByModel
while upstream ignores start/end; update the inline comment or API doc in this
file to state explicitly that the query params start and end are contract-only
and do not affect returned data (or optionally remove start/end from the route
signature), and reference the bffStaleWhileRevalidate call, the cacheKey
'perf-by-model:streaming-models', and getPerfByModel to make clear where this
behavior originates.

In @apps/web-next/src/lib/facade/dashboard-window.ts:

Around line 24-32: The function dashboardUpstreamTimeoutMs has a redundant
branch: both the hours <= 72 case and the final fallback return 55_000; simplify
by collapsing the branches so that dashboardUpstreamTimeoutMs(hours) returns
30_000 when hours <= 24 and 55_000 otherwise (or, if intended, add distinct
handling for >72h), updating the implementation of dashboardUpstreamTimeoutMs to
remove the dead hours <= 72 check and make the intent explicit.

In @apps/web-next/src/lib/facade/resolvers/orchestrators.ts:

Around line 64-74: mergeOrchestratorServiceUris currently deduplicates by
exact string match so variants like "https://orch.example.com" and
"https://orch.example.com/" both appear; update the function to normalize URIs
before comparison by trimming whitespace and removing a single trailing slash
(e.g., normalize both existing entries in out and the extracted extra from
dash.serviceUri / dash.service_uri), compare using these normalized forms, and
only push the extra (preferably the normalized form) if no normalized match
exists; reference mergeOrchestratorServiceUris and the dash.serviceUri /
dash.service_uri extraction when making the change.

In @apps/web-next/src/lib/facade/upstream-parse.ts:

Around line 22-41: The guard in parseDashboardKpiBody drops requests when body
has a streaming key that's nullish; change the logic in parseDashboardKpiBody to
detect the presence of the streaming key (using 'streaming' in body) and when
present return { kpi: o.streaming as DashboardKPI, requests: o.requests } so
requests are preserved even if o.streaming is null, and keep
parseDashboardKpiWithRequests unchanged to merge requests when present.

In @docs/pymthouse-minimal-design.md:

Around line 341-355: Update the Error Handling Cache-Control recommendation to
include a stale-if-error directive for transient failures: in the "Error
Handling" section where the header is currently specified as "Cache-Control:
public, max-age=0, s-maxage=5, stale-while-revalidate=0" (and the sentence "If
NaaP discovery is temporarily unavailable, clients receive a 503..."), change
the suggested header to include stale-if-error (for example "public, max-age=0,
s-maxage=1800, stale-if-error=3600") and add a short note explaining that this
allows CDNs/clients to serve recent discovery data on 503s to improve resilience
while accepting limited staleness.

Around line 280-282: The fenced code block containing the HTTP request "GET
/api/v1/network/discover-orchestrators" is missing a language identifier; update
the opening fence to include the language (e.g., use http) so the block becomes fenced with http to enable proper syntax highlighting and
accessibility for the GET /api/v1/network/discover-orchestrators example.

In @plugins/developer-api/backend/src/server.ts:

Around line 400-406: The catch block in server.ts should not rely on
error.message.includes; modify getNaapApiServerBase to throw a specific Error
subclass or sentinel (e.g., NaapConfigError) when the NAAP API server URL is
missing/invalid, and then in the catch branch replace the substring check with
an instanceof check against that class (or compare the sentinel) to return 503;
leave other errors returning 502. Update references to getNaapApiServerBase and
the catch block handling to use the new NaapConfigError type.
</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ]  Push a commit to this branch (recommended)
- [ ]  Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `781edadd-689e-42d4-8c16-23c90a6b9ccf`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 77244d7cc8b07f9a6f77f62bce178db810a54776 and aa89fcb050a54302425de6cb8a00ad6d3fd75203.

</details>

<details>
<summary>📒 Files selected for processing (33)</summary>

* `apps/web-next/src/app/(dashboard)/dashboard/page.tsx`
* `apps/web-next/src/app/api/internal/bff-warm/route.ts`
* `apps/web-next/src/app/api/v1/dashboard/orchestrators/route.ts`
* `apps/web-next/src/app/api/v1/developer/network-models/route.ts`
* `apps/web-next/src/app/api/v1/naap-api/warm/route.ts`
* `apps/web-next/src/app/api/v1/network/perf-by-model/route.ts`
* `apps/web-next/src/components/dashboard/overview-content.tsx`
* `apps/web-next/src/content/docs/guides/dashboard-data-provider.mdx`
* `apps/web-next/src/hooks/usePublicDashboard.ts`
* `apps/web-next/src/lib/dashboard/overview-timeframe.ts`
* `apps/web-next/src/lib/facade/dashboard-window.ts`
* `apps/web-next/src/lib/facade/index.ts`
* `apps/web-next/src/lib/facade/network-data.ts`
* `apps/web-next/src/lib/facade/resolvers/gpu-capacity.ts`
* `apps/web-next/src/lib/facade/resolvers/kpi.ts`
* `apps/web-next/src/lib/facade/resolvers/net-capacity.ts`
* `apps/web-next/src/lib/facade/resolvers/net-orchestrators.ts`
* `apps/web-next/src/lib/facade/resolvers/orchestrators.ts`
* `apps/web-next/src/lib/facade/resolvers/perf-by-model.ts`
* `apps/web-next/src/lib/facade/resolvers/pipelines.ts`
* `apps/web-next/src/lib/facade/resolvers/pricing.ts`
* `apps/web-next/src/lib/facade/stubs.ts`
* `apps/web-next/src/lib/facade/upstream-parse.ts`
* `docs/pymthouse-minimal-design.md`
* `packages/plugin-sdk/src/contracts/__tests__/dashboard.test.ts`
* `packages/plugin-sdk/src/contracts/dashboard.ts`
* `packages/plugin-sdk/src/contracts/index.ts`
* `packages/plugin-sdk/src/index.ts`
* `packages/plugin-sdk/src/types/network-model.ts`
* `packages/types/src/index.ts`
* `plugins/dashboard-data-provider/frontend/src/__tests__/provider.test.ts`
* `plugins/dashboard-data-provider/frontend/src/provider.ts`
* `plugins/developer-api/backend/src/server.ts`

</details>

</details>

…ntation

…sponse handling - Updated the JobFeedCard component to conditionally display input FPS, ensuring proper formatting when values are null. - Clarified documentation in orchestrators resolver regarding the scaling of API response values for effectiveSuccessRate, noSwapRatio, slaScore, and successRatio. - Enhanced error handling in the developer API server by adding labels to the ingest function for better logging of failed requests.

…oved error handling - Added a `bustCache` option to `appendJobFeedPollQuery` for cache invalidation. - Updated `fetchJobFeed` to accept options for cache busting and improved error handling. - Introduced `JOB_FEED_RECOVERY_RETRY_DELAYS` for better retry logic on upstream errors. - Enhanced logging to reflect the actual request URL during fetch operations.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

apps/web-next/src/lib/facade/resolvers/orchestrators.ts (1)
178-191: ⚠️ Potential issue | 🟠 Major

Data loss via URI filter — likely root cause of the "33 of 84 orchestrators" blocker.

orderedKeys correctly unions dashboard addresses with net-only addresses, but line 190 drops every row with empty uris via hasNonBlankServiceUri. Given the PR blocker (84 orchestrators in → ~33 out), the most plausible explanation is:

Upstream GET /v1/dashboard/orchestrators is not populating serviceUri/service_uri for most rows (the new optional fields on ApiOrchestrator are the only dashboard-side URI source feeding mergeOrchestratorServiceUris).

Those addresses also lack an entry in netData.urisByAddress (i.e., neither streaming/orchestrators nor requests/orchestrators returned a URI), so mapDashboardIntoNetRow produces uris: [] and the row is filtered out at line 190.

To diagnose and prevent silent data loss:

Add instrumentation before returning: count rows entering the merge, count after the filter, and count dropped because both inventory urisByAddress and dashboard serviceUri were empty.

Verify upstream payload contains serviceUri/service_uri on the dashboard rows you're seeing in staging; if not, recovery requires upstream or inventory fixes.

Consider expanding orderedKeys to include addresses from pipelineModelsByAddress/lastSeenMsByAddress that lack urisByAddress entries—today net-only addresses are sourced from urisByAddress.keys() only, making any inventory record without a URI invisible.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/lib/facade/resolvers/orchestrators.ts` around lines 178 -
191, The merge is silently dropping rows with empty uris because merged rows are
filtered by hasNonBlankServiceUri; update the logic around orderedKeys/merged to
(1) add instrumentation counters inside the mergeOrchestratorServiceUris flow to
log total input keys (orderedKeys), rows before filter, rows after filter and
count of rows dropped due to both dashboard serviceUri and netData.urisByAddress
being empty (use mapDashboardIntoNetRow/netOnlyPlaceholder outputs to detect
drops), and (2) expand orderedKeys to also include addresses from
pipelineModelsByAddress and lastSeenMsByAddress so inventory-only records
without uris are not excluded; reference functions mapDashboardIntoNetRow,
netOnlyPlaceholder, hasNonBlankServiceUri and the merged/orderedKeys variables
when making these changes.

🧹 Nitpick comments (4)

plugins/developer-api/backend/src/server.ts (1)
405-413: Prefer a typed/sentinel error over substring matching for the 503 path.

error.message.includes('NAAP_API_SERVER_URL') is brittle — any future error whose message happens to mention that env name (e.g., from upstream logs or fetch wrappers) will get mapped to 503 instead of 502, and a wording change in getNaapApiServerBase would silently flip the status code. A small class NaapConfigError extends Error {} (thrown from getNaapApiServerBase) and an instanceof check here would be more robust.
♻️ Sketch
+class NaapConfigError extends Error {}
+
 function getNaapApiServerBase(): string {
   const raw = process.env.NAAP_API_SERVER_URL?.trim();
   if (!raw) {
-    throw new Error(
+    throw new NaapConfigError(
       'NAAP_API_SERVER_URL is not set. Set it to the NAAP OpenAPI base URL (including /v1). See repo root .env.example.',
     );
   }
   return raw.replace(/\/+$/, '');
 }
@@
-    if (error instanceof Error && error.message.includes('NAAP_API_SERVER_URL')) {
+    if (error instanceof NaapConfigError) {
       return res.status(503).json({ error: error.message });
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/developer-api/backend/src/server.ts` around lines 405 - 413, The
catch block in server.ts currently checks
error.message.includes('NAAP_API_SERVER_URL') which is brittle; instead
introduce a sentinel error class (e.g., class NaapConfigError extends Error) and
throw it from getNaapApiServerBase when the NAAP config is missing/invalid, then
replace the substring check with an instanceof NaapConfigError check in the
catch (near netModelsInflight.delete and the existing res.status calls) so true
config errors return 503 and all other errors return 502.
apps/web-next/src/lib/facade/resolvers/orchestrators.ts (2)
67-77: Dedup is whitespace/case-sensitive — may push a near-duplicate URI.

extra is .trim()ed before the membership check, but netUris entries are not normalized at this site. If inventory yields "https://example.com/" and dashboard serviceUri is " https://example.com/ " (or differs only in trailing slash / scheme casing), the out.includes(extra) check fails and a duplicate-looking URI is appended. Consider normalizing both sides (trim at minimum) before the comparison, e.g.:
♻️ Suggested normalization
 function mergeOrchestratorServiceUris(netUris: string[], dash: ApiOrchestrator): string[] {
-  const out = [...netUris];
+  const out = [...netUris];
   const extra =
     (typeof dash.serviceUri === 'string' && dash.serviceUri.trim()) ||
     (typeof dash.service_uri === 'string' && dash.service_uri.trim()) ||
     '';
-  if (extra && !out.includes(extra)) {
+  if (extra && !out.some((u) => u.trim() === extra)) {
     out.push(extra);
   }
   return out;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/lib/facade/resolvers/orchestrators.ts` around lines 67 -
77, mergeOrchestratorServiceUris can append near-duplicate URIs because only
dash.serviceUri is trimmed before the includes check; normalize both sides
instead: map netUris to a normalized form (trim, optional lowercase, and remove
trailing slash) and compute the normalized form of extra (from dash.serviceUri
or dash.service_uri) and check against that normalized list before pushing;
update references to netUris, extra, out, and the function
mergeOrchestratorServiceUris so the original array preserves values but the
membership check uses the normalized set to avoid whitespace/case/trailing-slash
duplicates.
120-125: Minor: null-guard inconsistency vs ApiOrchestrator typing.

ApiOrchestrator.successRatio is typed number (non-nullable, line 42), yet line 122 defensively guards dash.successRatio != null. Either tighten the runtime contract (drop the guard) or relax the type to number | null to match the guard — pick one for consistency. slaScore (line 125) is correctly typed number | null and guarded with !== null, which is fine. Not a functional bug, just a contract mismatch.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/lib/facade/resolvers/orchestrators.ts` around lines 120 -
125, The runtime uses a null-check for dash.successRatio but
ApiOrchestrator.successRatio is declared non-nullable; make the contract
consistent by updating ApiOrchestrator.successRatio to be number | null (or
number|null) and keep the existing guard in orchestrators.ts (the mapping that
produces the successRatio field); also review any other callsites that assume a
non-null number and adjust them to handle null (or coerce to 0) so the type
change does not break consumers.
apps/web-next/src/hooks/useJobFeedStream.ts (1)
135-139: Tighten the comment — the ladder applies to any failure streak, not just first load.

JOB_FEED_RECOVERY_RETRY_DELAYS is consulted whenever feedFailureStreak > 0, regardless of whether it's the first load or a transient blip mid-session. The current wording ("on first load") under-describes the actual behavior.
📝 Proposed wording
-/**
- * When the upstream briefly errors on first load, retry quickly before falling
- * back to the steady poll interval.
- */
+/**
+ * Recovery back-off (ms) consulted whenever the feed reports an unhealthy
+ * response (HTTP error, invalid JSON, or `queryFailed:true`). Each delay is
+ * additionally capped by the steady `pollIntervalMs`, so it can only shorten
+ * — never lengthen — the cadence relative to a healthy stream.
+ */
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/hooks/useJobFeedStream.ts` around lines 135 - 139, The
comment above JOB_FEED_RECOVERY_RETRY_DELAYS is misleading — it says “on first
load” but the delay ladder is used whenever feedFailureStreak > 0 (any failure
streak). Update the comment to state that these delays are used for recovery
retries on any failure streak (transient or persistent) rather than only on
first load, and mention the tie to feedFailureStreak so future readers see when
the array is consulted.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/developer-api/backend/src/server.ts`:
- Around line 269-288: The merge currently blindly sums capacities for identical
Pipeline:Model keys (netModelRowKey) across `streaming/models` and
`requests/models` which can overstate capacity if the endpoints report
overlapping orchestrators; change the merge in the loop over `accum` (variables:
byKey, netModelRowKey, WarmOrchCount, TotalCapacity) to detect duplicates coming
from different sources and resolve them by either asserting disjointness
(throw/log) or by a deterministic merge rule (e.g., prefer the larger capacity,
or prefer one source over the other) instead of always summing. Also propagate a
`partial` indicator from the `ingest` calls (use the boolean results from
`streamingOk` and `requestsOk`) back to the cache layer: if either upstream
failed set `partial=true` and either skip writing to cache or write with a much
shorter TTL (instead of the current 60s) so transient partial results are not
served for a minute.

---

Outside diff comments:
In `@apps/web-next/src/lib/facade/resolvers/orchestrators.ts`:
- Around line 178-191: The merge is silently dropping rows with empty uris
because merged rows are filtered by hasNonBlankServiceUri; update the logic
around orderedKeys/merged to (1) add instrumentation counters inside the
mergeOrchestratorServiceUris flow to log total input keys (orderedKeys), rows
before filter, rows after filter and count of rows dropped due to both dashboard
serviceUri and netData.urisByAddress being empty (use
mapDashboardIntoNetRow/netOnlyPlaceholder outputs to detect drops), and (2)
expand orderedKeys to also include addresses from pipelineModelsByAddress and
lastSeenMsByAddress so inventory-only records without uris are not excluded;
reference functions mapDashboardIntoNetRow, netOnlyPlaceholder,
hasNonBlankServiceUri and the merged/orderedKeys variables when making these
changes.

---

Nitpick comments:
In `@apps/web-next/src/hooks/useJobFeedStream.ts`:
- Around line 135-139: The comment above JOB_FEED_RECOVERY_RETRY_DELAYS is
misleading — it says “on first load” but the delay ladder is used whenever
feedFailureStreak > 0 (any failure streak). Update the comment to state that
these delays are used for recovery retries on any failure streak (transient or
persistent) rather than only on first load, and mention the tie to
feedFailureStreak so future readers see when the array is consulted.

In `@apps/web-next/src/lib/facade/resolvers/orchestrators.ts`:
- Around line 67-77: mergeOrchestratorServiceUris can append near-duplicate URIs
because only dash.serviceUri is trimmed before the includes check; normalize
both sides instead: map netUris to a normalized form (trim, optional lowercase,
and remove trailing slash) and compute the normalized form of extra (from
dash.serviceUri or dash.service_uri) and check against that normalized list
before pushing; update references to netUris, extra, out, and the function
mergeOrchestratorServiceUris so the original array preserves values but the
membership check uses the normalized set to avoid whitespace/case/trailing-slash
duplicates.
- Around line 120-125: The runtime uses a null-check for dash.successRatio but
ApiOrchestrator.successRatio is declared non-nullable; make the contract
consistent by updating ApiOrchestrator.successRatio to be number | null (or
number|null) and keep the existing guard in orchestrators.ts (the mapping that
produces the successRatio field); also review any other callsites that assume a
non-null number and adjust them to handle null (or coerce to 0) so the type
change does not break consumers.

In `@plugins/developer-api/backend/src/server.ts`:
- Around line 405-413: The catch block in server.ts currently checks
error.message.includes('NAAP_API_SERVER_URL') which is brittle; instead
introduce a sentinel error class (e.g., class NaapConfigError extends Error) and
throw it from getNaapApiServerBase when the NAAP config is missing/invalid, then
replace the substring check with an instanceof NaapConfigError check in the
catch (near netModelsInflight.delete and the existing res.status calls) so true
config errors return 503 and all other errors return 502.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e1620015-2792-4c8d-b05e-9cd8e27dc01a

📥 Commits

Reviewing files that changed from the base of the PR and between aa89fcb and 0a8e88c.

📒 Files selected for processing (4)

apps/web-next/src/components/dashboard/overview-content.tsx
apps/web-next/src/hooks/useJobFeedStream.ts
apps/web-next/src/lib/facade/resolvers/orchestrators.ts
plugins/developer-api/backend/src/server.ts

🚧 Files skipped from review as they are similar to previous changes (1)

apps/web-next/src/components/dashboard/overview-content.tsx

- Enhanced `fetchJobFeed` to accept a base URL and effective polling interval, improving flexibility. - Updated retry logic with `JOB_FEED_RECOVERY_RETRY_DELAYS` for better handling of transient errors. - Refactored job feed polling to utilize the new fetch function, ensuring consistent cache busting behavior. - Added documentation for recovery backoff strategy after failed polls.

github-actions Bot added the size/XL Extra large PR (500+ lines) label Apr 17, 2026

github-actions Bot added scope/shell Shell app changes scope/sdk Plugin SDK changes scope/packages Shared package changes plugin/developer-api Developer API plugin labels Apr 17, 2026

vercel Bot deployed to Preview April 17, 2026 00:41 View deployment

vercel Bot deployed to Preview April 17, 2026 01:39 View deployment

eliteprox added 2 commits April 21, 2026 17:11

eliteprox force-pushed the feat/dashboard-data-updates branch from 3733b93 to 09f65b3 Compare April 21, 2026 21:11

vercel Bot deployed to Preview April 21, 2026 21:14 View deployment

Merge branch 'main' into feat/dashboard-data-updates

e22f953

vercel Bot deployed to Preview April 21, 2026 21:24 View deployment

vercel Bot deployed to Preview April 21, 2026 22:10 View deployment

Merge branch 'main' into feat/dashboard-data-updates

5687adf

vercel Bot deployed to Preview April 22, 2026 14:20 View deployment

mikezupper added 2 commits April 23, 2026 11:19

renamed orchestratorsOnline to orchestratorObserved

f632634

update display of decimal values

2dac189

vercel Bot deployed to Preview April 23, 2026 15:38 View deployment

fixed developer model pricing

aa89fcb

eliteprox marked this pull request as ready for review April 23, 2026 16:50

vercel Bot deployed to Preview April 23, 2026 16:53 View deployment

coderabbitai Bot requested changes Apr 23, 2026

View reviewed changes

eliteprox added 2 commits April 27, 2026 16:56

remove pymthouse minimal design document to streamline project docume…

f9bb2fa

…ntation

vercel Bot deployed to Preview April 27, 2026 21:21 View deployment

coderabbitai Bot requested changes Apr 27, 2026

View reviewed changes

Comment thread plugins/developer-api/backend/src/server.ts Outdated

vercel Bot deployed to Preview April 27, 2026 21:56 View deployment

coderabbitai Bot approved these changes Apr 27, 2026

View reviewed changes

eliteprox merged commit adaf7fe into main Apr 27, 2026
36 checks passed

eliteprox deleted the feat/dashboard-data-updates branch April 27, 2026 21:58

This was referenced Apr 27, 2026

revert: feat(web-next): align NAAP dashboard and BFF with OpenAPI v1 (streaming + requests)" #281

Merged

refactor(dashboard): align BFF data layer with OpenAPI v1 contract and improve job feed #282

Merged

coderabbitai Bot mentioned this pull request Apr 29, 2026

feat(pricing): add estimated pricing support #283

Open

coderabbitai Bot mentioned this pull request May 7, 2026

feat(pymthouse): add billing provider, plan-builder APIs, and client-credentials auth #268

Open

coderabbitai Bot mentioned this pull request May 21, 2026

nexus: add PymtHouse capability integration follow-ups #307

Draft

3 tasks

Conversation

eliteprox commented Apr 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Type

Plugin(s) Affected

Checklist

Breaking Changes

Screenshots / Recordings

Summary by CodeRabbit

Uh oh!

vercel Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eliteprox commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eliteprox commented Apr 17, 2026 •

edited by coderabbitai Bot

Loading

vercel Bot commented Apr 17, 2026 •

edited

Loading

coderabbitai Bot commented Apr 17, 2026 •

edited

Loading

github-actions Bot commented Apr 17, 2026 •

edited

Loading

eliteprox commented Apr 21, 2026 •

edited

Loading