Skip to content

refactor(dashboard): align BFF data layer with OpenAPI v1 contract and improve job feed#282

Merged
eliteprox merged 11 commits into
mainfrom
feat/dashboard-data-staging
Apr 28, 2026
Merged

refactor(dashboard): align BFF data layer with OpenAPI v1 contract and improve job feed#282
eliteprox merged 11 commits into
mainfrom
feat/dashboard-data-staging

Conversation

@eliteprox
Copy link
Copy Markdown
Contributor

@eliteprox eliteprox commented Apr 28, 2026

Summary

This PR re-applies #270 which was rolled back due to an upstream data issue which has been resolved.

This change refactors the dashboard's BFF (Backend for Frontend) data layer to align with the upstream OpenAPI v1 contract, consolidates streaming and requests model data, improves orchestrator/pipeline data handling, and hardens job feed polling with cache busting and recovery retry logic.

Changes

API / Data Layer

  • Merged streaming and requests model data in the network data layer for consistency with the upstream OpenAPI contract
  • Updated developer network models endpoint to combine streaming and requests models into a single response shape
  • Enhanced pricing resolver to accommodate the new OpenAPI v1 structure and ensure accurate pricing data
  • Introduced utility functions for dashboard window formatting and timeout management
  • Improved dashboard hooks to handle the new merged data structures

Orchestrator & KPI Resolvers

  • Added a utility function to determine the effective upstream window for orchestrators based on the requested period
  • Updated orchestrator cache key to use the effective window for improved data accuracy
  • Simplified KPI resolver by removing unnecessary net orchestrator data fetching — now directly returns parsed KPI data
  • Renamed orchestratorsOnlineorchestratorObserved to better reflect actual semantics

Pipeline & Timeframe Handling

  • Introduced OVERVIEW_TIMEFRAME_MAX_HOURS constant to cap timeframe options consistently across resolvers
  • Updated KPI and pipeline resolvers to use the new normalization function for timeframe parsing
  • Changed default sort in PipelinesCard from model to gpus to surface the busiest models first

Job Feed

  • Added bustCache option to appendJobFeedPollQuery for explicit cache invalidation
  • Introduced JOB_FEED_RECOVERY_RETRY_DELAYS for staged recovery backoff after upstream errors
  • Enhanced fetchJobFeed to accept a base URL and effective polling interval for better flexibility
  • Improved error logging with labels on failed ingest requests in the developer API server

UI / Display Fixes

  • Fixed decimal value display formatting
  • Fixed developer model pricing display
  • JobFeedCard now conditionally renders input FPS only when a non-null value is present
  • Added documentation clarifying that effectiveSuccessRate, noSwapRatio, slaScore, and successRatio are scaled API response values in the orchestrators resolver

Cleanup

  • Removed the pymthouse minimal design document

Testing

  • Dashboard KPIs render correctly across all supported timeframe windows
  • Orchestrator and pipeline data resolves without regression
  • Job feed recovers gracefully after upstream errors using the retry delay schedule
  • Developer model pricing and decimal formatting display correctly

Summary by CodeRabbit

  • New Features

    • Dashboard timeframe selection now limited to 48 hours maximum.
    • Job feed polling now features adaptive retry recovery with automatic cache busting on failures.
    • Network model pricing data now calculated and displayed.
  • Bug Fixes

    • Orchestrator metric display updated for accuracy.
    • Success rate precision increased to 1 decimal place.
    • Pipeline FPS data handling improved with better fallback logic.
    • Job feed fault tolerance enhanced with configurable recovery delays.
  • Documentation

    • Dashboard schema and examples updated to reflect metric changes.
    • API documentation clarified for data source alignment.

eliteprox and others added 10 commits April 27, 2026 19:41
…PI compliance

- Enhanced BFF warm route documentation to reference the upstream OpenAPI contract.
- Updated developer network models endpoint to merge streaming and requests models.
- Refined NAAP API warm route comments to clarify data sources.
- Adjusted performance by model resolver to utilize a single cache key for streaming models.
- Improved dashboard hooks to handle new data structures and ensure compatibility with requests.
- Introduced new utility functions for dashboard window formatting and timeout management.
- Merged streaming and requests model data in the network data layer for better consistency.
- Updated orchestrator and pipeline resolvers to reflect changes in data fetching and merging logic.
- Enhanced pricing resolver to accommodate new OpenAPI v1 structure and ensure accurate pricing data representation.
- Introduced a new utility function to determine the effective upstream window for orchestrators based on the requested period.
- Updated the orchestrator cache key to utilize the effective window for improved data accuracy.
- Simplified the KPI resolver by removing unnecessary net orchestrator data fetching and directly returning parsed KPI data.
- Enhanced documentation to clarify the behavior of the orchestrator endpoint and its limitations on timeframes.
…ed performance

- Changed default sorting in PipelinesCard from 'model' to 'gpus' to prioritize busy models.
- Introduced OVERVIEW_TIMEFRAME_MAX_HOURS constant to cap timeframe options and ensure consistent handling across various resolvers.
- Updated KPI and pipeline resolvers to utilize the new normalization function for timeframe parsing, enhancing data accuracy and performance.
…sponse handling

- Updated the JobFeedCard component to conditionally display input FPS, ensuring proper formatting when values are null.
- Clarified documentation in orchestrators resolver regarding the scaling of API response values for effectiveSuccessRate, noSwapRatio, slaScore, and successRatio.
- Enhanced error handling in the developer API server by adding labels to the ingest function for better logging of failed requests.
…oved error handling

- Added a `bustCache` option to `appendJobFeedPollQuery` for cache invalidation.
- Updated `fetchJobFeed` to accept options for cache busting and improved error handling.
- Introduced `JOB_FEED_RECOVERY_RETRY_DELAYS` for better retry logic on upstream errors.
- Enhanced logging to reflect the actual request URL during fetch operations.
- Enhanced `fetchJobFeed` to accept a base URL and effective polling interval, improving flexibility.
- Updated retry logic with `JOB_FEED_RECOVERY_RETRY_DELAYS` for better handling of transient errors.
- Refactored job feed polling to utilize the new fetch function, ensuring consistent cache busting behavior.
- Added documentation for recovery backoff strategy after failed polls.
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Apr 28, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
naap-platform Ready Ready Preview, Comment Apr 28, 2026 8:06pm

Request Review

@github-actions github-actions Bot added scope/shell Shell app changes scope/sdk Plugin SDK changes scope/packages Shared package changes plugin/developer-api Developer API plugin size/XL Extra large PR (500+ lines) labels Apr 28, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 28, 2026

⚠️ This PR is very large (1797 lines changed). Please split it into smaller, focused PRs if possible.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

Warning

Rate limit exceeded

@eliteprox has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 41 minutes and 8 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 83cc6c55-cef4-4255-abd5-44e19799e973

📥 Commits

Reviewing files that changed from the base of the PR and between e22e398 and 6bbdc3b.

📒 Files selected for processing (11)
  • apps/web-next/src/hooks/useJobFeedStream.ts
  • apps/web-next/src/lib/facade/dashboard-window.ts
  • apps/web-next/src/lib/facade/network-data.ts
  • apps/web-next/src/lib/facade/resolvers/net-capacity.ts
  • apps/web-next/src/lib/facade/resolvers/net-orchestrators.ts
  • apps/web-next/src/lib/facade/resolvers/perf-by-model.ts
  • apps/web-next/src/lib/facade/resolvers/pricing.ts
  • packages/plugin-sdk/src/contracts/__tests__/dashboard.test.ts
  • packages/plugin-sdk/src/contracts/createDashboardProvider.ts
  • packages/plugin-sdk/src/contracts/dashboard.ts
  • plugins/developer-api/backend/src/server.ts
📝 Walkthrough

Walkthrough

This PR migrates dashboard data aggregation from legacy NAAP endpoints to OpenAPI v1 streaming/models and requests/models, renames orchestratorsOnline to orchestratorsObserved across schemas and types, introduces upstream window formatting and cache-busting utilities, and refactors facade resolvers to merge streaming/requests data while adding request-mode job statistics.

Changes

Cohort / File(s) Summary
Dashboard KPI/Orchestrator field rename
apps/web-next/src/app/(dashboard)/dashboard/page.tsx, packages/types/src/index.ts, packages/plugin-sdk/src/contracts/dashboard.ts, packages/plugin-sdk/src/contracts/__tests__/dashboard.test.ts, plugins/dashboard-data-provider/frontend/src/__tests__/provider.test.ts, plugins/dashboard-data-provider/frontend/src/provider.ts, apps/web-next/src/content/docs/guides/dashboard-data-provider.mdx
Updated GraphQL query and contract definitions to request orchestratorsObserved instead of orchestratorsOnline; updated test expectations and documentation to match.
Contract and type system expansion
packages/plugin-sdk/src/contracts/dashboard.ts, packages/plugin-sdk/src/contracts/index.ts, packages/plugin-sdk/src/index.ts
Introduced DashboardKPIWithRequests, DashboardPipelinesWithRequests, and job breakdown types (DashboardJobsStats, DashboardJobsOverview, DashboardJobsByPipelineRow, DashboardJobsByCapabilityRow); re-exported new types across SDK modules.
Facade and resolver refactoring
apps/web-next/src/lib/facade/index.ts, apps/web-next/src/lib/facade/resolvers/kpi.ts, apps/web-next/src/lib/facade/resolvers/orchestrators.ts, apps/web-next/src/lib/facade/resolvers/pipelines.ts, apps/web-next/src/lib/facade/stubs.ts
Updated getDashboardKPI and getDashboardPipelines to return enriched types with request metadata; refactored resolveKPI to use single unified endpoint; changed resolveOrchestrators to apply 24h window cap and merge service URIs; updated stubs to use orchestratorsObserved.
Upstream window and cache utilities
apps/web-next/src/lib/dashboard/overview-timeframe.ts, apps/web-next/src/lib/facade/dashboard-window.ts, apps/web-next/src/lib/facade/resolvers/gpu-capacity.ts
Introduced OVERVIEW_TIMEFRAME_MAX_HOURS constant (48h) and removed 72h/7d options; added formatDashboardWindow and dashboardUpstreamTimeoutMs helpers; updated resolvers to use new window formatting and timeframe clamping.
Upstream response parsing
apps/web-next/src/lib/facade/upstream-parse.ts
New module providing parseDashboardKpiBody, parseDashboardKpiWithRequests, and parseDashboardPipelinesBody to safely normalize streaming/requests combined payloads.
Network model aggregation
apps/web-next/src/lib/facade/network-data.ts, apps/web-next/src/lib/facade/resolvers/net-capacity.ts, apps/web-next/src/lib/facade/resolvers/net-orchestrators.ts, plugins/developer-api/backend/src/server.ts, packages/plugin-sdk/src/types/network-model.ts
Migrated from single /net/models endpoint to merged streaming/models + requests/models approach; updated getRawNetModels to fetch, merge, and enrich with pricing; refactored resolveNetOrchestratorData to aggregate orchestrators from streaming and requests endpoints with deduplication and URI merging; updated documentation to reflect new data source.
FPS and performance caching
apps/web-next/src/lib/facade/resolvers/perf-by-model.ts, apps/web-next/src/app/api/v1/network/perf-by-model/route.ts
Changed resolvePerfByModel to derive FPS from streaming/models instead of perf/by-model endpoint; unified cache key across all query ranges; updated route handler to use shared perf-by-model:streaming-models cache key.
Pricing data aggregation
apps/web-next/src/lib/facade/resolvers/pricing.ts
Extended resolver to support both legacy priceAvgWeiPerUnit and OpenAPI v1 per-orchestrator priceWeiPerUnit formats; added per-model min/max/avg computation with warm-orchestrator-aware capacity selection.
Dashboard UI and component updates
apps/web-next/src/components/dashboard/overview-content.tsx
Updated KPI cards to display success rate at 1 decimal precision using orchestratorsObserved; removed FPS fallback to modelUsage?.avgFps, treating it as available only from modelFpsByPipelineModel; added headerInfo prop to PipelineTH for optional info icon with provenance; changed default sort to GPU descending; updated job feed tooltips and FPS cell formatting to show input/output FPS to 1 decimal place.
Job feed polling with adaptive recovery
apps/web-next/src/hooks/useJobFeedStream.ts
Introduced cache-busting parameter and failure streak tracking; added failure-dependent backoff using JOB_FEED_RECOVERY_RETRY_DELAYS; refactored polling loop to enable cache busting on non-zero failure streak and reset on healthy response.
Public dashboard hook updates
apps/web-next/src/hooks/usePublicDashboard.ts
Updated kpi field type to DashboardKPIWithRequests; added normalization logic to handle both array and wrapped responses for pipelines endpoint, selecting streaming or falling back to direct array.
BFF route documentation and cache strategies
apps/web-next/src/app/api/internal/bff-warm/route.ts, apps/web-next/src/app/api/v1/dashboard/orchestrators/route.ts, apps/web-next/src/app/api/v1/developer/network-models/route.ts, apps/web-next/src/app/api/v1/naap-api/warm/route.ts
Updated route documentation comments to reference NAAP OpenAPI v1 specs; changed orchestrators route cache key derivation from raw period to normalized orchestratorUpstreamWindowFromPeriod(period).

Sequence Diagrams

sequenceDiagram
    participant Dashboard as Dashboard<br/>(Client)
    participant BFF as BFF Facade<br/>(Server)
    participant NAAP as NAAP API<br/>(Upstream)

    Dashboard->>BFF: GET /api/v1/dashboard/kpi<br/>(timeframe)
    BFF->>BFF: normalizeTimeframeHours()<br/>formatDashboardWindow()
    BFF->>NAAP: GET /dashboard/kpi<br/>(window, timeoutMs)
    NAAP-->>BFF: { streaming: {...},<br/>requests?: {...} }
    BFF->>BFF: parseDashboardKpiWithRequests()<br/>(merge KPI + requests)
    BFF-->>Dashboard: DashboardKPIWithRequests<br/>(orchestratorsObserved,<br/>requests metadata)
Loading
sequenceDiagram
    participant Dashboard as Dashboard<br/>(Client)
    participant BFF as BFF Facade<br/>(Server)
    participant Cache as SWR/Redis<br/>(Cache)
    participant StreamAPI as NAAP<br/>Streaming API
    participant ReqAPI as NAAP<br/>Requests API

    Dashboard->>BFF: GET /api/v1/facade/net/orchestrators
    BFF->>Cache: Check (streaming+requests)<br/>aggregated cache
    alt Cache Hit
        Cache-->>BFF: cached orchestrator data
    else Cache Miss
        par
            BFF->>StreamAPI: GET /v1/streaming/orchestrators
            BFF->>ReqAPI: GET /v1/requests/orchestrators
        and
        end
        StreamAPI-->>BFF: rows (models, gpu_count)
        ReqAPI-->>BFF: rows (capabilities, last_seen)
        BFF->>BFF: ingestRow() aggregate<br/>dedupe by address
        BFF->>BFF: merge inventory URIs<br/>with dashboard serviceUri
        BFF->>Cache: store merged result
        Cache-->>BFF: cached data
    end
    BFF-->>Dashboard: NetOrchestratorData
Loading
sequenceDiagram
    participant Dashboard as Dashboard<br/>(Client)
    participant BFF as BFF Facade<br/>(Server)
    participant Cache as SWR/Redis<br/>(Cache)
    participant StreamAPI as NAAP<br/>Streaming API
    participant ReqAPI as NAAP<br/>Requests API
    participant PricingAPI as Pricing<br/>API

    Dashboard->>BFF: GET /api/v1/facade/network/models
    BFF->>Cache: Check (streaming+requests)<br/>merged cache
    alt Cache Hit
        Cache-->>BFF: merged NetworkModel[]
    else Cache Miss
        par
            BFF->>StreamAPI: GET /v1/streaming/models
            BFF->>ReqAPI: GET /v1/requests/models
        and
        end
        StreamAPI-->>BFF: rows (avg_fps, WarmOrchCount)
        ReqAPI-->>BFF: rows (TotalCapacity)
        BFF->>BFF: merge by pipeline:model<br/>sum counts/capacity
        BFF->>PricingAPI: GET /v1/dashboard/pricing
        PricingAPI-->>BFF: pricing rows<br/>(legacy or per-orch)
        BFF->>BFF: aggregate pricing<br/>compute min/max/avg
        BFF->>BFF: enrich NetworkModel<br/>with pricing
        BFF->>Cache: store merged+priced
        Cache-->>BFF: cached data
    end
    BFF-->>Dashboard: NetworkModel[]<br/>(with pricing)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

The diff involves substantial, multi-pattern changes across facades, resolvers, and API routes: orchestrator/KPI field renames, migration from single-endpoint to merged streaming/requests aggregation patterns, introduction of new cache utilities and parsing abstractions, UI component updates with new props, and failure-recovery logic. While many changes follow consistent patterns (e.g., field renames across contracts), the resolver logic is heterogeneous—each resolver addresses merging, deduplication, or data transformation differently—requiring separate reasoning for pricing, orchestrators, network models, and performance caching. Type/export expansion adds clarity but requires tracing public API surface changes across multiple packages.

Possibly related PRs

Suggested labels

scope/shell, scope/sdk, scope/packages, plugin/developer-api, size/XL

Suggested reviewers

  • seanhanca
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: refactoring the dashboard BFF data layer to align with OpenAPI v1 contract and improving job feed polling, which are the primary focuses across all file changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/dashboard-data-staging

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@eliteprox eliteprox enabled auto-merge (squash) April 28, 2026 19:48
@eliteprox eliteprox disabled auto-merge April 28, 2026 19:48
@eliteprox eliteprox enabled auto-merge (squash) April 28, 2026 19:49
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
apps/web-next/src/hooks/useJobFeedStream.ts (1)

90-93: ⚠️ Potential issue | 🟡 Minor

Fallback query builder drops cache-bust flag.

When URL parsing fails, refresh=1 is not appended even if bustCache is true, so recovery polling may not bypass stale caches.

💡 Proposed fix
   } catch {
-    const sep = fetchUrl.includes('?') ? '&' : '?';
-    return `${fetchUrl}${sep}pollMs=${encodeURIComponent(String(ms))}`;
+    const sep = fetchUrl.includes('?') ? '&' : '?';
+    const qs = [`pollMs=${encodeURIComponent(String(ms))}`];
+    if (bustCache) qs.push('refresh=1');
+    return `${fetchUrl}${sep}${qs.join('&')}`;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/hooks/useJobFeedStream.ts` around lines 90 - 93, The
fallback URL builder in useJobFeedStream.ts's catch block currently appends only
pollMs and loses the bustCache flag; update the catch branch that builds the
fallback string using fetchUrl, sep, and ms to also append refresh=1 when the
bustCache boolean is true (e.g., include "&refresh=1" or "?refresh=1" as
appropriate), ensuring both pollMs and refresh are URL-encoded via
encodeURIComponent and joined using the existing sep logic so recovery polling
bypasses cache when requested.
plugins/developer-api/backend/src/server.ts (1)

355-374: ⚠️ Potential issue | 🟠 Major

limit is not enforced on merged upstream rows.

When upstream returns more than limit, response still includes all merged rows; only catalog backfill is capped. This violates the endpoint’s limit behavior.

💡 Proposed fix
-  merged.sort(
+  merged.sort(
     (a, b) => a.Pipeline.localeCompare(b.Pipeline) || a.Model.localeCompare(b.Model),
   );
-  return { merged, partial };
+  const bounded = limitIsAll ? merged : merged.slice(0, limit ?? merged.length);
+  return { merged: bounded, partial };

Also applies to: 400-404, 430-437

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/developer-api/backend/src/server.ts` around lines 355 - 374, The
merged array is not being capped by the endpoint limit so upstream rows can
exceed `limit`; after merging upstream rows into `merged` (the array built from
`modelRows` and where keys are tracked by `seen` using `netModelRowKey`),
enforce the limit by truncating `merged` to `limit` when `limitIsAll` is false
(e.g., set merged = merged.slice(0, limit ?? merged.length)); recalc
`catalogSlotsRemaining` from the post-truncation length (or compute it before
backfill using Math.max(0, (limit ?? 0) - merged.length) but ensure merged has
been truncated), and apply the same truncation logic in the other similar merge
blocks referenced (the ones around the other fetch/merge sections).
🧹 Nitpick comments (2)
apps/web-next/src/lib/facade/dashboard-window.ts (1)

28-31: Timeout helper can be simplified.

hours <= 72 and > 72 return the same value, so this branch is redundant and can be collapsed for clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/lib/facade/dashboard-window.ts` around lines 28 - 31, The
conditional in the timeout helper is redundant: both branches of "if (hours <=
72)" return 55_000; remove the branch and replace it with a single return of
55_000 (i.e., eliminate the if (...) { return 55_000; } return 55_000; pattern)
so the helper simply returns 55_000 unconditionally.
apps/web-next/src/lib/facade/resolvers/net-capacity.ts (1)

12-12: Don’t swallow net-model fetch failures silently.

Returning [] here hides upstream outages and makes capacity collapse look like valid data. Add a warning log (or metric) before fallback.

♻️ Suggested patch
-    const models = await getRawNetModels().catch(() => []);
+    const models = await getRawNetModels().catch((err) => {
+      console.warn('[facade/net-capacity] getRawNetModels failed; returning empty capacity map:', err);
+      return [];
+    });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web-next/src/lib/facade/resolvers/net-capacity.ts` at line 12, The
current await getRawNetModels().catch(() => []) swallows errors; change the
catch to log a warning with the caught error before falling back to [] so
upstream outages are visible — e.g. replace .catch(() => []) with .catch((err)
=> { /* log a warning including err and context (use existing logger like
processLogger or console) */; return []; }) referencing the getRawNetModels call
and the models variable so the log includes which fetch failed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/web-next/src/lib/facade/network-data.ts`:
- Around line 52-55: The merge logic currently adds incoming values into an
existing entry (existing.WarmOrchCount += warm; existing.TotalCapacity +=
slots), which double-counts when the same (pipeline, model) key appears from
multiple sources; update the merge in the network-data.ts routine that handles
existing entries so it uses max semantics instead of summation: set
existing.WarmOrchCount = Math.max(existing.WarmOrchCount, warm) and
existing.TotalCapacity = Math.max(existing.TotalCapacity, slots) (apply the same
max logic wherever the code merges duplicate (pipeline, model) rows).
- Around line 60-70: The loops over stream and req only read snake_case fields
(r.pipeline, r.model, r.warm_orch_count, r.gpu_slots) so PascalCase payloads are
ignored; update the parsing to accept both snake_case and PascalCase by reading
fallback properties (e.g., pipeline = (r.pipeline ?? r.Pipeline)?.trim() ?? '',
model = (r.model ?? r.Model)?.trim() ?? '', and use Number(r.warm_orch_count ??
r.WarmOrchCount ?? 0) and Number(r.gpu_slots ?? r.GpuSlots ?? 0)) before calling
add; apply the same change in both for..of loops so both stream and req entries
support either casing.

In `@apps/web-next/src/lib/facade/resolvers/net-orchestrators.ts`:
- Around line 157-171: The fetch functions fetchStreamingOrchestrators and
fetchRequestsOrchestrators currently call the offset/limit endpoints without a
limit, risking truncated results; update both naapGet calls to include a query
parameter setting limit=1000 (e.g., pass { limit: 1000 } or include it in the
request options/query) so each function explicitly requests up to 1000 rows and
prevents silent undercounting.

In `@apps/web-next/src/lib/facade/resolvers/perf-by-model.ts`:
- Around line 32-39: The current code swallows all naapGet errors by .catch(()
=> []), which causes transport failures to be cached as empty FPS data; change
the rawRows load so that you do not catch/rewrite network errors—let exceptions
from naapGet propagate (so the route can return 503), but still coerce null or
undefined responses into an empty array (i.e., after awaiting
naapGet<StreamingModelRow[] | null | undefined>(...) set rawRows = result ?? []
rather than catching rejections). Keep the same request options and preserve the
'streaming/models' call and the StreamingModelRow type.

In `@apps/web-next/src/lib/facade/resolvers/pricing.ts`:
- Around line 159-166: The aggregation loop is creating entries for rows whose
pipeline or model are blank/whitespace; before computing pricingKey(r.pipeline,
r.model) and updating map, trim r.pipeline and r.model and skip the row if
either trimmed value is an empty string (e.g., if (!trimmedPipeline ||
!trimmedModel) continue;). Apply the same pre-check in the other aggregation
block that creates slots (the second occurrence around the code handling minWei
at lines ~205-208) so no per-orchestrator rows with blank IDs are inserted into
map or emitted.

In `@packages/plugin-sdk/src/contracts/dashboard.ts`:
- Around line 504-509: The kpi resolver signature in the Dashboard contract
declares pipeline and model_id parameters that are never forwarded; either
remove pipeline and model_id from the kpi type declaration or wire them
end-to-end: update the GraphQL schema to accept pipeline and model_id, update
createDashboardProvider (the provider factory that currently extracts only
window and timeframe) to also extract and forward pipeline and model_id into the
resolver call, and ensure any consumer code that implements kpi handles those
additional args; reference the kpi type in
packages/plugin-sdk/src/contracts/dashboard.ts and the createDashboardProvider
function to apply the change consistently.

In `@plugins/developer-api/backend/src/server.ts`:
- Around line 258-275: Promise.all currently causes total failure if either
fetch rejects or res.json() throws, breaking the intended partial-merge
behavior; change the fetch logic to use a safe pattern (e.g., Promise.allSettled
or a safeFetch wrapper) so failures return a sentinel (null or rejected state)
instead of rejecting the whole Promise, and update ingest (the
ingest(res,label,source) function) to guard the JSON parse in a try/catch (log a
warning on parse error and return false) so transport errors and JSON parse
failures are treated as per-source failures rather than aborting the whole
merge; ensure code that currently reads sRes and rRes handles the sentinel/null
results and still calls ingest only for valid Response objects.

---

Outside diff comments:
In `@apps/web-next/src/hooks/useJobFeedStream.ts`:
- Around line 90-93: The fallback URL builder in useJobFeedStream.ts's catch
block currently appends only pollMs and loses the bustCache flag; update the
catch branch that builds the fallback string using fetchUrl, sep, and ms to also
append refresh=1 when the bustCache boolean is true (e.g., include "&refresh=1"
or "?refresh=1" as appropriate), ensuring both pollMs and refresh are
URL-encoded via encodeURIComponent and joined using the existing sep logic so
recovery polling bypasses cache when requested.

In `@plugins/developer-api/backend/src/server.ts`:
- Around line 355-374: The merged array is not being capped by the endpoint
limit so upstream rows can exceed `limit`; after merging upstream rows into
`merged` (the array built from `modelRows` and where keys are tracked by `seen`
using `netModelRowKey`), enforce the limit by truncating `merged` to `limit`
when `limitIsAll` is false (e.g., set merged = merged.slice(0, limit ??
merged.length)); recalc `catalogSlotsRemaining` from the post-truncation length
(or compute it before backfill using Math.max(0, (limit ?? 0) - merged.length)
but ensure merged has been truncated), and apply the same truncation logic in
the other similar merge blocks referenced (the ones around the other fetch/merge
sections).

---

Nitpick comments:
In `@apps/web-next/src/lib/facade/dashboard-window.ts`:
- Around line 28-31: The conditional in the timeout helper is redundant: both
branches of "if (hours <= 72)" return 55_000; remove the branch and replace it
with a single return of 55_000 (i.e., eliminate the if (...) { return 55_000; }
return 55_000; pattern) so the helper simply returns 55_000 unconditionally.

In `@apps/web-next/src/lib/facade/resolvers/net-capacity.ts`:
- Line 12: The current await getRawNetModels().catch(() => []) swallows errors;
change the catch to log a warning with the caught error before falling back to
[] so upstream outages are visible — e.g. replace .catch(() => []) with
.catch((err) => { /* log a warning including err and context (use existing
logger like processLogger or console) */; return []; }) referencing the
getRawNetModels call and the models variable so the log includes which fetch
failed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 93f0e15a-ba73-4410-ad65-dfdebdd5543b

📥 Commits

Reviewing files that changed from the base of the PR and between 1530887 and e22e398.

📒 Files selected for processing (33)
  • apps/web-next/src/app/(dashboard)/dashboard/page.tsx
  • apps/web-next/src/app/api/internal/bff-warm/route.ts
  • apps/web-next/src/app/api/v1/dashboard/orchestrators/route.ts
  • apps/web-next/src/app/api/v1/developer/network-models/route.ts
  • apps/web-next/src/app/api/v1/naap-api/warm/route.ts
  • apps/web-next/src/app/api/v1/network/perf-by-model/route.ts
  • apps/web-next/src/components/dashboard/overview-content.tsx
  • apps/web-next/src/content/docs/guides/dashboard-data-provider.mdx
  • apps/web-next/src/hooks/useJobFeedStream.ts
  • apps/web-next/src/hooks/usePublicDashboard.ts
  • apps/web-next/src/lib/dashboard/overview-timeframe.ts
  • apps/web-next/src/lib/facade/dashboard-window.ts
  • apps/web-next/src/lib/facade/index.ts
  • apps/web-next/src/lib/facade/network-data.ts
  • apps/web-next/src/lib/facade/resolvers/gpu-capacity.ts
  • apps/web-next/src/lib/facade/resolvers/kpi.ts
  • apps/web-next/src/lib/facade/resolvers/net-capacity.ts
  • apps/web-next/src/lib/facade/resolvers/net-orchestrators.ts
  • apps/web-next/src/lib/facade/resolvers/orchestrators.ts
  • apps/web-next/src/lib/facade/resolvers/perf-by-model.ts
  • apps/web-next/src/lib/facade/resolvers/pipelines.ts
  • apps/web-next/src/lib/facade/resolvers/pricing.ts
  • apps/web-next/src/lib/facade/stubs.ts
  • apps/web-next/src/lib/facade/upstream-parse.ts
  • packages/plugin-sdk/src/contracts/__tests__/dashboard.test.ts
  • packages/plugin-sdk/src/contracts/dashboard.ts
  • packages/plugin-sdk/src/contracts/index.ts
  • packages/plugin-sdk/src/index.ts
  • packages/plugin-sdk/src/types/network-model.ts
  • packages/types/src/index.ts
  • plugins/dashboard-data-provider/frontend/src/__tests__/provider.test.ts
  • plugins/dashboard-data-provider/frontend/src/provider.ts
  • plugins/developer-api/backend/src/server.ts

Comment thread apps/web-next/src/lib/facade/network-data.ts
Comment thread apps/web-next/src/lib/facade/network-data.ts Outdated
Comment thread apps/web-next/src/lib/facade/resolvers/net-orchestrators.ts
Comment thread apps/web-next/src/lib/facade/resolvers/perf-by-model.ts Outdated
Comment thread apps/web-next/src/lib/facade/resolvers/pricing.ts
Comment thread packages/plugin-sdk/src/contracts/dashboard.ts
Comment thread plugins/developer-api/backend/src/server.ts
…filter arguments

- Updated the `kpi` query in the dashboard schema to accept `pipeline` and `model_id` as filter arguments, improving query flexibility.
- Modified the `createDashboardProvider` function to forward the new arguments to the KPI resolver.
- Added tests to verify the correct handling of the new filter arguments in the KPI resolver.
- Enhanced error handling in the developer API for improved robustness during data fetching.
@eliteprox eliteprox merged commit e8f689c into main Apr 28, 2026
36 checks passed
@eliteprox eliteprox deleted the feat/dashboard-data-staging branch April 28, 2026 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

plugin/developer-api Developer API plugin scope/packages Shared package changes scope/sdk Plugin SDK changes scope/shell Shell app changes size/XL Extra large PR (500+ lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants