perf: test report cache priming, pagination, and query optimization by stbenjam · Pull Request #3505 · openshift/sippy

stbenjam · 2026-05-06T17:08:39Z

Cache priming for test results

The test report cache loader now primes both collapsed and non-collapsed results in Redis, eliminating cold-cache latency for the most common test report views. Previously only collapsed results were primed, leaving non-collapsed queries (the detailed per-variant NURP+ view) to hit the database on first access.

To avoid unnecessary work, the cache primer now only targets OCP development releases (identified by having the payloadTags capability and no GA date set). GA releases and non-OCP products (OKD, etc.) are skipped.

Server-side pagination

The /api/tests endpoint previously returned all matching rows (up to 50k+ for uncollapsed views), causing the tests page to barely load. This adds server-side pagination following the existing pattern used by the job runs endpoint.

When perPage/page query parameters are present, the backend:

Applies ORDER BY, COUNT, LIMIT, and OFFSET at the SQL level
Returns a PaginationResult envelope with rows, total_rows, page_size, page
Bypasses the cache (paginated queries are fast with LIMIT/OFFSET)

When pagination params are absent, existing behavior is preserved for backward compatibility.

Frontend changes switch the DataGrid to paginationMode="server" and send perPage/page params, following the JobRunsTable pattern.

Filter pushdown (~830x improvement for filtered queries)

When collapse=false, TestsByNURPAndStandardDeviation builds a query that self-joins prow_test_report_7d_matview 3 times. Name/variant filters were only applied to the outermost query, causing subqueries to scan all rows for the release. Filters are now pushed into the stats and pass_rates subqueries, allowing index use on cache misses and paginated queries.

Replaces #3290 (rebased on main; original e2e failure was an unrelated sippy-load-job timeout).

Summary by CodeRabbit

Release Notes

New Features
- Added server-side pagination to the Tests API endpoint with perPage and page parameters for efficient data retrieval
- Test results table now supports server-side pagination for improved performance with large datasets
- Pagination responses include total row count information for navigation
Documentation
- Updated Tests API documentation with pagination parameters and new response structure

When collapse=false, TestsByNURPAndStandardDeviation builds a query that self-joins prow_test_report_7d_matview 3 times: 1. Outer query - gets the raw rows 2. pass_rates subquery - computes per-variant percentages 3. stats subquery - computes AVG/STDDEV across variants The name/variant filters were only applied to the outermost query. Subqueries 2 and 3 scanned all rows for the release to compute aggregates for every test, even when only a single test was requested. For release 4.22 with a name filter, this meant: | | Before (outer only) | After (pushed down) | |--------------------|-------------------------|---------------------| | Stats subquery | Seq Scan, 1.28M rows | Index Scan, 142 | | Estimated cost | 802,603 - 1,137,530 | 7.53 - 1,371 | | Speedup | - | ~830x | TestsByNURPAndStandardDeviation now accepts optional filter functions (variadic, backward-compatible) that are applied to both the stats and pass_rates subqueries. The filter is still also applied to the outer query, so results are identical. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Variant-specific filters (e.g., NOT has entry "never-stable") must not be pushed into the stats subquery, which computes AVG/STDDEV across all variants for a test. Filtering out variants there would skew the delta_from_*_average and standard deviation calculations. Split SubqueryFilter into a struct with a VariantOnly flag and an isVariantFilter helper. At the call site, the rawFilter is further split: name filters go to both stats and passRates subqueries (safe, just narrows to the matching test), while variant filters go only to passRates (preserving cross-variant stats semantics). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-merge-bot · 2026-05-06T17:08:42Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

coderabbitai · 2026-05-06T17:08:54Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds server-side pagination, sorting, and filtering to the Tests API (Postgres and BigQuery), exports SubqueryFilter for per-variant vs global DB filters, introduces cache priming utilities and a test-report-cache loader, validates pagination input (with tests), and wires frontend TestTable for server-driven pagination and cache-aware results.

Changes

Tests API, DB Query Filters, Pagination, and Cache Priming

Layer / File(s)	Summary
Type Definition & Signature `pkg/db/query/test_queries.go`	Adds exported `SubqueryFilter` (`Apply func(gorm.DB) gorm.DB`, `VariantOnly bool`) and updates `TestsByNURPAndStandardDeviation(..., subqueryFilters ...SubqueryFilter)` signature.
Query Logic `pkg/db/query/test_queries.go`	Applies `SubqueryFilter`s selectively: `VariantOnly` filters apply to `passRates` subquery; non-variant filters apply to both stats and `passRates`.
API Types & Helpers `pkg/api/tests.go`	Extends `TestResultsSpec` with `Pagination *apitype.Pagination`, `SortField string`, `Sort apitype.Sort`; adds `TestsAPIResult.filter`, `TestResultsSpec.matview()`, `TestResultsCacheDuration` constant, and `PrimeTestResultsCache`.
Postgres Pagination & Result Shaping `pkg/api/tests.go`	DB path updated to count total rows, apply ORDER/LIMIT/OFFSET when `Pagination` provided, set `TotalRows`, and suppress overall summary when paginating.
BigQuery Pagination Envelope `pkg/api/tests.go`	BigQuery path now supports sorting and optional pagination envelope (`rows`, `total_rows`, `page_size`, `page`) when pagination requested; preserves legacy non-paginated response when omitted.
Subquery Filter Wiring `pkg/api/tests.go`, `pkg/db/query/test_queries.go`	Constructs `subqueryFilters` from incoming filters and passes them into `TestsByNURPAndStandardDeviation`.
Server Routing & Validation `pkg/sippyserver/server.go`, `pkg/sippyserver/parameters.go`	`/api/tests` handlers require non-empty release, parse/validate pagination via `getPaginationParams` (adds `maxPerPage = 1000`, perPage and page validation), return 400 on invalid params, and forward parsed pagination to printer functions.
Pagination Parsing Tests `pkg/sippyserver/parameters_test.go`	Adds `TestGetPaginationParams` covering absent/default/valid/error scenarios for pagination parsing and validation.
Frontend: Request & State `sippy-ng/src/tests/TestTable.js`	Replaces local rows with `apiResult` (`{ rows, total_rows }`), binds `page` query param, adds `pageFlip` state, includes `perPage` and `page` in API requests, and stores server response into `apiResult`.
Frontend: UI Hooks & Grid Wiring `sippy-ng/src/tests/TestTable.js`	Resets page to 0 on view change, search, filter or sort; clears `apiResult` on location changes; `StyledDataGrid` uses `apiResult.rows` and `apiResult.total_rows` for server-side pagination; `downloadDataFunc` returns `apiResult.rows`.
Docs `pkg/api/README.md`	Documents `perPage` and `page` parameters, explains pagination envelope and shows example paginated response; retains legacy example when `perPage` omitted.
Cache Priming & Loader `pkg/dataloader/testreportcacheloader/...`, `cmd/sippy/load.go`	Adds `testreportcacheloader` with `New`, `Load`, `developmentReleases`, and `Errors`; test for `developmentReleases`; wires loader into `cmd/sippy load`.
Imports & Minor Wiring `pkg/api/tests.go`, `cmd/sippy/load.go`	Adds `gorm` import for subquery building and imports for new loader; uses `TestResultsCacheDuration` in cache orchestration.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Frontend as TestTable
  participant Server
  participant DB
  participant Cache

  Client->>Frontend: change page/filter/sort
  Frontend->>Server: GET /api/tests?release=X&page=Y&perPage=Z&...
  Server->>Server: parse & validate pagination/spec
  Server->>Cache: check cached paginated/unfiltered results
  alt cache miss
    Server->>DB: call TestsByNURPAndStandardDeviation(..., subqueryFilters...)
    DB-->>Server: filtered rows (+ total_rows if counted)
    Server->>Cache: store results (use TestResultsCacheDuration)
  end
  Server-->>Frontend: JSON { rows, total_rows, page_size, page }
  Frontend-->>Client: render rows and pagination UI

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 14 | ❌ 3

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Go Error Handling	⚠️ Warning	Violations found: Nil pointer dereferences without checks in testreportcacheloader.Load(), error wrapping using %v instead of %w in pkg/api/tests.go, and unwrapped errors in parameters.go.	Add nil checks in Load(). Use %w in fmt.Errorf for errors.
Test Coverage For New Features	⚠️ Warning	Pure functions lack test coverage: TestResultsSpec.matview() and TestsAPIResult.filter() should be tested. testreportcacheloader.Load() has no tests. TestTable.js pagination logic lacks tests.	Add unit tests for pure functions: matview(), filter(). Add mocked integration tests for Load(). Add React component tests for TestTable.js pagination logic (pageFlip, changePage).

✅ Passed checks (14 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Sql Injection Prevention	✅ Passed	Pagination validated 1-1000. Sorting in-memory. Parameters use SafeRead regex. Filters use parameterized queries. Tables hardcoded. No unsafe SQL concatenation.
Excessive Css In React Should Use Styles	✅ Passed	All inline styles in TestTable.js have 1 property each, below the 3-4 threshold. Dynamic styles use helper functions; complex styling uses useStyles(). Component complies.
Single Responsibility And Clear Naming	✅ Passed	PR maintains SRP. Types have ≤5 fields, methods use action-oriented names, functions have justified domain-specific parameters, and packages are properly separated. No generic naming patterns.
Stable And Deterministic Test Names	✅ Passed	The PR does not contain Ginkgo tests. Test files added use standard Go testing framework with table-driven tests that have stable, deterministic names containing no dynamic values.
Test Structure And Quality	✅ Passed	Custom check for Ginkgo test structure is not applicable. PR adds standard Go tests using testing.T and testify, not Ginkgo. Repository contains no Ginkgo imports or patterns.
Microshift Test Compatibility	✅ Passed	This is the Sippy CI analytics project, not an OpenShift e2e test suite. No Ginkgo tests are added; only Go unit tests. Check does not apply.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	No Ginkgo e2e tests added. This PR modifies sippy (test reporting system) with standard Go unit tests using testing package, not Ginkgo. Check not applicable.
Topology-Aware Scheduling Compatibility	✅ Passed	PR contains only application code changes. No manifests, operators, controllers, or scheduling constraints modified.
Ote Binary Stdout Contract	✅ Passed	OTE Binary Stdout Contract is not applicable. Sippy is a prow job analysis tool, not an OTE binary. The PR modifies Sippy APIs and cache priming without any stdout writes in process-level code.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	PR does not add Ginkgo e2e tests. Two unit test files added use standard Go testing.T, not Ginkgo. Check is not applicable.
Title check	✅ Passed	The title accurately summarizes the three main changes: cache priming, pagination, and query optimization. It is concise, specific, and clearly conveys the primary purpose of the pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

pkg/db/query/test_queries.go (1)
247-260: ⚡ Quick win

Split the godoc so it documents the right exported symbols.

Inserting SubqueryFilter here makes the existing TestsByNURPAndStandardDeviation doc block attach to SubqueryFilter, so the type now starts with the function description and the exported function no longer has its own godoc. Please give SubqueryFilter a short type comment and move the analytics-query description back above TestsByNURPAndStandardDeviation.

As per coding guidelines, "Name each function succinctly but accurately indicating its purpose relative to its package or receiver. When adding new functions, types, or fields, include a brief godoc if the name alone would not make the purpose obvious to someone unfamiliar with the feature."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/db/query/test_queries.go` around lines 247 - 260, The file's godoc for
TestsByNURPAndStandardDeviation was accidentally attached to the SubqueryFilter
type; add a short type comment above SubqueryFilter (one sentence describing it
as a wrapper for filter functions with metadata) and move the existing longer
analytics-query comment back so it immediately precedes the
TestsByNURPAndStandardDeviation function declaration; ensure SubqueryFilter has
its own concise godoc and TestsByNURPAndStandardDeviation retains the original
multi-line doc block.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/db/query/test_queries.go`:
- Around line 247-260: The file's godoc for TestsByNURPAndStandardDeviation was
accidentally attached to the SubqueryFilter type; add a short type comment above
SubqueryFilter (one sentence describing it as a wrapper for filter functions
with metadata) and move the existing longer analytics-query comment back so it
immediately precedes the TestsByNURPAndStandardDeviation function declaration;
ensure SubqueryFilter has its own concise godoc and
TestsByNURPAndStandardDeviation retains the original multi-line doc block.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 24a16f3d-8b6b-4be1-9e37-fab115d7b00a

📥 Commits

Reviewing files that changed from the base of the PR and between 60b66e6 and ee1ea29.

📒 Files selected for processing (2)

pkg/api/tests.go
pkg/db/query/test_queries.go

openshift-merge-bot · 2026-05-06T17:19:53Z

Scheduling required tests:
/test e2e

The /api/tests endpoint previously returned all matching rows (up to 50k+ for uncollapsed views), causing the frontend to barely load. This adds server-side pagination following the existing pattern used by the job runs endpoint. When perPage/page query parameters are present, the backend now: - Applies ORDER BY, COUNT, LIMIT, and OFFSET at the SQL level - Returns a PaginationResult envelope with rows, total_rows, page_size, page - Bypasses the cache (paginated queries are fast with LIMIT/OFFSET) When pagination params are absent, existing behavior is preserved for backward compatibility. Frontend changes switch the DataGrid to paginationMode="server" and send perPage/page params, following the JobRunsTable pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-ci-robot · 2026-05-06T17:34:39Z

@stbenjam: This pull request references TRT-2575 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

When collapse=false, TestsByNURPAndStandardDeviation builds a query that self-joins prow_test_report_7d_matview 3 times:

Outer query - gets the raw rows

pass_rates subquery - computes per-variant percentages

stats subquery - computes AVG/STDDEV across variants

The name/variant filters were only applied to the outermost query. Subqueries 2 and 3 scanned all rows for the release to compute aggregates for every test, even when only a single test was requested.

For release 4.22 with a name filter, this meant:

Before (outer only) After (pushed down)

Stats subquery Seq Scan, 1.28M rows Index Scan, 142

Estimated cost 802,603 - 1,137,530 7.53 - 1,371

Speedup - ~830x

TestsByNURPAndStandardDeviation now accepts optional filter functions (variadic, backward-compatible) that are applied to both the stats and pass_rates subqueries. The filter is still also applied to the outer query, so results are identical.

Replaces #3290 (rebased on main; original e2e failure was an unrelated sippy-load-job timeout).

Summary by CodeRabbit

New Features

Enhanced test results filtering and query capabilities to support more granular analytics.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-merge-bot · 2026-05-06T17:47:48Z

Scheduling required tests:
/test e2e

- Validate sort direction against allowlist (asc/desc) instead of interpolating raw user input into ORDER BY clause - Add bounds validation for perPage (1-1000) and page (>= 0) in getPaginationParams to prevent DoS via unbounded queries - Check COUNT query error instead of silently ignoring it - Fix BigQuery path to return PaginationResult envelope when pagination params are present (frontend expects this format) - Move COUNT before ORDER BY to avoid unnecessary overhead - Inline isVariantFilter (trivial single-field access) - Separate SubqueryFilter doc comment from function doc block - Update API docs in pkg/api/README.md with new pagination params and response format - Add unit tests for getPaginationParams bounds validation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

stbenjam · 2026-05-06T18:03:28Z

Review Panel Verdict

Disposition: APPROVE — all BLOCKING findings from initial review resolved

Specialist Findings

Architecture Reviewer: Cross-file impact is clean. SubqueryFilter is a justified abstraction for pushing filters into subqueries. Pagination path correctly bypasses cache (each page/sort combo has low cache reuse). downloadDataFunc now exports only the current page — acceptable tradeoff. No god functions, no circular dependencies.

Security & Supply Chain Reviewer: Sort direction uses an allowlist (SortAscending/SortDescending only — arbitrary strings cannot reach SQL). Sort field uses pq.QuoteIdentifier() which double-quote-escapes identifiers, preventing SQL injection. perPage bounded to 1–1000, page must be non-negative. No new dependencies added (lib/pq already vendored). No lockfile, build pipeline, or supply chain changes.

UX & API Reviewer: Backward compatible — when perPage is omitted, the legacy array response is preserved. When present, response is wrapped in PaginationResult envelope (rows, total_rows, page_size, page). Error messages for invalid pagination are clear (e.g., "perPage must be between 1 and 1000"). BigQuery path also wraps response in PaginationResult when pagination params are present. API docs updated in pkg/api/README.md.

Codebase Consistency Reviewer: getPaginationParams follows the same pattern as other parameter parsers in parameters.go. Test file parameters_test.go uses table-driven tests with testify assertions, consistent with the codebase. SubqueryFilter type is consistent with existing GORM query patterns. PaginationResult reuses the existing type from pkg/apis/api/types.go. isVariantFilter was inlined per suggestion.

QA Engineer: parameters_test.go covers 10 edge cases for getPaginationParams: no params, valid params, perPage without page, zero/negative perPage, negative page, exceeds max, non-numeric values, and boundary (max=1000). Sort direction validation is implicitly tested via the allowlist pattern (only SortAscending passes the guard). Backend pagination integration with the database requires integration tests (out of scope for this PR).

Devil's Advocate: Sort direction injection — resolved, allowlist prevents arbitrary tokens. BigQuery response shape — resolved, wraps in PaginationResult. COUNT + OFFSET TOCTOU — acceptable for materialized views (refreshed periodically, not subject to phantom rows). Double-fetch race in frontend (no AbortController) — pre-existing pattern across all tables in the codebase, not introduced by this PR. ROW_NUMBER() OVER() without ORDER BY gives non-deterministic IDs — harmless for React keys. Could not construct a failure scenario with the current validation.

Technical Writer: API docs in pkg/api/README.md updated: perPage (1–1000) and page (0+) parameters added to the Tests endpoint table. Pagination envelope format documented with example JSON. limit marked as legacy. Example response section renamed to clarify it shows the legacy format. No stale docs remain.

DBA Expert (300 years PostgreSQL): Filter pushdown into stats/passRates subqueries is the key performance win (~830x for filtered queries). pq.QuoteIdentifier is the correct defense for identifier injection. Sort direction allowlist prevents syntax injection. COUNT(*) error is now checked and propagated. perPage capped at 1000 prevents full-table-scan via pagination API. LIMIT/OFFSET on materialized views is safe from phantom-row issues. Deep pagination degrades linearly with OFFSET, but acceptable for UI use (max page 1000 * 1000 = 1M offset, well within matview size). PostgreSQL cannot push LIMIT through the nested subquery joins, but the filter pushdown ensures the subqueries themselves are fast.

Panel Synthesis

All eight specialists converged on the same five BLOCKING findings in the initial review, and all five have been resolved:

Sort direction SQL injection → Fixed with allowlist validation (only asc/desc reach SQL)
BigQuery response incompatibility → Fixed, both paths return PaginationResult when pagination params present
Unbounded perPage/page → Fixed, perPage validated to 1–1000, page >= 0
COUNT error silently ignored → Fixed, error checked and propagated
API docs not updated → Fixed, README.md updated with new parameters and response format

No remaining BLOCKING findings. The SUGGESTION items (inline isVariantFilter, separate SubqueryFilter doc comment) were also addressed. The DBA Expert confirmed the pagination approach is sound for materialized views and the filter pushdown is the correct optimization.

Required Actions Before Merge

None.

Optional Follow-ups

Consider adding an AbortController to frontend fetch calls to cancel in-flight requests on rapid page changes (pre-existing pattern across all tables, not specific to this PR)
Deep pagination could use keyset/cursor pagination if matview sizes grow significantly (current OFFSET approach is fine for UI use)
COUNT(*) could be cached or use pg_class.reltuples approximation for very large result sets if count queries become a bottleneck

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/api/tests.go`:
- Around line 401-409: The BigQuery branch sets TotalRows to len(testsResult)
after pagination, returning the page count rather than the full dataset count;
update the logic in the handler (the BigQuery path in pkg/api/tests.go where
RespondWithJSON is called for pagination) to compute and return the true total
row count by either running a separate COUNT(*) query before applying
limit/offset (mirror the Postgres path) or by using any available pre-limit
count variable (e.g., a totalCount/rowsBeforeLimit value if present) and set
apitype.PaginationResult.TotalRows to that value instead of len(testsResult);
ensure this runs only when pagination != nil and does not change the existing
paged Rows/Page/PageSize fields.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: adf153c4-f9ab-461c-9a0e-18064adad5ef

📥 Commits

Reviewing files that changed from the base of the PR and between ee1ea29 and 5ef2468.

📒 Files selected for processing (7)

pkg/api/README.md
pkg/api/tests.go
pkg/db/query/test_queries.go
pkg/sippyserver/parameters.go
pkg/sippyserver/parameters_test.go
pkg/sippyserver/server.go
sippy-ng/src/tests/TestTable.js

The BQ path was setting TotalRows to len(testsResult) after limit, which gave the page count instead of the dataset total. Now captures the total before applying pagination slice. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

stbenjam · 2026-05-06T18:37:34Z

/hold

The paginated tests API path was bypassing the cache and running the expensive three-layer nested subquery on every page/sort change. Now both paginated and non-paginated paths share the same cached result set (1 hour TTL), with sorting and pagination applied in memory. The collapsed result set is ~5k rows, making in-memory operations trivial. This also removes the separate COUNT(*) query that was doubling the DB work per paginated request. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Cache the full unfiltered test result set and apply filters in memory. This means any filter, sort, or page change is served instantly from cache without hitting the database. The cache TTL is increased to 4 hours to match the cache primer schedule. Also adds a test-report-cache data loader that can be used with the cache primer cronjob (--loader=test-report-cache) to warm the test results cache for all releases on both default and twoDay periods. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…sults Only prime cache for OCP development releases (no GA date, has payloadTags capability) to avoid wasting time on GA/OKD releases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (2)

pkg/api/tests.go (1)
485-490: 💤 Low value

Consider adding a brief godoc for the matview() method.

While the method is simple, a short comment clarifying its purpose (e.g., // matview returns the materialized view name based on the spec's period.) would improve readability for unfamiliar readers. As per coding guidelines, include a brief godoc if the name alone would not make the purpose obvious.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/tests.go` around lines 485 - 490, Add a brief godoc comment above the
TestResultsSpec.matview() method explaining what it returns and when (e.g.,
"matview returns the materialized view name based on the spec's Period, choosing
2-day or 7-day view"). Update the comment to reference the behavior using the
Period field and the returned constants testReport2dMatView and
testReport7dMatView, keeping it concise and placed immediately above the func
declaration for matview().
pkg/dataloader/testreportcacheloader/testreportcacheloader.go (1)
14-27: 💤 Low value

Consider adding brief godoc for the exported New function.

While the implementation is straightforward and follows established loader patterns, the coding guidelines suggest including a brief godoc if the name alone would not make the purpose obvious. A short comment like // New creates a testReportCacheLoader that primes cache for development releases. would help unfamiliar readers.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dataloader/testreportcacheloader/testreportcacheloader.go` around lines
14 - 27, Add a brief godoc comment above the exported New function describing
its purpose and behavior (for example: it creates a testReportCacheLoader that
primes the cache for development releases); place the comment immediately above
the New function declaration in the testreportcacheloader package and reference
the returned type testReportCacheLoader and the parameters (dbc, cacheClient,
releases) so readers understand what the constructor does.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/api/tests.go`:
- Around line 485-490: Add a brief godoc comment above the
TestResultsSpec.matview() method explaining what it returns and when (e.g.,
"matview returns the materialized view name based on the spec's Period, choosing
2-day or 7-day view"). Update the comment to reference the behavior using the
Period field and the returned constants testReport2dMatView and
testReport7dMatView, keeping it concise and placed immediately above the func
declaration for matview().

In `@pkg/dataloader/testreportcacheloader/testreportcacheloader.go`:
- Around line 14-27: Add a brief godoc comment above the exported New function
describing its purpose and behavior (for example: it creates a
testReportCacheLoader that primes the cache for development releases); place the
comment immediately above the New function declaration in the
testreportcacheloader package and reference the returned type
testReportCacheLoader and the parameters (dbc, cacheClient, releases) so readers
understand what the constructor does.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 943dbb47-6c35-4c48-9d01-e609f69cc228

📥 Commits

Reviewing files that changed from the base of the PR and between 50f2488 and ae7d76f.

📒 Files selected for processing (4)

cmd/sippy/load.go
pkg/api/tests.go
pkg/dataloader/testreportcacheloader/testreportcacheloader.go
pkg/dataloader/testreportcacheloader/testreportcacheloader_test.go

PrimeTestResultsCache now bypasses the cache read path entirely, always regenerating from the database and writing the fresh result. Previously it went through GetDataFromCacheOrMatview which would return stale cached data if the matview hadn't been refreshed yet. Also adds hack/bench-test-api.sh for comparing prod vs local API response times across various query patterns. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The production API is too slow to complete within curl's timeout, making the comparison benchmark impractical. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The cache primer runs every 4 hours, so a 4-hour TTL risked cache expiry on the boundary before the next primer run. Extending to 5 hours ensures primed entries never expire between runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (2)

pkg/api/tests.go (2)

421-437: 💤 Low value

Consider extracting pagination helper to reduce duplication.

The pagination logic (lines 421-437) is nearly identical to the Postgres path (lines 377-393). Per coding guidelines, check pkg/util/ for existing helpers or consider extracting a shared pagination function.

♻️ Example helper extraction

// In pkg/util or locally in this file:
func paginate[T any](items []T, pagination *apitype.Pagination) ([]T, int64) {
    totalRows := int64(len(items))
    start := pagination.Page * pagination.PerPage
    end := start + pagination.PerPage
    if start > int(totalRows) {
        start = int(totalRows)
    }
    if end > int(totalRows) {
        end = int(totalRows)
    }
    return items[start:end], totalRows
}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/tests.go` around lines 421 - 437, The pagination slicing logic in the
in-memory branch duplicates the Postgres path; extract a reusable helper (e.g.,
paginate[T any](items []T, pagination *apitype.Pagination) ([]T, int64)) into
pkg/util or top of this file and replace the block in the function that builds
the apitype.PaginationResult (where we currently compute totalRows, start, end
and slice sorted[start:end]) to call that helper, then pass the returned rows
and totalRows into RespondWithJSON for consistency with the Postgres path.

230-241: 💤 Low value

Filter errors are silently ignored.

When f.Filter(t) returns an error (e.g., due to an invalid filter field), the test is silently excluded from results. Consider logging at debug level to aid troubleshooting.

♻️ Suggested improvement

 func (tests TestsAPIResult) filter(f *filter.Filter) TestsAPIResult {
 	if f == nil || len(f.Items) == 0 {
 		return tests
 	}
 	var result TestsAPIResult
 	for _, t := range tests {
-		if match, err := f.Filter(t); err == nil && match {
+		match, err := f.Filter(t)
+		if err != nil {
+			log.WithError(err).Debugf("filter error for test %s", t.Name)
+			continue
+		}
+		if match {
 			result = append(result, t)
 		}
 	}
 	return result
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/tests.go` around lines 230 - 241, In TestsAPIResult.filter, errors
returned by f.Filter(t) are currently ignored; update the loop in the
TestsAPIResult.filter method to log any non-nil error from f.Filter(t) at debug
level (including the error and identifying info about t) before continuing so
failures to evaluate a filter are visible for troubleshooting while preserving
the current behavior of skipping non-matching items; i.e., when handling the
result of f.Filter(t), if err != nil call your package's debug logger (e.g.,
log.Debugf or the existing logger) with the error and t, then continue.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/api/tests.go`:
- Around line 421-437: The pagination slicing logic in the in-memory branch
duplicates the Postgres path; extract a reusable helper (e.g., paginate[T
any](items []T, pagination *apitype.Pagination) ([]T, int64)) into pkg/util or
top of this file and replace the block in the function that builds the
apitype.PaginationResult (where we currently compute totalRows, start, end and
slice sorted[start:end]) to call that helper, then pass the returned rows and
totalRows into RespondWithJSON for consistency with the Postgres path.
- Around line 230-241: In TestsAPIResult.filter, errors returned by f.Filter(t)
are currently ignored; update the loop in the TestsAPIResult.filter method to
log any non-nil error from f.Filter(t) at debug level (including the error and
identifying info about t) before continuing so failures to evaluate a filter are
visible for troubleshooting while preserving the current behavior of skipping
non-matching items; i.e., when handling the result of f.Filter(t), if err != nil
call your package's debug logger (e.g., log.Debugf or the existing logger) with
the error and t, then continue.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 2eeee8b4-20e8-4802-b8ec-4371f48873fb

📥 Commits

Reviewing files that changed from the base of the PR and between ae7d76f and 118186f.

📒 Files selected for processing (1)

pkg/api/tests.go

openshift-merge-bot · 2026-05-06T20:18:43Z

Scheduling required tests:
/test e2e

The API defaults IncludeOverall to true when collapse is false, but the primer was leaving it as false. This caused a cache key mismatch, resulting in cache misses for uncollapsed requests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The old code path through GetDataFromCacheOrMatview handled nil cache gracefully, but the direct write path panicked on nil. Return an error instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

stbenjam · 2026-05-06T20:46:20Z

Benchmark: Local API with cache priming (release 5.0)

Query	Cached	Uncached
collapsed (default)	0.130s	3.426s
uncollapsed	10.146s	2m2s
collapsed + twoDay	0.114s	-
uncollapsed + twoDay	9.696s	-
collapsed + filter name contains sig-node	0.225s	-
uncollapsed + filter name contains sig-node	17.426s	-
collapsed + filter runs > 14	0.219s	-
uncollapsed + paginated page 0	8.358s	-

Collapsed results see a ~26x improvement from caching. Uncollapsed results are ~12x faster cached vs uncached (2+ minutes down to ~10s). The uncollapsed cached time is dominated by deserializing the ~1GB JSON blob; the uncollapsed uncached query is a full self-joining matview scan that takes over 2 minutes.

Add WithCompression() option to CacheSet that gzip-compresses the JSON before writing to Redis. Both cache read paths auto-detect gzip via magic header bytes and decompress transparently. Only the test results cache primer uses compression for now, as the uncollapsed result set is ~1GB of JSON. All other callers are unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-merge-bot · 2026-05-06T21:16:57Z

Scheduling required tests:
/test e2e

openshift-merge-bot · 2026-05-06T22:24:29Z

Scheduling required tests:
/test e2e

openshift-merge-bot · 2026-05-06T23:24:34Z

Scheduling required tests:
/test e2e

dgoodwin · 2026-05-07T11:12:53Z

This is huge and that benchmark comment answers the main question I had, having the option to see the uncompressed list again would be handy, I had to give up on that some time ago.

dgoodwin · 2026-05-07T11:21:51Z

/lgtm
/hold

Just making sure I don't step on TRT's toes if they want a crack at this. Thanks for improving this, will help daily.

openshift-ci · 2026-05-07T11:22:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin, stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [dgoodwin,stbenjam]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2026-05-07T16:02:26Z

@stbenjam: No Jira issue is referenced in the title of this pull request.
To reference a jira issue, add 'XYZ-NNN:' to the title of this pull request and request another refresh with /jira refresh.

Details

In response to this:

Cache priming for test results

The test report cache loader now primes both collapsed and non-collapsed results in Redis, eliminating cold-cache latency for the most common test report views. Previously only collapsed results were primed, leaving non-collapsed queries (the detailed per-variant NURP+ view) to hit the database on first access.

To avoid unnecessary work, the cache primer now only targets OCP development releases (identified by having the payloadTags capability and no GA date set). GA releases and non-OCP products (OKD, etc.) are skipped.

Server-side pagination

The /api/tests endpoint previously returned all matching rows (up to 50k+ for uncollapsed views), causing the tests page to barely load. This adds server-side pagination following the existing pattern used by the job runs endpoint.

When perPage/page query parameters are present, the backend:

Applies ORDER BY, COUNT, LIMIT, and OFFSET at the SQL level

Returns a PaginationResult envelope with rows, total_rows, page_size, page

Bypasses the cache (paginated queries are fast with LIMIT/OFFSET)

When pagination params are absent, existing behavior is preserved for backward compatibility.

Frontend changes switch the DataGrid to paginationMode="server" and send perPage/page params, following the JobRunsTable pattern.

Filter pushdown (~830x improvement for filtered queries)

When collapse=false, TestsByNURPAndStandardDeviation builds a query that self-joins prow_test_report_7d_matview 3 times. Name/variant filters were only applied to the outermost query, causing subqueries to scan all rows for the release. Filters are now pushed into the stats and pass_rates subqueries, allowing index use on cache misses and paginated queries.

Replaces #3290 (rebased on main; original e2e failure was an unrelated sippy-load-job timeout).

Summary by CodeRabbit

Release Notes

New Features

Added server-side pagination to the Tests API endpoint with perPage and page parameters for efficient data retrieval

Test results table now supports server-side pagination for improved performance with large datasets

Pagination responses include total row count information for navigation

Documentation

Updated Tests API documentation with pagination parameters and new response structure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

stbenjam · 2026-05-07T16:24:10Z

I am not sure this is really the right approach, the caching is clunky and bad, @smg247 is going to look at it, if we can't figure out something better we could try to take this but it is really putting a tuxedo on a toad.

openshift-merge-bot · 2026-05-07T20:11:06Z

Scheduling required tests:
/test e2e

openshift-ci · 2026-05-07T20:38:27Z

@stbenjam: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

stbenjam and others added 2 commits May 6, 2026 17:06

openshift-ci Bot requested review from deepsm007 and smg247 May 6, 2026 17:08

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 6, 2026

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

coderabbitai Bot approved these changes May 6, 2026

View reviewed changes

openshift-merge-bot Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label May 6, 2026

stbenjam changed the title ~~perf: push filters into test report subqueries (~830x improvement)~~ TRT-2575: perf: push filters into test report subqueries + server-side pagination May 6, 2026

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 6, 2026

fix: gofmt parameters_test.go struct alignment

5ef2468

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

Comment thread pkg/api/tests.go

openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 6, 2026

stbenjam and others added 3 commits May 6, 2026 14:43

perf: prime test report cache for both collapsed and non-collapsed re…

ae7d76f

…sults Only prime cache for OCP development releases (no GA date, has payloadTags capability) to avoid wasting time on GA/OKD releases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

stbenjam changed the title ~~TRT-2575: perf: push filters into test report subqueries + server-side pagination~~ TRT-2575: perf: test report query optimization, pagination, and cache priming May 6, 2026

stbenjam changed the title ~~TRT-2575: perf: test report query optimization, pagination, and cache priming~~ TRT-2575: perf: test report cache priming, pagination, and query optimization May 6, 2026

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

stbenjam and others added 2 commits May 6, 2026 20:06

chore: remove benchmark script

fe68804

The production API is too slow to complete within curl's timeout, making the comparison benchmark impractical. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

stbenjam and others added 2 commits May 6, 2026 20:18

fix: nil check cache client in PrimeTestResultsCache

037d1f3

The old code path through GetDataFromCacheOrMatview handled nil cache gracefully, but the direct write path panicked on nil. Return an error instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-ci Bot assigned dgoodwin May 7, 2026

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 7, 2026

stbenjam changed the title ~~TRT-2575: perf: test report cache priming, pagination, and query optimization~~ perf: test report cache priming, pagination, and query optimization May 7, 2026

openshift-ci-robot removed the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 7, 2026

Conversation

stbenjam commented May 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Cache priming for test results

Server-side pagination

Filter pushdown (~830x improvement for filtered queries)

Summary by CodeRabbit

Release Notes

Uh oh!

openshift-merge-bot Bot commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (3 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-merge-bot Bot commented May 6, 2026

Uh oh!

openshift-ci-robot commented May 6, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-merge-bot Bot commented May 6, 2026

Uh oh!

stbenjam commented May 6, 2026

Review Panel Verdict

Specialist Findings

Panel Synthesis

Required Actions Before Merge

Optional Follow-ups

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

stbenjam commented May 6, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-merge-bot Bot commented May 6, 2026

Uh oh!

stbenjam commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark: Local API with cache priming (release 5.0)

Uh oh!

openshift-merge-bot Bot commented May 6, 2026

Uh oh!

openshift-merge-bot Bot commented May 6, 2026

Uh oh!

openshift-merge-bot Bot commented May 6, 2026

Uh oh!

dgoodwin commented May 7, 2026

Uh oh!

dgoodwin commented May 7, 2026

Uh oh!

openshift-ci Bot commented May 7, 2026

Uh oh!

openshift-ci-robot commented May 7, 2026

Cache priming for test results

Server-side pagination

Filter pushdown (~830x improvement for filtered queries)

Summary by CodeRabbit

Release Notes

Uh oh!

stbenjam commented May 7, 2026

Uh oh!

openshift-merge-bot Bot commented May 7, 2026

Uh oh!

openshift-ci Bot commented May 7, 2026

stbenjam commented May 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 6, 2026 •

edited

Loading

openshift-ci-robot commented May 6, 2026 •

edited by openshift-ci Bot

Loading

stbenjam commented May 6, 2026 •

edited

Loading