Skip to content

FIX: add support for offset and limit on listing aggregations#25943

Merged
IceS2 merged 11 commits intomainfrom
use-limit-and-offset-on-listing-agg
Feb 19, 2026
Merged

FIX: add support for offset and limit on listing aggregations#25943
IceS2 merged 11 commits intomainfrom
use-limit-and-offset-on-listing-agg

Conversation

@IceS2
Copy link
Copy Markdown
Contributor

@IceS2 IceS2 commented Feb 17, 2026

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.


Summary by Gitar

  • Added offset and limit pagination support for aggregated queries:\n - Extended ListParams SDK model with offset and latest boolean fields enabling pagination parameters\n - Enhanced EntityTimeSeriesRepository.listLatestFromSearch() to accept limit/offset/sortField/sortType for paginated aggregations\n\n- Implemented dual-aggregation pagination strategy:\n - Added bucket_sort and stats_bucket pipeline aggregations (Elasticsearch and OpenSearch) to slice result buckets and count totals accurately\n - Parallel aggregation trees when paginating: byTerms for sliced results + byTermsCount for exact filtered count\n\n- Updated resource layer to support pagination:\n - TestCaseResolutionStatusResource and TestCaseResultResource now pass pagination parameters to aggregation method\n - Removed documentation stating "offset and limit are ignored"\n\n- Added comprehensive test coverage:\n - Integration test (IncidentPaginationIT) verifies pagination across multiple pages and edge cases (11 test cases, 5-item pages)\n - Unit tests (EntityTimeSeriesRepositoryPaginationTest and SearchAggregationTest) validate aggregation node building and bucket_sort/stats_bucket creation"

This will update automatically on new commits.

TeddyCr
TeddyCr previously approved these changes Feb 17, 2026
@github-actions
Copy link
Copy Markdown
Contributor

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Comment on lines 159 to 161
aggregationsMap.removeIf(
aggregationMap -> aggregationMap.get("aggType").contains("bucket_selector"));
for (int j = 0; j < aggregationsMap.size(); j++) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Bug: bucket_sort not excluded from getAggregationMetadata()

The getAggregationMetadata() method (line 159-161) removes bucket_selector from metadata but does not remove bucket_sort. Since bucket_sort is also a pipeline aggregation (not a dimension or metric), it should be excluded just like bucket_selector.

When bucket_sort appears as a leaf node in metadata, it will:

  1. Add null to the metrics list (line 171, since it has no field key)
  2. Add bucket_sort#pagination to the keys list (line 184)

This could corrupt report metadata if the aggregation tree is processed through the generic aggregation path. Currently this doesn't affect listLatestFromSearch (which doesn't use getAggregationMetadata), but it's a latent bug if the tree structure is reused elsewhere.

Suggested fix:

        aggregationsMap.removeIf(
            aggregationMap -> aggregationMap.get("aggType").contains("bucket_selector")
                || aggregationMap.get("aggType").contains("bucket_sort"));

Was this helpful? React with 👍 / 👎

Comment on lines 158 to 160
// remove bucket_selector from metadata as it is a filter and neither a dimension nor a metric
aggregationsMap.removeIf(
aggregationMap -> aggregationMap.get("aggType").contains("bucket_selector"));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Edge Case: stats_bucket also not excluded from getAggregationMetadata()

The existing finding notes that bucket_sort isn't excluded from getAggregationMetadata(). The same issue applies to stats_bucket, another pipeline aggregation added in this PR.

At line 159, only bucket_selector is removed:

aggregationsMap.removeIf(
    aggregationMap -> aggregationMap.get("aggType").contains("bucket_selector"));

If the pagination aggregation tree is ever passed through the data quality report path, both bucket_sort and stats_bucket nodes would leak into the metadata, potentially adding null to the metrics list (since neither has a "field" key).

While this isn't currently triggered (the pagination aggregation doesn't go through getAggregationMetadata()), it's a defensive fix worth making alongside the existing finding. Consider generalizing the exclusion to all pipeline aggregations.

Suggested fix:

        // remove pipeline aggregations from metadata as they are neither a dimension nor a metric
        aggregationsMap.removeIf(
            aggregationMap -> {
              String aggType = aggregationMap.get("aggType");
              return aggType.contains("bucket_selector")
                  || aggType.contains("bucket_sort")
                  || aggType.contains("stats_bucket");
            });

Was this helpful? React with 👍 / 👎

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Feb 19, 2026

🔍 CI failure analysis for 6c8f33d: maven-collate-ci failed again (5th consecutive occurrence) due to external Collate workflow failure. Cascading from flaky tests. Pagination code remains correct.

Issue

maven-collate-ci job failed - 5th consecutive occurrence of this cascading failure pattern from external Collate workflow.

Root Cause

Cascading failure from external Collate workflow encountering flaky integration tests.

Failed Job (Current Run - Commit 6c8f33d)

maven-collate-ci (64137552240): External workflow failure

Details

Job Flow:

  1. ✅ Verified PR labels
  2. ✅ Triggered Collate workflow (commit 6c8f33d)
  3. ⏳ Waited 27 minutes 43 seconds
  4. ❌ Workflow failed: conclusion=failure
  5. ❌ Job failed: ##[error]Workflow run has failed

Pattern Analysis

maven-collate-ci failures - all 5 consecutive occurrences:

  1. Commit 774bad0 (64042523505)
  2. Commit c14c6f6 (64120353941)
  3. Commit 6c8f33d (64123966800)
  4. Commit 6c8f33d (64127470908)
  5. Commit 6c8f33d (64137552240, this run)

Conclusion: Persistent pattern of external Collate workflow failures across all CI runs for this PR.

Why This Is Unrelated to PR

  1. Wrapper job triggering external workflow
  2. Collate workflow runs same test suite in separate environment
  3. Encountering same flaky infrastructure issues
  4. PR modifies only pagination logic
  5. IncidentPaginationIT continues to pass in direct CI runs

Conclusion

Fifth consecutive maven-collate-ci cascading failure. Environmental flakiness in external Collate workflow. Pagination functionality working correctly.

The pagination functionality is working correctly and the PR is ready from a code perspective.

Code Review 👍 Approved with suggestions 4 resolved / 6 findings

Well-implemented pagination feature with solid test coverage. The dual-aggregation strategy for accurate total counts is sound. One prior finding about pipeline aggregation exclusion in metadata remains unresolved, and stats_bucket has the same issue.

💡 Edge Case: stats_bucket also not excluded from getAggregationMetadata()

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchAggregation.java:158-160

The existing finding notes that bucket_sort isn't excluded from getAggregationMetadata(). The same issue applies to stats_bucket, another pipeline aggregation added in this PR.

At line 159, only bucket_selector is removed:

aggregationsMap.removeIf(
    aggregationMap -> aggregationMap.get("aggType").contains("bucket_selector"));

If the pagination aggregation tree is ever passed through the data quality report path, both bucket_sort and stats_bucket nodes would leak into the metadata, potentially adding null to the metrics list (since neither has a "field" key).

While this isn't currently triggered (the pagination aggregation doesn't go through getAggregationMetadata()), it's a defensive fix worth making alongside the existing finding. Consider generalizing the exclusion to all pipeline aggregations.

Suggested fix
        // remove pipeline aggregations from metadata as they are neither a dimension nor a metric
        aggregationsMap.removeIf(
            aggregationMap -> {
              String aggType = aggregationMap.get("aggType");
              return aggType.contains("bucket_selector")
                  || aggType.contains("bucket_sort")
                  || aggType.contains("stats_bucket");
            });
💡 Bug: bucket_sort not excluded from getAggregationMetadata()

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchAggregation.java:159-161

The getAggregationMetadata() method (line 159-161) removes bucket_selector from metadata but does not remove bucket_sort. Since bucket_sort is also a pipeline aggregation (not a dimension or metric), it should be excluded just like bucket_selector.

When bucket_sort appears as a leaf node in metadata, it will:

  1. Add null to the metrics list (line 171, since it has no field key)
  2. Add bucket_sort#pagination to the keys list (line 184)

This could corrupt report metadata if the aggregation tree is processed through the generic aggregation path. Currently this doesn't affect listLatestFromSearch (which doesn't use getAggregationMetadata), but it's a latent bug if the tree structure is reused elsewhere.

Suggested fix
        aggregationsMap.removeIf(
            aggregationMap -> aggregationMap.get("aggType").contains("bucket_selector")
                || aggregationMap.get("aggType").contains("bucket_sort"));
✅ 4 resolved
Bug: Cardinality total count ignores bucket_selector filtering

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityTimeSeriesRepository.java:511 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityTimeSeriesRepository.java:441-452
The cardinality aggregation counts all unique values of the groupBy field across the entire index (or query scope), but the bucket_selector pipeline aggregation filters out buckets that don't match content filters (i.e., where the latest timestamp doesn't match the matching timestamp). This means the cardinality-reported total count will be higher than the actual number of valid buckets that pass the bucket_selector filter.

For example, if there are 200 unique test cases (cardinality = 200), but only 150 match the content filters, the API will report total: 200 while only 150 results are actually pageable. This causes the UI to show more pages than actually exist, with later pages returning empty results.

The cardinality aggregation doesn't account for the bucket_selector filtering logic. To get an accurate count, you'd need a different approach — for example, running a separate query with the content filters applied, or using the terms aggregation with a large enough size to get all buckets and counting the results post-filter.

Bug: Terms aggregation size=100 silently truncates bucket_sort pagination

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityTimeSeriesRepository.java:467 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchAggregation.java:38
The terms aggregation created in buildComplexAggregation() uses SearchAggregation.terms("byTerms", groupBy) which hardcodes size: 100 (see SearchAggregation.java:38). The bucket_sort pipeline aggregation operates on the buckets already returned by the parent terms aggregation, so it can only paginate within those 100 buckets.

If a user has more than 100 unique groups (e.g., 500 test cases) and requests offset=50, limit=100, the terms aggregation returns only 100 buckets, the bucket_selector filters some out, and then bucket_sort tries to skip 50 and return 100 — but there aren't enough buckets. Meanwhile, the cardinality aggregation correctly reports the true total (500), creating a mismatch between reported total and available pages.

The terms aggregation size needs to be increased to accommodate pagination. At minimum, it should be Math.min(offset + limit, MAX_AGGREGATE_SIZE) when pagination parameters are provided, or set to MAX_AGGREGATE_SIZE to ensure all buckets are available for the bucket_selector filter and subsequent bucket_sort.

Quality: Swallowed exception hides failures in cardinality extraction

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityTimeSeriesRepository.java:450
The catch block at lines 450-452 silently swallows all exceptions during cardinality extraction. When pagination is in effect, falling back to entityList.size() will return the size of the current page rather than the true total, which would break pagination metadata. At minimum, the exception should be logged at WARN level so failures are observable in production.

} catch (Exception e) {
  LOG.warn("Failed to extract cardinality from aggregation response, falling back to entity list size", e);
}
Quality: precision_threshold in cardinality builder is never applied

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchAggregation.java:134
The precision_threshold parameter is set in SearchAggregation.cardinality() at line 134 (value.put("precision_threshold", "3000")), but the existing ElasticCardinalityAggregations and OpenCardinalityAggregations classes do not read or apply this parameter when building the aggregation — they only read params.get("field"). This means the precision threshold setting is dead code that may mislead maintainers into thinking it's being applied. Either remove it from the builder, or update the cardinality aggregation implementations to use it.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants