ISSUE #2032-CLT: Entity History Endpoint #25410

TeddyCr · 2026-01-20T15:18:27Z

Describe your changes:

Summary

Add new /history endpoint to list all entity versions (current + historical) within a time range with cursor-based pagination
Implement Guava LoadingCache for count queries with 1-minute TTL to reduce database load
Add database migrations for updatedAt generated column and composite index on entity_extension table

Description

This PR introduces a new API endpoint that allows clients to retrieve all entity versions (both current entities and their historical versions from
entity_extension) within a specified timestamp range. This is useful for auditing, change tracking, and data synchronization use cases.

Key Changes

New Endpoint: GET /v1/{entityType}/history

Query parameters: startTs, endTs, limit, before, after
Returns paginated ResultList with cursor-based navigation
Implemented once in base EntityResource.java, automatically available for all entity types

Repository Layer (EntityRepository.java):

listEntityHistoryByTimestamp() - Main method handling pagination and cursor logic
getVersionCountCached() - Cached count retrieval using Guava LoadingCache
decodeAndValidateCursor() - Cursor validation ensuring proper format (Long:UUID)

DAO Layer (CollectionDAO.java):

getEntityHistoryByTimestampRange() - UNION ALL query combining entity_extension and main entity table
getEntityHistoryByTimestampRangeCount() - Optimized count query for pagination totals

Database Migrations:

MySQL: Add VIRTUAL generated column updatedAt extracted from JSON
PostgreSQL: Add STORED generated column updatedAt extracted from JSON
Both: Add composite index idx_entity_extension_updated_at_id for efficient time-range queries

Caching Strategy

Count queries are cached using Guava LoadingCache with:

Max size: 500 entries
TTL: 1 minute
Cache key: tableName:startTs:endTs:extensionPrefix

Type of change:

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

New feature

The issue properly describes why the new feature is needed, what's the goal, and how we are building it. Any discussion
or decision-making process is reflected in the issue.
I have updated the documentation.
I have added tests around the new logic.

…for a specific entity type

…2032/clt

openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java

openmetadata-service/src/main/java/org/openmetadata/service/resources/EntityResource.java

harshach · 2026-01-20T21:40:09Z

openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java

+  private static final LoadingCache<String, Integer> COUNT_CACHE =
+      CacheBuilder.newBuilder()
+          .maximumSize(500)
+          .expireAfterWrite(1, TimeUnit.MINUTES)


.expireAfterAccess(5, TimeUnit.MINUTES) lets use this instead 1 min expiry. The data is minimal we can keep it slightly longer

openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java

gitar-bot · 2026-01-20T22:06:22Z

🔍 CI failure analysis for 24a5a8e: Five CI job failures (92 total tests): (1) Python lineage - Elasticsearch infrastructure failure, (2-3) Playwright - timing/timeout issues with 13 flaky tests, (4-5) Maven postgresql & sonarcloud - IDENTICAL 46 bulk loading field fetcher failures proving systematic pre-existing bug. All unrelated to PR #25410's entity history endpoint.

Issue

Five separate CI job failures occurred:

1. py-run-tests (3.11) - 10 Test Failures

The Python ingestion tests failed with lineage-related test failures where lineage queries return empty source tables.

Root Cause: Elasticsearch infrastructure failure in the test environment.

Evidence:

All 10 failures show identical symptom: Expected sources: {'SOURCE_TABLE_2', 'BASE_TABLE', ...} but Actual sources: set()
Repeated Elasticsearch errors: Search failed due to [es/search] failed: [search_phase_execution_exception] all shards failed
Additional infrastructure issues: "Failed to get access token" (6+ times), "ModuleNotFoundError: No module named 'cachetools'"

2. playwright-ci-postgresql (4, 6) - 2 Test Failures + 8 Flaky Tests

Playwright E2E tests had failures in domain rename tests.

Failures:

Domain Rename Comprehensive Tests › Rename domain with assets (tables, topics, dashboards) preserves associations
Domain Rename Comprehensive Tests › Multiple consecutive domain renames preserve all associations

Error Pattern: Both tests expected 3 assets in domain count but received 2.

Root Cause: Test flakiness/timing issue where domain rename operations complete asynchronously.

Additional Context: 8 other tests were flaky but passed on retry.

3. playwright-ci-postgresql (2, 6) - 2 Test Failures + 5 Flaky Tests

Playwright E2E tests had browser crash failures in Test Case Import/Export tests.

Failures:

Test Case Import/Export/Edit - End-to-End Flow with Admin › Admin: Complete export-import-validate flow
Test Case Import/Export/Edit - End-to-End Flow with EditAll User on TestCase resource › EditAll User: Complete export-import-validate flow

Error Pattern: Browser crash after 3-minute timeout: Error: page.waitForSelector: Target page, context or browser has been closed

Root Cause: Browser/page crash during long-running operations.

Additional Context: 5 other tests were flaky but passed on retry.

4. maven-postgresql-ci - 46 Test Failures

Java integration tests failed with bulk loading efficiency issues.

Failure Pattern:

test_bulkLoadingEfficiency - 30 failures across different entity types
test_fieldFetchersEfficiency - 15 failures across different entity types
testChartWithDashboards - 1 failure

Error Pattern: All failures show: Owners should be loaded consistently in bulk operations ==> expected: <1> but was: <0>

Root Cause: Bulk entity listing not populating the owners field when using fields query parameter, while individual entity fetching works correctly.

5. maven-sonarcloud-ci - 46 Test Failures (DUPLICATE)

Java integration tests with SonarCloud analysis - IDENTICAL failures to maven-postgresql-ci.

Failure Pattern: Exactly the same 46 test failures:

Same test names: test_bulkLoadingEfficiency, test_fieldFetchersEfficiency
Same affected resources: WorkflowResource, DashboardServiceResource, PipelineServiceResource, LLMServiceResource, IngestionPipelineResource, StorageServiceResource, DatabaseServiceResource, APIServiceResource, MetadataServiceResource, SecurityServiceResource, MlModelServiceResource, MessagingServiceResource, DriveServiceResource, SearchServiceResource, ChartResource, StoredProcedureResource, SearchIndexResource
Same error message: Owners should be loaded consistently in bulk operations ==> expected: <1> but was: <0>
Same locations: Lines 887 and 1236 in EntityResourceTest.java

Analysis: This is the same CI job as maven-postgresql-ci but with SonarCloud static analysis added. Both run the full test suite, and both encounter the identical bulk loading regression. This confirms the issue is NOT database-specific but affects the test environment uniformly.

Details

Why all five failures are unrelated to PR #25410:

PR Scope: This PR adds a /history endpoint for querying entity versions within a timestamp range:
- Database migrations for updatedAt column and composite indexes
- New REST endpoint GET /v1/{entityType}/history in EntityResource.java base class
- Repository method listEntityHistoryByTimestamp() for timestamp-based queries
- DAO methods using UNION of entity_extension and main entity tables
- Guava LoadingCache for count queries (5-minute TTL, 500 max entries)
- REMOVES obsolete /versions endpoints from SearchIndex, SearchService, Glossary, and Tag resources
- Method rename: listAllVersionsByTimestamp → listEntityHistoryByTimestamp
No Bulk Loading Code Modified: The PR does not touch:
- Entity listing endpoints or bulk loading logic
- Field fetcher implementation or field population code
- listEntities() or listInternal() methods
- Owner loading or relationship fetching code
- Any code path used by the fields query parameter
Changes Are Isolated: All PR changes are isolated to:
- New /history endpoint (completely new code path)
- Internal repository method for timestamp-based version queries
- DAO layer for querying entity_extension table by timestamp
- Count query caching (only affects /history endpoint)
- Minor refactoring: .get(0) → .getFirst() and .get(size-1) → .getLast()
No Lineage/Elasticsearch Code Modified: The PR does not touch lineage or Elasticsearch code.
No Domain/UI Code Modified: The PR does not touch domain rename logic or UI components.
No Test Case Import/Export Code Modified: The PR does not touch data quality or import/export functionality.
Duplicate Failure Analysis: The fact that both maven-postgresql-ci AND maven-sonarcloud-ci have IDENTICAL failures (same 46 tests, same error messages) proves:
- This is NOT a database-specific issue (not PostgreSQL-specific)
- This is NOT a SonarCloud-specific issue
- This is a systematic test environment issue affecting all Maven CI jobs
- The bulk loading field fetching bug is consistent and reproducible across CI runs
- The issue is NOT introduced by this PR (which doesn't touch bulk loading code)

Why Maven failures are pre-existing/environmental:

The PR makes NO changes to bulk loading, field fetching, or owner population code
All 46 failures follow the exact same pattern across unrelated entity types
The new /history endpoint uses a completely different code path
The existing listEntities() method (used by failing tests) is untouched
The failures occur uniformly across BOTH maven-postgresql-ci and maven-sonarcloud-ci
Possible causes: recent main branch regression, test ordering dependency, or CI environment configuration issue

Root Cause Summary

All five CI job failures (92 total test failures) stem from test environment issues and pre-existing bugs, not from code changes in this PR:

Elasticsearch cluster failure - infrastructure issue
Domain rename timing issues - pre-existing test flakiness
Browser crashes on long-running tests - resource/timeout constraints
Bulk loading field fetching regression - pre-existing bug affecting ALL Maven CI jobs uniformly (not introduced by this PR)
Duplicate of Create OpenMetadata website and required content #4 - confirms the bulk loading issue is systematic, not random

Recommended Actions

For Python tests: Retry the job to confirm Elasticsearch infrastructure issue is transient.

For Playwright tests (domain rename): Improve test reliability by adding explicit waits and retry logic.

For Playwright tests (import/export): Increase timeout and add intermediate checkpoints.

For Maven tests (bulk loading - affects both postgresql and sonarcloud jobs): The solution is to:

Investigate recent changes to entity listing or field fetching code in the main branch
Check if there was a recent merge that introduced this bulk loading regression
Verify that the fields query parameter is properly propagated to field fetchers
Review the entity listing implementation to ensure owners field is included in bulk queries
Consider if CI environment configuration changed recently

These issues should be tracked separately. None are blockers for this PR as all failures are unrelated to the entity history endpoint implementation. The PR's changes are isolated to a new code path (/history endpoint) that doesn't intersect with any of the failing test code paths.

Code Review 👍 Approved with suggestions 7 resolved / 10 findings

Well-implemented history endpoint with minor documentation inconsistency. The UNION semantics issue from previous review has been fixed.

⚠️

Edge Case: Cursor validation allows any UUID, not just existing entity IDs

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java:275

The decodeAndValidateCursor method validates that the cursor contains a valid UUID format but doesn't verify the UUID actually corresponds to an existing entity:

private CursorInfo decodeAndValidateCursor(String cursor) {
  // ...
  try {
    UUID.fromString(parts[1]);
  } catch (IllegalArgumentException e) {
    throw new IllegalArgumentException("Invalid UUID in cursor");
  }
  return new CursorInfo(Long.parseLong(parts[0]), parts[1]);
}

A malicious user could craft a cursor with a valid UUID that doesn't exist, potentially causing unexpected behavior in the pagination logic. While this may not be a critical security issue, it could lead to empty result sets or confusing pagination behavior.

Recommendation: Consider adding a check that the cursor ID exists in the database, or document that invalid cursors may produce empty/unexpected results as acceptable behavior.

⚠️

Bug: @max validation message contradicts actual max value

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/EntityResource.java:1628-1631 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/EntityResource.java:1095

The @Max annotation on line 1630-1631 has a value of 500 but the error message says "must be less than or equal to 1000000". This misleading validation message could confuse API consumers when they exceed the limit.

Current code:

@Max(value = 500, message = "must be less than or equal to 1000000")

Suggested fix:

@Max(value = 500, message = "must be less than or equal to 500")

Or, if 1000000 is the intended limit, update the value accordingly.

More details 💡 1 suggestion ✅ 7 resolved

💡 Bug: Validation description contradicts @max constraint

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/EntityResource.java:1095

The @Parameter description says "Limit the number of entity returned (1 to 1000000, default = 10)" but the actual @Max constraint is set to 500. This inconsistency between documentation and implementation will confuse API users.

Impact: API users may expect to be able to pass values up to 1000000 based on the documentation, only to receive validation errors.

Suggested fix: Update the description to match the actual constraint:

@Parameter(description = "Limit the number of entities returned (1 to 500, default = 10)")

✅ Bug: MySQL and PostgreSQL queries use different UNION semantics

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:1356 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:1333 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:1382 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:1368
The getEntityHistoryByTimestampRange method uses UNION ALL for MySQL (line 1347) but UNION for PostgreSQL (line 1362). This creates inconsistent behavior:

MySQL (UNION ALL): Returns all rows including duplicates

PostgreSQL (UNION): Removes duplicate rows

This means the same API call could return different results depending on the database backend, which is unexpected behavior for users.

Impact:

If an entity has the same ID, updatedAt, and json in both tables (edge case), PostgreSQL will return one row while MySQL returns two.

This inconsistency makes the API behavior unpredictable across deployments.

Suggested fix: Use UNION ALL consistently for both database types, or use UNION for both if duplicate elimination is desired. Based on the PR description mentioning "current + historical versions", UNION ALL seems more appropriate to ensure all versions are returned.

✅ Bug: PostgreSQL query missing UNION keyword between subqueries

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:1351
The PostgreSQL version of the getEntityHistoryByTimestampRange query is missing the UNION keyword between the entity_extension subquery and the main table subquery. This will cause a SQL syntax error when running on PostgreSQL.

Looking at the MySQL query:
"SELECT id, updatedAt, json FROM entity_extension ... "
+ "UNION "  // ← UNION is present
+ "SELECT id, updatedAt, json FROM <table> ..."
But the PostgreSQL query shows:
"AND jsonSchema = :entityType "
+ "SELECT id, updatedAt, json::jsonb FROM <table> ..."  // ← Missing UNION
The same issue exists in the count query for PostgreSQL.

Fix: Add + "UNION " before the second SELECT in both PostgreSQL queries.

✅ Performance: Count query includes UNION ALL but may be slow on large tables

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:1099
The count query getEntityHistoryByTimestampRangeCount performs a COUNT on the UNION of both tables:
SELECT COUNT(*) FROM (
  SELECT id FROM <tableName> WHERE ... extension LIKE :extensionPrefix
  UNION ALL
  SELECT id FROM :tableName WHERE ...
) AS combined
For large datasets with many historical versions, this count query could be expensive, even with the caching strategy. The LIKE prefix query on the extension column may not use an index efficiently.

Recommendation: Consider:

Adding an index on the extension column if not already present

For very large datasets, consider approximate counts or removing total count from pagination (returning only has-more indicators)

✅ Bug: UNION ALL query may return duplicates between extension and main table

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:1076
The getEntityHistoryByTimestampRange SQL query uses UNION ALL to combine results from entity_extension and the main entity table:
SELECT json FROM <tableName>
WHERE updatedAt >= :startTs AND updatedAt <= :endTs
  AND extension LIKE :extensionPrefix
  <cursorCondition>
ORDER BY updatedAt DESC, id DESC
LIMIT :limit
) AS combined
UNION ALL
(SELECT json FROM :tableName
WHERE updatedAt >= :startTs AND updatedAt <= :endTs
  <cursorCondition>
ORDER BY updatedAt DESC, id DESC
LIMIT :limit
)
If the same entity exists in both tables (current version in main table, and a historical version with the same updatedAt in extension table), they would both be returned. While this might be intentional (to show all versions), the UNION ALL doesn't deduplicate, and the final ORDER BY and LIMIT applied to the outer query may produce inconsistent results across pages.

Additionally, the outer ORDER BY updatedAt DESC, id DESC LIMIT :limit is applied after the UNION ALL, but each subquery already applies its own LIMIT, which could cause pagination issues when the result set spans both tables.

Recommendation: Clarify if duplicates are expected. If not, consider using UNION instead of UNION ALL, or restructure to handle this case explicitly.

✅ Bug: getResourcePath() returns pluralized path that may be incorrect

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/BaseEntityIT.java:116
The getResourcePath() method in BaseEntityIT.java appends "s" to pluralize the entity type:
protected String getResourcePath() {
  return "/v1/" + getEntityType() + "s/";
}
This simple pluralization may not work correctly for all entity types. For example:

policy → /v1/policys/ (incorrect, should be /v1/policies/)

glossary → /v1/glossarys/ (incorrect, should be /v1/glossaries/)

This could cause test failures for entities with irregular pluralization.

Recommendation: Override getResourcePath() in subclasses with irregular pluralization, or use a proper pluralization utility.

...and 2 more from earlier reviews

What Works Well

Clean implementation of cursor-based pagination with proper cache strategy for count queries. Database migrations handle both MySQL and PostgreSQL with appropriate generated columns and indexes. Comprehensive test coverage across multiple entity types.

Recommendations

The cursor validation intentionally only validates format (Long:UUID) without checking entity existence, which is standard for cursor pagination. Consider documenting this behavior for API consumers.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off Gitar will not commit updates to this branch.
Display: compact Hiding non-applicable rules.

Comment with these commands to change:

`Auto-apply`	`Compact`
`gitar auto-apply:on`	`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | This comment will update automatically (Docs)}

openmetadata-service/src/main/java/org/openmetadata/service/resources/EntityResource.java

sonarqubecloud · 2026-01-21T00:24:42Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

TeddyCr added 9 commits January 20, 2026 11:51

feat: added repository logic to list all versions (including latest) …

6e13f33

…for a specific entity type

Merge commit '8b20f6c23aabcfd6099af8120d1802fa709701b7' into feature/…

dd4c519

…2032/clt

feat: added list all versions for all the entity resources

1a3fba8

Merge commit 'b0335b63a7438f39bbf3c162bf44171330915ba9' into feature/…

42264b7

…2032/clt

feat: moved endpoint to EntityResource

c29e53e

Merge remote-tracking branch 'origin/main' into feature/2032/clt

12e01c3

feat: renamed endpoint to /history and methods to EntityHistory

781f8be

Merge remote-tracking branch 'origin/main' into ISSUE-2032-CLT

29d5238

feat: ran java linting

6ee8a6f

TeddyCr requested a review from a team as a code owner January 20, 2026 15:18

TeddyCr had a problem deploying to test January 20, 2026 15:18 — with GitHub Actions Error

github-actions bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Jan 20, 2026

feat: remove v1 implementation left over code

df851c0

TeddyCr had a problem deploying to test January 20, 2026 15:22 — with GitHub Actions Failure

TeddyCr temporarily deployed to test January 20, 2026 15:22 — with GitHub Actions Inactive

feat: fix failing tests

42e1b69

TeddyCr had a problem deploying to test January 20, 2026 17:08 — with GitHub Actions Error

TeddyCr had a problem deploying to test January 20, 2026 21:32 — with GitHub Actions Error

gitar-bot bot reviewed Jan 20, 2026

View reviewed changes

openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java Show resolved Hide resolved

gitar-bot bot reviewed Jan 20, 2026

View reviewed changes

openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java Show resolved Hide resolved

gitar-bot bot reviewed Jan 20, 2026

View reviewed changes

openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java Show resolved Hide resolved

gitar-bot bot reviewed Jan 20, 2026

View reviewed changes

openmetadata-service/src/main/java/org/openmetadata/service/resources/EntityResource.java Show resolved Hide resolved

harshach reviewed Jan 20, 2026

View reviewed changes

gitar-bot bot reviewed Jan 20, 2026

View reviewed changes

openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java Show resolved Hide resolved

gitar-bot bot reviewed Jan 20, 2026

View reviewed changes

openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java Show resolved Hide resolved

feat: increase cache duration

bae4c85

TeddyCr had a problem deploying to test January 20, 2026 21:45 — with GitHub Actions Error

feat: address query edge cases

24a5a8e

TeddyCr temporarily deployed to test January 20, 2026 22:06 — with GitHub Actions Inactive

TeddyCr had a problem deploying to test January 20, 2026 22:06 — with GitHub Actions Failure

harshach approved these changes Jan 20, 2026

View reviewed changes

gitar-bot bot reviewed Jan 20, 2026

View reviewed changes

openmetadata-service/src/main/java/org/openmetadata/service/resources/EntityResource.java Show resolved Hide resolved

TeddyCr merged commit 83143d5 into main Jan 21, 2026
26 of 35 checks passed

TeddyCr deleted the ISSUE-2032-CLT branch January 21, 2026 05:52

gitar-bot bot mentioned this pull request Jan 21, 2026

FIX #24374 - Data Contract at Data Product level #25314

Open

9 tasks

ISSUE #2032-CLT: Entity History Endpoint #25410

ISSUE #2032-CLT: Entity History Endpoint #25410

Uh oh!

Conversation

TeddyCr commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes:

Type of change:

Checklist:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

harshach Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gitar-bot bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

1. py-run-tests (3.11) - 10 Test Failures

2. playwright-ci-postgresql (4, 6) - 2 Test Failures + 8 Flaky Tests

3. playwright-ci-postgresql (2, 6) - 2 Test Failures + 5 Flaky Tests

4. maven-postgresql-ci - 46 Test Failures

5. maven-sonarcloud-ci - 46 Test Failures (DUPLICATE)

Details

Root Cause Summary

Recommended Actions

What Works Well

Recommendations

Uh oh!

Uh oh!

sonarqubecloud bot commented Jan 21, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TeddyCr commented Jan 20, 2026 •

edited

Loading

gitar-bot bot commented Jan 20, 2026 •

edited

Loading