Lineage Improvements#24919
Conversation
1. Add path preservation 2. Add Column Filter
|
TypeScript types have been updated based on the JSON schema changes in the PR |
|
🔍 CI failure analysis for c6c3863: Maven build failed due to 502 Bad Gateway error from yarn registry while fetching @emotion package; this is a transient infrastructure issue unrelated to PR code changes.IssueMaven build failure in Root CauseThe yarn package registry (registry.yarnpkg.com) returned a 502 Bad Gateway error when trying to fetch: This is a temporary infrastructure/network issue on the registry's side. DetailsBuild Context:
Why This Is Unrelated to PR #24919:
Combined with Previous Python Test Failure: SolutionThe fix is to retry the CI job. This is a flaky infrastructure issue that should resolve once the yarn registry recovers from the temporary outage. The 502 Bad Gateway indicates the registry service was temporarily unavailable. Code Review 👍 Approved with suggestionsComprehensive lineage performance enhancement with column filtering. Well-architected strategy pattern with good test coverage. Three previous findings remain unresolved. Resolved ✅ 3 resolvedPerformance: GuavaLineageGraphCache weight calculation creates redundant lists
Edge Case: shouldCacheGraph returns false for zero nodes
Bug: Integer overflow possible in calculateGeometricProgression
What Works Well
Recommendations
Status of Previous Findings
Tip Comment OptionsAuto-apply is off Gitar will not commit updates to this branch. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | This comment will update automatically (Docs) |
|
* Lineage Improvements 1. Add path preservation 2. Add Column Filter * Remove impl doc * Add Lineage Strategies for efficient loading of graphs * Update getByEntityCount * Update generated TypeScript types * Add Builder * Fix Build Issue * Make NodOpProgressTracker have empty constructor --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>



Describe your changes:
Fixes
I worked on ... because ...
Summary by Gitar
SmallGraphStrategy,MediumGraphStrategy,LargeGraphStrategy,StreamingGraphStrategy) handle graphs from 1K to 500K+ nodesLineageStrategySelectordynamically chooses optimal approach based on estimated graph sizecolumnFilterandpreservePathsAPI parameters inLineageResourceenable filtering by column names, tags, or glossary termsColumnFilterMatcherandLineagePathPreserverclasses implement filtering logic while preserving complete lineage pathsGuavaLineageGraphCachewith LRU eviction and 5-minute TTL caches small/medium graphs (<50K nodes)LineageCacheKeyprovides composite cache keys based on query parametersAbstractLineageGraphBuilderbase class eliminates 95% code duplication between Elasticsearch and OpenSearch implementationslineageSettings.jsonextended withgraphPerformanceConfigdefining thresholds (5K, 50K, 100K nodes) and batch sizes for each strategyThis will update automatically on new commits.