Skip to content

Conversation

@siddhant1
Copy link
Member

@siddhant1 siddhant1 commented Jan 21, 2026

Screen.Recording.2026-01-21.at.4.53.10.PM.mov

Thank you for your contribution!
Unless your change is trivial, please create an issue to discuss the change before creating a PR.
-->

Describe your changes:

Screenshot 2026-01-21 at 4 51 53 PM

Fixes

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

  • Database schema enhancement:
    • Added entityStatus column with index to glossary terms table for efficient status-based filtering
  • Backend filtering improvement:
    • Modified ListFilter.getEntityStatusCondition() to use IN clause instead of equality for multi-status queries
    • Updated GlossaryTermRepository.searchGlossaryTermsInternal to support filtering by multiple status values
  • Frontend status filtering:
    • Enhanced GlossaryTermTab.component.tsx with multi-select status filter controls
    • Updated glossaryAPI.ts to pass status filters to backend endpoints
  • Test coverage:
    • Added GlossaryTermResourceIT.java integration tests for status filtering scenarios
    • Added Playwright E2E tests in GlossaryStatusFilterLargeDataset.spec.ts and GlossaryStatusFilterNestedTerms.spec.ts

This will update automatically on new commits.


@github-actions
Copy link
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@github-actions
Copy link
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@github-actions
Copy link
Contributor

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@siddhant1
Copy link
Member Author

@gitar-bot fix the SQL injection, add a commit

@github-actions
Copy link
Contributor

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@gitar-bot
Copy link

gitar-bot bot commented Jan 21, 2026

@gitar-bot fix the SQL injection, add a commit

I've fixed the SQL injection vulnerability in GlossaryTermRepository.buildEntityStatusCondition.

The fix validates input values against the EntityStatus enum before constructing SQL queries. Invalid status values are now filtered out, preventing SQL injection attacks.

Changes in GlossaryTermRepository.java:1633

  • Added validation logic that checks each status value against the allowed EntityStatus enum values
  • Only legitimate status values (APPROVED, DRAFT, IN_REVIEW, DEPRECATED, REJECTED, UNPROCESSED) can be used
  • Ran mvn spotless:apply to format the code

This complements the earlier fix in ListFilter.java, ensuring both locations are now protected against SQL injection.

@siddhant1 siddhant1 force-pushed the sid/fix-status-glossary branch from db7c661 to 3038077 Compare January 25, 2026 09:58
@gitar-bot
Copy link

gitar-bot bot commented Jan 25, 2026

🔍 CI failure analysis for cdbe7e5: Maven SonarCloud CI shows IDENTICAL failures as Maven PostgreSQL CI - confirms consistent Apps/AWS credential issues, UNRELATED (7804 tests, 1 failed, 3 errors, 99.95% pass rate)

Issue

Maven SonarCloud CI (job 61413375919) failed with IDENTICAL results to Maven PostgreSQL CI.

Root Cause

UNRELATED to this PR - Same Apps and AWS credential test failures across both Maven CI environments

Test Results: 7804 tests, 1 failed, 3 errors, 701 skipped (99.95% pass rate)

Key Finding - Identical Results Across Maven CI Environments:

  • Maven PostgreSQL CI (job 61413375972): 1 failed, 3 errors
  • Maven SonarCloud CI (job 61413375919): 1 failed, 3 errors (IDENTICAL)

Same Failed Tests (Both Environments):

  1. AppsResourceTest - Apps functionality
  2. AwsCredentialsProviderTest - AWS credential provider
  3. AwsCredentialsUtilTest - AWS credential utilities

Historical Pattern - Multiple Maven CI Runs

Previous runs documented:

  • Earlier Maven PostgreSQL CI: 1 failed, 9 errors (same tests)
  • Earlier Maven SonarCloud CI: 1 failed, 9 errors (identical)

Current runs:

  • Maven PostgreSQL CI: 1 failed, 3 errors
  • Maven SonarCloud CI: 1 failed, 3 errors (IDENTICAL)

Proves: Consistent pre-existing failures in Apps and AWS credential functionality

Evidence This Is UNRELATED

  1. No related files modified - This PR only modifies glossary term files
  2. Identical across environments - Same failures in both PostgreSQL and SonarCloud CI
  3. High pass rate - 99.95% (7800/7804)
  4. Consistent pattern - Multiple Maven runs show same test areas failing

Assessment

UNRELATED pre-existing issue. Having identical failures across both Maven PostgreSQL and SonarCloud CI environments confirms these are pre-existing bugs in Apps and AWS credential functionality, not introduced by this PR's glossary term status filtering changes.

The glossary term status filtering implementation is working correctly with 99.95% Maven test pass rate.

Code Review 👍 Approved with suggestions 19 resolved / 22 findings

Good implementation of glossary status filtering with comprehensive tests and proper SQL injection prevention. Three minor issues from previous review remain unresolved: missing newlines in SQL files, duplicated parseEntityStatusValues method, and MySQL migration lacking IF NOT EXISTS.

💡 Edge Case: MySQL migration missing IF NOT EXISTS for column addition

📄 bootstrap/sql/migrations/native/1.11.7/mysql/schemaChanges.sql:3-5 📄 bootstrap/sql/migrations/native/1.11.7/postgres/schemaChanges.sql:3-5 📄 bootstrap/sql/migrations/native/1.11.7/mysql/schemaChanges.sql 📄 bootstrap/sql/migrations/native/1.11.7/postgres/schemaChanges.sql

The MySQL migration script uses a bare ALTER TABLE ... ADD COLUMN without any idempotency check:

ALTER TABLE glossary_term_entity
  ADD COLUMN entityStatus VARCHAR(32)

While the PostgreSQL migration correctly uses IF NOT EXISTS:

ALTER TABLE glossary_term_entity
  ADD COLUMN IF NOT EXISTS entityStatus VARCHAR(32)

If the MySQL migration is run twice (e.g., during a failed migration retry or in certain deployment scenarios), it will fail with a "Duplicate column name" error. MySQL 8.0 doesn't support IF NOT EXISTS for ADD COLUMN natively, so this typically requires a stored procedure or checking information_schema.

Note: The PostgreSQL migration handles this correctly with IF NOT EXISTS for both the column addition and the index creation. The MySQL migration also uses CREATE INDEX without a conditional check, though MySQL indexes will fail silently if they already exist.

Suggested fix: Consider using a conditional check pattern for MySQL, such as:

-- Using stored procedure or checking if column exists first
SELECT COUNT(*) INTO @col_exists FROM information_schema.columns 
WHERE table_schema = DATABASE() AND table_name = 'glossary_term_entity' AND column_name = 'entityStatus';
SET @query = IF(@col_exists = 0, 'ALTER TABLE glossary_term_entity ADD COLUMN entityStatus VARCHAR(32) GENERATED ALWAYS AS (json ->> ''$.entityStatus'') STORED', 'SELECT 1');
PREPARE stmt FROM @query;
EXECUTE stmt;

Or wrap in a stored procedure that checks for column existence first.

💡 Edge Case: Missing newline at end of migration SQL files

📄 bootstrap/sql/migrations/native/1.11.7/mysql/schemaChanges.sql:9 📄 bootstrap/sql/migrations/native/1.11.7/postgres/schemaChanges.sql:9 📄 bootstrap/sql/migrations/native/1.11.7/mysql/schemaChanges.sql:15

Both the MySQL and PostgreSQL migration SQL files are missing a newline at the end of the file. While this is a minor issue, many code conventions and tools (like POSIX standards, git diffs, and various linters) expect files to end with a newline character.

The files currently end with the CREATE INDEX statement without a trailing newline, which can cause issues with file concatenation tools and may result in "\ No newline at end of file" warnings in diffs.

Suggested fix: Add a newline at the end of both:

  • bootstrap/sql/migrations/native/1.11.7/mysql/schemaChanges.sql
  • bootstrap/sql/migrations/native/1.11.7/postgres/schemaChanges.sql
💡 Quality: Duplicated parseEntityStatusValues method across classes

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ListFilter.java:165-176 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/GlossaryTermRepository.java:1632-1643

The parseEntityStatusValues method is duplicated identically in two places:

  1. ListFilter.java (lines 165-176)
  2. GlossaryTermRepository.java (lines 1632-1643)

Both methods have the exact same logic:

  • Check for null/empty input
  • Validate against EntityStatus enum values
  • Split comma-separated values, trim, and filter valid statuses

This violates DRY (Don't Repeat Yourself) principle and creates maintenance burden - if the parsing logic needs to change, it must be updated in multiple places.

Suggested fix: Extract this common method to a shared utility class (e.g., EntityStatusUtils) that both classes can reference. This would centralize the validation logic and make future changes easier to maintain.

✅ 19 resolved
Bug: Comment typo: migration version 1.1.7 should be 1.11.7

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:3871
The comment in CollectionDAO.java on line 3871 references migration version 1.1.7 but the actual migration files are located in the 1.11.7 folder. This could confuse future developers looking for the migration that added the entityStatus column.

Current comment:

// entityStatus filtering uses generated column added in migration 1.1.7

Suggested fix:

// entityStatus filtering uses generated column added in migration 1.11.7
Bug: Comment typo: migration version 1.1.7 should be 1.11.7

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java
The comment on line 237 (in the diff context around line 240) says "entityStatus filtering uses generated column added in migration 1.1.7" but the actual migration files are in the 1.11.7 directory, not 1.1.7. This could cause confusion for developers looking for the migration.

Suggestion: Update the comment to reference "migration 1.11.7".

Security: SQL injection via string concatenation in IN clause

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ListFilter.java
The getEntityStatusCondition() method builds an SQL IN clause by string concatenation. While the code attempts to sanitize input by:

  1. Filtering against valid EntityStatus values
  2. Escaping single quotes via replace("'", "''")

This is still problematic because:

  1. The sanitization relies on matching against EntityStatus.value() strings, but if a malicious value somehow passes validation, the SQL is directly concatenated.
  2. The approach mixes parameterized queries (used elsewhere) with string concatenation, which is a security anti-pattern.

While the current validation against EntityStatus.values() provides protection, this pattern is fragile - if validation is bypassed or modified, it becomes exploitable.

Recommendation: Use parameterized queries with @BindList annotation (as done in the new searchGlossaryTermsWithStatus DAO method) instead of string concatenation. This would be consistent with the approach used in GlossaryTermRepository.searchGlossaryTermsInternal() which properly uses @BindList("entityStatusValues").

Edge Case: Missing dependency in useEffect may cause stale closures

📄 openmetadata-ui/src/main/resources/ui/src/components/Glossary/GlossaryTermTab/GlossaryTermTab.component.tsx
The useEffect hook that triggers fetch when search term or status filter changes has dependencies [searchTerm, selectedStatus] but calls fetchAllTerms() without including it in the dependency array. If fetchAllTerms uses stale closure values, this could lead to bugs.

Looking at the useEffect:

useEffect(() => {
  if (activeGlossary && previousGlossaryFQN === activeGlossary?.fullyQualifiedName) {
    setSearchPaging({ offset: 0, total: undefined, hasMore: true });
    fetchAllTerms();
  }
}, [searchTerm, selectedStatus]);

The function fetchAllTerms is referenced but not in the dependency array, and activeGlossary, previousGlossaryFQN are also used but not in dependencies.

Recommendation: Either:

  1. Add fetchAllTerms, activeGlossary, and previousGlossaryFQN to the dependency array
  2. Or wrap fetchAllTerms in useCallback with proper dependencies and include it
Quality: Duplicated status validation logic in two places

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/GlossaryTermRepository.java 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ListFilter.java
The status validation and IN clause building logic is duplicated:

  1. GlossaryTermRepository.buildEntityStatusCondition (lines 1633-1648)
  2. ListFilter.getValidatedInConditionFromString (lines 603-614)

Both methods implement nearly identical logic for:

  • Splitting comma-separated values
  • Trimming whitespace
  • Filtering against valid enum values
  • Building a quoted IN clause

Impact: Code duplication leads to maintenance burden and risk of the implementations diverging (one could have a bug fix while the other doesn't).

Suggested fix: Consolidate the validation logic into a single shared utility method. ListFilter.getValidatedInConditionFromString is already generic (accepts Class<T> enumClass). Reuse this method in GlossaryTermRepository instead of duplicating the logic.

...and 14 more resolved from earlier reviews

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs UI UI specific issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants