Skip to content

feat(test): add WHERE-prefix view infrastructure for cross-command testing (#5505)#5508

Open
gingeekrishna wants to merge 3 commits into
opensearch-project:mainfrom
gingeekrishna:feat/5505-where-view-test-infrastructure
Open

feat(test): add WHERE-prefix view infrastructure for cross-command testing (#5505)#5508
gingeekrishna wants to merge 3 commits into
opensearch-project:mainfrom
gingeekrishna:feat/5505-where-view-test-infrastructure

Conversation

@gingeekrishna

@gingeekrishna gingeekrishna commented Jun 3, 2026

Copy link
Copy Markdown

Part of #5505

What does this PR do?

Implements phase 1 of the RFC: foundational infrastructure for WHERE-prefix view testing, with a proof-of-concept parametrization of FieldsCommandIT.

The core idea: extend the top test indices with a _is_real boolean flag, then run the same test assertions through:

source=<extended_index> | where _is_real | fields - _is_real | <original query>

This catches bugs where commands fail only when preceded by a where filter — a gap identified in issues like #5482 and #5483.

Changes

New extended test datasets

  • accounts_extended.json — 1000 real account records (_is_real: true) + 3 synthetic records (_is_real: false)
  • bank_extended.json — 7 real bank records (_is_real: true) + 3 synthetic records (_is_real: false)
  • Corresponding index mappings with _is_real: boolean

Infrastructure

File Change
TestsConstants.java TEST_INDEX_ACCOUNT_EXTENDED, TEST_INDEX_BANK_EXTENDED constants
TestUtils.java getAccountExtendedIndexMapping(), getBankExtendedIndexMapping() helpers
SQLIntegTestCase.Index ACCOUNT_EXTENDED, BANK_EXTENDED enum entries
PPLIntegTestCase.java sourceView(extendedIndex, query) helper method

Proof-of-concept: FieldsCommandIT

Three tests converted to @ParameterizedTest running on both the direct index and the WHERE-prefix view:

  • testBasicFieldSelection (schema assertions)
  • testMultipleFieldSelection (row assertions with head 3)
  • testSpecialDataTypes (all 7 bank birthdate rows)

Why fields - fieldname works without Calcite

hasEnhancedFieldFeatures() in AstBuilder.java gates only wildcards and space-delimited fields — not the MINUS/exclusion form — so the view chain is safe in the default (non-Calcite) test mode.

…sting

Introduces the foundational infrastructure proposed in RFC opensearch-project#5505 to catch
bugs where PPL commands break when preceded by a `where` filter — a common
gap since individual-command tests pass but combined queries fail.

## What changed

### New extended test indices
- `accounts_extended.json` / `bank_extended.json`: copies of the top two
  test datasets with `"_is_real":true` on every real record plus 3 synthetic
  records with `"_is_real":false` that should never appear in results
- Corresponding index mapping files that add a `_is_real` boolean field
  alongside the existing schema

### Infrastructure additions
- `TestsConstants`: new `TEST_INDEX_ACCOUNT_EXTENDED` and
  `TEST_INDEX_BANK_EXTENDED` constants
- `TestUtils`: `getAccountExtendedIndexMapping()` and
  `getBankExtendedIndexMapping()` mapping helpers
- `SQLIntegTestCase.Index`: `ACCOUNT_EXTENDED` and `BANK_EXTENDED` enum
  entries wiring constants → mapping → data file
- `PPLIntegTestCase.sourceView(extendedIndex, query)`: helper that builds
  the view chain
  `source=<ext> | where _is_real | fields - _is_real | <query>`

### Proof-of-concept parametrization in `FieldsCommandIT`
`testBasicFieldSelection`, `testMultipleFieldSelection`, and
`testSpecialDataTypes` are converted to `@ParameterizedTest` tests that
run twice: once directly against the base index, and once through the
WHERE-prefix view of the extended index. The assertions are identical —
the fake `_is_real=false` rows are filtered before reaching the user query.

## Why `fields - fieldname` works without Calcite
`hasEnhancedFieldFeatures` (AstBuilder.java) gates only wildcards and
space-delimited fields — not the MINUS/exclusion form — so the view chain
is safe to use in the default (non-Calcite) test mode.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds “extended” versions of the Account and Bank test indices (with an _is_real flag) and updates PPL integration tests to run the same assertions against both the base indices and a where _is_real-prefixed view to better catch multi-command chain issues.

Changes:

  • Added extended index mappings and NDJSON datasets for account and bank.
  • Introduced constants and index enum entries for the new extended indices.
  • Parameterized FieldsCommandIT to run tests against both direct sources and a WHERE-prefixed view that filters out synthetic rows.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
integ-test/src/test/resources/indexDefinitions/bank_extended_index_mapping.json New mapping for extended bank index including _is_real and extra fields used by extended dataset.
integ-test/src/test/resources/indexDefinitions/account_extended_index_mapping.json New mapping for extended account index including _is_real and fielddata/keyword subfields for test use.
integ-test/src/test/resources/bank_extended.json New NDJSON bulk dataset for the extended bank index with real + synthetic rows.
integ-test/src/test/resources/accounts_extended.json New large NDJSON bulk dataset for the extended account index with real + synthetic rows.
integ-test/src/test/java/org/opensearch/sql/ppl/PPLIntegTestCase.java Adds a helper to build a WHERE-prefixed “view” source over an extended index.
integ-test/src/test/java/org/opensearch/sql/ppl/FieldsCommandIT.java Parameterizes tests to run against both base indices and an extended-index filtered view.
integ-test/src/test/java/org/opensearch/sql/legacy/TestsConstants.java Adds constants for extended index names.
integ-test/src/test/java/org/opensearch/sql/legacy/TestUtils.java Adds mapping loaders for the new extended index mapping files.
integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java Registers ACCOUNT_EXTENDED and BANK_EXTENDED in the Index enum for test setup/loading.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread integ-test/src/test/resources/accounts_extended.json Outdated
Comment thread integ-test/src/test/resources/bank_extended.json
Comment thread integ-test/src/test/java/org/opensearch/sql/ppl/FieldsCommandIT.java Outdated
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

(Review updated until commit 8bd33a8)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The sourceView method is public but unused in this PR. The actual parameterized tests call sourceViews (plural), which internally calls sourceView. If sourceView is intended as a public API for future use, this is fine. However, if it was meant to be used directly in FieldsCommandIT, the tests are not exercising it as a standalone method, only indirectly through sourceViews.

protected static String sourceView(String extendedIndex) {
  return String.format("source=%s | where _is_real | fields - _is_real", extendedIndex);
}
Data Inconsistency

Line 8 uses a numeric timestamp (1542152000000) for birthdate while all other records use string dates like "2017-10-23". This inconsistency could cause issues if the test logic or mapping expects uniform date formats. The original bank.json likely used strings consistently, so this deviation may trigger unexpected behavior in date parsing or comparison operations.

{"account_number":18,"balance":4180,"firstname":"Dale","lastname":"Adams","age":33,"gender":"M","address":"467 Hutchinson Court","employer":"Boink","email":"daleadams@boink.com","city":"Orick","state":"MD","male":true,"birthdate":1542152000000,"_is_real":true}

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Latest suggestions up to 8bd33a8
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix parameter mismatch with documentation

The sourceView method receives extendedIndex as a parameter but the Javadoc states
it should receive the base index and look up the extended index from INDEX_VIEWS.
Either update the implementation to accept baseIndex and perform the lookup, or
correct the Javadoc to match the current signature.

integ-test/src/test/java/org/opensearch/sql/ppl/PPLIntegTestCase.java [170-172]

-protected static String sourceView(String extendedIndex) {
+protected static String sourceView(String baseIndex) {
+  String extendedIndex = INDEX_VIEWS.get(baseIndex);
+  if (extendedIndex == null) {
+    throw new IllegalArgumentException("No extended index found for: " + baseIndex);
+  }
   return String.format("source=%s | where _is_real | fields - _is_real", extendedIndex);
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion identifies a documentation-implementation mismatch. However, the current implementation appears intentional: sourceView takes an extendedIndex directly, while sourceViews performs the lookup. The Javadoc could be clearer, but changing the implementation would require updating the caller in sourceViews as well. This is a minor documentation clarity issue rather than a functional bug.

Low

Previous suggestions

Suggestions up to commit 006643b
CategorySuggestion                                                                                                                                    Impact
General
Remove redundant field exclusion clause

The fields - _is_real removal is redundant because the test already calls fields
firstname, lastname which explicitly selects only those columns. The _is_real field
won't appear in the final result anyway. Remove the fields - _is_real clause to
simplify the query and avoid unnecessary processing.

integ-test/src/test/java/org/opensearch/sql/ppl/FieldsCommandIT.java [42-48]

 static Stream<Arguments> basicFieldSelectionSources() {
   return Stream.of(
       Arguments.of("direct", String.format("source=%s", TEST_INDEX_ACCOUNT)),
       Arguments.of(
           "where-prefix-view",
           String.format(
-              "source=%s | where _is_real | fields - _is_real", TEST_INDEX_ACCOUNT_EXTENDED)));
+              "source=%s | where _is_real", TEST_INDEX_ACCOUNT_EXTENDED)));
 }
Suggestion importance[1-10]: 3

__

Why: While the suggestion correctly identifies that fields - _is_real may seem redundant when followed by explicit field selection, it overlooks the purpose of the WHERE-prefix view pattern. The fields - _is_real clause ensures the synthetic _is_real field is removed before the actual test query runs, maintaining consistency with the base index schema. However, the suggestion has merit in that explicit field selection would override this, making it somewhat redundant in practice.

Low

- Fix UTF-8 BOM in accounts_extended.json: PowerShell Out-File added a
  BOM (0xEF 0xBB 0xBF) which breaks bulk NDJSON ingestion; regenerated
  with UTF8Encoding(false) so the file starts with the literal {
- Deduplicate @MethodSource providers: basicFieldSelectionSources and
  multipleFieldSelectionSources were identical; replaced both with a
  shared accountIndexSources() stream; added bankIndexSources() for bank
  tests; both reuse the sourceView() helper
- Remove unused label parameter: dropped the label string from
  Arguments.of() and test method signatures; the display name now shows
  the full query source string via @ParameterizedTest(name="querySource={0}")
- Add sourceView(String) overload to PPLIntegTestCase: the single-arg
  form returns the view prefix without a trailing pipe so callers can
  append commands with " | <command>"; the two-arg form delegates to it

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit ec626fb

@Swiddis Swiddis added infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc. PPL Piped processing language labels Jun 8, 2026

@Swiddis Swiddis left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Comment thread integ-test/src/test/java/org/opensearch/sql/ppl/PPLIntegTestCase.java Outdated
Comment thread integ-test/src/test/java/org/opensearch/sql/ppl/FieldsCommandIT.java Outdated
Comment thread integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java Outdated
…dex type names

- Add INDEX_VIEWS map and sourceViews(baseIndex) to PPLIntegTestCase so
  new views only need to be registered in one place
- Remove unused sourceView(extendedIndex, query) overload
- Update FieldsCommandIT to delegate to sourceViews() instead of
  building streams inline
- Fix ACCOUNT_EXTENDED and BANK_EXTENDED type names to avoid collisions
  with ACCOUNT ("account") and BANK ("account")
@gingeekrishna

gingeekrishna commented Jun 8, 2026

Copy link
Copy Markdown
Author

Thanks for the review @Swiddis! Addressed all four comments in the latest commit:

  • Removed the unused sourceView(extendedIndex, query) overload
  • Centralized stream generation into PPLIntegTestCase.sourceViews(baseIndex) backed by an INDEX_VIEWS map — adding new views in future only requires updating that one map
  • Updated FieldsCommandIT source methods to delegate to sourceViews()
  • Fixed ACCOUNT_EXTENDED and BANK_EXTENDED type names ("account_extended" and "bank_extended" respectively) to avoid collision with existing entries

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 8bd33a8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc. PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants