Support aggregation/window commands with dynamic fields by ykmr1224 · Pull Request #4743 · opensearch-project/sql

ykmr1224 · 2025-11-05T00:11:25Z

This PR is for feature branch feature/permissive

Description

Support aggregation/window commands with dynamic fields
- stats, eventstats, timechart, trendline
DebugUtils/JsonUtils are just utility class mainly for tests and debugging.

Related Issues

Permissive mode RFC: #4349
Dynamic fields RFC: #4433

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

ykmr1224 · 2025-11-07T00:36:57Z

Updated to utilize type coercion.

penghuo · 2025-11-07T16:19:08Z

+    if (!context.fieldBuilder.isFieldSpecificType(byFieldName)) {
+      throw new IllegalArgumentException(
+          String.format(
+              "By field `%s` needs to be specific type. Please cast explicitly.", byFieldName));
+    }


Can we cast to string for groupBy field?

I realized timechart requires bigger change due to type assigned to span function, which prevents automatic type coercion work properly.
Let me address this in a separate PR.

Found simpler way to solve the problem, and included the change in this PR.

penghuo · 2025-11-13T23:34:26Z

+    projectDynamicFieldAsString(node.getBinExpression(), context);
+    projectDynamicFieldAsString(node.getByField(), context);
+


is it required for all visitor?

could u add a test in CalciteDynamicFieldsTimechartIT to help understand what is correspond logical plan / sql

Added CalcitePPLDynamicFieldsTest.java‎ for spark SQL. Added explains in IT.

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

penghuo · 2025-11-17T21:54:23Z

+    verifyLogical(root, expectedLogical);
+
+    String expectedSparkSql =
+        "SELECT `id`, `name`, `_MAP`\n"


The output always include _MAP columns?
@dai-chen does it works with unified ppl in spark?

Sure, let me check.

It contains _MAP when the query does not explicitly select fields, since it should output all the dynamic fields along with static fields. (You can refer test case: testProjectStaticFields)

As I understand if we submit such SQL query on S3 table to Spark directly, the changes include at least:

Add _MAP to Spark table schema

Add result expanding logic similarly as DynamicFieldsResultProcessor.expandDynamicFields()

Do you have example for writing _MAP? I want to check if more changes required.

@dai-chen
_MAP should be automatically added to the table schema when permissive mode is enabled, or a command generate dynamic fields (like spath command without output param)

_MAP is collected here
Refer this PR for further context.

@ykmr1224 I just want to make sure I’m understanding this correctly.

Case 1: For _MAP generated from a table, do we need to update the Spark catalog to add it when permissive mode is enabled? When you say "automatically added to the table", it means current OpenSearch schema right?

Case 2: For _MAP generated dynamically by a command like spath, could you share a concrete example, including:

the PPL query, and

the Spark SQL query generated?

Since our approach is to transpile PPL into Spark SQL, I’d like to ensure that all required semantics are encoded in the SQL we generate. Otherwise, we’ll need to estimate the effort for any changes required in the Spark SQL engine.

@dai-chen
Case 1: Yes, it is added to OpenSearch schema (specifically to metadata fields). I am not sure how Spark catalog works, but I suppose we need to add _MAP to the catalog schema.

Case 2: Here is the sample SQL for ppl source=EMP | fields ENAME | spath input=ENAME

SELECT `mvappend`(`ENAME`, `JSON_EXTRACT_ALL`(`ENAME`)['ENAME']) `ENAME`, `MAP_REMOVE`(`JSON_EXTRACT_ALL`(`ENAME`), ARRAY ('ENAME')) `_MAP` FROM `scott`.`EMP`

`MAP_REMOVE`(`JSON_EXTRACT_ALL`(`ENAME`), ARRAY ('ENAME')) `_MAP` is where _MAP is assigned. (MAP_REMOVE is to dedupe the fields in static field)

Yes, case 1 may need some changes and we can focus on case 2. Posted the Spark SQL query generated in my understanding.

# Test data search source=test_events; 25/11/19 11:10:06 WARN UnifiedQueryParser: PPL translated to Spark SQL: SELECT * FROM `spark_catalog`.`default`.`test_events` @timestamp host packets message 2025-09-08 10:00:00 server1 60 {"category":1, "resource":"A"} 2025-09-08 10:01:00 server1 120 {"category":2, "resource":"B"} 2025-09-08 10:02:00 server1 60 {"category":3, "resource":"C"} 2025-09-08 10:02:30 server2 180 {"category":4, "resource":"D"} # PPL query # source=test_events | spath input=message | eval cat = abs(category) * 10 # Spark SQL query expected spark-sql (default)> > SELECT > ABS(TRY_CAST(`_MAP`['category'] AS INT) * 10) AS `cat` > FROM ( > SELECT `JSON_EXTRACT_ALL`(`message`) AS `_MAP` > FROM `test_events` > ); line 2:14 missing ')' at '(' cat 10 20 30 40

If this is correct, the only question is expand logic in DynamicFieldsResultProcessor.expandDynamicFields()

dai-chen · 2025-11-18T17:00:44Z

+
+    JSONObject result = executeQuery(query);
+
+    assertExplainYaml(


Can we only assert the part we're interested in?

I've added this per request from @penghuo to add explain verification, and I think it is better keeping whole part to detect when plan is changed.
I would migrate it to separate file once I merge the change and enabled permissive mode in main branch. (it is currently enabled only in integration test and cannot use same test base class)

dai-chen · 2025-11-18T17:09:08Z

+    verifyLogical(root, expectedLogical);
+
+    String expectedSparkSql =
+        "SELECT `id`, `name`, `_MAP`\n"


As I understand if we submit such SQL query on S3 table to Spark directly, the changes include at least:

Add _MAP to Spark table schema

Add result expanding logic similarly as DynamicFieldsResultProcessor.expandDynamicFields()

Do you have example for writing _MAP? I want to check if more changes required.

ykmr1224 added PPL Piped processing language calcite calcite migration releated labels Nov 5, 2025

ykmr1224 marked this pull request as ready for review November 5, 2025 16:52

ykmr1224 self-assigned this Nov 5, 2025

ykmr1224 added the enhancement New feature or request label Nov 5, 2025

penghuo reviewed Nov 6, 2025

View reviewed changes

ykmr1224 force-pushed the dynamic-aggregation branch 2 times, most recently from 33bab3f to 6b2e491 Compare November 7, 2025 00:27

penghuo reviewed Nov 7, 2025

View reviewed changes

ykmr1224 force-pushed the dynamic-aggregation branch from 44e2d10 to 0e2d036 Compare November 12, 2025 00:10

penghuo reviewed Nov 13, 2025

View reviewed changes

ykmr1224 added 9 commits November 14, 2025 16:00

Fix aggregation for dynamic fields

ed37afb

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

Address comments

f49b645

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

Utilize coercion

bb5e543

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

Fix timechart and trendline

910a1b4

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

minor fix

07d1cf8

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

Minor refactoring

5e4f6bc

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

Add tests for spark sql verification

89c1b86

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

Add comment

0e191dd

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

Add explain verification

d552010

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

ykmr1224 force-pushed the dynamic-aggregation branch from d6acee2 to d552010 Compare November 15, 2025 00:00

penghuo reviewed Nov 17, 2025

View reviewed changes

dai-chen reviewed Nov 18, 2025

View reviewed changes

penghuo approved these changes Nov 19, 2025

View reviewed changes

dai-chen approved these changes Nov 19, 2025

View reviewed changes

ykmr1224 merged commit 990346a into opensearch-project:feature/permissive Nov 19, 2025
33 of 34 checks passed

		projectDynamicFieldAsString(node.getBinExpression(), context);
		projectDynamicFieldAsString(node.getByField(), context);

Uh oh!

Conversation

ykmr1224 commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ykmr1224 commented Nov 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ykmr1224 Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dai-chen Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ykmr1224 Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ykmr1224 commented Nov 5, 2025 •

edited

Loading

ykmr1224 Nov 14, 2025 •

edited

Loading

dai-chen Nov 18, 2025 •

edited

Loading

ykmr1224 Nov 18, 2025 •

edited

Loading