Skip to content

fix: use SubstraitTypeSystem when parsing SQL#350

Merged
vbarua merged 2 commits into
mainfrom
vbarua/use-substrait-type-sytem-for-sql-parsing
Mar 21, 2025
Merged

fix: use SubstraitTypeSystem when parsing SQL#350
vbarua merged 2 commits into
mainfrom
vbarua/use-substrait-type-sytem-for-sql-parsing

Conversation

@vbarua
Copy link
Copy Markdown
Member

@vbarua vbarua commented Mar 21, 2025

BREAKING CHANGE: SQL parsing no longer uses the default Calcite type system

@vbarua vbarua force-pushed the vbarua/use-substrait-type-sytem-for-sql-parsing branch from 99bbb4a to f316f5a Compare March 21, 2025 00:23
BREAKING CHANGE: default Calcite type system is no longer used

public class SubstraitTypeSystem extends RelDataTypeSystemImpl {
static final org.slf4j.Logger logger =
org.slf4j.LoggerFactory.getLogger(SubstraitTypeSystem.class);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup: opportunistic removal of unused logger

Rel pojo1 = SubstraitRelVisitor.convert(relRoot, EXTENSION_COLLECTION);

// 2. substrait rel -> Calcite Rel
RelNode relnodeRoot = new SubstraitToSql().substraitRelToCalciteRel(pojoRel, creates);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SqlToSubstrait conversion uses the schema as encoded by the creates, which is a list of SQL CREATE statements. These are also used when parsing the original query above.

Using the SubstraitToCalcite convert method, whichs generates a Calcite schema from the given Substrait plan, works as an extra check to make sure that the schema was encoded correctly into the plan, instead of just assuming the same schema.

Switching to SubstraitToCalcite here also exposed the bug that I'm fixing, because SubstraitToCalcite uses the SubstraitTypeSystem, while SqlToCalcite uses the default Calcite type system which has a low maximum precision.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering due to my recent PR for converting Substrait's Plan.Root to Calcite's RelRoot #339 whether the even more complete test would be to compare the Calcite RelRoot after SQL parsing to the the Calcite RelRoot generated from Substrait's Plan.Root.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be, but at the moment the SubraitRelVisitor#convert(RelRoot root, ...) method currently generates a Substrait Rel from the Calcite RelRoot.

Updating that conversion to generate a Plan.Root would make sense, but I would consider it beyond the scope of this.

@vbarua
Copy link
Copy Markdown
Member Author

vbarua commented Mar 21, 2025

Was doing some work related to SQL parsing when I noticed some weirdness around decimals. Specifically, some tests failed because the precision between expected and actual results didn't line up. The number 19 kept showing up, and once I figured out that was the default Calcite precision I realized that we weren't using the SubstraitTypeSystem when parsing SQL which has different limits.

@vbarua vbarua marked this pull request as ready for review March 21, 2025 00:36
@vbarua vbarua merged commit 128d497 into main Mar 21, 2025
@vbarua vbarua deleted the vbarua/use-substrait-type-sytem-for-sql-parsing branch March 21, 2025 17:47
ingomueller-net added a commit to ingomueller-net/substrait-consumer-testing that referenced this pull request Mar 24, 2025
This is the latest version that runs with just one fix. The breaking
change until there is substrait-io/substrait-java#350, which changes
which types are used in some places. Consequently, some of the plans
generated by Isthmus change: in several occasions, the precision and/or
scale of decimal types changes and, in one instance, a new cast to a
decimal type with a different precision is inserted.

Signed-off-by: Ingo Müller <ingomueller@google.com>
ingomueller-net added a commit to substrait-io/consumer-testing that referenced this pull request Mar 27, 2025
This is the latest version that runs with just one fix. The breaking
change until there is substrait-io/substrait-java#350, which changes
which types are used in some places. Consequently, some of the plans
generated by Isthmus change: in several occasions, the precision and/or
scale of decimal types changes and, in one instance, a new cast to a
decimal type with a different precision is inserted.

Signed-off-by: Ingo Müller <ingomueller@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants