Spark3 join types #1942

jpy-git · 2021-11-19T12:22:24Z

Brief summary of the change made

This PR aims to add [LEFT] ANTI and [LEFT] SEMI joins to the Spark3 dialect. Fixes #1933.

Are there any other side effects of this change that we should be aware of?

No

Pull Request checklist

Please confirm you have completed any of the necessary steps below.
Included test cases to demonstrate any code changes, which may be one or more of the following:
- .yml rule test cases in test/fixtures/rules/std_rule_cases.
- .sql/.yml parser test cases in test/fixtures/dialects (note YML files can be auto generated with python test/generate_parse_fixture_yml.py or by running tox locally).
- Full autofix test cases in test/fixtures/linter/autofix.
- Other.
Added appropriate documentation for the change.
Created GitHub issues for any relevant followup/future enhancements if appropriate.

codecov · 2021-11-19T12:42:05Z

Codecov Report

Merging #1942 (16022f5) into main (d3c40ff) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main     #1942   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          147       147           
  Lines        10214     10219    +5     
=========================================
+ Hits         10214     10219    +5

Impacted Files	Coverage Δ
src/sqlfluff/dialects/dialect_spark3.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d3c40ff...16022f5. Read the comment docs.

src/sqlfluff/dialects/dialect_spark3.py

tunetheweb · 2021-11-19T18:21:55Z

src/sqlfluff/dialects/dialect_spark3.py

+
+
+@spark3_dialect.segment(replace=True)
+class JoinClauseSegment(BaseSegment):


Can't you just do this:

spark3_dialect.replace( JoinKeywords=Sequence(OneOf("SEMI", "ANTI", optional=True), "JOIN"), ) `

@tunetheweb looking in ansi JoinKeywords is just the word JOIN,the JOIN types (LEFT, RIGHT, INNER etc) are the same as I've done here?

Think adding those to JOIN keywords would allow things like CROSS SEMI JOIN which isn't possible

Yes but you don’t need to override the JoinClauseSegment at all - just the JoinKeywords So your 80 line change becomes just the above 3 lines.

Think adding those to JOIN keywords would allow things like CROSS SEMI JOIN which isn't possible

That is true. Your change is more technically accurate since it won’t allow that.

On the other hand, none of our dialects are 100% accurate to the SQL syntax and how likely is it above is used?

To me the point of implementing the parse is so we can implement rules that depend on understanding a SQL component rather than necessarily being 100% accurate to to the syntax. We should look to enforce coding standards and pick up silent syntax errors where ever possible. That SQL would not run so will be picked up. Yes it would be nice if linter picked up these things but I don’t think it’s essential.

The question is, whether the simple 3 line fix over the 80 line one, is simpler, sufficient and the downsides are acceptable? Or do we go the whole copying route and duplicate the code?

I could go either way to be honest. Obviously we override lots of things so another one isn’t an issue. Just saying I probably would have gone the simpler route myself if I’d done this. But maybe I’m just lazy 😀

I guess for me given it's more accurate to the true syntax and also consistent with the placement of the other join types the 80 lines is fine with me.
Remember you may want to add other non-ansi segments (e.g. natural joins) to this segment in the future so I think it pays to be explicit with the syntax when we can.
I tend not to worry too much about low line count for the sake of it.
"Explicit is better than implicit" 😄

jpy-git and others added 3 commits November 19, 2021 00:00

Add Spark3 join types

bc61ad1

black format and linting

2104ad7

Merge branch 'main' into spark3_join_types

4ba8152

WittierDinosaur reviewed Nov 19, 2021

View reviewed changes

src/sqlfluff/dialects/dialect_spark3.py Outdated Show resolved Hide resolved

jpy-git and others added 3 commits November 19, 2021 15:21

Merge branch 'main' into spark3_join_types

6f8fde8

replace get_eventual_alias with method from ansi

668c34a

Merge branch 'main' into spark3_join_types

16022f5

tunetheweb reviewed Nov 19, 2021

View reviewed changes

tunetheweb approved these changes Nov 19, 2021

View reviewed changes

tunetheweb merged commit 2d88f8a into sqlfluff:main Nov 19, 2021

jpy-git deleted the spark3_join_types branch November 19, 2021 23:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark3 join types #1942

Spark3 join types #1942

jpy-git commented Nov 19, 2021

codecov bot commented Nov 19, 2021 •

edited

tunetheweb Nov 19, 2021

jpy-git Nov 19, 2021

jpy-git Nov 19, 2021

tunetheweb Nov 19, 2021

tunetheweb Nov 19, 2021

jpy-git Nov 19, 2021 •

edited



		@spark3_dialect.segment(replace=True)
		class JoinClauseSegment(BaseSegment):

Spark3 join types #1942

Spark3 join types #1942

Conversation

jpy-git commented Nov 19, 2021

Brief summary of the change made

Are there any other side effects of this change that we should be aware of?

Pull Request checklist

codecov bot commented Nov 19, 2021 • edited

Codecov Report

tunetheweb Nov 19, 2021

Choose a reason for hiding this comment

jpy-git Nov 19, 2021

Choose a reason for hiding this comment

jpy-git Nov 19, 2021

Choose a reason for hiding this comment

tunetheweb Nov 19, 2021

Choose a reason for hiding this comment

tunetheweb Nov 19, 2021

Choose a reason for hiding this comment

jpy-git Nov 19, 2021 • edited

Choose a reason for hiding this comment

codecov bot commented Nov 19, 2021 •

edited

jpy-git Nov 19, 2021 •

edited