New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark3 join types #1942
Spark3 join types #1942
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1942 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 147 147
Lines 10214 10219 +5
=========================================
+ Hits 10214 10219 +5
Continue to review full report at Codecov.
|
|
||
|
||
@spark3_dialect.segment(replace=True) | ||
class JoinClauseSegment(BaseSegment): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just do this:
spark3_dialect.replace(
JoinKeywords=Sequence(OneOf("SEMI", "ANTI", optional=True), "JOIN"),
)
`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tunetheweb looking in ansi JoinKeywords is just the word JOIN,the JOIN types (LEFT, RIGHT, INNER etc) are the same as I've done here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think adding those to JOIN keywords would allow things like CROSS SEMI JOIN which isn't possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but you don’t need to override the JoinClauseSegment
at all - just the JoinKeywords
So your 80 line change becomes just the above 3 lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think adding those to JOIN keywords would allow things like CROSS SEMI JOIN which isn't possible
That is true. Your change is more technically accurate since it won’t allow that.
On the other hand, none of our dialects are 100% accurate to the SQL syntax and how likely is it above is used?
To me the point of implementing the parse is so we can implement rules that depend on understanding a SQL component rather than necessarily being 100% accurate to to the syntax. We should look to enforce coding standards and pick up silent syntax errors where ever possible. That SQL would not run so will be picked up. Yes it would be nice if linter picked up these things but I don’t think it’s essential.
The question is, whether the simple 3 line fix over the 80 line one, is simpler, sufficient and the downsides are acceptable? Or do we go the whole copying route and duplicate the code?
I could go either way to be honest. Obviously we override lots of things so another one isn’t an issue. Just saying I probably would have gone the simpler route myself if I’d done this. But maybe I’m just lazy 😀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess for me given it's more accurate to the true syntax and also consistent with the placement of the other join types the 80 lines is fine with me.
Remember you may want to add other non-ansi segments (e.g. natural joins) to this segment in the future so I think it pays to be explicit with the syntax when we can.
I tend not to worry too much about low line count for the sake of it.
"Explicit is better than implicit" 😄
Brief summary of the change made
This PR aims to add
[LEFT] ANTI
and[LEFT] SEMI
joins to the Spark3 dialect. Fixes #1933.Are there any other side effects of this change that we should be aware of?
No
Pull Request checklist
Please confirm you have completed any of the necessary steps below.
Included test cases to demonstrate any code changes, which may be one or more of the following:
.yml
rule test cases intest/fixtures/rules/std_rule_cases
..sql
/.yml
parser test cases intest/fixtures/dialects
(note YML files can be auto generated withpython test/generate_parse_fixture_yml.py
or by runningtox
locally).test/fixtures/linter/autofix
.Added appropriate documentation for the change.
Created GitHub issues for any relevant followup/future enhancements if appropriate.