Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark3 join types #1942

Merged
merged 6 commits into from Nov 19, 2021
Merged

Spark3 join types #1942

merged 6 commits into from Nov 19, 2021

Conversation

jpy-git
Copy link
Contributor

@jpy-git jpy-git commented Nov 19, 2021

Brief summary of the change made

This PR aims to add [LEFT] ANTI and [LEFT] SEMI joins to the Spark3 dialect. Fixes #1933.

Are there any other side effects of this change that we should be aware of?

No

Pull Request checklist

  • Please confirm you have completed any of the necessary steps below.

  • Included test cases to demonstrate any code changes, which may be one or more of the following:

    • .yml rule test cases in test/fixtures/rules/std_rule_cases.
    • .sql/.yml parser test cases in test/fixtures/dialects (note YML files can be auto generated with python test/generate_parse_fixture_yml.py or by running tox locally).
    • Full autofix test cases in test/fixtures/linter/autofix.
    • Other.
  • Added appropriate documentation for the change.

  • Created GitHub issues for any relevant followup/future enhancements if appropriate.

@codecov
Copy link

codecov bot commented Nov 19, 2021

Codecov Report

Merging #1942 (16022f5) into main (d3c40ff) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##              main     #1942   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          147       147           
  Lines        10214     10219    +5     
=========================================
+ Hits         10214     10219    +5     
Impacted Files Coverage Δ
src/sqlfluff/dialects/dialect_spark3.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d3c40ff...16022f5. Read the comment docs.



@spark3_dialect.segment(replace=True)
class JoinClauseSegment(BaseSegment):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just do this:

spark3_dialect.replace(
    JoinKeywords=Sequence(OneOf("SEMI", "ANTI", optional=True), "JOIN"),
)
`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tunetheweb looking in ansi JoinKeywords is just the word JOIN,the JOIN types (LEFT, RIGHT, INNER etc) are the same as I've done here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think adding those to JOIN keywords would allow things like CROSS SEMI JOIN which isn't possible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but you don’t need to override the JoinClauseSegment at all - just the JoinKeywords So your 80 line change becomes just the above 3 lines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think adding those to JOIN keywords would allow things like CROSS SEMI JOIN which isn't possible

That is true. Your change is more technically accurate since it won’t allow that.

On the other hand, none of our dialects are 100% accurate to the SQL syntax and how likely is it above is used?

To me the point of implementing the parse is so we can implement rules that depend on understanding a SQL component rather than necessarily being 100% accurate to to the syntax. We should look to enforce coding standards and pick up silent syntax errors where ever possible. That SQL would not run so will be picked up. Yes it would be nice if linter picked up these things but I don’t think it’s essential.

The question is, whether the simple 3 line fix over the 80 line one, is simpler, sufficient and the downsides are acceptable? Or do we go the whole copying route and duplicate the code?

I could go either way to be honest. Obviously we override lots of things so another one isn’t an issue. Just saying I probably would have gone the simpler route myself if I’d done this. But maybe I’m just lazy 😀

Copy link
Contributor Author

@jpy-git jpy-git Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for me given it's more accurate to the true syntax and also consistent with the placement of the other join types the 80 lines is fine with me.
Remember you may want to add other non-ansi segments (e.g. natural joins) to this segment in the future so I think it pays to be explicit with the syntax when we can.
I tend not to worry too much about low line count for the sake of it.
"Explicit is better than implicit" 😄

@tunetheweb tunetheweb merged commit 2d88f8a into sqlfluff:main Nov 19, 2021
@jpy-git jpy-git deleted the spark3_join_types branch November 19, 2021 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spark3 semi/anti join not parsing optional "left" token
3 participants