-
-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark3: JOIN
clause enhancements
#2570
Spark3: JOIN
clause enhancements
#2570
Conversation
Output from running SQLFluff below but hitting a few problems. The updates to For some reason though, it works with every keyword BUT Even with disabling L031 through noqa, the unparsable section still occurs. I cannot for the life of me figure out why this is ONLY happening with > sqlfluff lint --dialect spark3 -v test/fixtures/dialects/spark3/join_types.sql
==== sqlfluff ====
sqlfluff: 0.9.4 python: 3.9.10
implementation: cpython dialect: spark3
verbosity: 1 templater: jinja
==== readout ====
=== [ path: test/fixtures/dialects/spark3/join_types.sql ] ===
CRITICAL [L031] Applying rule L031 threw an Exception: local variable 'alias_name' referenced before assignment
Traceback (most recent call last):
File "/Users/lorin.dawson/repos/sqlfluff/src/sqlfluff/core/rules/base.py", line 519, in crawl
res = self._eval(
File "/Users/lorin.dawson/repos/sqlfluff/src/sqlfluff/rules/L031.py", line 110, in _eval
self._lint_aliases_in_join(
File "/Users/lorin.dawson/repos/sqlfluff/src/sqlfluff/rules/L031.py", line 210, in _lint_aliases_in_join
and used_alias_ref.raw == alias_name
UnboundLocalError: local variable 'alias_name' referenced before assignment
CRITICAL [L031] Applying rule L031 threw an Exception: local variable 'alias_name' referenced before assignment
Traceback (most recent call last):
File "/Users/lorin.dawson/repos/sqlfluff/src/sqlfluff/core/rules/base.py", line 519, in crawl
res = self._eval(
File "/Users/lorin.dawson/repos/sqlfluff/src/sqlfluff/rules/L031.py", line 110, in _eval
self._lint_aliases_in_join(
File "/Users/lorin.dawson/repos/sqlfluff/src/sqlfluff/rules/L031.py", line 210, in _lint_aliases_in_join
and used_alias_ref.raw == alias_name
UnboundLocalError: local variable 'alias_name' referenced before assignment
== [test/fixtures/dialects/spark3/join_types.sql] FAIL
L: 28 | P: 1 | L055 | Use 'LEFT JOIN' instead of 'RIGHT JOIN'.
L: 61 | P: 1 | PRS | Line 61, Position 1: Found unparsable section: 'SEMI'
L: 61 | P: 1 | L011 | Implicit/explicit aliasing of table.
L: 61 | P: 1 | L014 | Unquoted identifiers must be consistently lower case.
L: 61 | P: 6 | L003 | Indentation not consistent with line #60
L: 61 | P: 6 | L051 | 'INNER JOIN' must be fully qualified.
L: 70 | P: 1 | L031 | Unexpected exception: local variable 'alias_name'
| referenced before assignment; Could you open an issue at
| https://github.com/sqlfluff/sqlfluff/issues ? You can
| ignore this exception for now, by adding '-- noqa: L031'
| at the end of line 70
L: 70 | P: 7 | L039 | Unnecessary whitespace found.
L: 72 | P: 1 | PRS | Line 72, Position 1: Found unparsable section: 'ANTI'
L: 72 | P: 1 | L011 | Implicit/explicit aliasing of table.
L: 72 | P: 1 | L014 | Unquoted identifiers must be consistently lower case.
L: 72 | P: 6 | L003 | Indentation not consistent with line #71
L: 72 | P: 6 | L051 | 'INNER JOIN' must be fully qualified.
L: 100 | P: 15 | L055 | Use 'LEFT JOIN' instead of 'RIGHT JOIN'.
==== summary ====
violations: 14 status: FAIL
All Finished 📜 🎉! |
It's because This is due to a mistake in the @spark3_dialect.segment(replace=True)
class AliasExpressionSegment(BaseSegment):
...
parse_grammar = StartsWith(
Sequence(
Ref.keyword("AS", optional=True),
OneOf(
# maybe table alias and column aliases
Sequence(
Ref("SingleIdentifierGrammar", optional=True),
Bracketed(Ref("SingleIdentifierListSegment")),
),
# just a table alias
Ref("SingleIdentifierGrammar"),
),
),
terminator=OneOf(
Ref("JoinTypeKeywords"),
Ref("FromClauseTerminatorGrammar"),
)
) |
This removes the unparsable section error but [L: 69, P: 1] | comment: '-- anti join'
[L: 69, P: 13] | newline: '\n'
[L: 70, P: 1] | base:
[L: 70, P: 1] | select_statement:
[L: 70, P: 1] | select_clause:
[L: 70, P: 1] | keyword: 'SELECT'
[L: 70, P: 7] | [META] indent:
[L: 70, P: 7] | whitespace: ' '
[L: 70, P: 9] | select_clause_element:
[L: 70, P: 9] | column_reference:
[L: 70, P: 9] | identifier: 'employee'
[L: 70, P: 17] | dot: '.'
[L: 70, P: 18] | identifier: 'id'
[L: 70, P: 20] | newline: '\n'
[L: 71, P: 1] | [META] dedent:
[L: 71, P: 1] | from_clause:
[L: 71, P: 1] | keyword: 'FROM'
[L: 71, P: 5] | whitespace: ' '
[L: 71, P: 6] | from_expression:
[L: 71, P: 6] | [META] indent:
[L: 71, P: 6] | from_expression_element:
[L: 71, P: 6] | table_expression:
[L: 71, P: 6] | table_reference:
[L: 71, P: 6] | identifier: 'employee'
[L: 71, P: 14] | newline: '\n'
[L: 72, P: 1] | alias_expression:
[L: 72, P: 1] | identifier: 'ANTI'
[L: 72, P: 5] | whitespace: ' '
[L: 72, P: 6] | [META] dedent:
[L: 72, P: 6] | join_clause:
[L: 72, P: 6] | keyword: 'JOIN'
[L: 72, P: 10] | [META] indent:
[L: 72, P: 10] | whitespace: ' '
[L: 72, P: 11] | from_expression_element:
[L: 72, P: 11] | table_expression:
[L: 72, P: 11] | table_reference:
[L: 72, P: 11] | identifier: 'department'
[L: 72, P: 21] | newline: '\n'
[L: 73, P: 1] | whitespace: ' '
[L: 73, P: 5] | join_on_condition:
[L: 73, P: 5] | keyword: 'ON'
[L: 73, P: 7] | [META] indent:
[L: 73, P: 7] | whitespace: ' '
[L: 73, P: 8] | expression:
[L: 73, P: 8] | column_reference:
[L: 73, P: 8] | identifier: 'employee'
[L: 73, P: 16] | dot: '.'
[L: 73, P: 17] | identifier: 'deptno'
[L: 73, P: 23] | whitespace: ' '
[L: 73, P: 24] | comparison_operator:
[L: 73, P: 24] | raw_comparison_operator: '='
[L: 73, P: 25] | whitespace: ' '
[L: 73, P: 26] | column_reference:
[L: 73, P: 26] | identifier: 'department'
[L: 73, P: 36] | dot: '.'
[L: 73, P: 37] | identifier: 'deptno'
[L: 73, P: 43] | [META] dedent:
[L: 73, P: 43] | [META] dedent:
[L: 73, P: 43] | statement_terminator: ';' Sounds like a simple fix would be to add these to the reserved keyword list but they are not technically reserved keywords. I had a previous iteration on I'm still struggling to understand why these are being swallowed as table aliases if they're included in the |
Codecov Report
@@ Coverage Diff @@
## main #2570 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 163 163
Lines 11862 11863 +1
=========================================
+ Hits 11862 11863 +1
Continue to review full report at Codecov.
|
I see a couple of issues:
I think the easiest thing is to change the Alias expression to this: type = "alias_expression"
match_grammar = Sequence(
Ref.keyword("AS", optional=True),
OneOf(
# maybe table alias and column aliases
Sequence(
Ref("SingleIdentifierGrammar", optional=True),
Bracketed(Ref("SingleIdentifierListSegment")),
),
# just a table alias
Ref("SingleIdentifierGrammar"),
exclude=Ref("JoinTypeKeywords"),
),
) That is, add an explicit And just lose the That seems to work for me. |
This worked like a charm! Thank you I'll clean everything up and take it out of Draft once all checks pass. |
JOIN
clause enhancementsJOIN
clause enhancements
Brief summary of the change made
Enhancements to joins in Spark3
Are there any other side effects of this change that we should be aware of?
N/A for now; will update this section once ready for review.
Pull Request checklist
Please confirm you have completed any of the necessary steps below.
Included test cases to demonstrate any code changes, which may be one or more of the following:
.yml
rule test cases intest/fixtures/rules/std_rule_cases
..sql
/.yml
parser test cases intest/fixtures/dialects
(note YML files can be auto generated withtox -e generate-fixture-yml
).test/fixtures/linter/autofix
.Added appropriate documentation for the change.
Created GitHub issues for any relevant followup/future enhancements if appropriate.