-
-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark3: parse some VALUES clauses #2245
Conversation
test/dialects/spark3_grammar_test.py
Outdated
logger = logging.getLogger() | ||
logger.addHandler(logging.StreamHandler()) | ||
for _, _, message in caplog.record_tuples: | ||
logger.warning(message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for my debugging purposes.
i wanted to ask, is there a good place that explains how to read the detailed parsing output?
just by staring at it i could see it was only picking up part of from values 1,2
as the FromExpressionSegment
, but beyond that i was having trouble reading it:
[PD:0 MD:1 ] .None Ref.match OUT + [match=<MatchResult 1/9: 'from'>, seg="'from values 1 , 2'"]
[PD:0 MD:2 ] ..None Delim._look_ahead_match IN [ls=7, seg='values 1 , 2']
[PD:0 MD:2 ] ..None Delim._look_ahead_match SI [mq=[], sb=['VALUES', '', '1', '', ',', '', '2']]
[PD:0 MD:2 ] ..None Ref.match OUT + [match=<MatchResult 1/3: ','>, seg="', 2'"]
[PD:0 MD:2 ] ..None Delim._look_ahead_match SC [bsm=(4, 1, <Ref: CommaSegment>)]
[PD:0 MD:4 ] ....FromExpressionSegment OneOf.match PRN [ns=1, ps=1, ms=0, pruned=[<Ref: MLTableExpressionSegment>], opts=[<Ref: FromExpressionElementSegment>]]
[PD:0 MD:7 ] .......FromExpressionElementSegment Ref.match OUT [match=<MatchResult 0/3: ''>, seg="'values 1'"]
[PD:0 MD:7 ] .......FromExpressionElementSegment Optio.match PRN [ns=1, ps=1, ms=0, pruned=[<Bracketed: [<Ref: TableExpressionSegment>]>], opts=[<Ref: TableExpressionSegment>]]
[PD:0 MD:9 ] .........TableExpressionSegment OneOf.match PRN [ns=3, ps=1, ms=1, pruned=[<Bracketed: [<Ref: SelectableGrammar>]>], opts=[<Ref: ValuesClauseSegment>, <Ref: BareFunctionSegment>, <Ref: FunctionSegment>, <Ref: TableReferenceSegment>]]
[PD:0 MD:12] ............ValuesClauseSegment Ref.match OUT + [match=<MatchResult 1/3: 'values'>, seg="'values 1'"]
[PD:0 MD:11] ...........ValuesClauseSegment Start.match OUT ++ [match=<MatchResult 3/3: 'values 1'>, seg="'values 1'"]
[PD:0 MD:10] ..........ValuesClauseSegment Value.match OUT ++ [match=<MatchResult 1/1: 'values 1'>, seg="'values 1'"]
[PD:0 MD:10] ..........TableExpressionSegment Ref.match OUT ++ [match=<MatchResult 1/1: 'values 1'>, seg="'values 1'"]
[PD:0 MD:9 ] .........TableExpressionSegment OneOf.match OUT ++ [match=<MatchResult 1/1: 'values 1'>, seg="'values 1'"]
[PD:0 MD:8 ] ........TableExpressionSegment Table.match OUT ++ [match=<MatchResult 1/1: 'values 1'>, seg="'values 1'"]
[PD:0 MD:8 ] ........FromExpressionElementSegment Ref.match OUT ++ [match=<MatchResult 1/1: 'values 1'>, seg="'values 1'"]
[PD:0 MD:7 ] .......FromExpressionElementSegment Optio.match OUT ++ [match=<MatchResult 1/1: 'values 1'>, seg="'values 1'"]
[PD:0 MD:6 ] ......FromExpressionElementSegment Seque.match OUT ++ [match=<MatchResult 1/1: 'values 1'>, seg="'values 1'"]
[PD:0 MD:5 ] .....FromExpressionElementSegment FromE.match OUT ++ [match=<MatchResult 1/1: 'values 1'>, seg="'values 1'"]
[PD:0 MD:5 ] .....FromExpressionSegment Ref.match OUT ++ [match=<MatchResult 1/1: 'values 1'>, seg="'values 1'"]
[PD:0 MD:4 ] ....FromExpressionSegment OneOf.match OUT ++ [match=<MatchResult 1/1: 'values 1'>, seg="'values 1'"]
[PD:0 MD:3 ] ...FromExpressionSegment Condi.match OUT ++ [match=<MatchResult 1/1: ''>, seg="''"]
[PD:0 MD:3 ] ...FromExpressionSegment Seque.match OUT ++ [match=<MatchResult 3/3: 'values 1'>, seg="'values 1'"]
[PD:0 MD:2 ] ..FromExpressionSegment FromE.match OUT ++ [match=<MatchResult 1/1: 'values 1'>, seg="'values 1'"]
[PD:0 MD:2 ] ..None Ref.match OUT ++ [match=<MatchResult 1/1: 'values 1'>, seg="'values 1'"]
[PD:0 MD:2 ] ..None Delim._look_ahead_match IN [ls=2, seg=' 2']
[PD:0 MD:2 ] ..None Delim._look_ahead_match SI [mq=[], sb=['', '2']]
[PD:0 MD:2 ] ..None Delim._look_ahead_match SC [bsm=None]
[PD:0 MD:4 ] ....FromExpressionSegment OneOf.match PRN [ns=1, ps=1, ms=0, pruned=[<Ref: MLTableExpressionSegment>], opts=[<Ref: FromExpressionElementSegment>]]
[PD:0 MD:7 ] .......FromExpressionElementSegment Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:7 ] .......FromExpressionElementSegment Optio.match PRN [ns=1, ps=1, ms=0, pruned=[<Bracketed: [<Ref: TableExpressionSegment>]>], opts=[<Ref: TableExpressionSegment>]]
[PD:0 MD:9 ] .........TableExpressionSegment OneOf.match PRN [ns=3, ps=2, ms=0, pruned=[<Ref: ValuesClauseSegment>, <Bracketed: [<Ref: SelectableGrammar>]>], opts=[<Ref: BareFunctionSegment>, <Ref: FunctionSegment>, <Ref: TableReferenceSegment>]]
[PD:0 MD:10] ..........TableExpressionSegment Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:11] ...........FunctionSegment OneOf.match PRN [ns=1, ps=1, ms=0, pruned=[<Sequence: [<Sequence: [<Ref: DatePartFunctionNameSe...]>], opts=[<Sequence: [<Sequence: [<AnyNumberOf: [<Ref: Functio..., <Ref: PostFunctionGrammar [opt]>]>]]
[PD:0 MD:17] .................DatePartFunctionNameSegment Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:16] ................DatePartFunctionNameSegment Seque.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:15] ...............DatePartFunctionNameSegment DateP.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:15] ...............FunctionSegment Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:14] ..............FunctionSegment AnyNu.match PRN [ns=1, ps=0, ms=0, pruned=[], opts=[<Ref: FunctionNameSegment>]]
[PD:0 MD:17] .................FunctionNameSegment AnyNu.match PRN [ns=1, ps=0, ms=0, pruned=[], opts=[<Sequence: [<Ref: SingleIdentifierGrammar>, <Ref: DotSegment>]>]]
[PD:0 MD:19] ...................SingleIdentifierGrammar OneOf.match PRN [ns=2, ps=0, ms=0, pruned=[], opts=[<Ref: NakedIdentifierSegment>, <Ref: QuotedIdentifierSegment>]]
[PD:0 MD:20] ....................SingleIdentifierGrammar Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:20] ....................SingleIdentifierGrammar Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:19] ...................SingleIdentifierGrammar OneOf.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:19] ...................FunctionNameSegment Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:18] ..................FunctionNameSegment Seque.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:17] .................FunctionNameSegment AnyNu.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:17] .................FunctionNameSegment OneOf.match PRN [ns=2, ps=0, ms=0, pruned=[], opts=[<Ref: FunctionNameIdentifierSegment>, <Ref: QuotedIdentifierSegment>]]
[PD:0 MD:18] ..................FunctionNameSegment Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:18] ..................FunctionNameSegment Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:17] .................FunctionNameSegment OneOf.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:16] ................FunctionNameSegment Seque.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:15] ...............FunctionNameSegment Funct.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:15] ...............FunctionSegment Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:14] ..............FunctionSegment AnyNu.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:13] .............FunctionSegment Seque.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:12] ............FunctionSegment Seque.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:11] ...........FunctionSegment OneOf.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:10] ..........FunctionSegment Funct.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:10] ..........TableExpressionSegment Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:12] ............TableReferenceSegment Delim._look_ahead_match IN [ls=1, seg='2']
[PD:0 MD:12] ............TableReferenceSegment Delim._look_ahead_match SI [mq=[], sb=['2']]
[PD:0 MD:12] ............TableReferenceSegment OneOf.match PRN [ns=1, ps=10, ms=0, pruned=[<Ref: OnKeywordSegment>, <Ref: AsKeywordSegment>, <Ref: UsingKeywordSegment>, <Ref: CommaSegment>, <Ref: CastOperatorSegment>, <Ref: StartSquareBracketSegment>, <Ref: StartBracketSegment>, <Ref: ColonSegment>, <Ref: DelimiterSegment>, <class 'sqlfluff.core.parser.segments.base.BracketedSegment'>], opts=[<Ref: BinaryOperatorGrammar>]]
[PD:0 MD:13] .............BinaryOperatorGrammar OneOf.match PRN [ns=1, ps=3, ms=0, pruned=[<Ref: ArithmeticBinaryOperatorGrammar>, <Ref: StringBinaryOperatorGrammar>, <Ref: BooleanBinaryOperatorGrammar>], opts=[<Ref: ComparisonOperatorGrammar>]]
[PD:0 MD:14] ..............ComparisonOperatorGrammar OneOf.match PRN [ns=1, ps=9, ms=0, pruned=[<Ref: EqualsSegment>, <Ref: EqualsSegment_a>, <Ref: EqualsSegment_b>, <Ref: GreaterThanSegment>, <Ref: LessThanSegment>, <Ref: GreaterThanOrEqualToSegment>, <Ref: LessThanOrEqualToSegment>, <Ref: NotEqualToSegment_a>, <Ref: NotEqualToSegment_b>], opts=[<Ref: LikeOperatorSegment>]]
[PD:0 MD:15] ...............ComparisonOperatorGrammar Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:14] ..............ComparisonOperatorGrammar OneOf.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:14] ..............BinaryOperatorGrammar Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:13] .............BinaryOperatorGrammar OneOf.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:13] .............TableReferenceSegment Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:12] ............TableReferenceSegment OneOf.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:12] ............TableReferenceSegment NonCo.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:12] ............SingleIdentifierGrammar OneOf.match PRN [ns=2, ps=0, ms=0, pruned=[], opts=[<Ref: NakedIdentifierSegment>, <Ref: QuotedIdentifierSegment>]]
[PD:0 MD:13] .............SingleIdentifierGrammar Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:13] .............SingleIdentifierGrammar Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:12] ............SingleIdentifierGrammar OneOf.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:12] ............TableReferenceSegment Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:11] ...........TableReferenceSegment Delim.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:10] ..........TableReferenceSegment Table.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:10] ..........TableExpressionSegment Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:9 ] .........TableExpressionSegment OneOf.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:8 ] ........TableExpressionSegment Table.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:8 ] ........FromExpressionElementSegment Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:7 ] .......FromExpressionElementSegment Optio.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:6 ] ......FromExpressionElementSegment Seque.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:5 ] .....FromExpressionElementSegment FromE.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:5 ] .....FromExpressionSegment Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:4 ] ....FromExpressionSegment OneOf.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:3 ] ...FromExpressionSegment Seque.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:2 ] ..FromExpressionSegment FromE.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:2 ] ..None Ref.match OUT [match=<MatchResult 0/1: ''>, seg="'2'"]
[PD:0 MD:1 ] .None Delim.match OUT [match=<MatchResult 0/2: ''>, seg="'values 1 , 2'"]
[PD:0 MD:0 ] None Seque.match OUT [match=<MatchResult 0/9: ''>, seg="'from values 1 , 2'"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing I'm aware of. @alanmcruickshank wrote the parser; maybe he has something to add?
…park3, per suggestion: sqlfluff#2245 (comment)
@barrywhart I've copied tests from your PR as you suggest @jpy-git your PR allowed me to reuse the ValuesClauseSegment from ansi, which is great I'm going to take it out of draft, ought not be more than a couple small items outstanding. |
@mcannamela: A couple of the test suites have failures. One is the normal Python tests. Another is a new test added yesterday which runs the full
Can you look at the first? I'll take an initial look at the second one. Many rules are failing, but the error message is the same for each, and I think several of them reuse some of the same underlying components, so it may be essentially the same error repeated multiple times. |
I have some possible fixes for the rule issues. I'll create a PR against your PR later today with those fixes. |
Would a separate PR be better? Can then merge main into this branch after? |
@tunetheweb: Good idea! I created it as a standalone PR. It's similar to some of your fixes yesterday. #2488 |
…into mc-fix-spark-values
Thanks everybody for reviewing; I think I have addressed all feedback so assuming checks pass I'm hunting for ✅ s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK think this is good to merge. Thanks for all your work on it. Realise there's still more to do, but had a quick look and can't see a simple solution so let's merge as is and tackle later.
One suggestion to correct docstring back ticks that I'll merge in.
Brief summary of the change made
Makes progress on #2152.
VALUES
clauses fail to parse in the spark3 dialect I believe becauseTableExpressionSegment
does not includeValuesClauseSegment
. I address that only for the spark3 dialect by including it, and also redefining some segments to address subsequent parsing failures.At present, I believe I am covering most simple cases of
VALUES
used inFROM
clauses, but not any of the optional directives likeORDER BY
,LIMIT
,OFFSET
that you can see in e.g. the postgres docs (Spark SQL docs don't mentionVALUES
except inINSERT
statements). I suspect this should all apply to the ansi dialect but I must admit I gave up trying to find the spec after not that much looking.Also, it only works reliably when the
VALUES
clause is in parentheses:I think this is because the
parse_grammar
ofFromClauseSegment
isDelimited(Ref("FromExpressionSegment"))
, which splits on the first comma it finds rather than looking for the longestFromExpressionSegment
. I don't see a good way around that. I could live with this, but I wish I had a way to hint to the user that this is what went wrong, since the output ofparse
does not make this at all obvious that just wrapping with parens will fix it.Are there any other side effects of this change that we should be aware of?
The only dialect tests that fail for spark3 are ones that I added, but since I do override
TableExpressionSegment
for spark3 I imagine there are tests we were relying on in ansi that we would want to make sure spark3 passes. Before going nuts with that, I wanted to see whether I shouldn't just be pushing all this down into the ansi dialect anyway.Pull Request checklist
Please confirm you have completed any of the necessary steps below.
Included test cases to demonstrate any code changes, which may be one or more of the following:
.yml
rule test cases intest/fixtures/rules/std_rule_cases
..sql
/.yml
parser test cases intest/fixtures/dialects
(note YML files can be auto generated withtox -e generate-fixture-yml
).test/fixtures/linter/autofix
.Added appropriate documentation for the change.
Created GitHub issues for any relevant followup/future enhancements if appropriate.