Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comma is removed when fixing query with multiple CTEs #939

Closed
plazar opened this issue Apr 8, 2021 · 5 comments · Fixed by #1239
Closed

Comma is removed when fixing query with multiple CTEs #939

plazar opened this issue Apr 8, 2021 · 5 comments · Fixed by #1239
Assignees
Labels
bug Something isn't working

Comments

@plazar
Copy link

plazar commented Apr 8, 2021

When fixing a query with multiple CTEs I find that the comma separating them is occasionally removed, resulting in invalid SQL. I'm using dialect = snowflake and a simple example query is:

WITH first_cte AS
(SELECT
    id
    , one
    FROM first),

second_cte AS
(SELECT
    id
    , two
    FROM {{ source('schema', 'table') }} )

SELECT
    id
    , one
    , two
FROM first_cte
LEFT JOIN second_cte
    ON first_cte.id = second_cte.id;

I've done some detective work and the comma is removed only when I set comma_style=leading for rule L019.

Interestingly, if I manually replace the jinja macro {{ source('schema', 'table') }} with schema_table then the leading comma is correctly inserted and the fixed query is valid!

Expected Behaviour

I'm expect that sqlfluff fix ... correctly removes the trailing command after the first CTE in the query above and places a leading comma before the second CTE.

Observed Behaviour

The trailing comma is removed and no leading comma is inserted.

Steps to Reproduce

  1. Copy/paste the query into "bug.sql"
  2. Create .sqlfluff file (see below) in the same directory
  3. Run sqlfluff fix bug.sql -> trailing comma is removed, no leading comma is inserted

And to reproduce some of the odd behaviour I saw (in case it's helpful to locate the issue):

  • Replace {{ source('schema', 'table') }} with schema_table in the query

    • Trailing comma is correctly replaced with a leading comma
  • Reset {{ source('schema', 'table') }}, change comma_style = leading with comma_style = trailing in .sqlfluff

    • Trailing comma is correctly replaced with a leading comma

Version

$ sqlfluff --version
sqlfluff, version 0.5.0

$ python --version
Python 3.8.5

Configuration

[sqlfluff]
dialect = snowflake
rules = L019

[sqlfluff:rules]
comma_style = leading

Extra info

Output from running sqlfluff fix:

$ sqlfluff fix bug.sql
==== finding fixable violations ====
== [bug.sql] FAIL
L:   5 | P:  13 | L019 | Found trailing comma. Expected only leading.
==== fixing violations ====
1 fixable linting violations found
Are you sure you wish to attempt to fix these? [Y/n] ...
Attempting fixes...
Persisting Changes...
== [bug.sql] PASS
Done. Please check your files to confirm.

Resulting .sql file (notice the comma missing between the first and second CTE):

WITH first_cte AS
(SELECT
        id
        , one
        FROM first)

second_cte AS
  (SELECT
    id
    , two
    FROM {{ source('schema', 'table') }} )

SELECT id
        , one
        , two
FROM first_cte
LEFT JOIN second_cte
        ON first_cte.id = second_cte.id;

Finally

I'd be willing to try to fix the code and open a PR. However, I'm very new to sqlfluff and I don't know if this is an easy fix or a hard one. Also, I would need some guidance about where to look and what to investigate.

@plazar plazar added the bug Something isn't working label Apr 8, 2021
@pwildenhain
Copy link
Member

🤔 Weird -- I'm currently working on comma stuff in #934 so I'll take this on and hopefully solve it in that issue

@pwildenhain pwildenhain self-assigned this Apr 8, 2021
@pwildenhain
Copy link
Member

pwildenhain commented Jun 11, 2021

Also tagging #1085 since it's definitely related -- and shows that the issue stands for trailing and leading commas.

@pwildenhain
Copy link
Member

pwildenhain commented Jun 11, 2021

I don't have a solid answer to this, but I've been playing around a little bit in the debugger, and have something to show for it.

I thought the issue would be with the rule itself. But maybe it's with the underlying fix engine?

Here is the state of the fix while we're still crawling looking for fixes.

(Pdb) c
> c:\users\wildenhaip\sqlfluff\src\sqlfluff\core\rules\base.py(395)crawl()
-> return vs, raw_stack, fixes, memory
(Pdb) fixes
[<LintFix: delete @[L:  5, P: 13] delete:','>, <LintFix: edit @[L:  7, P:  1] edt:'second_cte AS\n  (SELECT\n    id\n   
 , two\n    FROM schema_table )'->', second_cte AS\n  (SELECT\n    id\n    , two\n    FROM schema_table )'>]

Looks like the leading comma is in there as expected 👍🏻

But by the time we get to the linter, it seems like we've lost it 😢

> c:\users\wildenhaip\sqlfluff\src\sqlfluff\core\linter\linter.py(491)lint_parsed()
-> if parsed.config.get("dialect") == "ansi" and linted_file.get_violations(
(Pdb) linted_file.fix_string()
("WITH first_cte AS\n(SELECT\n\tid\n\t, one\n\tFROM first)\n\nsecond_cte AS\n  (SELECT\n    id\n    , two\n    FROM {{ source('schema', 'table') }} )\n\nSELECT id\n\t, one\n\t, two\nFROM first_cte\nLEFT JOIN second_cte\n\tON first_cte.id = 
second_cte.id;", True)

Tagging @alanmcruickshank or @barrywhart to see if they have any thoughts on how the fix would get lost in translation. As @plazar pointed out, it likely has something to do with the jinja templating, since this isn't a problem when we drop the jinja templating (which I can confirm).

sqlfluff parse output

[L:  1, P:  1]      |file:
[L:  1, P:  1]      |    statement:
[L:  1, P:  1]      |        with_compound_statement:
[L:  1, P:  1]      |            keyword:                                          'WITH'
[L:  1, P:  5]      |            whitespace:                                       ' '
[L:  1, P:  6]      |            common_table_expression:
[L:  1, P:  6]      |                identifier:                                   'first_cte'
[L:  1, P: 15]      |                whitespace:                                   ' '
[L:  1, P: 16]      |                keyword:                                      'AS'
[L:  1, P: 18]      |                newline:                                      '\n'
[L:  2, P:  1]      |                bracketed:
[L:  2, P:  1]      |                    start_bracket:                            '('
[L:  2, P:  2]      |                    [META] indent:
[L:  2, P:  2]      |                    select_statement:
[L:  2, P:  2]      |                        select_clause:
[L:  2, P:  2]      |                            keyword:                          'SELECT'
[L:  2, P:  8]      |                            [META] indent:
[L:  2, P:  8]      |                            newline:                          '\n'
[L:  3, P:  1]      |                            whitespace:                       '\t'
[L:  3, P:  2]      |                            select_clause_element:
[L:  3, P:  2]      |                                column_reference:
[L:  3, P:  2]      |                                    identifier:               'id'
[L:  3, P:  4]      |                            newline:                          '\n'
[L:  4, P:  1]      |                            whitespace:                       '\t'
[L:  4, P:  2]      |                            comma:                            ','
[L:  4, P:  3]      |                            whitespace:                       ' '
[L:  4, P:  4]      |                            select_clause_element:
[L:  4, P:  4]      |                                column_reference:
[L:  4, P:  4]      |                                    identifier:               'one'
[L:  4, P:  7]      |                        newline:                              '\n'
[L:  5, P:  1]      |                        whitespace:                           '\t'
[L:  5, P:  2]      |                        [META] dedent:
[L:  5, P:  2]      |                        from_clause:
[L:  5, P:  2]      |                            keyword:                          'FROM'
[L:  5, P:  6]      |                            whitespace:                       ' '
[L:  5, P:  7]      |                            from_expression:
[L:  5, P:  7]      |                                [META] indent:
[L:  5, P:  7]      |                                from_expression_element:
[L:  5, P:  7]      |                                    table_expression:
[L:  5, P:  7]      |                                        table_reference:
[L:  5, P:  7]      |                                            identifier:       'first'
[L:  5, P: 12]      |                                [META] dedent:
[L:  5, P: 12]      |                    [META] dedent:
[L:  5, P: 12]      |                    end_bracket:                              ')'
[L:  5, P: 13]      |            comma:                                            ','
[L:  5, P: 14]      |            newline:                                          '\n'
[L:  6, P:  1]      |            newline:                                          '\n'
[L:  7, P:  1]      |            common_table_expression:
[L:  7, P:  1]      |                identifier:                                   'second_cte'
[L:  7, P: 11]      |                whitespace:                                   ' '
[L:  7, P: 12]      |                keyword:                                      'AS'
[L:  7, P: 14]      |                newline:                                      '\n'
[L:  8, P:  1]      |                whitespace:                                   '  '
[L:  8, P:  3]      |                bracketed:
[L:  8, P:  3]      |                    start_bracket:                            '('
[L:  8, P:  4]      |                    [META] indent:
[L:  8, P:  4]      |                    select_statement:
[L:  8, P:  4]      |                        select_clause:
[L:  8, P:  4]      |                            keyword:                          'SELECT'
[L:  8, P: 10]      |                            [META] indent:
[L:  8, P: 10]      |                            newline:                          '\n'
[L:  9, P:  1]      |                            whitespace:                       '    '
[L:  9, P:  5]      |                            select_clause_element:
[L:  9, P:  5]      |                                column_reference:
[L:  9, P:  5]      |                                    identifier:               'id'
[L:  9, P:  7]      |                            newline:                          '\n'
[L: 10, P:  1]      |                            whitespace:                       '    '
[L: 10, P:  5]      |                            comma:                            ','
[L: 10, P:  6]      |                            whitespace:                       ' '
[L: 10, P:  7]      |                            select_clause_element:
[L: 10, P:  7]      |                                column_reference:
[L: 10, P:  7]      |                                    identifier:               'two'
[L: 10, P: 10]      |                        newline:                              '\n'
[L: 11, P:  1]      |                        whitespace:                           '    '
[L: 11, P:  5]      |                        [META] dedent:
[L: 11, P:  5]      |                        from_clause:
[L: 11, P:  5]      |                            keyword:                          'FROM'
[L: 11, P:  9]      |                            whitespace:                       ' '
[L: 11, P: 10]      |                            from_expression:
[L: 11, P: 10]      |                                [META] indent:
[L: 11, P: 10]      |                                from_expression_element:
[L: 11, P: 10]      |                                    table_expression:
[L: 11, P: 10]      |                                        table_reference:
[L: 11, P: 10]      |                                            identifier:       'schema_table'
[L: 11, P: 22]      |                                [META] dedent:
[L: 11, P: 22]      |                    [META] dedent:
[L: 11, P: 22]      |                    end_bracket:                              ')'
[L: 11, P: 23]      |            newline:                                          '\n'
[L: 12, P:  1]      |            newline:                                          '\n'
[L: 13, P:  1]      |            select_statement:
[L: 13, P:  1]      |                select_clause:
[L: 13, P:  1]      |                    keyword:                                  'SELECT'
[L: 13, P:  7]      |                    [META] indent:
[L: 13, P:  7]      |                    whitespace:                               ' '
[L: 13, P:  8]      |                    select_clause_element:
[L: 13, P:  8]      |                        column_reference:
[L: 13, P:  8]      |                            identifier:                       'id'
[L: 13, P: 10]      |                    newline:                                  '\n'
[L: 14, P:  1]      |                    whitespace:                               '\t'
[L: 14, P:  2]      |                    comma:                                    ','
[L: 14, P:  3]      |                    whitespace:                               ' '
[L: 14, P:  4]      |                    select_clause_element:
[L: 14, P:  4]      |                        column_reference:
[L: 14, P:  4]      |                            identifier:                       'one'
[L: 14, P:  7]      |                    newline:                                  '\n'
[L: 15, P:  1]      |                    whitespace:                               '\t'
[L: 15, P:  2]      |                    comma:                                    ','
[L: 15, P:  3]      |                    whitespace:                               ' '
[L: 15, P:  4]      |                    select_clause_element:
[L: 15, P:  4]      |                        column_reference:
[L: 15, P:  4]      |                            identifier:                       'two'
[L: 15, P:  7]      |                newline:                                      '\n'
[L: 16, P:  1]      |                [META] dedent:
[L: 16, P:  1]      |                from_clause:
[L: 16, P:  1]      |                    keyword:                                  'FROM'
[L: 16, P:  5]      |                    whitespace:                               ' '
[L: 16, P:  6]      |                    from_expression:
[L: 16, P:  6]      |                        [META] indent:
[L: 16, P:  6]      |                        from_expression_element:
[L: 16, P:  6]      |                            table_expression:
[L: 16, P:  6]      |                                table_reference:
[L: 16, P:  6]      |                                    identifier:               'first_cte'
[L: 16, P: 15]      |                        newline:                              '\n'
[L: 17, P:  1]      |                        [META] dedent:
[L: 17, P:  1]      |                        join_clause:
[L: 17, P:  1]      |                            keyword:                          'LEFT'
[L: 17, P:  5]      |                            whitespace:                       ' '
[L: 17, P:  6]      |                            keyword:                          'JOIN'
[L: 17, P: 10]      |                            [META] indent:
[L: 17, P: 10]      |                            whitespace:                       ' '
[L: 17, P: 11]      |                            from_expression_element:
[L: 17, P: 11]      |                                table_expression:
[L: 17, P: 11]      |                                    table_reference:
[L: 17, P: 11]      |                                        identifier:           'second_cte'
[L: 17, P: 21]      |                            newline:                          '\n'
[L: 18, P:  1]      |                            whitespace:                       '\t'
[L: 18, P:  2]      |                            join_on_condition:
[L: 18, P:  2]      |                                keyword:                      'ON'
[L: 18, P:  4]      |                                [META] indent:
[L: 18, P:  4]      |                                whitespace:                   ' '
[L: 18, P:  5]      |                                expression:
[L: 18, P:  5]      |                                    column_reference:
[L: 18, P:  5]      |                                        identifier:           'first_cte'
[L: 18, P: 14]      |                                        dot:                  '.'
[L: 18, P: 15]      |                                        identifier:           'id'
[L: 18, P: 17]      |                                    whitespace:               ' '
[L: 18, P: 18]      |                                    comparison_operator:      '='
[L: 18, P: 19]      |                                    whitespace:               ' '
[L: 18, P: 20]      |                                    column_reference:
[L: 18, P: 20]      |                                        identifier:           'second_cte'
[L: 18, P: 30]      |                                        dot:                  '.'
[L: 18, P: 31]      |                                        identifier:           'id'
[L: 18, P: 33]      |                                [META] dedent:
[L: 18, P: 33]      |                            [META] dedent:
[L: 18, P: 33]      |    statement_terminator:                                     ';'
[L: 18, P: 34]      |    newline:                                                  '\n'

@barrywhart
Copy link
Member

NOTE: Updated the SQL example in the description to address other linting issues, leaving only the L019 issue. This may make it easier to investigate the issue.

@barrywhart
Copy link
Member

@pwildenhain: I have a PR now which fixes this. I've noticed with other rules that certain kinds of fixes cause the file to be corrupted in various ways. In this case, an "edit" fix was replacing a segment with a list of segments that included itself. This works sometimes but not others. TBH, this should probably be flagged as a "bad fix" by the fix engine, rather than allowing unpredictable behavior. My PR only addresses the specific rule issue, not the deeper issue.

With this fix, I don't think #1085 is related after all, but I'm not sure because I don't use dbt, nor have I worked on the dbt template.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants