Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

L025 rule firing in fix (but not lint) and breaking SQL for BigQuery UNNEST statement. #1302

Closed
tunetheweb opened this issue Aug 19, 2021 · 5 comments · Fixed by #1303
Closed
Labels
bug Something isn't working

Comments

@tunetheweb
Copy link
Member

tunetheweb commented Aug 19, 2021

Consider the following SQL in BigQuery dialect:

SELECT
    category,
    value
FROM
    table1,
UNNEST(1, 2, 3) AS category

If I lint it, it correct picks up that the UNNEST statement has bad indentation (should be the same indentation as table1):

% sqlfluff lint --dialect bigquery test.sql
== [test.sql] FAIL
L:   6 | P:   1 | L003 | Indentation not consistent with line #5
All Finished 📜 🎉!

However if I run fix on this it identifies two errors:

% sqlfluff fix --dialect bigquery test.sql
==== finding fixable violations ====
== [test.sql] FAIL
L:   6 | P:   1 | L003 | Indentation not consistent with line #5
L:   6 | P:  20 | L025 | Alias 'category' is never used in SELECT statement.
==== fixing violations ====
2 fixable linting violations found
Are you sure you wish to attempt to fix these? [Y/n] ...

The alias category IS being used - so why does fix think it's not being used? And why is it different than what lint found? I think it's because it sees UNNEST as a table and thinks the category is a table alias rather than a special column alias just for UNNEST?

And if I then fix this, it (correctly!) fixes the indentation but (incorrectly!) removes the category alias:

SELECT
    category,
    value
FROM
    table1,
    UNNEST(1, 2, 3)

This breaks the query as the category in the SELECT clause now refers to nothing!

So it's almost like fix for rule L025 is not aware of this special syntax for BigQuery but lint is?

What's even stranger (and might point us in the direction of the fix) is that if I lint it without passing a dialect (so using the standard ANSI dialect) then even lint picks up the "error":

% sqlfluff lint test.sql
== [test.sql] FAIL
L:   2 | P:   5 | L027 | Unqualified reference 'category' found in select with
                       | more than one referenced table/view.
L:   3 | P:   5 | L027 | Unqualified reference 'value' found in select with more
                       | than one referenced table/view.
L:   6 | P:   1 | L003 | Indentation not consistent with line #5
L:   6 | P:  20 | L025 | Alias 'category' is never used in SELECT statement.
All Finished 📜 🎉!

Expected Behaviour

Fix should not attempt to fix non-linting errors and shouldn't break the SQL.

Observed Behaviour

Fix broke the SQL.

fix identifying more issues than lint was also identified in #1149 but have opened this as a separate issue as it's a more specific example and that issue didn't say whether fix would break something or not.

Steps to Reproduce

As per above

Dialect

BigQuery

Version

Include the output of sqlfluff --version along with your Python version

sqlfluff, version 0.6.3

Configuration

No config (other than --dialect bigquery in command line), so default.

Will try to figure out what's going on here, and submit a fix if I do, but any pointers greatly appreciated as bit confused how lint and fix can be different...

@tunetheweb tunetheweb added the bug Something isn't working label Aug 19, 2021
@tunetheweb
Copy link
Member Author

Oh and one more piece of information. Once the indentation is fixed, fix no longer flags this as an error and no longer tries to “correct” it (and breaking it in the process). This bug only happens in combination with rhe indentation error.

@barrywhart
Copy link
Member

I don't know exactly, but UNNEST is an odd construct. The rule probably only understands regular old table syntax.

@tunetheweb
Copy link
Member Author

I don't know exactly, but UNNEST is an odd construct. The rule probably only understands regular old table syntax.

Except it doesn’t complain about it when indentation is corrected as per my follow up comment:

Oh and one more piece of information. Once the indentation is fixed, fix no longer flags this as an error and no longer tries to “correct” it (and breaking it in the process). This bug only happens in combination with the indentation error.

But yeah I agree UNNEST is an odd construct. 😁

@barrywhart
Copy link
Member

Also, working on lint rules in SQLFluff is a very different experience than working on the dialects. Chaos vs order. Alchemy vs science. Rewarding, though.

@tunetheweb
Copy link
Member Author

tunetheweb commented Aug 20, 2021

Challenge accepted!

So going with the tried and tested method of throwing a lot of print statements in there I found the issue.

It all boiled down to this statement:

for function_name in table_expr.recursive_crawl("function_name"):
if function_name.raw.lower() in dialect.sets("value_table_functions"):
return True
return False

When the spacing was added by the other rule, this resulted in the function name changing from "UNEST" to " UNEST" (with lots of spaces before the UNNEST) and so wasn't matched as one of the special value_table_functions (which is only used in BigQuery for the dialects we support).

I'm not sure that adding spacing should technically change the function name, and whether that does (or will) cause any other issues, but the simplest fix here is to add a strip() function:

if function_name.raw.lower().strip() in dialect.sets("value_table_functions"):

Did a search for any other instances of function_name.* in but this looks to be the only one.

Rewarding indeed to find that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants