-
-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
L025 bug fix: stop incorrectly flagging on nested inner joins #3145
L025 bug fix: stop incorrectly flagging on nested inner joins #3145
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3145 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 165 165
Lines 12278 12279 +1
=========================================
+ Hits 12278 12279 +1
Continue to review full report at Codecov.
|
@@ -1460,7 +1460,10 @@ def get_eventual_aliases(self) -> List[Tuple[BaseSegment, AliasInfo]]: | |||
buff.append((from_expression, alias)) | |||
|
|||
# In some dialects, like TSQL, join clauses can have nested join clauses | |||
for join_clause in self.recursive_crawl("join_clause"): | |||
# recurse into them - but not if part of a sub-select statement (see #3144) | |||
for join_clause in self.recursive_crawl( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we're fixing the real issue here. It seems to me that we don't need to ignore select_statement
. Instead, we need to detect that the on
clause is using the alias, therefore it's not unused.
Do we know why the behavior changed from previous versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was discussed more in this comment in #3144
Basically, previously it was this:
for join_clause in self.get_children("join_clause"):
That is, we only got the direct join children, rather than using recursive_crawl
.
However #2993 was added to make it recursive to allow SQL like this:
SELECT
1 AS RegionCode
FROM BA
LEFT OUTER JOIN (
I LEFT OUTER JOIN P
ON I.Pcd = P.Iid
) ON BA.Iid = I.Bcd
Note that the outermost ON
clause is referencing the inner table I
. This is a specific syntax allowed in T-SQL (but no where else I don't think?), so the above code needed to change to be recursive_crawl
instead of get_children
.
Unfortunately there is no easier way of identifying just this case as there will be brackets, and there could be nested, nested calls. But recursive_crawl
will find them so all good.
That solve that problem, but then broke it for this SQL - hence this PR to partially revert that:
select col1
from table1 as abg
inner join (
select
ceb.emailaddresskey,
dac.accountkey
from table2 as dac
inner join table3 as ceb
on ceb.col2 = dac.col2
) as abh
on abg.col1 = abh.col1
Here the subquery should not be accessible at all to the outer query, and only the results of that (as abh
) as it's not that specific T-SQL instance above, which is the only one that should allow nesting. So effectively it should be a black box, and should look like this (at least while while the rule process the outer select - it'll get to the nested one later):
select col1
from table1 as abg
inner join (
---something we don't need to worry about now
) as abh
on abg.col1 = abh.col1
Then we process that inner join later in the rule.
That is what used to happen before #2993 , and then, while processing the outer select statement, it would see table references abg
and abh
and then uses of both those in the ON
clause - great, no rule to fire!
Instead what happens (and why we need this PR) is we now pick up recursive "joins" so we ALSO pick up table reference ceb
when we shouldn't. And there is no reference to ceb
in the outer select, hence the rule fires complaining about that.
Note that we don't pick up the inner table reference dac
btw cause that is an FROM
clause and not a JOIN
clause and the bug is specific to JOIN
clauses. Again, showing that this is wrong to traverse into the inner select for just JOIN
s.
As the nested SELECT
should be a black box (except in very specific use cases for T-SQL, when it's an nested JOIN
), I've added functionality to not recurse into it anymore, if it is it's own select statement (which is exactly what happened before), but still recurse for anything else (so the original T-SQL bug is still resolved). There may be other instances we might need to do that, but that's the only one I can think of now.
The alternative is to gather ALL the table references for the whole query (even the inner, nested ones that should not be referenceable, and including dac
in this example) and ALL the column references, and then decide which are used and which are not. But that is not the correct thing to do here, since the inner, nested table references are not be accessible outside of it.
Here's a BigQuery example showing the inner table canot be referenced in the outer select:
Only via the outer alias:
Does that explain it all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me, but I'll leave it to @barrywhart to ✅ if he is happy with the explanation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense!
In general, visibility goes upwards, but not downwards.
* L025 bug fix: stop incorrectly flagging on nested inner joins * Comment
Brief summary of the change made
Fixes #3144
Are there any other side effects of this change that we should be aware of?
Adds a new, optional, parameter to
recursive_crawl
to allow you to provide a type that should not be recursed into.Pull Request checklist
Please confirm you have completed any of the necessary steps below.
Included test cases to demonstrate any code changes, which may be one or more of the following:
.yml
rule test cases intest/fixtures/rules/std_rule_cases
..sql
/.yml
parser test cases intest/fixtures/dialects
(note YML files can be auto generated withtox -e generate-fixture-yml
).test/fixtures/linter/autofix
.Added appropriate documentation for the change.
Created GitHub issues for any relevant followup/future enhancements if appropriate.