This repository has been archived by the owner on Jan 28, 2021. It is now read-only.
sql/analyzer: refactor and fix bugs in qualify_columns rule #706
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
qualify_columns rule has been a source of bugs for quite a long time
due to the way we used to look for columns. Before, we looked for
all available schemas in all the tree of a query (excluding subqueries).
This required a lot of exceptions and treatments for special cases
that have been added over time in order to patch the bugs that kept
appearing.
It had special cases for aliases, for GroupBy, etc that kept complicating
the code and making the rule harder to follow and confusing.
This refactor simplifies the logic of the rule and treats all nodes
in the exact same way so it's simpler, more obvious and easier to
reason about.
Now, a node only has knowledge of the columns (aliases or not) defined
until it reaches the first Project, GroupBy, ResolvedTable or subquery
in each branch of the tree. This way, we can gather all the available
columns and infer the schema (which we cannot just call using the Schema
method because the tree is not resolved yet). Then, qualifying columns
becomes a trivial job once you have the schema.
All the tests of go-mysql-server and gitbase pass with this new
implementation of the rule.
TL;DR: got sick of this rule while debugging a gitbase issue and rewrote it so we don't have to get sick of it anymore while debugging