Skip to content
This repository has been archived by the owner on Jan 28, 2021. It is now read-only.

sql/analyzer: refactor and fix bugs in qualify_columns rule #706

Merged
merged 1 commit into from
May 13, 2019

Conversation

erizocosmico
Copy link
Contributor

@erizocosmico erizocosmico commented May 10, 2019

qualify_columns rule has been a source of bugs for quite a long time
due to the way we used to look for columns. Before, we looked for
all available schemas in all the tree of a query (excluding subqueries).
This required a lot of exceptions and treatments for special cases
that have been added over time in order to patch the bugs that kept
appearing.
It had special cases for aliases, for GroupBy, etc that kept complicating
the code and making the rule harder to follow and confusing.

This refactor simplifies the logic of the rule and treats all nodes
in the exact same way so it's simpler, more obvious and easier to
reason about.
Now, a node only has knowledge of the columns (aliases or not) defined
until it reaches the first Project, GroupBy, ResolvedTable or subquery
in each branch of the tree. This way, we can gather all the available
columns and infer the schema (which we cannot just call using the Schema
method because the tree is not resolved yet). Then, qualifying columns
becomes a trivial job once you have the schema.

All the tests of go-mysql-server and gitbase pass with this new
implementation of the rule.

TL;DR: got sick of this rule while debugging a gitbase issue and rewrote it so we don't have to get sick of it anymore while debugging

qualify_columns rule has been a source of bugs for quite a long time
due to the way we used to look for columns. Before, we looked for
all available schemas in all the tree of a query (excluding subqueries).
This required a lot of exceptions and treatments for special cases
that have been added over time in order to patch the bugs that kept
appearing.
It had special cases for aliases, for GroupBy, etc that kept complicating
the code and making the rule harder to follow and confusing.

This refactor simplifies the logic of the rule and treats all nodes
in the exact same way so it's simpler, more obvious and easier to
reason about.
Now, a node only has knowledge of the columns (aliases or not) defined
until it reaches the first Project, GroupBy, ResolvedTable or subquery
in each branch of the tree. This way, we can gather all the available
columns and infer the schema (which we cannot just call using the Schema
method because the tree is not resolved yet). Then, qualifying columns
becomes a trivial job once you have the schema.

All the tests of go-mysql-server and gitbase pass with this new
implementation of the rule.

Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
@erizocosmico erizocosmico added the enhancement New feature or request label May 10, 2019
@erizocosmico erizocosmico requested a review from a team May 10, 2019 14:57
@erizocosmico erizocosmico self-assigned this May 10, 2019
@ajnavarro ajnavarro merged commit 33c1da4 into src-d:master May 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants