sql/analyzer: refactor and fix bugs in qualify_columns rule #706

erizocosmico · 2019-05-10T14:57:04Z

qualify_columns rule has been a source of bugs for quite a long time
due to the way we used to look for columns. Before, we looked for
all available schemas in all the tree of a query (excluding subqueries).
This required a lot of exceptions and treatments for special cases
that have been added over time in order to patch the bugs that kept
appearing.
It had special cases for aliases, for GroupBy, etc that kept complicating
the code and making the rule harder to follow and confusing.

This refactor simplifies the logic of the rule and treats all nodes
in the exact same way so it's simpler, more obvious and easier to
reason about.
Now, a node only has knowledge of the columns (aliases or not) defined
until it reaches the first Project, GroupBy, ResolvedTable or subquery
in each branch of the tree. This way, we can gather all the available
columns and infer the schema (which we cannot just call using the Schema
method because the tree is not resolved yet). Then, qualifying columns
becomes a trivial job once you have the schema.

All the tests of go-mysql-server and gitbase pass with this new
implementation of the rule.

TL;DR: got sick of this rule while debugging a gitbase issue and rewrote it so we don't have to get sick of it anymore while debugging

qualify_columns rule has been a source of bugs for quite a long time due to the way we used to look for columns. Before, we looked for all available schemas in all the tree of a query (excluding subqueries). This required a lot of exceptions and treatments for special cases that have been added over time in order to patch the bugs that kept appearing. It had special cases for aliases, for GroupBy, etc that kept complicating the code and making the rule harder to follow and confusing. This refactor simplifies the logic of the rule and treats all nodes in the exact same way so it's simpler, more obvious and easier to reason about. Now, a node only has knowledge of the columns (aliases or not) defined until it reaches the first Project, GroupBy, ResolvedTable or subquery in each branch of the tree. This way, we can gather all the available columns and infer the schema (which we cannot just call using the Schema method because the tree is not resolved yet). Then, qualifying columns becomes a trivial job once you have the schema. All the tests of go-mysql-server and gitbase pass with this new implementation of the rule. Signed-off-by: Miguel Molina <miguel@erizocosmi.co>

erizocosmico added the enhancement New feature or request label May 10, 2019

erizocosmico requested a review from a team May 10, 2019 14:57

erizocosmico self-assigned this May 10, 2019

erizocosmico mentioned this pull request May 10, 2019

Ambiguous column present in multiple tables src-d/gitbase#812

Closed

kuba-- approved these changes May 10, 2019

View reviewed changes

juanjux approved these changes May 10, 2019

View reviewed changes

ajnavarro approved these changes May 13, 2019

View reviewed changes

ajnavarro merged commit 33c1da4 into src-d:master May 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql/analyzer: refactor and fix bugs in qualify_columns rule #706

sql/analyzer: refactor and fix bugs in qualify_columns rule #706

erizocosmico commented May 10, 2019 •

edited

sql/analyzer: refactor and fix bugs in qualify_columns rule #706

sql/analyzer: refactor and fix bugs in qualify_columns rule #706

Conversation

erizocosmico commented May 10, 2019 • edited

erizocosmico commented May 10, 2019 •

edited