Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolving subqueries columns #154

Merged
merged 17 commits into from
Jun 9, 2021
Merged

Conversation

collerek
Copy link
Collaborator

@collerek collerek commented Jun 1, 2021

Final piece of work in regard of resolving aliases and columns coming from sub-queries to the actual database columns.
Now all aliases should resolve and only real columns should be included in columns and columns_dict.

Following the updated readme note the tables in join:

from sql_metadata import Parser

parser = Parser(
"""
SELECT COUNT(1) FROM
(SELECT std.task_id FROM some_task_detail std WHERE std.STATUS = 1) a
JOIN (SELECT st.task_id FROM some_task st WHERE task_type_id = 80) b
ON a.task_id = b.task_id;
"""
)

# get sub-queries dictionary
parser.subqueries
# {"a": "SELECT std.task_id FROM some_task_detail std WHERE std.STATUS = 1",
#  "b": "SELECT st.task_id FROM some_task st WHERE task_type_id = 80"}


# get names/ aliases of sub-queries / derived tables
parser.subqueries_names
# ["a", "b"]

# note that columns coming from sub-queries are resolved to real columns
parser.columns
#["some_task_detail.task_id", "some_task_detail.STATUS", "some_task.task_id", 
# "task_type_id"]

# same applies for columns_dict, note the join columns are resolved
parser.columns_dict
#{'join': ['some_task_detail.task_id', 'some_task.task_id'],
# 'select': ['some_task_detail.task_id', 'some_task.task_id'],
# 'where': ['some_task_detail.STATUS', 'task_type_id']}

Let me know what you think :)

@collerek collerek added this to the v2.2 milestone Jun 1, 2021
or self.previous_token.normalized == "AS"
or self.is_in_with_columns
)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌🏻

or (self.previous_token.value == "(")
and self.next_token.value == ")"
)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌🏻

sql_metadata/parser.py Outdated Show resolved Hide resolved
@collerek
Copy link
Collaborator Author

collerek commented Jun 3, 2021

I realized that I forgot about with queries -> so that's also something that needs to be developed later (resolving with queries).

**Extracts column names and tables** used by the query.
Automatically conduct **column alias resolution**, **sub queries aliases resolution** as well as **tables aliases resolving**.

Provides also a helper for **normalization of SQL queries**.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌🏻

@collerek
Copy link
Collaborator Author

collerek commented Jun 4, 2021

@macbre - now after coverage add the test pipeline fails with lack of GITHUB_TOKEN

@collerek
Copy link
Collaborator Author

collerek commented Jun 7, 2021

@macbre - Ok I fixed the coverall runs, do you need me to do anything more for this PR?

sql_metadata/utils.py Outdated Show resolved Hide resolved
Co-authored-by: Maciej Brencz <maciej.brencz@gmail.com>
@macbre macbre merged commit bb9c36d into macbre:master Jun 9, 2021
@collerek collerek deleted the resolve_subquery branch June 9, 2021 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants