New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolving table/column for UNION queries #13
Comments
The visitor needs to be made more complicated so that only the tables for the query in context are checked. I've tested the following snippet out a bit. Some of the logic from Another note: This doesn't currently handle subqueries, though it could (and probably should). I maybe should work on adding a few other examples and clarifying the scope of each use case. def check_extracted_columns(query, with_resolution=False):
class ColumnGroup(object):
def __init__(self, columns, tables):
self.columns = columns
self.tables = tables
def __repr__(self):
return repr(self.columns)
def __str__(self):
return str(self.columns)
class TableAndColumnExtractor(DefaultTraversalVisitor):
def __init__(self):
self.column_groups = []
def visit_query_specification(self, node, context):
columns = []
maybe_tables = []
tables = OrderedDict()
if node.from_:
if isinstance(node.from_, Join):
relation = node.from_
maybe_tables.append(relation.right)
while isinstance(relation.left, Join):
relation = relation.left
maybe_tables.append(relation.right)
maybe_tables.append(relation.left)
else:
maybe_tables.append(node.from_)
maybe_tables.reverse()
# Make it easy to refer to tables
for table in maybe_tables:
handle, table = self._handle_and_table(table)
if handle:
tables[handle] = table
for item in node.select.select_items:
if isinstance(item, SingleColumn):
columns.append(item)
self.column_groups.append(ColumnGroup(columns, tables))
def _handle_and_table(self, relation):
handle = None
table = None
if isinstance(relation, AliasedRelation):
if isinstance(relation.relation, Table):
handle = relation.alias
table = relation.relation
else:
print("WARNING: Aliased Relation is not a table and "
"is omitted")
else:
handle = ".".join(relation.name.parts)
table = relation
return handle, table
def print_column_resolution_order(column_group):
print("\nTable Column Resolution for column group:")
tables = column_group.tables
for i in range(len(column_group.columns)):
column = column_group.columns[i]
if not isinstance(column.expression, QualifiedNameReference):
print("Warning: Skipping column at ordinal %d" % i)
continue
names = column.expression.name.parts
resolution = []
if len(names) > 1:
qualified_table_name = ".".join(names[:-1])
if qualified_table_name in tables:
resolution.append(tables[qualified_table_name])
else:
resolution = [v for v in tables.values()]
print(repr(column) + ": " + str(resolution))
print("\n\nChecking query:\n" + query)
visitor = TableAndColumnExtractor()
visitor.process(parser.parse(query), None)
print(visitor.column_groups)
if with_resolution:
for column_group in visitor.column_groups:
print_column_resolution_order(column_group) |
Thanks @brianv0 this was super helpful. |
I'm going to close this for now. I've created a new issue as a placeholder to help determine if we need an simple API that can perform a canned set of query analysis jobs (like extracting columns) and how that might be implemented. |
@brianv0 I was leveraging the logic you provided in the
examples/gather_columns.py
script, and agree with your logic for a query of the form,which results in a table/column resolution of:
i.e., from the query one cannot decipher whether the column
foo
is from a or b, however for aUNION
type querythe table/column resolution is:
although it's apparent from the query that the
foo
column is from tablea
and thebar
column is from tableb
. Do you know of any resolution to this issue?The text was updated successfully, but these errors were encountered: