Resolving table/column for UNION queries #13

john-bodley · 2017-03-24T21:57:34Z

@brianv0 I was leveraging the logic you provided in the examples/gather_columns.py script, and agree with your logic for a query of the form,

SELECT foo, b.bar FROM a JOIN b ON a.id = b.id

which results in a table/column resolution of:

SingleColumn(expression=QualifiedNameReference(name=QualifiedName(foo))): [Table(name=QualifiedName(a)), Table(name=QualifiedName(b))]
SingleColumn(expression=QualifiedNameReference(name=QualifiedName(b.bar))): [Table(name=QualifiedName(b))]

i.e., from the query one cannot decipher whether the column foo is from a or b, however for a UNION type query

SELECT foo AS bar FROM a UNION SELECT bar FROM b

the table/column resolution is:

SingleColumn(alias='bar', expression=QualifiedNameReference(name=QualifiedName(foo))): [Table(name=QualifiedName(b)), Table(name=QualifiedName(a))]
SingleColumn(expression=QualifiedNameReference(name=QualifiedName(bar))): [Table(name=QualifiedName(b)), Table(name=QualifiedName(a))]

although it's apparent from the query that the foo column is from table a and the bar column is from table b. Do you know of any resolution to this issue?

The text was updated successfully, but these errors were encountered:

brianv0 · 2017-03-24T23:44:15Z

The visitor needs to be made more complicated so that only the tables for the query in context are checked. I've tested the following snippet out a bit. Some of the logic from print_column_resolution_order could be put into the visitor, I did some of that myself.

Another note: This doesn't currently handle subqueries, though it could (and probably should). I maybe should work on adding a few other examples and clarifying the scope of each use case.

def check_extracted_columns(query, with_resolution=False):

    class ColumnGroup(object):
        def __init__(self, columns, tables):
            self.columns = columns
            self.tables = tables

        def __repr__(self):
            return repr(self.columns)

        def __str__(self):
            return str(self.columns)

    class TableAndColumnExtractor(DefaultTraversalVisitor):

        def __init__(self):
            self.column_groups = []

        def visit_query_specification(self, node, context):
            columns = []
            maybe_tables = []
            tables = OrderedDict()

            if node.from_:
                if isinstance(node.from_, Join):
                    relation = node.from_
                    maybe_tables.append(relation.right)
                    while isinstance(relation.left, Join):
                        relation = relation.left
                        maybe_tables.append(relation.right)
                    maybe_tables.append(relation.left)
                else:
                    maybe_tables.append(node.from_)
                maybe_tables.reverse()

            # Make it easy to refer to tables
            for table in maybe_tables:
                handle, table = self._handle_and_table(table)
                if handle:
                    tables[handle] = table
        
            for item in node.select.select_items:
                if isinstance(item, SingleColumn):
                    columns.append(item)

            self.column_groups.append(ColumnGroup(columns, tables))

        def _handle_and_table(self, relation):
            handle = None
            table = None
            if isinstance(relation, AliasedRelation):
                if isinstance(relation.relation, Table):
                    handle = relation.alias
                    table = relation.relation
                else:
                    print("WARNING: Aliased Relation is not a table and "
                          "is omitted")
            else:
                handle = ".".join(relation.name.parts)
                table = relation
            return handle, table

    def print_column_resolution_order(column_group):
        print("\nTable Column Resolution for column group:")
        tables = column_group.tables
        for i in range(len(column_group.columns)):
            column = column_group.columns[i]
            if not isinstance(column.expression, QualifiedNameReference):
                print("Warning: Skipping column at ordinal %d" % i)
                continue
            names = column.expression.name.parts
            resolution = []
            if len(names) > 1:
                qualified_table_name = ".".join(names[:-1])
                if qualified_table_name in tables:
                    resolution.append(tables[qualified_table_name])
            else:
                resolution = [v for v in tables.values()]
            print(repr(column) + ": " + str(resolution))

    print("\n\nChecking query:\n" + query)
    visitor = TableAndColumnExtractor()
    visitor.process(parser.parse(query), None)
    print(visitor.column_groups)
    if with_resolution:
        for column_group in visitor.column_groups:
            print_column_resolution_order(column_group)

john-bodley · 2017-03-25T19:01:16Z

Thanks @brianv0 this was super helpful.

brianv0 · 2017-03-31T20:19:39Z

I'm going to close this for now. I've created a new issue as a placeholder to help determine if we need an simple API that can perform a canned set of query analysis jobs (like extracting columns) and how that might be implemented.

brianv0 mentioned this issue Mar 31, 2017

Create a query analysis module/api for common operations #18

Open

brianv0 closed this as completed Mar 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolving table/column for UNION queries #13

Resolving table/column for UNION queries #13

john-bodley commented Mar 24, 2017

brianv0 commented Mar 24, 2017

john-bodley commented Mar 25, 2017

brianv0 commented Mar 31, 2017

Resolving table/column for UNION queries #13

Resolving table/column for UNION queries #13

Comments

john-bodley commented Mar 24, 2017

brianv0 commented Mar 24, 2017

john-bodley commented Mar 25, 2017

brianv0 commented Mar 31, 2017