# Querying Hierarchical Data
## Using alias to handle same table joined queries
Often, you'll have tables that contain hierarchical data, such as employees and managers who are also employees. For this reason, you may wish to join a table to itself on different columns. The `.alias()` method, which creates a copy of a table, helps accomplish this task. Because it's the same table, you only need a where clause to specify the join condition.

Here, you'll use the `.alias()` method to build a query to join the `employees` table against itself to determine to whom everyone reports.

In [6]:
# Import create_engine
from sqlalchemy import create_engine, MetaData, Table, select, desc, func

# Metadata
metadata = MetaData()

# Create an engine that connects to the census.sqlite file: engine
engine = create_engine('sqlite:///employees.sqlite')

# create connection
connection = engine.connect()

# Print table names
print(engine.table_names())

['employees']


In [8]:
# Reflect census table from the engine: census
employees = Table('employees', metadata, autoload=True, autoload_with=engine)

# Print census table metadata
print(repr(employees))

Table('employees', MetaData(bind=None), Column('id', INTEGER(), table=<employees>, primary_key=True, nullable=False), Column('name', VARCHAR(length=20), table=<employees>), Column('job', VARCHAR(length=20), table=<employees>), Column('mgr', INTEGER(), table=<employees>), Column('hiredate', DATETIME(), table=<employees>), Column('sal', NUMERIC(precision=7, scale=2), table=<employees>), Column('comm', NUMERIC(precision=7, scale=2), table=<employees>), Column('dept', INTEGER(), table=<employees>), schema=None)


In [10]:
# Print the column names
print(employees.columns.keys())

['id', 'name', 'job', 'mgr', 'hiredate', 'sal', 'comm', 'dept']


In [12]:
# Make an alias of the employees table: managers
managers = employees.alias()
managers

<sqlalchemy.sql.selectable.Alias at 0xe89830; %(15243312 employees)s>

In [14]:
# Build a query to select manager's and their employees names: stmt
stmt = select(
    [managers.columns.name.label('manager'),
     employees.columns.name.label('employee')]
)

# Match managers id with employees mgr: stmt
stmt = stmt.where(managers.columns.id == employees.columns.mgr)

# Order the statement by the managers name: stmt
stmt = stmt.order_by(managers.columns.name)

# Execute statement: results
results = connection.execute(stmt).fetchall()

# Print records
for record in results:
    print(record)

('FILLMORE', 'GRANT')
('FILLMORE', 'ADAMS')
('FILLMORE', 'MONROE')
('GARFIELD', 'JOHNSON')
('GARFIELD', 'LINCOLN')
('GARFIELD', 'POLK')
('GARFIELD', 'WASHINGTON')
('HARDING', 'TAFT')
('HARDING', 'HOOVER')
('JACKSON', 'HARDING')
('JACKSON', 'GARFIELD')
('JACKSON', 'FILLMORE')
('JACKSON', 'ROOSEVELT')


## Leveraging Functions and Group_bys with Hierarchical Data
It's also common to want to roll up data which is in a hierarchical table. Rolling up data requires making sure you're careful which alias you use to perform the group_bys and which table you use for the function.

Here, your job is to get a count of employees for each manager.

In [15]:
# Make an alias of the employees table: managers
managers = employees.alias()

# Build a query to select managers and counts of their employees: stmt
stmt = select([managers.columns.name, func.count(employees.columns.id)])

# Append a where clause that ensures the manager id and employee mgr are equal
stmt = stmt.where(managers.columns.id==employees.columns.mgr)

# Group by Managers Name
stmt = stmt.group_by(managers.columns.name)

# Execute statement: results
results = connection.execute(stmt).fetchall()

# import pandas
import pandas as pd

# Create a DataFrame from the results: df
df = pd.DataFrame(results)

# Set column names
df.columns = results[0].keys()

# Print the Dataframe
df

Unnamed: 0,name,count_1
0,FILLMORE,3
1,GARFIELD,4
2,HARDING,2
3,JACKSON,4
