# Concatenating Tables with Set-Like Operations in `sqlalchemy`

Finally, we look at combining tables with `union`, `intersect`, and `except` in `sqlalchemy`.

## Example - Auto Sales in SQL

In [41]:
from sqlalchemy.ext.automap import automap_base
from sqlalchemy import select as selectq
from sqlalchemy import create_engine, func
import pandas as pd

sales_eng = create_engine("sqlite:///databases/sales_2_8.db") 
Base = automap_base()
Base.prepare(sales_eng, reflect=True)
SalesApr = Base.classes.sales_apr
salesAprTbl = SalesApr.__table__
SalesMay = Base.classes.sales_may
salesMayTbl = SalesMay.__table__

AttributeError: sales_apr

In [3]:
pd.read_sql_query(selectq([SalesApr]), con=sales_eng)

Unnamed: 0,Salesperson,Compact,Sedan,SUV,Truck,id
0,Ann,22,18,15,12,0
1,Bob,20,14,6,24,1
2,Yolanda,19,10,28,17,2
3,Xerxes,11,27,17,9,3


In [4]:
pd.read_sql_query(selectq([SalesMay]), con=sales_eng)

Unnamed: 0,Salesperson,Compact,Sedan,SUV,Truck,id
0,Ann,22,18,15,12,0
1,Bob,19,12,17,20,1
2,Yolanda,19,8,32,15,2
3,Xerxes,12,23,18,9,3


## Notes on set concatenation in `sqlalchemy`

* Available functions: `union, union_all, intersect, intersect_all, except_, except_all`
* Used to combine full select statements
    * Example: `(SELECT * FROM T1) UNION (SELECT * FROM T2)`
    
**Consequence:** You need to
    1. Make two/more select statements
    2. *Then* combine with `union` etc.

In [5]:
from sqlalchemy import union, union_all, intersect, intersect_all, except_, except_all

## Performing a `union`

In [6]:
sales_union = union(selectq([salesAprTbl]), selectq([salesMayTbl]))
print(sales_union)

SELECT sales_apr."Salesperson", sales_apr."Compact", sales_apr."Sedan", sales_apr."SUV", sales_apr."Truck", sales_apr.id 
FROM sales_apr UNION SELECT sales_may."Salesperson", sales_may."Compact", sales_may."Sedan", sales_may."SUV", sales_may."Truck", sales_may.id 
FROM sales_may


In [7]:
pd.read_sql_query(sales_union, con=sales_eng)

Unnamed: 0,Salesperson,Compact,Sedan,SUV,Truck,id
0,Ann,22,18,15,12,0
1,Bob,19,12,17,20,1
2,Bob,20,14,6,24,1
3,Xerxes,11,27,17,9,3
4,Xerxes,12,23,18,9,3
5,Yolanda,19,8,32,15,2
6,Yolanda,19,10,28,17,2


## Performing a `union_all`

In [8]:
sales_union_all = union_all(selectq([salesAprTbl]), selectq([salesMayTbl]))
pd.read_sql_query(sales_union_all, con=sales_eng)

Unnamed: 0,Salesperson,Compact,Sedan,SUV,Truck,id
0,Ann,22,18,15,12,0
1,Bob,20,14,6,24,1
2,Yolanda,19,10,28,17,2
3,Xerxes,11,27,17,9,3
4,Ann,22,18,15,12,0
5,Bob,19,12,17,20,1
6,Yolanda,19,8,32,15,2
7,Xerxes,12,23,18,9,3


##  `union_all` and friends take any number of tables

In [9]:
sales_union_all3 = union_all(selectq([salesAprTbl]), 
                             selectq([salesAprTbl]), 
                             selectq([salesMayTbl]))
pd.read_sql_query(sales_union_all3, con=sales_eng)

Unnamed: 0,Salesperson,Compact,Sedan,SUV,Truck,id
0,Ann,22,18,15,12,0
1,Bob,20,14,6,24,1
2,Yolanda,19,10,28,17,2
3,Xerxes,11,27,17,9,3
4,Ann,22,18,15,12,0
5,Bob,20,14,6,24,1
6,Yolanda,19,10,28,17,2
7,Xerxes,11,27,17,9,3
8,Ann,22,18,15,12,0
9,Bob,19,12,17,20,1


## Performing a `intersect`

Note that `intersect` and `intersect_all` are synonymous.

In [10]:
sales_inter = intersect(selectq([salesAprTbl]), selectq([salesMayTbl]))
pd.read_sql_query(sales_inter, con=sales_eng)

Unnamed: 0,Salesperson,Compact,Sedan,SUV,Truck,id
0,Ann,22,18,15,12,0


## Performing a `except_`

Note that the `_` is needed as `except` is a protected Python name.

In [11]:
sales_except = except_(selectq([salesAprTbl]), selectq([salesMayTbl]))
pd.read_sql_query(sales_except, con=sales_eng)

Unnamed: 0,Salesperson,Compact,Sedan,SUV,Truck,id
0,Bob,20,14,6,24,1
1,Xerxes,11,27,17,9,3
2,Yolanda,19,10,28,17,2


## <font color="red"> Exercise 3 </font>

In the database folder, you will find a database titled `uber_samples_2_8.db` that contains the sample tables from the last 2 examples.  The tables are named  `sales_jun`, `sales_apr`, `sales_may`, `sales_sep`, `sales_aug`, and `sales_jul`. 

1. Use `union_all` to create a `stmt` that combines these files into one table.
2. Use `pandas` and `limit(5)` to get the first 5 rows of the table.
3. Use `selectq([func.count('*')]).select_from(stmt)` to count the total number of rows in the new table.

In [42]:
!rm ./databases/uber_samples_2_8.db

In [45]:
uber_eng = create_engine("sqlite:///databases/uber_samples_2_8.db") 
Base2 = automap_base()
Base2.prepare(uber_eng, reflect=True)

SalesJun = Base2.classes.sales_jun
salesJunTbl = SalesJun.__table__
SalesApr2 = Base2.classes.sales_apr
salesAprTbl2 = SalesApr2.__table__
SalesMay2 = Base2.classes.sales_may
salesMayTbl2 = SalesMay2.__table__
SalesSep = Base2.classes.sales_sep
salesSepTbl = SalesSep.__table__
SalesAug = Base2.classes.sales_aug
salesAugTbl = SalesAug.__table__
SalesJul = Base2.classes.sales_jul
salesJulTbl = SalesJul.__table__

In [62]:
sales_union_uber = union_all(selectq([salesJunTbl]),
                             selectq([salesAprTbl2]), 
                             selectq([salesMayTbl2]),
                             selectq([salesSepTbl]),
                             selectq([salesAugTbl]),
                             selectq([salesJulTbl]),
                            )

In [67]:
pd.read_sql_query(sales_union_uber.limit(5), con=uber_eng)

Unnamed: 0,date,Lat,Lon,Base,month,id
0,6/19/2014 16:49:00,40.7568,-73.9701,B02682,jun,0
1,6/12/2014 21:25:00,40.6463,-73.7768,B02598,jun,1
2,6/15/2014 22:23:00,40.7205,-73.9575,B02512,jun,2
3,6/14/2014 20:34:00,40.7639,-73.9624,B02617,jun,3
4,6/13/2014 14:36:00,40.7665,-73.9667,B02598,jun,4


In [66]:
pd.read_sql_query(selectq([func.count('*')]).select_from(sales_union_uber), con=uber_eng)


Unnamed: 0,count_1
0,600000
