# Create Temporary Tables Using SQL Files

This tutorial demonstrates how to create temporary tables in athena using `pydbtools.read_sql_queries`. This is an amended version of [create_temporary_version.ipynb](create_temporary_version.ipynb).

## Setup

Just run this script to create the source database so we can use it for our example.

In [None]:
import os
import pandas as pd
import awswrangler as wr
import pydbtools as pydb

In [None]:
# setup your own testing area (set foldername = GH username)
foldername = "mratford"  # GH username
foldername = foldername.lower().replace("-", "_")

In [None]:
bucketname = "alpha-everyone"
s3_base_path = f"s3://{bucketname}/{foldername}/"

db_name = f"aws_example_{foldername}"
source_db_base_path = f"s3://{bucketname}/{foldername}/source_db/"

pydb.delete_database_and_data(db_name)

# Setup source database
# Create the database
pydb.create_database(db_name)

# Iterate through the tables in data/ and write them to our db using awswrangler
for table_name in ["department", "employees", "sales"]:
    table_path = pydb.s3_path_join(source_db_base_path, f"{table_name}/")
    pydb.file_to_table(
        path=f"data/{table_name}.csv",
        database=db_name,
        table=table_name,
        location=table_path,
    )

## Task

We are going to create a table that shows total sales per employee using all 3 tables.

In [None]:
pydb.read_sql_query(f"SELECT * FROM {db_name}.employees LIMIT 5", ctas_approach=False)

In [None]:
pydb.read_sql_query(f"SELECT * FROM {db_name}.department LIMIT 5", ctas_approach=False)

In [None]:
pydb.read_sql_query(f"SELECT * FROM {db_name}.sales LIMIT 5", ctas_approach=False)

pydbtools has `read_sql_queries` and `read_sql_queries` functions that allow you to create temporary tables within SQL which you can refer to in a `__temp__` database.

**First create a total_sales table:**

In [None]:
sql = f"""
CREATE TEMP TABLE total_sales AS
SELECT employee_id, sum(sales) as total_sales
FROM {db_name}.sales
GROUP BY employee_id;
"""
print(sql)

**Then create a table of employee names from the sales department:**

In [None]:
sql += f"""
CREATE TEMP TABLE sales_employees AS
SELECT e.employee_id, e.forename, e.surname, d.department_name
FROM {db_name}.employees AS e
LEFT JOIN {db_name}.department AS d
ON e.department_id = d.department_id
WHERE e.department_id = 1;
"""
print(sql)

**Finally return our final tables**

Note that more than one select statement can be used so the function returns an iterator yielding the results of each select.

In [None]:
sql += f"""
SELECT se.*, ts.total_sales
FROM __temp__.sales_employees AS se
INNER JOIN __temp__.total_sales AS ts
ON se.employee_id = ts.employee_id;
"""
print(sql)

In [None]:
total_sales = pydb.read_sql_queries(sql)

In [None]:
total_sales

The `read_sql_queries_gen` function allows you to use more than `SELECT` statement, returning an iterator of dataframes.

In [None]:
sql += f"""
SELECT forename, surname, sum(s.sales) as q1_sales
FROM __temp__.sales_employees AS se
LEFT JOIN {db_name}.sales AS s
ON se.employee_id = s.employee_id
GROUP BY forename, surname;
"""
print(sql)

In [None]:
total_sales, q1_sales = tuple(pydb.read_sql_queries_gen(sql))
q1_sales

In [None]:
### Clean up

pydb.delete_database_and_data(db_name)