<a href="https://colab.research.google.com/github/gt-cse-6040/bootcamp/blob/main/SQL/syllabus/SQL3nb4_SQL_TempTables_FA25.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SQL Temp Tables

In [None]:
!wget https://github.com/gt-cse-6040/bootcamp/raw/main/SQL/syllabus/NYC-311-2M_small.db

In [None]:
# create a connection to the database
import sqlite3 as db
import pandas as pd

# Connect to a database (or create one if it doesn't exist)
conn_nyc = db.connect('NYC-311-2M_small.db')

## Temp Tables -- Quick Review

In SQLite, temporary tables (or temp tables) are used to store data temporarily during the course of a session.

These tables are very useful for intermediate results, complex queries, or storing temporary data without affecting the main schema of the database.

The the temporary table, and the data in the temporary table, is automatically deleted when the session ends or when the database connection is closed.

Temporary Tables Features:

*    `Temporary Scope`: Temporary tables only exist for the duration of the database session. Once the connection is closed, the temporary tables are dropped automatically.

*    `Session-Specific`: They are available only to the database connection that created them. Other connections cannot access the temporary tables.

*    `Prefix`: Temporary tables are created with the keyword TEMP or TEMPORARY, but TEMP is optional in SQLite. They are stored in memory by default, but you can configure them to be stored on disk.

*    `No Impact on Schema`: Temporary tables are separate from the permanent database schema, so they do not affect the structure or data of the main tables.

## Temp Tables

- Are created/defined at the `top` of the SQL statement.


- The temp table is created and written to memory when defined, and subsequent joins to the temp table are to the table structure in memory.


- Temp tables can be very efficient, when your query is either complex, or you must call it several/many times in your SQL program, because it is only created once, then referenced as any other table.


- Temp tables can be memory-intensive, particularly if it holds a lot of data (either rows or columns, or both).

https://w3schools.tech/tutorial/sql/sql-temporary-tables

### EXAMPLE -- INNER JOIN WITH Temp Table (similar to before, with a subquery/CTE)

**Requirement**

From the `data` table, for each `city`, return counts or a distribution of tickets per `hour` on the biggest day (by events) by `createdDate`.

Hint, it's `2014-11-18` (8466 events) but how do we put this into code dynamically?

*    Columns
    *    `City`
    *    `createdHour`
    *    `countoccur`: the count of events

*    Exclude NULL cities i.e. `WHERE city IS NOT NULL`

*    Sort
    *   `City` in ascending order
    *   `createdHour` in ascending order

**Pseudocode:**
*    Need to find the biggest day. Store as TEMP TABLE `temptopymd`
*    JOIN `temptopymd` to the `data` table
*    produce `SELECT` statement
*    include `WHERE` statement
*    `GROUP BY`
*    `ORDER BY`

In [None]:
def inner_join_example():

    def drop_table():
        drop_table_query = '''
                    DROP TABLE IF EXISTS temptopymd;
                    '''
        return drop_table_query

    def create_table():
        create_table_query = '''
                    CREATE TEMP TABLE temptopymd AS
                              SELECT strftime('%Y-%m-%d',CreatedDate) createdymd
                                      ,count(*) totalymd
                              FROM data
                              group by 1
                              order by 2 desc
                              limit 1;
                    '''
        return create_table_query
#     create the cursor and drop/create the temp table
    cursor=conn_nyc.cursor()
    cursor.execute(drop_table())
    cursor.execute(create_table())


#     # display the list of cities
    query_inner_join = '''
                        SELECT a.city
                                    ,strftime('%H',CreatedDate) createdhour
                                    ,count(*) countoccur
                                FROM data a
                                --this join gets the date with the most events from temptable
                                INNER JOIN temptopymd b
                                    on strftime('%Y-%m-%d',a.CreatedDate)=b.createdymd
                                WHERE a.city IS NOT NULL
                                GROUP BY 1,2
                                ORDER BY 1,2
                '''


    return query_inner_join

df_inner_join_example = pd.read_sql(inner_join_example(),conn_nyc)
display(df_inner_join_example)

### So what did we do here?

#### For the temp table itself:

1. We ensured that the temp table did not already exist in memory.

#### We then created the temp table:

2. The query counted the number of rows (complaints) for each data.

3. The query sorted by the number in descending order.

4. The temp table was created one row, which is the date with the most complaints.

#### Next, in the main query:

1. The temp table inner joined to the main query on the date.

2. Because the join to the temp table is an inner join, it ensures that the only rows included/returned are those with that date.

### As discussed before, a temp table is also very good if it is to be called multiple times in the session, and/or it has a lot of data in it.

#### Same as subqueries and CTEs, troubleshooting temp tables is fairly straightforward.

#### For example, to ensure that the temp table is returning the correct date, all we have to do is call it in its own query.

Let's see how this is done.

In [None]:
def temptableexample():

    query_temp_table = '''
                SELECT * FROM temptopymd LIMIT 10
                '''

    return query_temp_table

d=pd.read_sql(temptableexample(),conn_nyc)
display(d)

## Remember:  

#### Temporary tables `only` exist for the duration of the database session.

#### Once the connection is closed, the temporary tables are dropped automatically.

### In this case, we have not closed the connection (by closing the notebook), so we can call the temporary table in its own query.

## Because of the complexity in creating temporary tables for the LEFT JOIN example, we will not be showing it with Temporary Tables.

#### You will not be asked to execute a query that complex, using temp tables, in this class.

## What questions do you have on Temporary Tables?