##### Tutorial 08: SQLite Window Functions 

SQLite introduced support for window functions starting from version 3.25, allowing you to perform advanced analysis and calculations over a set of rows related to the current row. A window function operates over a "window" or subset of rows defined by a PARTITION BY clause and ordered by the ORDER BY clause. If you are familiar with aggregate functions like SUM or AVG, window functions are similar but they do not collapse multiple rows into a single result.

This tutorial covers how to use window functions in SQLite. Let's explore each of the supported window functions, including examples.

- row_number()
- rank()
- dense_rank()
- percent_rank()
- cume_dist()
- ntile(N)
- lag(expr), lag(expr, offset), lag(expr, offset, default)
- lead(expr), lead(expr, offset), lead(expr, offset, default)
- first_value(expr)
- last_value(expr)
- nth_value(expr, N)

**Example 1**: Rank students by their "`Pre_Knowledge_Data`" (Knowledge Score) within each city.  

**Example 2**: Get the cumulative distribution of "`Pre_Knowledge_Data`" for each student across all cities and state regions.

**Example 3**: Calculate the moving average of "`Pre_Knowledge_Data`" for students based on their time of joining.

**Example 4**: Compare the pre-knowledge `Pre_Knowledge_Data` of participants who joined the program in adjacent rows (using the LAG() function).

**Example 5**: Divide students into quartiles based on "`Pre_Knowledge_Data`" score using NTILE() window function.



In [None]:
import sqlite3
import pandas as pd

db_path = './database/mmdt.db3'

In [None]:
query = """
        SELECT 
            p.ID, 
            COALESCE(p.Pre_Knowledge_Data, b.Pre_Knowledge_Data) as score, 
            RANK() OVER(ORDER BY COALESCE(p.Pre_Knowledge_Data, b.Pre_Knowledge_Data)) as level 
        FROM participants as p
        LEFT JOIN bhutan as b
        Using(ID);
        """

df = pd.read_sql_query(query, f"sqlite:///{db_path}")
df.tail(20)

In [None]:
query = """
        SELECT 
            p.ID, COALESCE(p.State_Region,b.State_Region) as region,
            COALESCE(p.Pre_Knowledge_Data, b.Pre_Knowledge_Data) as score, 
            RANK() OVER(ORDER BY COALESCE(p.Pre_Knowledge_Data, b.Pre_Knowledge_Data)) as level, 
            COUNT(*) OVER (PARTITION BY COALESCE(p.State_Region,b.State_Region)) as numb_region,
            CAST(RANK() OVER(ORDER BY COALESCE(p.Pre_Knowledge_Data, b.Pre_Knowledge_Data)) 
                AS FLOAT)/COUNT(*) OVER (PARTITION BY COALESCE(p.State_Region,b.State_Region)) as cmd
        FROM participants as p
        LEFT JOIN bhutan as b
        Using(ID)
        WHERE p.State_Region LIKE '%Mandalay%';
        """

df = pd.read_sql_query(query, f"sqlite:///{db_path}")
df.head(20)

In [None]:
query = """
        SELECT 
            p.ID, COALESCE(p.Time,b.Time) as time, 
            COALESCE(p.Pre_Knowledge_Data, b.Pre_Knowledge_Data) as score, 
            AVG(COALESCE(p.Pre_Knowledge_Data, b.Pre_Knowledge_Data))
                OVER(ORDER BY COALESCE(p.Time, b.Time) ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as m_avg_score            
        FROM participants as p
        LEFT JOIN bhutan as b
        Using(ID);
        """

df = pd.read_sql_query(query, f"sqlite:///{db_path}")
df.head(20)

In [None]:
query = """
        SELECT 
            p.ID, COALESCE(p.Time,b.Time) as time, 
            COALESCE(p.Pre_Knowledge_Data, b.Pre_Knowledge_Data) as score, 
            LAG(COALESCE(p.Pre_Knowledge_Data, b.Pre_Knowledge_Data)) 
                OVER(ORDER BY COALESCE(p.Time, b.Time)) as pre_score                     
        FROM participants as p
        LEFT JOIN bhutan as b
        Using(ID);
        """

df = pd.read_sql_query(query, f"sqlite:///{db_path}")
df.head(20)

In [None]:
query = """
        SELECT 
            p.ID, COALESCE(p.Time,b.Time) as time, 
            COALESCE(p.Pre_Knowledge_Data, b.Pre_Knowledge_Data) as score, 
            NTILE(4) OVER (ORDER BY COALESCE(p.Pre_Knowledge_Data, b.Pre_Knowledge_Data)) AS range
                           
        FROM participants as p
        LEFT JOIN bhutan as b
        Using(ID);
        """

df = pd.read_sql_query(query, f"sqlite:///{db_path}")
df.tail(50)