### Tutorial 06: Working with Time Ddata

In this tutorial, we will explore how to work with timestamps, dates, and other time-related columns in SQLite. 

**Working with Timestamps and Dates in SQLite**
SQLite supports:

- **DATE**: Stores the date in the format YYYY-MM-DD.
- **TIME**: Stores the time in the format HH:MM:SS.
- **DATETIME**: Stores both the date and the time in the format YYYY-MM-DD HH:MM:SS.
- **TIMESTAMP**: Stores the same data as DATETIME but usually refers to the number of seconds since a specific time (often 1970-01-01 00:00:00 UTC).

SQLite provides a few built-in functions for manipulating time:

- `DATE`(): Extracts the date from a given string, or converts a string to a date. Your date must be in YYYY-MM-DD format. 
- `TIME`(): Extracts the time from a given string, or converts a string to time. Your date must be in YYYY-MM-DD format. 
- `DATETIME`(): Combines both date and time from a given string, or converts a string to both.
- `STRFTIME`(): Formats date and time according to a specific format.

**Example 1**: What is the status of participants who have left the country?

**Example 2**: What is the average age of participants who have applied in the same day and same hour?

**Example 3**: How many paraticipants left the country between August 01,2024 and Feb 01, 2021?

**Example 4**: What is the status distribution of participants  who have applied in the same day?

**Example 5**: Extract ID and Age of participants who are still in progress and have left the country between August 01,2024 and Feb 01, 2021?

---

In [None]:
import pandas as pd
import sqlite3

db_path = './database/mmdt.db3'

In [None]:
#only before update
query = """
WITH formatedDate AS(
        SELECT 
            ID, 
            substr(Date_leave_country, -4, 4) as year,        
            REPLACE(substr(Date_leave_country, -7, 2), '/','0') as day, 
            CASE
            WHEN  instr(Date_leave_country, '/') = 2 
            THEN  '0'||substr(Date_leave_country, 1, instr(Date_leave_country, '/')-1) 
            WHEN  instr(Date_leave_country, '/') = 3 
            THEN  substr(Date_leave_country, 1, instr(Date_leave_country, '/')-1) 
            END as month    
        FROM participants)
        
    SELECT 
        fd.ID, 
        fd.year ||'-' || fd.month || '-' || fd.day as date
    FROM formatedDate as fd;
        """


In [None]:
#run only once
update_query = """
            UPDATE participants 
            SET Date_Leave_Country = 
            substr(Date_leave_country, -4, 4) ||'-' || 
            CASE
            WHEN  instr(Date_leave_country, '/') = 2 
            THEN  '0'||substr(Date_leave_country, 1, instr(Date_leave_country, '/')-1) 
            WHEN  instr(Date_leave_country, '/') = 3 
            THEN  substr(Date_leave_country, 1, instr(Date_leave_country, '/')-1) 
            END  || '-' || 
            REPLACE(substr(Date_leave_country, -7, 2), '/','0')
            """
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
cursor.execute(update_query)
conn.commit()
conn.close()

In [None]:
query = """
        SELECT ID, s.status, Date_leave_country
        FROM participants as p
        LEFT JOIN status as s
        ON p.ID = s.PARTICIPANT_ID
        ;
        """

df = pd.read_sql_query(query, f'sqlite:///{db_path}')
df

In [None]:
# done only once before update
query = """
WITH formatedDate AS(
        SELECT 
            ID, Time,
            substr(Time, instr(Time, ' ')-4, 4) as year,        
            REPLACE(substr(Time, instr(Time, ' ')-7, 2), '/','0') as day, 
            CASE
            WHEN  instr(Time, '/') = 2 
            THEN  '0'||substr(Time, 1, instr(Time, '/')-1) 
            WHEN  instr(Time, '/') = 3 
            THEN  substr(Time, 1, instr(Time, '/')-1) 
            END as month,  
            
            CASE
            WHEN instr(substr(Time,-9,9), ':') = 4
            THEN '0'||substr(Time, -7 ,7)
            WHEN  instr(substr(Time,-9,9), ':') = 5
            THEN substr(Time, -8,8)           
            END as h_m_s
        FROM participants)
        
    SELECT 
        fd.ID, fd.Time,fd.mms,
        fd.year ||'-' || fd.month || '-' || fd.day as date,
        fd.h_m_s
    FROM formatedDate as fd;
        """



In [None]:
#run only once 
update_query = """
            UPDATE participants 
            SET Time = 
            substr(Time, instr(Time, ' ')-4, 4) ||'-' || 
            CASE
            WHEN  instr(Time, '/') = 2 
            THEN  '0'||substr(Time, 1, instr(Time, '/')-1) 
            WHEN  instr(Time, '/') = 3 
            THEN  substr(Time, 1, instr(Time, '/')-1) 
            END  || '-' || 
            REPLACE(substr(Time, instr(Time, ' ')-7, 2), '/','0') || ' ' || 
            CASE
            WHEN instr(substr(Time,-9,9), ':') = 4
            THEN '0'||substr(Time, -7 ,7)
            WHEN  instr(substr(Time,-9,9), ':') = 5
            THEN substr(Time, -8,8)           
            END
            """

conn = sqlite3.connect(db_path)
cursor = conn.cursor()
cursor.execute(update_query)
conn.commit()
conn.close()

In [None]:
query = """
        SELECT 
        COALESCE(substr(Time, 1,10),'2024-12-30') as date_hour, 
        COUNT(*) as num_applicants,
        ROUND(AVG(2024-BOD),2) as average_age        
        FROM participants
        GROUP BY date_hour;
        """

df = pd.read_sql_query(query, f'sqlite:///{db_path}')
df

In [None]:
query = """
        SELECT ID, Date_leave_country
        FROM participants
        WHERE Date_leave_country BETWEEN '2021-02-01' AND '2024-08-01';
        """

df = pd.read_sql_query(query, f'sqlite:///{db_path}')
df

In [None]:
query = """
        SELECT 
        COALESCE(substr(Time, 1,10),'2024-07-25') as date_applied, 
        s.status,
        COUNT(*) as num_applicants            
        FROM participants as p
        LEFT JOIN status as s
        ON p.ID = s.PARTICIPANT_ID
        GROUP BY date_applied, status;
        """

df = pd.read_sql_query(query, f'sqlite:///{db_path}')
df

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Calculate the total number of applicants per 'date_applied'
df['total_applicants'] = df.groupby('date_applied')['num_applicants'].transform('sum')

# Calculate the percentage of applicants for each 'Status' by dividing by total applicants
df['percentage'] = (df['num_applicants'] / df['total_applicants']) * 100

plt.figure(figsize = (12,6))
sns.barplot(data = df, x = 'date_applied', y = 'percentage', hue = 'Status')
plt.ylabel('number of applicants')
plt.show()

In [None]:
query = """
        SELECT 
        ID,
        strftime('%Y', CURRENT_DATE)-BOD as age,
        s.status                 
        FROM participants as p
        LEFT JOIN status as s
        ON p.ID = s.PARTICIPANT_ID
        WHERE Date_leave_country BETWEEN '2021-02-01' AND '2024-08-01'
        AND s.status = 'In progress';
        """

df = pd.read_sql_query(query, f'sqlite:///{db_path}')
df