##### Tutorial 2: Retrieving Data
This tutorial will teach you how to retrieve data. You will learn how to:


######## Objectives
- Perform basic `SELECT` queries
- Filter data using the `WHERE` clause
- Retrieve specific columns
- Sort and limit results
- Use various operators to refine queries
---


In [1]:
import sqlite3
import pandas as pd

db_path = './database/mmdt.db3'


In this tutorial, I am using `pandas` to read an SQLite database. The syntax of queries remains the same as with other methods. Here's how you can proceed:

######## Install Necessary Libraries
Ensure you have the required libraries installed. Use the following commands to install them if needed:
```bash
pip install pandas sqlalchemy

######## Retrieve Data, Filter Query with WHERE, Aggregate Data with GROUP BY and ORDER BY
- **Q.1**: Count the number of `participants`.  
- **Q.2**: Retrieve the `ID`, `City`, and `State_Region` of all applicants who are currently in Myanmar.
- **Q.3**: Identify all the distinct values in the `Selected` column.
- **Q.4**: Retrieve the `ID` of participants who are substitute (`Selected = 'Waiting List'`) and are located in Myanmar.
- **Q.5**: Retrieve the `ID` of participants who does not provide the `Gender` information. 
- **Q.6**: Group the participants by State_Region and count the number of participants in each state. Sort the results in ascending order based on the count.
- **Q.7**: Group the applicants by gender and `count` the number of applicants in each group. Sort the results in descending order based on the `count`.
- **Q.8**: Retrieve the `ID` of participants who are gen `z`. The year of birth (BOD) is between 1997 and 2000.
- **Q.9**: Calculate the `age` of each participant based on their year of birth and include the result in the output. 
- **Q.10**:Compute statistical values (such as the mean, minimum, maximum, and standard deviation) of `age`.    

In [2]:
db_path = './database/mmdt.db3'

query = "SELECT count(*) FROM participants;"
df = pd.read_sql_query(query, f"sqlite:///{db_path}")

df


Unnamed: 0,count(*)
0,100


In [3]:
query = "SELECT ID, City, State_Region, Country FROM participants WHERE Country LIKE 'Myanmar%' LIMIT 10;"
df = pd.read_sql_query(query, f"sqlite:///{db_path}")

df

Unnamed: 0,ID,City,State_Region,Country
0,mmdt2024.001,Mandalay,Mandalay,Myanmar
1,mmdt2024.002,Yangon,Yangon,Myanmar
2,mmdt2024.003,Taungoo,Bago,Myanmar
3,mmdt2024.005,Yangon,Yangon,Myanmar
4,mmdt2024.007,Mandalay,Mandalay,Myanmar
5,mmdt2024.009,Taunggyi,Others,Myanmar
6,mmdt2024.012,Yangon,Yangon,Myanmar
7,mmdt2024.014,Mandalay,Mandalay,Myanmar
8,mmdt2024.015,Yangon,Yangon,Myanmar
9,mmdt2024.016,Yangon,Yangon,Myanmar


In [4]:
query = "SELECT DISTINCT selected FROM participants;"
df = pd.read_sql_query(query, f"sqlite:///{db_path}")

df

Unnamed: 0,Selected
0,Yes
1,waiting list
2,


In [5]:
query = "SELECT ID, country FROM participants WHERE selected NOT LIKE '%Yes%' OR selected is NULL;"
df = pd.read_sql_query(query, f"sqlite:///{db_path}")

df

Unnamed: 0,ID,Country
0,mmdt2024.006,Outside Myanmar
1,mmdt2024.031,Outside Myanmar
2,mmdt2024.071,Myanmar
3,mmdt2024.082,
4,mmdt2024.084,Outside Myanmar
5,mmdt2024.085,


In [6]:
query = "SELECT ID, country FROM participants WHERE selected LIKE '%waiting list%' and country = 'Myanmar';"
df = pd.read_sql_query(query, f"sqlite:///{db_path}")

df

Unnamed: 0,ID,Country
0,mmdt2024.071,Myanmar


In [7]:
query = "SELECT ID, gender FROM participants WHERE gender is NULL;"
df = pd.read_sql_query(query, f"sqlite:///{db_path}")
df

Unnamed: 0,ID,Gender
0,mmdt2024.082,
1,mmdt2024.085,


In [8]:
query = "SELECT state_region, count(*) as number FROM participants GROUP BY state_region ORDER BY number DESC;"
df = pd.read_sql_query(query, f"sqlite:///{db_path}")
df

Unnamed: 0,State_Region,number
0,Yangon,50
1,Others,22
2,Mandalay,17
3,Shan,2
4,Bago,2
5,,2
6,Rakhine,1
7,Nay Pyi Taw,1
8,Mon,1
9,Kayin,1


In [9]:
query = "SELECT gender, count(*) as number FROM participants GROUP BY gender ORDER BY number DESC;"
df = pd.read_sql_query(query, f"sqlite:///{db_path}")
df

Unnamed: 0,Gender,number
0,Female,73
1,Male,23
2,,2
3,male,1
4,Man,1


In [10]:
query = "SELECT ID, BOD as Year_born FROM participants WHERE BOD BETWEEN 1997 and 2000;"
df = pd.read_sql_query(query, f"sqlite:///{db_path}")
df

Unnamed: 0,ID,Year_born
0,mmdt2024.002,1999.0
1,mmdt2024.004,1997.0
2,mmdt2024.005,1999.0
3,mmdt2024.008,1998.0
4,mmdt2024.013,1998.0
5,mmdt2024.015,1998.0
6,mmdt2024.016,2000.0
7,mmdt2024.018,1999.0
8,mmdt2024.020,1997.0
9,mmdt2024.021,1999.0


In [11]:
query = "SELECT ID, 2024-BOD as Age FROM participants;"
df = pd.read_sql_query(query, f"sqlite:///{db_path}")
df

Unnamed: 0,ID,Age
0,mmdt2024.001,32.0
1,mmdt2024.002,25.0
2,mmdt2024.003,38.0
3,mmdt2024.004,27.0
4,mmdt2024.005,25.0
...,...,...
95,mmdt2024.096,25.0
96,mmdt2024.097,24.0
97,mmdt2024.098,27.0
98,mmdt2024.099,25.0


In [12]:
query = "SELECT AVG(2024-BOD) as Mean_Age, MAX(2024-BOD) as Max_Age, MIN(2024-BOD) as Min_Age FROM participants;"
df = pd.read_sql_query(query, f"sqlite:///{db_path}")
df

Unnamed: 0,Mean_Age,Max_Age,Min_Age
0,19.765306,47.0,-959.0
