### Tutorial 2: Retrieving Data
This tutorial will teach you how to retrieve data. You will learn how to:


#### Objectives
- Perform basic `SELECT` queries
- Filter data using the `WHERE` clause
- Retrieve specific columns
- Sort and limit results
- Use various operators to refine queries
---


In [1]:
import sqlite3
import pandas as pd

In this tutorial, I am using `pandas` to read an SQLite database. The syntax of queries remains the same as with other methods. Here's how you can proceed:

#### Install Necessary Libraries
Ensure you have the required libraries installed. Use the following commands to install them if needed:
```bash
pip install pandas sqlalchemy

#### Retrieve Data, Filter Query with WHERE, Aggregate Data with GROUP BY and ORDER BY
- **Q.1**: Count the number of `participants`.  
- **Q.2**: Retrieve the `ID`, `City`, and `State_Region` of all applicants who are currently in Myanmar.
- **Q.3**: Identify all the distinct values in the `Selected` column.
- **Q.4**: Retrieve the `ID` of participants who are substitute (`Selected = 'Waiting List'`) and are located in Myanmar.
- **Q.5**: Retrieve the `ID` of participants who does not provide the `Gender` information. 
- **Q.6**: Group the participants by State_Region and count the number of participants in each state. Sort the results in ascending order based on the count.
- **Q.7**: Group the applicants by gender and `count` the number of applicants in each group. Sort the results in descending order based on the `count`.
- **Q.8**: Retrieve the `ID` of participants who are gen `z`. The year of birth (BOD) is between 1997 and 2000.
- **Q.9**: Calculate the `age` of each participant based on their year of birth and include the result in the output. 
- **Q.10**:Compute statistical values (such as the mean, minimum, maximum, and standard deviation) of `age`.    

In [11]:
db_path = '../Projects/database/mmdt.db3'
query = "SELECT COUNT(*) FROM participants"

df = pd.read_sql_query(query, f"sqlite:///{db_path}")

df

Unnamed: 0,COUNT(*)
0,100


In [12]:
query = "SELECT * FROM participants"

df = pd.read_sql_query(query, f"sqlite:///{db_path}")

df.columns

Index(['ID', 'Time', 'Selected', 'BOD', 'City', 'State_Region', 'Country',
       'Date_Leave_Country', 'Gender', 'Current_Situation', 'Type_of_Internet',
       'Device_used', 'School_Name', 'Academic_career', 'Pre_Knowledge_Data',
       'Course_Wish_Join', 'Dedicate_Learning_Time',
       'Personal_Professional_Goals', 'Reason_Right_Person',
       'Personal_Professional_Challenges', 'Others'],
      dtype='object')

In [13]:
query = """
        SELECT ID, City, State_Region 
        FROM participants 
        WHERE Country = 'Myanmar';
        """

df = pd.read_sql_query(query, f"sqlite:///{db_path}")

df

Unnamed: 0,ID,City,State_Region
0,mmdt2024.001,Mandalay,Mandalay
1,mmdt2024.002,Yangon,Yangon
2,mmdt2024.003,Taungoo,Bago
3,mmdt2024.005,Yangon,Yangon
4,mmdt2024.007,Mandalay,Mandalay
...,...,...,...
70,mmdt2024.094,Pyay,Bago
71,mmdt2024.095,Myanmar,Mandalay
72,mmdt2024.096,Pyin Oo Lwin,Yangon
73,mmdt2024.098,Yangon,Yangon


In [15]:
query = """
        SELECT DISTINCT Selected
        FROM participants;
        """

df = pd.read_sql_query(query, f"sqlite:///{db_path}")

df

Unnamed: 0,Selected
0,Yes
1,waiting list
2,
