# Advanced Query Techniques of CASE and Subquery
The **CASE** expression evaluates a list of conditions and returns an expression based on the result of the evaluation. The CASE expression is similar to the IF-THEN-ELSE statement in other programming languages. You can use the CASE statement in any clause or statement that accepts a valid expression. For example, you can use the CASE statement in clauses such as WHERE, ORDER BY, HAVING, IN, SELECT and statements such as SELECT, UPDATE, and DELETE.

A **subquery**, simply put, is a query written as a part of a bigger statement. Think of it as a SELECT statement inside another one. The result of the inner SELECT can then be used in the outer query.

In this notebook, we put these two query techniques together to calculate seasonal runoff from year-month data in the table of rch.

In [15]:
import mysql.connector as sql
import pandas as pd
import os

In [16]:
connection = sql.connect(
    host = os.environ.get('mysql_host'),
    user = os.environ.get('mysql_user'),
    password = os.environ.get('mysql_password')
)

cursor = connection.cursor()

In [17]:
pd.read_sql_query("""
    SHOW TABLES
    FROM world""",
    connection)

Unnamed: 0,Tables_in_world
0,city
1,country
2,countrylanguage


In [18]:
pd.read_sql_query("""
    DESCRIBE world.country""",
    connection)

Unnamed: 0,Field,Type,Null,Key,Default,Extra
0,Code,b'char(3)',NO,PRI,b'',
1,Name,b'char(52)',NO,,b'',
2,Continent,"b""enum('Asia','Europe','North America','Africa...",NO,,b'Asia',
3,Region,b'char(26)',NO,,b'',
4,SurfaceArea,"b'float(10,2)'",NO,,b'0.00',
5,IndepYear,b'smallint',YES,,,
6,Population,b'int',NO,,b'0',
7,LifeExpectancy,"b'float(3,1)'",YES,,,
8,GNP,"b'float(10,2)'",YES,,,
9,GNPOld,"b'float(10,2)'",YES,,,


## 1. Check the country table

In [19]:
pd.read_sql_query("""
    SELECT *
    FROM world.country
    LIMIT 5
    """,
    connection)

Unnamed: 0,Code,Name,Continent,Region,SurfaceArea,IndepYear,Population,LifeExpectancy,GNP,GNPOld,LocalName,GovernmentForm,HeadOfState,Capital,Code2
0,ABW,Aruba,North America,Caribbean,193.0,,103000,78.4,828.0,793.0,Aruba,Nonmetropolitan Territory of The Netherlands,Beatrix,129,AW
1,AFG,Afghanistan,Asia,Southern and Central Asia,652090.0,1919.0,22720000,45.9,5976.0,,Afganistan/Afqanestan,Islamic Emirate,Mohammad Omar,1,AF
2,AGO,Angola,Africa,Central Africa,1246700.0,1975.0,12878000,38.3,6648.0,7984.0,Angola,Republic,JosÃ© Eduardo dos Santos,56,AO
3,AIA,Anguilla,North America,Caribbean,96.0,,8000,76.1,63.2,,Anguilla,Dependent Territory of the UK,Elisabeth II,62,AI
4,ALB,Albania,Europe,Southern Europe,28748.0,1912.0,3401200,71.6,3205.0,2500.0,ShqipÃ«ria,Republic,Rexhep Mejdani,34,AL


## 2. Calculate decades changes
There are two key steps:

1. Use the CASE and Subquery to convert years to named decades;
2. Calculate decade mean with aggregate functions on groups.

In addition, we also use another filter keyword of BETWEEN to span year into decades.
I will check the 20th century.

In [37]:
pd.read_sql_query("""
    SELECT c.Name, c.Continent, c.Region, c.Decade, AVG(c.Population)
    FROM (
        SELECT w.Name, w.Continent, w.Region,
        CASE
            WHEN (w.IndepYear) between 1900 AND 1909 THEN '00s'
            WHEN (w.IndepYear) between 1910 AND 1919 THEN '10s'
            WHEN (w.IndepYear) between 1920 AND 1929 THEN '20s'
            WHEN (w.IndepYear) between 1930 AND 1939 THEN '30s'
            WHEN (w.IndepYear) between 1940 AND 1949 THEN '40s'
            WHEN (w.IndepYear) between 1950 AND 1959 THEN '50s'
            WHEN (w.IndepYear) between 1960 AND 1969 THEN '60s'
            WHEN (w.IndepYear) between 1970 AND 1979 THEN '70s'
            WHEN (w.IndepYear) between 1980 AND 1989 THEN '80s'
            WHEN (w.IndepYear) between 1990 AND 1999 THEN '90s'
            ELSE 'Other'
        END Decade,
        w.Population
        FROM world.country w) c
        GROUP BY c.Name, c.Continent, c.Region, c.Decade
    """,
    connection)

Unnamed: 0,Name,Continent,Region,Decade,AVG(c.Population)
0,Aruba,North America,Caribbean,Other,103000.0
1,Afghanistan,Asia,Southern and Central Asia,10s,22720000.0
2,Angola,Africa,Central Africa,70s,12878000.0
3,Anguilla,North America,Caribbean,Other,8000.0
4,Albania,Europe,Southern Europe,10s,3401200.0
...,...,...,...,...,...
234,Yemen,Asia,Middle East,10s,18112000.0
235,Yugoslavia,Europe,Southern Europe,10s,10640000.0
236,South Africa,Africa,Southern Africa,10s,40377000.0
237,Zambia,Africa,Eastern Africa,60s,9169000.0


## Summary
Sometimes, we may need construct complicated requires that go beyond a table join or basic SELECT query. For example, we might need to write a query that uses the results of other queries as inputs (i.e., SUBQUERY). Or we might need to reclassify numerical values into categories before counting them (i.e., CASE).

In this notebook, we explored a collection of SQL functions and options essential for solving more complex problems. Now we can add subqueries in multiple locations to provide finer control over filtering or preprocessing data before analyzing it in a main query.

# References
- [Chonghua Yin notebook](https://github.com/royalosyin/Practice-SQL-with-SQLite-and-Jupyter-Notebook/blob/master/ex09-Advanced%20Query%20Techniques%20of%20CASE%20and%20Subquery.ipynb)