# SQL Working with CASEs

In [None]:
!wget https://github.com/gt-cse-6040/bootcamp/raw/main/SQL/syllabus/NYC-311-2M_small.db

In [None]:
# create a connection to the database
import sqlite3 as db
import pandas as pd

# Connect to a database (or create one if it doesn't exist)
conn_nyc = db.connect('NYC-311-2M_small.db')

## CASEs
*    https://www.sqlite.org/lang_expr.html#the_case_expression

In SQLite, the CASE statement is a powerful tool that allows you to recode variables based on specific conditions. It works similarly to an IF-THEN-ELSE structure in programming languages. By using CASE, you can create new columns, transform existing values, or recategorize data based on conditional logic directly within your SQL queries.

Some use-cases include:
*  Conditional formatting to display different outputs based on conditions, like categorizing employees into different bonus tiers.
*  Data transformation such as transforming or mapping data into different formats or categories, like converting numeric status codes into human-readable text.
*  Complex filtering such as performing complex conditional checks and/or manipulate data accordingly.


Syntax:
1. Simple CASE expression:

```sql 
CASE expression
    WHEN value1 THEN result1
    WHEN value2 THEN result2
    ...
    ELSE resultN
END
```
*  expression: The value or expression that is compared with each WHEN value.
*  value1, value2, ...: The values to compare against the expression.
*  result1, result2, ...: The corresponding result to return when a match is found.
*  ELSE resultN: The value returned if no WHEN conditions match (optional).
*  If no ELSE clause is specified and no conditions match, the result is NULL.


2. Searched CASE expression (more flexible):

```sql 
CASE
    WHEN condition1 THEN result1
    WHEN condition2 THEN result2
    ...
    ELSE resultN
END
```
*  condition1, condition2, ...: The conditions to check.
*  result1, result2, ...: The corresponding results for each condition.
*  ELSE resultN: The value returned if no condition is met (optional).
*  If no conditions match and no ELSE clause is provided, it returns NULL.

### Examples
1.  Simple CASE expression:

In [None]:
def case1_example() ->str:
    query = '''
                SELECT Agency
                , CASE Agency
                    WHEN 'HPD' THEN 1
                    WHEN 'NYPD' THEN 2
                    WHEN 'DOT' THEN 3
                    ELSE 0 
                  END top3
                FROM data
                LIMIT 20
                '''
    return query

case1_example = pd.read_sql(case1_example(),conn_nyc)
display(case1_example)

2. Searched CASE expression: You can test more complex conditions, not just equality.

In [None]:
def case2_example() ->str:
    query = '''
                SELECT Agency
                , CASE
                    WHEN Agency='HPD' THEN 1
                    WHEN Agency='NYPD' THEN 2
                    WHEN Agency='DOT' THEN 3
                    ELSE 0 
                  END top3
                FROM data
                LIMIT 20
                '''
    return query

case2_example = pd.read_sql(case2_example(),conn_nyc)
display(case2_example)

3. Nested CASE: You can nest `CASE` expressions for more complex logic

In [None]:
def case3_example() ->str:
    query = '''
                SELECT Agency
                , CASE
                    WHEN Agency LIKE '%PD' THEN
                        CASE
                            WHEN Agency='HPD' THEN 1
                            WHEN Agency='NYPD' THEN 2
                            ELSE 0
                        END
                    WHEN Agency='DOT' THEN 3
                    ELSE 0 
                    END top3
                FROM data
                LIMIT 20
                '''
    return query

case3_example = pd.read_sql(case3_example(),conn_nyc)
display(case3_example)

4. CASE without ELSE: If you omit `ELSE`, the query will return `NULL` when no conditions match

In [None]:
def case4_example() ->str:
    query = '''
                SELECT Agency
                , CASE
                    WHEN Agency='HPD' THEN 1
                    WHEN Agency='NYPD' THEN 2
                    WHEN Agency='DOT' THEN 3
                    -- ELSE 0 
                    END top3
                FROM data
                LIMIT 20
                '''
    return query

case4_example = pd.read_sql(case4_example(),conn_nyc)
display(case4_example)

## Aggregating CASEs
You can also use any of the aggregating functions e.g. `SUM()`, `MAX()`, `MIN()`, etc; along with `CASE()` to perform conditional aggregations. This is helpful when you want to aggregate values that meet specific functions or criteria within your data.

Some use-cases include:
*  Conditional aggegation to sum values based on conditions, making it ideal for generating reports and performing complex aggegations.
*  Handling multiple conditions in one query, which would otherwise require multiple `SELECT` statements or complicated joins
*  Flexibility in that you can use `SUM(CASE())` for various operations like summing based on date ranges, specific categories, thresholds, etc.

Syntax:

```sql 
SUM(
    CASE
    WHEN condition1 THEN result1
    WHEN condition2 THEN result2
    ...
    ELSE resultN
    END
)
```
*  condition1, condition2, ...: The conditions to check for each row.
*  result1, result2, ...: The corresponding results to sum for each condition.
*  ELSE resultN: The value returned if no condition is met (optional); usually 0.

1. Conditional Sum: Count the number of times a particular occurence happens. It can also be used to calculate a [Probability Mass Function](https://en.wikipedia.org/wiki/Probability_mass_function) 

In [None]:
def case5_example() ->str:
    query = '''
                SELECT
                    SUM(
                        CASE
                        WHEN Agency='HPD' THEN 1
                        ELSE 0
                        END
                        ) countHPD
                    , COUNT(*) total_count
                FROM data
                '''
    return query

case5_example = pd.read_sql(case5_example(),conn_nyc)
display(case5_example)

2. Conditional Max: Determine if something exists for a particular group

In [None]:
def case6_example() ->str:
    query = '''
                SELECT
                City
                ,MAX(
                    CASE
                        WHEN Agency='HPD' THEN 1
                        ELSE 0
                    END
                    ) HPDexists
                FROM data
                GROUP BY City
                ORDER BY 2 DESC
                '''
    return query

case6_example = pd.read_sql(case6_example(),conn_nyc)
display(case6_example)

### Example Exercise
From the `data` table for `Agency='NYPD'`, use the column `CompliantType` to count the number of times that:
*  Columns
    *  `ComplaintType`
    *  `noiseParkComplaint`:
        *  if 'Noise" is present in `ComplaintType`, then `Noise`
        *  elif "Park" is present in `ComplainType`, then `Park`
        *  else `NULL`
    *  `countoccur`: count of occurrences
*  Sort
    *  `countoccur` in descending order

In [None]:
def casecomplaints() -> str:
  return f"""
  SELECT ComplaintType
          ,CASE
                WHEN ComplaintType LIKE 'Noise%' THEN 'Noise'
                WHEN ComplaintType LIKE '%Park%' THEN 'Park'
                ELSE NULL
            END noiseParkComplaint
          ,COUNT(*) countoccur
  FROM data
  WHERE Agency='NYPD'
  GROUP BY 1,2
  ORDER BY 3 DESC
  """
casecomplaints=pd.read_sql(casecomplaints(),conn_nyc)
display(casecomplaints)