# 04. Case statement

A **CASE statement** allows us to map one or more conditions to a corresponding value for each condition. You start a CASE statement with the word CASE and conclude it with an END . Between those keywords, you specify each condition with a WHEN [condition] THEN [value] , where the [condition] and the corresponding [value] are supplied by you. After specifying the condition–value pairs, you can have a catch-all value to default to if none of the conditions were met, which is specified in the ELSE.


In [2]:
%load_ext sql
%config SqlMagic.autocommit=False
%config SqlMagic.autolimit=20
%config SqlMagic.displaylimit=20
%sql postgresql://user-pengfei:gv8eba5xmsw4kt2uk1mn@postgresql-124499/test

Below query is an example on how we categorize freight into price_level categories, where any speed greater than 60 is 'Expansive' , 40 to 60 is 'MODERATE' , and less than 40 is 'Cheap'

In [7]:
%%sql
select order_id, freight, 
case
   when freight > 60 then 'Expansive'
   when freight >=40 and freight <=60 then 'MODERATE'
   else 'Cheap'
end 
as price_level
from orders
limit 5;

 * postgresql://user-pengfei:***@postgresql-124499/test
5 rows affected.


order_id,freight,price_level
10248,32.38,Cheap
10249,11.61,Cheap
10250,65.83,Expansive
10251,41.34,MODERATE
10252,51.3,MODERATE


## 4.1 Order matters in case statement 

We can actually omit the `and freight <=60` condition in the second when clause. Because **the sql parser processes a CASE statement from top to bottom, and the first condition it finds true is the one it uses (and it will stop evaluating subsequent conditions)**. So if we have a record with a freight value of 83, we can be certain it will be evaluated as 'HIGH'. Although it is greater than 40, it will not be assigned 'MODERATE' because it will not get to that point. 

As a result below query will output the same result as the above one

In [8]:
%%sql
select order_id, freight, 
case
   when freight > 60 then 'Expansive'
   when freight >=40 then 'MODERATE'
   else 'Cheap'
end 
as price_level
from orders
limit 5;

 * postgresql://user-pengfei:***@postgresql-124499/test
5 rows affected.


order_id,freight,price_level
10248,32.38,Cheap
10249,11.61,Cheap
10250,65.83,Expansive
10251,41.34,MODERATE
10252,51.3,MODERATE


## 4.2 Group by the output of case operator

The output of case can be used in the group by operator, because they are not aggregated results. Below query shows an example on group by **price_level**

Although we can access the value of the case statement, but in some database server(e.g. sqlite), the alias "price_level" can not be used as reference in the group by. **We can only use the position index of the case operator as reference in group by operator . Note the index of the position starts by 1**.

But in postgresql, we can use both. 

In below query, we use directly the alias "price_level"

In [15]:
%%sql
select ship_via, 
case
   when freight > 60 then 'Expansive'
   when freight >=40 then 'MODERATE'
   else 'Cheap'
end 
as price_level
from orders
group by ship_via, price_level
order by ship_via
limit 5;

 * postgresql://user-pengfei:***@postgresql-124499/test
5 rows affected.


ship_via,price_level
1,Expansive
1,Cheap
1,MODERATE
2,MODERATE
2,Cheap


In below query, we use index of the position to do the group by. It should return exactly the same result as the previous query

In [18]:
%%sql
select ship_via, 
case
   when freight > 60 then 'Expansive'
   when freight >=40 then 'MODERATE'
   else 'Cheap'
end 
as price_level
from orders
group by ship_via, 2
order by ship_via
limit 5;

 * postgresql://user-pengfei:***@postgresql-124499/test
5 rows affected.


ship_via,price_level
1,Expansive
1,Cheap
1,MODERATE
2,MODERATE
2,Cheap


## 4.3 Use column value in then

In above example, in the then value, we always used a given value. We can also use other column value here. Suppose for differnt shiping companies, the gain of freight is different (e.g. for 1->0.05, 2->0.07, 3-> 0.08).  


In [22]:
%%sql
select order_id, ship_via, freight, 
case
  when ship_via = 1 then freight*0.05
  when ship_via = 2 then freight*0.07
  when ship_via = 3 then freight*0.08
end
as gain
from orders;

 * postgresql://user-pengfei:***@postgresql-124499/test
830 rows affected.


order_id,ship_via,freight,gain
10248,3,32.38,2.590400085449218
10249,1,11.61,0.5804999828338623
10250,2,65.83,4.608100128173828
10251,1,41.34,2.0670000076293946
10252,2,51.3,3.5909999465942386
10253,2,58.17,4.071899871826172
10254,2,22.98,1.608599967956543
10255,3,148.33,11.866400146484375
10256,2,13.97,0.9779000186920168
10257,3,81.91,6.55280029296875


## 4.4 The “Zero/Null” CASE Trick

The “zero/null” CASE trick allows you to apply different “filters” for different aggregate values, all in a single SELECT query. For instance, if you want to aggregate the gain of each shipping company for each year. 

If you want to use where operator, you have to use three select then use two inner join to get the result. Below query shows you an example

In [36]:
%%sql

with gain_1 as (
select extract(year from shipped_date),
sum(freight*0.05) as ship_1_gain
from orders
where ship_via=1
group by extract(year from shipped_date)),

gain_2 as (
select extract(year from shipped_date),
sum(freight*0.07) as ship_2_gain
from orders
where ship_via=2
group by extract(year from shipped_date)),

gain_3 as (
select extract(year from shipped_date),
sum(freight*0.08) as ship_3_gain
from orders
where ship_via=3
group by extract(year from shipped_date))

select g1.date_part as year, ship_1_gain, ship_2_gain, ship_3_gain 
from gain_1 as g1
join gain_2 as g2
on g1.date_part=g2.date_part
join gain_3 as g3
on g1.date_part=g3.date_part;

 * postgresql://user-pengfei:***@postgresql-124499/test
3 rows affected.


year,ship_1_gain,ship_2_gain,ship_3_gain
1996.0,102.63749929927287,262.4069010525942,293.40159952878946
1998.0,258.1645004957914,842.2848009056969,410.08239974975584
1997.0,440.9559995688498,824.2814997290077,925.5640021991732


If we use the “zero/null” CASE trick, we can turn the above code in to a query much simpler

In [38]:
%%sql

select extract(year from shipped_date) as year,
sum(case when ship_via=1 then freight*0.05 else 0 end ) as ship_1_gain,
sum(case when ship_via=2 then freight*0.07 else 0 end ) as ship_2_gain,
sum(case when ship_via=3 then freight*0.08 else 0 end ) as ship_3_gain
from orders
where shipped_date is not null
group by extract(year from shipped_date);

 * postgresql://user-pengfei:***@postgresql-124499/test
3 rows affected.


year,ship_1_gain,ship_2_gain,ship_3_gain
1996.0,102.63749929927287,262.4069010525942,293.40159952878946
1998.0,258.1645004957914,842.2848009056969,410.08239974975584
1997.0,440.9559995688498,824.2814997290077,925.5640021991732


Note, this case trick applies to all aggregation operators such as min, max, count, etc.

## 4.5 Multiple boolean condition in the case operator

You can use any **Boolean expressions(single/multiple) in a CASE statement, including functions and AND , OR , and NOT statements**. The following query will find the total shipped order for each company that orders are shipped to France


In [40]:
%%sql
select ship_via, 
count(case when (ship_via = 1) and (ship_country='France') then 1 else 0 end) as ship_1_count,
count(case when (ship_via = 2) and (ship_country='France') then 1 else 0 end) as ship_2_count,
count(case when (ship_via = 3) and (ship_country='France') then 1 else 0 end) as ship_3_count
from orders
group by ship_via;

 * postgresql://user-pengfei:***@postgresql-124499/test
3 rows affected.


ship_via,ship_1_count,ship_2_count,ship_3_count
1,249,249,249
3,255,255,255
2,326,326,326
