# CASE Statements

A CASE statement allows us to map one or more conditions to a corresponding value for each condition. You start a CASE statement with the word CASE and conclude it with an END. Between those keywords, you specify each condition with the a WHEN [condition] THEN [value].

After specifying the condition-value pairs, you can have a catch-all value to default to if none of the conditions where met, which is specified in the ELSE.

In [1]:
#!pip install ipython-sql
!git clone https://github.com/thomasnield/oreilly_getting_started_with_sql.git
%load_ext sql
%sql sqlite:///oreilly_getting_started_with_sql/weather_stations.db

Cloning into 'oreilly_getting_started_with_sql'...
remote: Enumerating objects: 3, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (3/3), done.[K
remote: Total 60 (delta 0), reused 0 (delta 0), pack-reused 57[K
Unpacking objects: 100% (60/60), done.


'Connected: @oreilly_getting_started_with_sql/weather_stations.db'

In [8]:
%%sql

SELECT report_code, year, month, day, wind_speed,

CASE
  WHEN wind_speed >= 40 THEN 'HIGH'
  WHEN wind_speed >= 30 AND wind_speed < 40 THEN 'MODERATE'
  ELSE 'LOW'
END as wind_severity
FROM station_data
LIMIT 10;

 * sqlite:///oreilly_getting_started_with_sql/weather_stations.db
Done.


report_code,year,month,day,wind_speed,wind_severity
34DDA7,2002,12,21,0.2,LOW
39537B,1998,10,1,6.7,LOW
C3C6D5,2001,5,18,4.3,LOW
145150,2007,10,14,2.5,LOW
EF616A,1967,7,29,1.2,LOW
1F8A7B,1953,6,18,3.6,LOW
D028D8,1981,6,27,3.0,LOW
C74611,1978,2,5,13.3,LOW
737090,1962,8,14,5.1,LOW
C5C66E,2006,10,15,1.7,LOW


## Grouping CASE Statements

When you create CAST statements and group them, you can create some very powerful transformations. Converting values based on one or more conditions before aggregating them gives us even more possibilities to slice data in interesting ways. 

In [9]:
%%sql

SELECT year,

CASE
  WHEN wind_speed >= 40 THEN 'HIGHT'
  WHEN wind_speed >= 40 THEN 'MODERATE'
  ELSE 'LOW'
END as wind_seversity,

COUNT(*) as record_count

FROM station_data
GROUP BY 1, 2

LIMIT 10;

 * sqlite:///oreilly_getting_started_with_sql/weather_stations.db
Done.


year,wind_seversity,record_count
1930,LOW,5
1932,LOW,3
1933,LOW,6
1935,LOW,2
1936,LOW,18
1937,LOW,23
1938,LOW,13
1939,LOW,9
1940,LOW,26
1941,LOW,42


## The "Zero/Null" CASE Trick

You can use tricks with the CASE statement. One simeple but useful pattern is the "zero/null" CASE trick. This allows you to apply different filters for different aggregate values, all in a single SELECT query.

In [10]:
%%sql

SELECT year, month,

round(SUM(CASE WHEN tornado = 1 THEN precipitation ELSE 0 END),2) as tornado_precipitation,

round(SUM(CASE WHEN tornado = 0 THEN precipitation ELSE 0 END),2) as non_tornado_precipitation

FROM station_data
GROUP BY year, month
LIMIT 10;

 * sqlite:///oreilly_getting_started_with_sql/weather_stations.db
Done.


year,month,tornado_precipitation,non_tornado_precipitation
1930,6,0.0,0.0
1930,10,0.0,
1932,3,0.0,0.0
1933,3,0.0,0.0
1933,7,0.0,
1935,7,0.0,0.0
1936,8,0.0,0.64
1936,9,0.0,0.0
1936,10,0.0,0.27
1936,11,0.0,0.06


The CASE statement can do an impressive amount of work, especially in complex aggregation task. By leverageing a condition to make a value 0 if the condition is not met, we effectively ignore that value and exclude it from the SUM (since adding 0 has no impact).

You could so a similar calculation with MIN or MAX operations, and us a null instead of 0 to make sure values with certain coinditon are never considered:

In [20]:
%%sql
SELECT year,

MAX(CASE WHEN tornado = 0 THEN precipitation ELSE NULL END) as max_non_tornado_precipitation,

MAX(CASE WHEN tornado = 1 THEN precipitation ELSE NULL END) as max_tornado_precipitation
FROM station_data
WHERE year >= 1990
GROUP BY year
LIMIT 10;

 * sqlite:///oreilly_getting_started_with_sql/weather_stations.db
Done.


year,max_non_tornado_precipitation,max_tornado_precipitation
1990,2.48,0.59
1991,2.36,1.93
1992,1.5,1.51
1993,1.18,2.13
1994,1.26,1.16
1995,0.91,0.35
1996,3.31,0.68
1997,1.18,0.08
1998,1.22,0.2
1999,2.64,0.25


Just like the WHERE statement, you can use any Boolean expression in a CASE statement, in cluding function and AND, OR, and NOT statements. The following query will find the avarage temperatures by month when rain/hail was present versus not present after the year 2000:

In [22]:
%%sql

SELECT month, 

round(AVG(CASE WHEN rain OR hail THEN temperature ELSE null END),2) as avg_precipitation_temp,

round(AVG(CASE WHEN NOT (rain OR hail) THEN temperature ELSE null END),2) as avg_non_precipitation_temp

FROM station_data
WHERE year > 2000
GROUP BY month
LIMIT 10;

 * sqlite:///oreilly_getting_started_with_sql/weather_stations.db
Done.


month,avg_precipitation_temp,avg_non_precipitation_temp
1,35.62,41.79
2,33.8,38.9
3,46.61,49.23
4,49.03,52.33
5,55.9,58.91
6,55.4,64.85
7,66.98,70.02
8,66.68,67.89
9,60.66,62.4
10,53.01,56.36
