-
Notifications
You must be signed in to change notification settings - Fork 2
/
09.1 Window LAG, COVID 19.sql
135 lines (102 loc) · 4.06 KB
/
09.1 Window LAG, COVID 19.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
/*
Window LAG - COVID-19 Data
Notes on the data: This data was assembled based on work done by Rodrigo Pombo based on John Hopkins University, based on World Health Organisation.
The data was assembled 21st April 2020 - there are no plans to keep this data set up to date.
*/
/*
#1) Introducing the covid table
The example uses a WHERE clause to show the cases in 'Italy' in March.
Q. Modify the query to show data from Spain.
*/
SELECT name, DAY(whn) day_, confirmed, deaths, recovered
FROM covid
WHERE (name = 'Spain') AND (MONTH(whn) = 3)
ORDER BY whn;
/*
#2) Introducing the LAG function
The LAG function is used to show data from the preceding row or the table.
When lining up rows the data is partitioned by country name and ordered by the data whn.
That means that only data from Italy is considered.
Q. Modify the query to show confirmed for the day before.
*/
SELECT name, DAY(whn) day_, confirmed,
LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY whn) confirmed_yday
FROM covid
WHERE (name = 'Italy') AND (MONTH(whn) = 3)
ORDER BY whn;
/*
#3) Number of new cases
The number of confirmed case is cumulative - but we can use LAG to recover the number of new cases reported for each day.
Q. Show the number of new cases for each day, for Italy, for March.
*/
SELECT name, DAY(whn) day_,
(confirmed - LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY whn)) AS day_count
FROM covid
WHERE (name = 'Italy') AND (MONTH(whn) = 3)
ORDER BY whn;
/*
#4) Weekly changes
The data gathered are necessarily estimates and are inaccurate.
However by taking a longer time span we can mitigate some of the effects.
You can filter the data to view only Monday's figures WHERE WEEKDAY(whn) = 0.
Q. Show the number of new cases in Italy for each week - show Monday only.
*/
SELECT name, DATE_FORMAT(whn,'%Y-%m-%d') date_,
(confirmed - LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY whn)) AS week_count
FROM covid
WHERE (name = 'Italy') AND (WEEKDAY(whn) = 0)
ORDER BY whn;
/*
#5) LAG using a JOIN
You can JOIN a table using DATE arithmetic.
This will give different results if data is missing.
Q. Show the number of new cases in Italy for each week - show Monday only.
*/
SELECT tw.name, DATE_FORMAT(tw.whn,'%Y-%m-%d') date_,
(tw.confirmed - lw.confirmed) week_count
FROM covid tw LEFT JOIN covid lw ON
(DATE_ADD(lw.whn, INTERVAL 1 WEEK) = tw.whn) AND (tw.name = lw.name)
WHERE (tw.name = 'Italy') AND (WEEKDAY(tw.whn) = 0)
ORDER BY tw.whn;
/*
#6) RANK()
The query shown shows the number of confirmed cases together with the world ranking for cases.
United States has the highest number, Spain is number 2...
Notice that while Spain has the second highest confirmed cases, Italy has the second highest number of deaths due to the virus.
Q. Include the ranking for the number of deaths in the table.
*/
SELECT name, confirmed,
RANK() OVER (ORDER BY confirmed DESC) rnk_confirmed,
deaths,
RANK() OVER (ORDER BY deaths DESC) rnk_deaths
FROM covid
WHERE whn = '2020-04-20'
ORDER BY confirmed DESC;
/*
#7) Infection rate
The query shown includes a JOIN to the world table so we can access the total population of each country and calculate infection rates (in cases per 100,000).
Q. Show the infect rate ranking for each country.
Only include countries with a population of at least 10 million.
*/
SELECT world.name,
ROUND(100000*confirmed/population,0) rate_per_100K,
RANK() OVER (ORDER BY rate_per_100K DESC) rnk
FROM covid JOIN world ON covid.name = world.name
WHERE (whn = '2020-04-20') AND (population > 10000000)
ORDER BY population DESC;
/*
#8) Turning the corner
Q. For each country that has had at least 1000 new cases in a single day, show the date of the peak number of new cases.
*/
WITH temp1 AS (
SELECT *, (confirmed - LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY whn)) day_count
FROM covid
), temp2 AS (
SELECT name, MAX(day_count) peak_cases
FROM temp1
GROUP BY name
HAVING peak_cases > 1000
)
SELECT temp2.name, DATE_FORMAT(whn, '%Y-%m-%d') date_, peak_cases
FROM temp2 LEFT JOIN temp1 ON (temp2.name = temp1.name) AND (temp2.peak_cases = temp1.day_count)
ORDER BY date_;