-
Notifications
You must be signed in to change notification settings - Fork 1
/
data-prep.Rmd
414 lines (351 loc) · 19.2 KB
/
data-prep.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
---
title: "Preparing the data"
author: "Robert Schlegel"
date: "2020-02-25"
output: workflowr::wflow_html
editor_options:
chunk_output_type: console
csl: FMars.csl
bibliography: MHWflux.bib
---
```{r global_options, include = FALSE}
knitr::opts_chunk$set(fig.width = 8, fig.align = 'center',
echo = TRUE, warning = FALSE, message = FALSE,
eval = TRUE, tidy = FALSE)
```
<!-- Look at the NOAA SST instead of GLORYS -->
<!-- Compare the GORYS and OISST MHW results -->
## Introduction
Much of the code in this vignette is taken entirely or partially from the [study area prep](https://robwschlegel.github.io/MHWNWA/polygon-prep.html), the [MHW prep](https://robwschlegel.github.io/MHWNWA/sst-prep.html), and the [gridded data prep](https://robwschlegel.github.io/MHWNWA/var-prep.html) vignettes from the drivers of MHWs in the [NW Atlantic project](https://robwschlegel.github.io/MHWNWA/index.html). Because this process has already been established we are going to put it all together in this one vignette in a more streamlined manner.
All of the libraries and functions used in this vignette, and the project more broadly may be found [here](https://github.com/robwschlegel/MHWflux/blob/master/code/functions.R).
```{r satrtup}
# get everything up and running in one go
source("code/functions.R")
library(ggpubr)
library(gridExtra)
# NB: This package was removed from CRAN :(
# It may be downloaded manually at: https://cran.r-project.org/src/contrib/Archive/SDMTools/
# library(SDMTools) # For finding points within polygons
```
## Study area
A reminder of what the study area looks like. It has been cut into 6 regions, adapted from work by @Richaud2016.
```{r region-fig}
frame_base +
geom_polygon(data = NWA_coords, alpha = 0.7, size = 2,
aes(fill = region, colour = region)) +
geom_polygon(data = map_base, aes(group = group))
```
## Pixels per region
In this study it was decided to use the higher resolution 1/12th degree GLORYS data. This means we will need to re-calculate which pixels fall within which region so we can later determine how to create our average SST time series per region as well as the other averaged heat flux term time series.
```{r grid-points, eval=FALSE}
# Load one GLORYS file to extract the lon/lat coords
GLORYS_files <- dir("../data/GLORYS", full.names = T, pattern = "MHWflux")
GLORYS_grid <- tidync(GLORYS_files[1]) %>%
hyper_tibble() %>%
dplyr::rename(lon = longitude, lat = latitude) %>%
dplyr::select(lon, lat) %>%
unique()
# Load one ERA5 file to get the lon/lat coords
ERA5_files <- dir("../../oliver/data/ERA/ERA5/LWR", full.names = T, pattern = "ERA5")
ERA5_grid <- tidync(ERA5_files[1]) %>%
hyper_filter(latitude = dplyr::between(latitude, min(NWA_coords$lat), max(NWA_coords$lat)),
longitude = dplyr::between(longitude, min(NWA_coords$lon)+360, max(NWA_coords$lon)+360),
time = index == 1) %>%
hyper_tibble() %>%
dplyr::rename(lon = longitude, lat = latitude) %>%
dplyr::select(lon, lat) %>%
unique() %>%
mutate(lon = lon-360)
# Function for finding and cleaning up points within a given region polygon
pnts_in_region <- function(region_in, product_grid){
region_sub <- NWA_coords %>%
filter(region == region_in)
coords_in <- pnt.in.poly(pnts = product_grid[,c("lon", "lat")], poly.pnts = region_sub[,c("lon", "lat")]) %>%
filter(pip == 1) %>%
dplyr::select(-pip) %>%
mutate(region = region_in)
return(coords_in)
}
# Run the function
GLORYS_regions <- plyr::ldply(unique(NWA_coords$region), pnts_in_region,
.parallel = T, product_grid = GLORYS_grid)
saveRDS(GLORYS_regions, "data/GLORYS_regions.Rda")
ERA5_regions <- plyr::ldply(unique(NWA_coords$region), pnts_in_region,
.parallel = T, product_grid = ERA5_grid)
saveRDS(ERA5_regions, "data/ERA5_regions.Rda")
```
```{r grid-points-visual}
GLORYS_regions <- readRDS("data/GLORYS_regions.Rda")
ERA5_regions <- readRDS("data/ERA5_regions.Rda")
# Combine for visual
both_regions <- rbind(GLORYS_regions, ERA5_regions) %>%
mutate(product = c(rep("GLORYS", nrow(GLORYS_regions)),
rep("ERA5", nrow(ERA5_regions))))
# Visualise to ensure success
ggplot(NWA_coords, aes(x = lon, y = lat)) +
# geom_polygon(aes(fill = region), alpha = 0.2) +
geom_point(data = both_regions, aes(colour = region)) +
geom_polygon(data = map_base, aes(group = group), show.legend = F) +
coord_cartesian(xlim = NWA_corners[1:2],
ylim = NWA_corners[3:4]) +
labs(x = NULL, y = NULL) +
facet_wrap(~product)
```
## Average time series per region
With our pixels per region sorted we may now go about creating the average time series for each region from the GLORYS and ERA5 data. First we will load a brick of the data constrained roughly to the study area into memory before assigning the correct pixels to their regions. Once the pixels are assigned we will summarise them into one mean time series per variable per region. These mean time series are what the rest of the analyses will depend on.
The code for loading and processing the GLORYS data.
```{r GLORYS-prep, eval=FALSE}
# Set number of cores
# NB: This is very RAM heavy, be carfeul with core use
doParallel::registerDoParallel(cores = 25)
# The GLORYS file location
GLORYS_files <- dir("../data/GLORYS", full.names = T, pattern = "MHWflux")
system.time(
GLORYS_all_ts <- load_all_GLORYS_region(GLORYS_files) %>%
dplyr::arrange(region, t) %>%
mutate(cur_spd = round(sqrt(u^2 + v^2), 2),
cur_dir = round((270-(atan2(v, u)*(180/pi)))%%360))
) # 187 seconds on 25 cores
saveRDS(GLORYS_all_ts, "data/GLORYS_all_ts.Rda")
```
The code for the ERA5 data. NB: The ERA5 data are on an hourly 0.25x0.25 spatiotemporal grid. This loading process constrains them to a daily 0.25x0.25 grid.
```{r ERA5-prep, eval=FALSE}
# See the code/workflow script for the code used for ERA5 data prep
# There is too much code to run from an RMarkdown document
```
## MHWs per region
We will be using the SST values from GLORYS for calculating the MHWs and will use the standard Hobday definition with a base period of 1993-01-01 to 2018-12-25. We are using an uneven length year as the data do not quite extend to the end of December. It was decided that the increased accuracy of the climatology from the 2018 year outweighed the negative consideration of having a clim period that excludes a few days of winter.
```{r MHW-calc, eval=FALSE}
# Load the data
GLORYS_all_ts <- readRDS("data/GLORYS_all_ts.Rda")
# Calculate the MHWs
GLORYS_region_MHW <- GLORYS_all_ts %>%
dplyr::select(region:temp) %>%
group_by(region) %>%
nest() %>%
mutate(clims = map(data, ts2clm,
climatologyPeriod = c("1993-01-01", "2018-12-25")),
events = map(clims, detect_event),
cats = map(events, category, S = FALSE)) %>%
select(-data, -clims)
# Save
saveRDS(GLORYS_region_MHW, "data/GLORYS_region_MHW.Rda")
saveRDS(GLORYS_region_MHW, "shiny/GLORYS_region_MHW.Rda")
```
Ke pointed out however that it may be better to use the NOAA OISST data. The reasoning being that because we are not fully closing the heat budget with GLORYS, there is no particular benefit to using the SST data from that modelled ensemble product. Rather it would be better to use the remotely observed NOAA OISST product as this is a more direct measure of the surface temperature of the ocean. Then again, there is a lot of benefit to just using two products instead of three. Particularly considering that all of the marine variables used here come from the GLORYS product. To that end the GLORYS and OISST MHWs must be compared to see if they are markedly different. If not, we will use the GLORYS SST data.
```{r GLORYS-OISST-comp}
# Load the MHW calculations from the NOAA OISST data
OISST_region_MHW <- readRDS("../MHWNWA/data/OISST_region_MHW.Rda")
# Load the GLORYS MHW data
GLORYS_region_MHW <- readRDS("data/GLORYS_region_MHW.Rda")
# Extract the time series
OISST_MHW_clim <- OISST_region_MHW %>%
select(-cats) %>%
unnest(events) %>%
filter(row_number() %% 2 == 1) %>%
unnest(events) %>%
mutate(product = "OISST")
GLORYS_MHW_clim <- GLORYS_region_MHW %>%
select(-cats) %>%
unnest(events) %>%
filter(row_number() %% 2 == 1) %>%
unnest(events) %>%
mutate(product = "GLORYS")
MHW_clim <- rbind(OISST_MHW_clim, GLORYS_MHW_clim) %>%
mutate(anom = temp-seas)
# Extract the events
OISST_MHW_event <- OISST_region_MHW %>%
select(-cats) %>%
unnest(events) %>%
filter(row_number() %% 2 == 0) %>%
unnest(events) %>%
mutate(product = "OISST")
GLORYS_MHW_event <- GLORYS_region_MHW %>%
select(-cats) %>%
unnest(events) %>%
filter(row_number() %% 2 == 0) %>%
unnest(events) %>%
mutate(product = "GLORYS")
MHW_event <- rbind(OISST_MHW_event, GLORYS_MHW_event) %>%
mutate(month_peak = lubridate::month(date_peak, label = T),
season = case_when(month_peak %in% c("Jan", "Feb", "Mar") ~ "Winter",
month_peak %in% c("Apr", "May", "Jun") ~ "Spring",
month_peak %in% c("Jul", "Aug", "Sep") ~ "Summer",
month_peak %in% c("Oct", "Nov", "Dec") ~ "Autumn"),
season = factor(season, levels = c("Spring", "Summer", "Autumn", "Winter"))) %>%
select(-month_peak)
# Compare time series
MHW_clim_wide <- MHW_clim %>%
dplyr::select(product, region, doy, t, anom) %>%
pivot_wider(names_from = product, values_from = anom) %>%
mutate(t_diff = OISST-GLORYS)
MHW_clim_wide_monthly <- MHW_clim_wide %>%
mutate(t = round_date(t, unit = "month")) %>%
group_by(region, t) %>%
summarise(t_diff = mean(t_diff, na.rm = T))
# Plot regional anomaly comparison
ts_comp <- ggplot(data = MHW_clim_wide, aes(x = t, y = t_diff)) +
geom_line(aes(colour = region), alpha = 0.5, show.legend = F) +
geom_line(data = MHW_clim_wide_monthly, show.legend = F,
aes(colour = region)) +
geom_smooth(method = "lm", show.legend = F) +
facet_wrap(~region) +
labs(x = "Date", y = "OISST anom. - GLORYS anom.",
title = "Daily anomaly comparisons",
subtitle = paste0("Faint line shows daily differences, bold line shows monthly.",
"\nStraight blue line shows linear trend in daily differences."))
# Plot the comparison of the seasonal and threshold signals
seas_thresh_comp <- MHW_clim %>%
dplyr::select(product, region, doy, seas, thresh) %>%
unique() %>%
pivot_wider(names_from = product, values_from = c(seas, thresh)) %>%
mutate(seas_diff = seas_OISST-seas_GLORYS,
thresh_diff = thresh_OISST-thresh_GLORYS) %>%
ggplot(aes(x = doy)) +
geom_line(aes(y = seas_diff, colour = region), linetype = "solid", show.legend = F) +
geom_line(aes(y = thresh_diff, colour = region), linetype = "dashed", show.legend = F) +
facet_wrap(~region) +
labs(x = "Day-of-year (doy)", y = "OISST clim. - GLORYS clim.",
title = "Difference per day-of-year (doy)",
subtitle = paste0("Solid line shows seasonal climatology,",
"\ndashed line shows threshold."))
# Plot average doy difference histogram
doy_comp <- MHW_clim_wide %>%
group_by(region, doy) %>%
summarise(doy_diff = mean(t_diff)) %>%
ggplot(aes(x = doy_diff)) +
geom_histogram(aes(fill = region), bins = 20, show.legend = F) +
facet_wrap(~region) +
labs(x = "Mean difference (OISST - GLORYS) per doy",
title = "Distribution of mean differences per doy")
# Combine
OISST_GLORYS_ts_comp <- ggarrange(ts_comp,
ggarrange(seas_thresh_comp, doy_comp, ncol = 2, nrow = 1, align = "hv", labels = c("B", "C")),
nrow = 2, labels = "A", align = "hv")
OISST_GLORYS_ts_comp
# ggsave(plot = OISST_GLORYS_ts_comp, filename = "output/OISST_GLORYS_ts_comp.png", height = 8, width = 10)
# Compare MHW results
MHW_event_comp <- MHW_event %>%
group_by(product, region) %>%
summarise(event_count = n(),
dur = mean(duration),
int_mean = mean(intensity_mean),
int_cum_mean = mean(intensity_cumulative),
int_max = max(intensity_max),
onset = mean(rate_onset),
decline = mean(rate_decline)) %>%
ungroup() %>%
arrange(region, product) %>%
mutate_if(is.numeric, round, 2) #%>%
# pivot_wider(names_from = product, values_from = c(event_count:decline))
# tableGrob(rows = NULL)
event_count_table <- MHW_event_comp %>%
dplyr::select(product:event_count) %>%
pivot_wider(names_from = region, values_from = event_count) %>%
tableGrob(rows = NULL)
# Boxplot of key variables
box_comp <- MHW_event %>%
dplyr::select(product, region, duration, intensity_mean,
intensity_cumulative, intensity_max, rate_onset, rate_decline) %>%
pivot_longer(cols = duration:rate_decline) %>%
ggplot(aes(x = region, y = value, fill = region)) +
geom_boxplot(aes(colour = product), notch = TRUE) +
scale_colour_manual(values = c("black", "red")) +
facet_wrap(~name, scales = "free_y") +
labs(fill = "Region", colour = "Product", x = NULL, y = "Value for given facet",
title = "Boxplots showing range of values for MHWs in each region")
# box_comp
OISST_GLORYS_MHW_comp <- ggarrange(box_comp, event_count_table, ncol = 1, nrow = 2,
heights = c(8, 1), labels = c("A", "B"), align = "hv")
OISST_GLORYS_MHW_comp
# ggsave(plot = OISST_GLORYS_MHW_comp, filename = "output/OISST_GLORYS_MHW_comp.png", height = 8, width = 10)
# Seasons within regions
MHW_event_season_comp <- MHW_event %>%
group_by(product, region, season) %>%
summarise(event_count = n(),
dur = mean(duration),
int_mean = mean(intensity_mean),
int_cum_mean = mean(intensity_cumulative),
int_max = max(intensity_max),
onset = mean(rate_onset),
decline = mean(rate_decline)) %>%
ungroup() %>%
arrange(region, season, product) %>%
mutate_if(is.numeric, round, 2)
knitr::kable(MHW_event_season_comp)
# Compare top 3 events per region
MHW_event_top <- MHW_event %>%
dplyr::select(product, everything()) %>%
group_by(product, region) %>%
dplyr::top_n(3, intensity_cumulative) %>%
ungroup() %>%
arrange(region, product) %>%
dplyr::select(product, region, event_no, date_start, date_peak, date_end, duration,
intensity_mean, intensity_cumulative, intensity_max, rate_onset, rate_decline)
knitr::kable(MHW_event_top)
```
From the figures and tables output from this comparison analysis we may see that there are some larger differences than were expected. Most importantly perhaps is the the MHWs in the OISST data are more numerous, intense, and shorter in duration. It appears that the GLORYS data assimilation methodology smooths the data more than what we see in the remotely sensed SST. In the peer-reviewed write-up this difference between OISST and GLORYS smoothness will need to be discussed. I think it still best to use the GLORYS data as the SST should match more closely to the flux terms considering they are also likely smoothed more than a different more direct sensing would report. There are arguments for and against the use of the SST from GLORYS or NOAA but looking at the SOM results with the different SST products it appears that the OISST allows for more meaningful nodes. So I'm now swapping out the GORYS SST for NOAA and will see if the correlation results still hold up.
<!-- I think it still best to use the GLORYS data as the SST should match more closely to the flux terms considering they are also likely smoothed more than a different more direct sensing would report. -->
## Clims + anoms per variable
The analyses to come are going to be performed on anomaly values, not the original time series. In order to calculate the anomalies we are first going to need the climatologies for each variable. We will use the Hobday definition of climatology creation and then subtract the expected climatology from the observed values. We are again using the 1993-01-01 to 2018-12-25 base period for these calculations to ensure consistency throughout the project.
```{r clims, eval=FALSE}
# Load the data
GLORYS_all_ts <- readRDS("data/GLORYS_all_ts.Rda")
ERA5_all_ts <- readRDS("data/ERA5_all_ts.Rda")
ALL_ts <- left_join(ERA5_all_ts, GLORYS_all_ts, by = c("region", "t"))
# Calculate GLORYS clims and anoms
# Also give better names to the variables
ALL_anom <- ALL_ts %>%
dplyr::rename(lwr = msnlwrf, swr = msnswrf, lhf = mslhf,
shf = msshf, mslp = msl, sst = temp) %>%
dplyr::select(-wind_dir, -cur_dir) %>%
mutate(qnet_mld = qnet/(mld*1042*4000),
lwr_mld = lwr/(mld*1042*4000),
swr_mld = swr/(mld*1042*4000),
lhf_mld = lhf/(mld*1042*4000),
shf_mld = shf/(mld*1042*4000),
mld_1 = 1/mld) %>%
pivot_longer(cols = c(-region, -t), names_to = "var", values_to = "val") %>%
group_by(region, var) %>%
nest() %>%
mutate(clims = map(data, ts2clm, y = val, roundClm = 10,
climatologyPeriod = c("1993-01-01", "2018-12-25"))) %>%
dplyr::select(-data) %>%
unnest(cols = clims) %>%
mutate(anom = val-seas) %>%
ungroup()
# Save
saveRDS(ALL_anom, "data/ALL_anom.Rda")
saveRDS(ALL_anom, "shiny/ALL_anom.Rda")
```
## Cumulative heat flux terms
We also need to create cumulative heatflux terms as well as a few other choice variables. This is done by taking the first day during the MHW and adding the daily values together cummulatively until the end of the event. The daily values are first divided by the MLD on that day as seen above. The MLD value used to divide the daily variables accounts for the water density and specific heat constant: Q/(rho x Cp x hmld), where rho = 1042 and Cp ~= 4000. Th Qnet term calculated this way approximates the air-sea flux term.
The movement terms aren't very useful and may not be worth including as they don't really show advection. So rather one can say that the parts of the heating that aren't explained by anything else could be attributed to advection through the process of elimination. For the moment they are still left in here.
```{r cum-heat-flux, eval=FALSE}
# We're going to switch over to the NOAA OISST data for MHWs now
OISST_region_MHW <- readRDS("../MHWNWA/data/OISST_region_MHW.Rda")
OISST_MHW_clim <- OISST_region_MHW %>%
select(-cats) %>%
unnest(events) %>%
filter(row_number() %% 2 == 1) %>%
unnest(events)
ALL_anom_cum <- ALL_anom %>%
dplyr::select(region, var, t, anom) %>%
pivot_wider(id_cols = c(region, var, t), names_from = var, values_from = anom) %>%
dplyr::select(region:tcc, mslp, qnet, p_e, mld, mld_1, qnet_mld:shf_mld) %>%
# Change this line depending on GLORYS or NOAA use
left_join(OISST_MHW_clim[,c("region", "t", "event_no")], by = c("region", "t")) %>%
filter(event_no > 0) %>%
group_by(region, event_no) %>%
mutate_if(is.numeric, cumsum) %>%
ungroup() %>%
dplyr::select(region, event_no, t, everything()) %>%
pivot_longer(cols = c(-region, -event_no, -t), names_to = "var", values_to = "anom") %>%
mutate(var = paste0(var,"_cum")) %>%
dplyr::select(region, var, event_no, t, anom)
# Save
saveRDS(ALL_anom_cum, "data/ALL_anom_cum.Rda")
saveRDS(ALL_anom_cum, "shiny/ALL_anom_cum.Rda")
```
In the next vignette we will take the periods of time over which MHWs occurred per region and pair those up with the GLORYS and ERA5 data. This will be used to investigate which drivers are best related to the onset and decline of MHWs.
## References