-
Notifications
You must be signed in to change notification settings - Fork 1
/
data-prep.Rmd
336 lines (277 loc) · 15.4 KB
/
data-prep.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
---
title: "Preparing the data"
author: "Robert Schlegel"
date: "2020-02-25"
output: workflowr::wflow_html
editor_options:
chunk_output_type: console
csl: FMars.csl
bibliography: MHWflux.bib
---
```{r global_options, include = FALSE}
knitr::opts_chunk$set(fig.width = 8, fig.align = 'center',
echo = TRUE, warning = FALSE, message = FALSE,
eval = TRUE, tidy = FALSE)
```
## Introduction
The purpose of this vignette is to walk through the steps taken to prepare the data from the NOAA OISST, ERA5, and GLORYS products for the following analyses. The statistical analyses may be found in the [MHWs vs. heat flux]() vignette, and the SOM analysis may be found in the [SOM]() vignette.
All of the libraries and functions used in this vignette, and the project more broadly, may be found [here](https://github.com/robwschlegel/MHWflux/blob/master/code/functions.R).
```{r satrtup}
# get everything up and running in one go
source("code/functions.R")
library(ggpubr)
library(gridExtra)
library(FNN)
```
## Study area
A reminder of what the study area looks like. It has been cut into 6 regions, adapted from work by @Richaud2016.
```{r region-fig}
frame_base +
geom_polygon(data = NWA_coords, alpha = 0.7, size = 2,
aes(fill = region, colour = region)) +
geom_polygon(data = map_base, aes(group = group))
```
## Pixels per region
In this study it was decided to use the higher resolution 1/12th degree GLORYS data. This means we will need to re-calculate which pixels fall within which region so we can later determine how to create our average SST time series per region as well as the other averaged heat flux term time series.
For the SOM we will want to have everything on the same 1/4 degree grid. For the ERA5 data this is an easy shift to make, but for the GLORYS 1/12 degree data we want to ensure that these smaller pixels are being matched correctly to the nearest centre of an OISST pixels. To do this we will create an index of pairings using a fastest nearest neighbour search.
```{r grid-points, eval=FALSE}
# Load one OISST file to extract the lon/lat coords
OISST_grid <- readRDS("../data/OISST/daily/1982/daily.1982-01-01.Rda") %>%
dplyr::select(lon, lat) %>%
filter(lon >= NWA_corners[1], lon <= NWA_corners[2],
lat >= NWA_corners[3], lat <= NWA_corners[4]) %>%
unique() %>%
mutate(OISST_index = 1:n()) # For merging to GLORYS grid
saveRDS(OISST_grid[,1:2], "metadata/OISST_grid.Rda")
# Load one high-res GLORYS file to extract the lon/lat coords
GLORYS_grid_base <- tidync(GLORYS_files[1]) %>%
hyper_tibble() %>%
dplyr::rename(lon = longitude, lat = latitude) %>%
filter(lon >= NWA_corners[1], lon <= NWA_corners[2],
lat >= NWA_corners[3], lat <= NWA_corners[4]) %>%
dplyr::select(lon, lat) %>%
unique()
# Add on the nearest OISST coords
GLORYS_grid <- GLORYS_grid_base %>%
mutate(OISST_index = as.vector(knnx.index(as.matrix(OISST_grid[,c("lon", "lat")]),
as.matrix(GLORYS_grid_base[,c("lon", "lat")]), k = 1))) %>%
left_join(OISST_grid, by = "OISST_index") %>%
dplyr::select(-OISST_index) %>%
dplyr::rename(lon = lon.x, lat = lat.x, lon_OISST = lon.y, lat_OISST = lat.y)
saveRDS(GLORYS_grid, "metadata/GLORYS_grid.Rda")
# Load one ERA5 file to get the lon/lat coords
ERA5_grid <- tidync(ERA5_lhf_files[1]) %>%
hyper_filter(latitude = dplyr::between(latitude, min(NWA_coords$lat), max(NWA_coords$lat)),
longitude = dplyr::between(longitude, min(NWA_coords$lon)+360, max(NWA_coords$lon)+360),
time = index == 1) %>%
hyper_tibble() %>%
dplyr::rename(lon = longitude, lat = latitude) %>%
dplyr::select(lon, lat) %>%
mutate(lon = lon-360) %>% # Change back to +-180 scale
mutate(lon = lon+0.125, lat = lat-0.125) %>% # Regrid to match OISST coords
filter(lon >= NWA_corners[1], lon <= NWA_corners[2],
lat >= NWA_corners[3], lat <= NWA_corners[4]) %>%
unique()
saveRDS(ERA5_grid, "metadata/ERA5_grid.Rda")
# Function for finding and cleaning up points within a given region polygon
points_in_region <- function(region_in, product_grid){
region_sub <- NWA_coords %>%
filter(region == region_in)
coords_in <- product_grid %>%
mutate(in_grid = sp::point.in.polygon(point.x = product_grid[["lon"]], point.y = product_grid[["lat"]],
pol.x = region_sub[["lon"]], pol.y = region_sub[["lat"]])) %>%
filter(in_grid >= 1) %>%
mutate(region = region_in) %>%
dplyr::select(lon, lat, region)
return(coords_in)
}
# Run the function
OISST_regions <- plyr::ldply(unique(NWA_coords$region), points_in_region,
.parallel = T, product_grid = OISST_grid)
saveRDS(OISST_regions, "metadata/OISST_regions.Rda")
GLORYS_regions <- plyr::ldply(unique(NWA_coords$region), points_in_region,
.parallel = T, product_grid = GLORYS_grid)
saveRDS(GLORYS_regions, "metadata/GLORYS_regions.Rda")
ERA5_regions <- plyr::ldply(unique(NWA_coords$region), points_in_region,
.parallel = T, product_grid = ERA5_grid)
saveRDS(ERA5_regions, "metadata/ERA5_regions.Rda")
```
```{r grid-points-visual}
OISST_regions <- readRDS("metadata/OISST_regions.Rda")
GLORYS_regions <- readRDS("metadata/GLORYS_regions.Rda")
ERA5_regions <- readRDS("metadata/ERA5_regions.Rda")
# Combine for visual
ALL_regions <- rbind(OISST_regions, GLORYS_regions, ERA5_regions) %>%
mutate(product = c(rep("OISST", nrow(OISST_regions)),
rep("GLORYS", nrow(GLORYS_regions)),
rep("ERA5", nrow(ERA5_regions))))
# Visualise to ensure success
ggplot(NWA_coords, aes(x = lon, y = lat)) +
geom_polygon(data = map_base, aes(group = group), show.legend = F) +
geom_point(data = ALL_regions, aes(colour = region), size = 0.1) +
coord_cartesian(xlim = NWA_corners[1:2], ylim = NWA_corners[3:4]) +
labs(x = NULL, y = NULL) +
facet_wrap(~product)
ggsave(filename = "output/NWA_product_regions.pdf", height = 5, width = 18)
```
With our pixels per region determined we may now go about creating the average time series for each region from the OISST, GLORYS, and ERA5 data.
## OISST data processing
Up first for processing are the OISST data. The full brick of data within the study area will be loaded first. Then the pixels within the regions are pulled out and individual time series are made from each and saved. Finally the climatologies and anomalies for each pixel in the study area are calculated and saved. The region clims+anoms are calculated for all of the data at the same time near the end of this vignette.
```{r OISST_prep, eval=FALSE}
# The files with data in the study area
OISST_files_sub <- data.frame(files = OISST_files,
lon = c(seq(0.125, 179.875, by = 0.25),
seq(-179.875, -0.125, by = 0.25))) %>%
filter(lon >= min(NWA_coords$lon), lon <= max(NWA_coords$lon)) %>%
mutate(files = as.character(files))
# Load the full grid for the study area
system.time(
OISST_all <- plyr::ldply(OISST_files_sub$files, .fun = load_OISST, .parallel = TRUE)
) # 16 seconds
# test visual
OISST_all %>%
filter(t == "1993-01-01") %>%
ggplot(aes(x = lon, y = lat)) +
geom_raster(aes(fill = temp))
# Create the region time series and save
system.time(
OISST_all_ts <- OISST_all %>%
right_join(OISST_regions, by = c("lon", "lat")) %>%
group_by(region, t) %>%
summarise(temp = mean(temp, na.rm = T), .groups = "drop")
) # 14 seconds
saveRDS(OISST_all_ts, "data/OISST_all_ts.Rda")
# test visual
OISST_all_ts %>%
ggplot(aes(x = t, y = temp)) +
geom_line(aes(colour = region)) +
facet_wrap(~region)
# Calculate the clims and anoms for each pixel
system.time(
OISST_all_anom <- OISST_all %>%
pivot_longer(cols = c(-lon, -lat, -t), names_to = "var", values_to = "val") %>%
plyr::ddply(., c("lon", "lat", "var"), calc_clim_anom, .parallel = T, point_accuracy = 2)
) # 214 seconds
saveRDS(OISST_all_anom, "data/OISST_all_anom.Rda")
```
## GLORYS data processing
We are using the 1/12 degree GLORYS product for the calculations of the region time series in order to better capture the sub-mesoscale processes that may be associated with the driving of MHWs. For the broad synoptic scale data given to the SOM we will be using the 1/4 degree GLORYS product as it needs to be the same resolution as the other products being used.
```{r GLORYS-prep, eval=FALSE}
# Set number of cores
# NB: This is very RAM heavy, be careful with core use
registerDoParallel(cores = 50)
# Process and save the region time series
system.time(
GLORYS_all_ts <- plyr::ldply(GLORYS_files, load_GLORYS, .parallel = T, region = T) %>%
dplyr::arrange(region, t) %>%
mutate(cur_spd = round(sqrt(u^2 + v^2), 4),
cur_dir = round((270-(atan2(v, u)*(180/pi)))%%360))
) # 137 seconds on 25 cores
saveRDS(GLORYS_all_ts, "data/GLORYS_all_ts.Rda")
# Load and prep the GLORYS data for the entire study area
system.time(
GLORYS_all <- plyr::ldply(GLORYS_files, load_GLORYS,
.parallel = T, .paropts = c(.inorder = FALSE))
) # 200 seconds on 50 cores
gc()
# Calculates clims+anoms and save
system.time(
GLORYS_all_anom <- GLORYS_all %>%
pivot_longer(cols = c(-lon, -lat, -t), names_to = "var", values_to = "val") %>%
plyr::ddply(., c("lon", "lat", "var"), calc_clim_anom, .parallel = T,
point_accuracy = 6, .paropts = c(.inorder = FALSE))
) # 694 seconds on 50 cores
saveRDS(GLORYS_all_anom, "data/GLORYS_all_anom.Rda")
```
## ERA5 data processing
Note that the ERA5 data are on an hourly 0.25 degree spatiotemporal grid. This loading process constrains them to a daily 0.25 degree grid that matches the OISST data before finally converting them to a single time series per region.
```{r ERA5-prep, eval=FALSE}
# See the code/workflow script for the code used for ERA5 data prep
# There is too much code to run from an RMarkdown document as each variable must loaded individually
```
## MHWs per region
We will be using the SST values from GLORYS for calculating the MHWs and will use the standard Hobday definition with a base period of 1993-01-01 to 2018-12-25. We are using an uneven length year as the data do not quite extend to the end of December. It was decided that the increased accuracy of the climatology from the 2018 year outweighed the negative consideration of having a clim period that excludes a few days of winter.
```{r MHW-calc, eval=FALSE}
# Load the data
OISST_all_ts <- readRDS("data/OISST_all_ts.Rda")
# Calculate the MHWs
OISST_region_MHW <- OISST_all_ts %>%
group_by(region) %>%
nest() %>%
mutate(clims = map(data, ts2clm,
climatologyPeriod = c("1993-01-01", "2018-12-25")),
events = map(clims, detect_event),
cats = map(events, category, S = FALSE)) %>%
select(-data, -clims)
# Save
saveRDS(OISST_region_MHW, "data/OISST_region_MHW.Rda")
saveRDS(OISST_region_MHW, "shiny/OISST_region_MHW.Rda")
```
### MHW results
As this is not really the focus of the paper, I will quickly go over the MHW results here. Specifically we are looking at the summary stats for MHWs per region and season, and how they may differ between those classifying groups. The stats created here are used in the results section of the manuscript and some of the musings will be used for the discussion.
```{r MHW-results}
MHW_annual_count <- OISST_MHW_event %>%
mutate(year = year(date_peak)) %>%
group_by(year) %>%
summarise(count = n())
```
## Region clims + anoms per variable
The analyses to come are going to be performed on anomaly values, not the original time series. In order to calculate the anomalies we are first going to need the climatologies for each variable. We will use the Hobday definition of climatology creation and then subtract the expected climatology from the observed values. We are again using the 1993-01-01 to 2018-12-25 base period for these calculations to ensure consistency throughout the project.
```{r clims, eval=FALSE}
# Load the data
GLORYS_all_ts <- readRDS("data/GLORYS_all_ts.Rda")
ERA5_all_ts <- readRDS("data/ERA5_all_ts.Rda")
OISST_all_ts <- readRDS("data/OISST_all_ts.Rda")
ALL_ts <- left_join(ERA5_all_ts, GLORYS_all_ts, by = c("region", "t")) %>%
left_join(OISST_all_ts, by = c("region", "t")) %>%
filter(t <= "2018-12-31")
# Calculate GLORYS clims and anoms
# Also give better names to the variables
# Also convert the Qx terms from seconds to days by multiplying by 86,400
ALL_ts_anom <- ALL_ts %>%
dplyr::rename(lwr = msnlwrf, swr = msnswrf, lhf = mslhf,
shf = msshf, mslp = msl, sst = temp.y) %>%
dplyr::select(-wind_dir, -cur_dir, -temp.x) %>%
mutate(qnet_mld = (qnet*86400)/(mld*1024*4000),
lwr_mld = (lwr*86400)/(mld*1024*4000),
swr_mld = (swr*86400)/(mld*1024*4000),
lhf_mld = (lhf*86400)/(mld*1024*4000),
shf_mld = (shf*86400)/(mld*1024*4000),
mld_1 = 1/mld) %>%
pivot_longer(cols = c(-region, -t), names_to = "var", values_to = "val") %>%
group_by(region, var) %>%
nest() %>%
mutate(clims = map(data, ts2clm, y = val, roundClm = 10,
climatologyPeriod = c("1993-01-01", "2018-12-25"))) %>%
dplyr::select(-data) %>%
unnest(cols = clims) %>%
mutate(anom = val-seas) %>%
ungroup()
# Save
saveRDS(ALL_ts_anom, "data/ALL_ts_anom.Rda")
saveRDS(ALL_ts_anom, "shiny/ALL_ts_anom.Rda")
```
## Cumulative heat flux terms
We also need to create cumulative heatflux terms as well as a few other choice variables. This is done by taking the first day during the MHW and adding the daily values together cumulatively until the end of the event. The daily values are first divided by the MLD on that day as seen above. The MLD value used to divide the daily variables accounts for the water density and specific heat constant: Q/(rho x Cp x hmld), where rho = 1042 and Cp ~= 4000. Th Qnet term calculated this way approximates the air-sea flux term.
The movement terms aren't very useful and may not be worth including as they don't really show advection. So rather one can say that the parts of the heating that aren't explained by anything else could be attributed to advection through the process of elimination. For the moment they are still left in here.
```{r cum-heat-flux, eval=FALSE}
# We're going to switch over to the NOAA OISST data for MHWs now
ALL_ts_anom_cum <- ALL_ts_anom %>%
dplyr::select(region, var, t, anom) %>%
pivot_wider(id_cols = c("region", "t"), names_from = var, values_from = anom) %>%
dplyr::select(region:tcc, mslp, qnet, p_e, mld, mld_1, qnet_mld:shf_mld) %>%
left_join(OISST_MHW_clim[,c("region", "t", "event_no")], by = c("region", "t")) %>%
filter(event_no > 0) %>%
group_by(region, event_no) %>%
mutate_if(is.numeric, cumsum) %>%
ungroup() %>%
dplyr::select(region, event_no, t, everything()) %>%
pivot_longer(cols = c(-region, -event_no, -t), names_to = "var", values_to = "anom") %>%
mutate(var = paste0(var,"_cum")) %>%
dplyr::select(region, var, event_no, t, anom)
# Save
saveRDS(ALL_ts_anom_cum, "data/ALL_ts_anom_cum.Rda")
saveRDS(ALL_ts_anom_cum, "shiny/ALL_ts_anom_cum.Rda")
```
In the [MHWs vs. heat flux](https://robwschlegel.github.io/MHWflux/mhw-flux.html) vignette we will take the periods of time over which MHWs occurred per region and pair those up with the GLORYS and ERA5 data. This will be used to investigate which drivers are best related to the onset and decline of MHWs. In the [SOM](https://robwschlegel.github.io/MHWNWA/som.html) vignette we will look for how
## References