-
Notifications
You must be signed in to change notification settings - Fork 1
/
data-prep.Rmd
197 lines (159 loc) · 8.24 KB
/
data-prep.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
---
title: "Preparing the data"
author: "Robert Schlegel"
date: "2020-02-25"
output: workflowr::wflow_html
editor_options:
chunk_output_type: console
csl: FMars.csl
bibliography: MHWNWA.bib
---
```{r global_options, include = FALSE}
knitr::opts_chunk$set(fig.width = 8, fig.align = 'center',
echo = TRUE, warning = FALSE, message = FALSE,
eval = TRUE, tidy = FALSE)
```
## Introduction
Much of the code in this vignette is taken entirely or partially from the [study area prep](https://robwschlegel.github.io/MHWNWA/polygon-prep.html), the [MHW prep](https://robwschlegel.github.io/MHWNWA/sst-prep.html), and the [gridded data prep](https://robwschlegel.github.io/MHWNWA/var-prep.html) vignettes from the drivers of MHWs in the NW Atlantic [project](https://robwschlegel.github.io/MHWNWA/index.html). Because this process has already been established we are going to put it all together in this one vignette in a more streamlined manner.
All of the libraries and functions used in this vignette, and the project more broadly may be found [here](https://github.com/robwschlegel/MHWflux/blob/master/code/functions.R).
```{r satrtup}
# get everything up and running in one go
source("code/functions.R")
library(SDMTools) # For finding points within polygons
```
## Study area
A reminder of what the study area looks like. It has been cut into 6 regions, adapted from work by @Richaud2016.
```{r region-fig}
frame_base +
geom_polygon(data = NWA_coords, alpha = 0.7, size = 2,
aes(fill = region, colour = region)) +
geom_polygon(data = map_base, aes(group = group))
```
## Pixels per region
In this study it was decided to use the higher resolution 1/12th degree GLORYS data. This means we will need to re-calculate which pixels fall within which region so we can later determine how to create our average SST time series per region as well as the other averaged heat flux term time series.
```{r grid-points, eval=FALSE}
# Load one GLORYS file to extract the lon/lat coords
GLORYS_files <- dir("../data/GLORYS", full.names = T, pattern = "MHWflux")
GLORYS_grid <- tidync(GLORYS_files[1]) %>%
hyper_tibble() %>%
dplyr::rename(lon = longitude, lat = latitude) %>%
dplyr::select(lon, lat) %>%
unique()
# Load one ERA5 file to get the lon/lat coords
ERA5_files <- dir("../../oliver/data/ERA/ERA5/LWR", full.names = T, pattern = "ERA5")
ERA5_grid <- tidync(ERA5_files[1]) %>%
hyper_filter(latitude = dplyr::between(latitude, min(NWA_coords$lat), max(NWA_coords$lat)),
longitude = dplyr::between(longitude, min(NWA_coords$lon)+360, max(NWA_coords$lon)+360),
time = index == 1) %>%
hyper_tibble() %>%
dplyr::rename(lon = longitude, lat = latitude) %>%
dplyr::select(lon, lat) %>%
unique() %>%
mutate(lon = lon-360)
# Function for finding and cleaning up points within a given region polygon
pnts_in_region <- function(region_in, product_grid){
region_sub <- NWA_coords %>%
filter(region == region_in)
coords_in <- pnt.in.poly(pnts = product_grid[,c("lon", "lat")], poly.pnts = region_sub[,c("lon", "lat")]) %>%
filter(pip == 1) %>%
dplyr::select(-pip) %>%
mutate(region = region_in)
return(coords_in)
}
# Run the function
GLORYS_regions <- plyr::ldply(unique(NWA_coords$region), pnts_in_region,
.parallel = T, product_grid = GLORYS_grid)
saveRDS(GLORYS_regions, "data/GLORYS_regions.Rda")
ERA5_regions <- plyr::ldply(unique(NWA_coords$region), pnts_in_region,
.parallel = T, product_grid = ERA5_grid)
saveRDS(ERA5_regions, "data/ERA5_regions.Rda")
```
```{r grid-points-visual}
GLORYS_regions <- readRDS("data/GLORYS_regions.Rda")
ERA5_regions <- readRDS("data/ERA5_regions.Rda")
# Combine for visual
both_regions <- rbind(GLORYS_regions, ERA5_regions) %>%
mutate(product = c(rep("GLORYS", nrow(GLORYS_regions)),
rep("ERA5", nrow(ERA5_regions))))
# Visualise to ensure success
ggplot(NWA_coords, aes(x = lon, y = lat)) +
# geom_polygon(aes(fill = region), alpha = 0.2) +
geom_point(data = both_regions, aes(colour = region)) +
geom_polygon(data = map_base, aes(group = group), show.legend = F) +
coord_cartesian(xlim = NWA_corners[1:2],
ylim = NWA_corners[3:4]) +
labs(x = NULL, y = NULL) +
facet_wrap(~product)
```
## Average time series per region
With our pixels per region sorted we may now go about creating the average time series for each region from the GLORYS and ERA5 data. First we will load a brick of the data constrained roughly to the study area into memory before assigning the correct pixels to their regions. Once the pixels are assigned we will summarise them into one mean time series per variable per region. These mean time series are what the rest of the analyses will depend on.
The code for loading and processing the GLORYS data.
```{r GLORYS-prep, eval=FALSE}
# Set number of cores
# NB: This is very RAM heavy, be carfeul with core use
doParallel::registerDoParallel(cores = 25)
# The GLORYS file location
GLORYS_files <- dir("../data/GLORYS", full.names = T, pattern = "MHWflux")
system.time(
GLORYS_all_ts <- load_all_GLORYS_region(GLORYS_files) %>%
dplyr::arrange(region, t)
) # 187 seconds on 25 cores
saveRDS(GLORYS_all_ts, "data/GLORYS_all_ts.Rda")
```
The code for the ERA5 data. NB: The ERA5 data are on an hourly 0.25x0.25 spatiotemporal grid. This loading process constrains them to a daily 0.25x0.25 grid.
```{r}
# See the code/workflow script for the code used for ERA5 data prep
# There is too much code to run from an RMarkdown document
```
## MHWs per region
We will be using the SST value from GLORYS for calculating the MHWs and will use the standard Hobday definition with a base period of 1993-01-01 to 2018-12-25. We are using an eneven length year as the data do not quite extend to the end of December. It was decided that the increased accuracy of the climatology from the 2018 year outweighed the negative consideration of having a clim period that excludes a few days of winter.
```{r MHW-calc, eval=FALSE}
# Load the data
GLORYS_all_ts <- readRDS("data/GLORYS_all_ts.Rda")
# Calculate the MHWs
GLORYS_region_MHW <- GLORYS_all_ts %>%
dplyr::select(region:temp) %>%
group_by(region) %>%
nest() %>%
mutate(clims = map(data, ts2clm,
climatologyPeriod = c("1993-01-01", "2018-12-25")),
events = map(clims, detect_event),
cats = map(events, category, S = FALSE)) %>%
select(-data, -clims)
# Save
saveRDS(GLORYS_region_MHW, "data/GLORYS_region_MHW.Rda")
```
## Clims + anoms per variable
The analyses to come are going to be performed on anomaly values, not the original time series. In order to calculate the anomalies we are first going to need the climatologies for each variable. We will use the Hobday definition of climatology creation and then subtract the expected climatology from the observed values.
```{r clims}
# Load the data
GLORYS_all_ts <- readRDS("data/GLORYS_all_ts.Rda")
ERA5_all_ts <- readRDS("data/ERA5_all_ts.Rda")
ALL_ts <- rbind(GLORYS_all_ts, ERA5_all_ts) %>%
mutate(product = c(rep("GLORYS")))
# Calculate GLORYS clims and anoms
GLORYS_all_anom <- GLORYS_all_ts %>%
pivot_longer(temp:ssh, names_to = "var", values_to = "val") %>%
group_by(region, var) %>%
nest() %>%
mutate(clims = map(data, ts2clm, y = val, roundClm = 6,
climatologyPeriod = c("1993-01-01", "2018-12-25"))) %>%
dplyr::select(-data) %>%
unnest(cols = clims) %>%
mutate(anom = val-seas)
# Calculate ERA5 clims
ERA5_all_anom <- ERA5_all_ts %>%
pivot_longer(msnlwrf:t2m, names_to = "var", values_to = "val") %>%
group_by(region, var) %>%
nest() %>%
mutate(clims = map(data, ts2clm, y = val, roundClm = 6,
climatologyPeriod = c("1993-01-01", "2018-12-31"))) %>%
dplyr::select(-data) %>%
unnest(cols = clims) %>%
mutate(anom = val-seas)
# Save
GLORYS_all_anom <- saveRDS(GLORYS_all_anom, "data/GLORYS_all_anom.Rda")
ERA5_all_anom <- saveRDS(ERA5_all_anom, "data/ERA5_all_anom.Rda")
```
And that's all there is to it. In the next vignette we will take the periods of time over whiche MHWs occurred per region and pair those up with the GLORYS and ERA5 data. This will be used to investigate which drivers are best related to the onset and decline of MHWs.
## References