/
wave4.Rmd
458 lines (358 loc) · 20.7 KB
/
wave4.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
---
title: Analysis of "Global wave number-4 pattern in the southern subtropical sea surface temperature."
description: |
A recent paper claims to have detected a global wave-4 pattern in subtropical SSTs. But have they really?
author:
- name: Elio Campitelli
url: https://eliocamp.github.io/
affiliation: Centro de Investigaciones del Mar y la Atmósfera
affiliation_url: http://www.cima.fcen.uba.ar/
date: "2021-01-14"
slug: wave4
categories:
- zonal waves
- general circulation
output:
distill::distill_article: default
bibliography: bibliography.bib
compare_updates_url: https://github.com/eliocamp/scrapbook/blob/main/_posts/2021-01-14-wave4/wave4.Rmd
---
```{r setup, include=FALSE}
file <- tools::file_path_sans_ext(knitr::current_input())
knitr::opts_chunk$set(echo = TRUE,
cache = FALSE,
cache.path = file.path("..", "cache", file, "/"),
code_folding = TRUE,
layout = "l-page")
knitr::knit_hooks$set(crop = knitr::hook_pdfcrop)
# https://www.nature.com/articles/s41598-020-80492-x
```
This are some quick notes on [Global wave number-4 pattern in the southern subtropical sea surface temperature](https://www.nature.com/articles/s41598-020-80492-x) [@senapati2021]. The article claims to discover a wave-4 pattern in the southern Sea Surface Temperature (SST). Their basic method is to perform Empirical Orthogonal Function (EOF) to SST between 55°S and 20°S. They discard the first EOF as uninteresting because it's "ENSO-like" (unsurprising) and focus on the second EOF, whose spatial pattern is in Figure \@ref(fig:fig1).
(ref:fig1-cap) Panel a from @senapati2021's Figure 1: Spatial pattern of second leading EOF mode of SST anomaly over the region (20°S-55°S) from HadSST.
```{r fig1, echo=FALSE, fig.cap="(ref:fig1-cap)", out.extra="class=external"}
knitr::include_graphics(file.path("img", "wave4-fig1.png"))
```
First things first. Download the data and try to reproduce their figure.
```{r helpers, cache = FALSE}
# packages
library(magrittr)
library(data.table)
library(metR)
library(ggplot2)
# fast detrend.
Detrend <- function(y, x) {
nas <- is.na(y)
m <- mean(y, na.rm = TRUE)
if (!hasArg(x)) x <- seq_along(y)
y[!nas] <- .lm.fit(cbind(1, x[!nas]), y[!nas])$residuals
return(y + m)
}
theme_set(theme_minimal() +
theme(panel.grid = element_blank()))
# simple and dirty map
map <- ggplot2::map_data("world2") %>%
subset(lat %between% c(-90, -0))
quick_map <- list(geom_polygon(data = map,
aes(long, lat, group = group), fill = "white", color = "black",
size =0.2),
scale_x_longitude(),
scale_y_latitude(ticks = 10),
coord_quickmap(ylim = c(-55, -20)))
```
```{r download-data}
# Download and decompress HadSST data
had_file <- here::here("_data", "hadsst.nc")
if (!file.exists(had_file)) {
hadsst <- "https://www.metoffice.gov.uk/hadobs/hadisst/data/HadISST_sst.nc.gz"
had_zip <- tempfile()
download.file(hadsst, had_zip, mode = "wb")
R.utils::gunzip(had_zip, had_file, remove = FALSE)
}
```
```{r sst}
# Read data and detrend
sst <- ReadNetCDF(had_file,
vars = c("sst"),
subset = list(latitude = c(-90, -0),
time = c("1979-01-01", "2018-12-31"))) %>%
setnames(c("longitude", "latitude"), c("lon", "lat")) %>%
na.omit() %>%
.[, lon := ConvertLongitude(lon)] %>%
.[, sst := Detrend(Anomaly(sst)), by = .(lon, lat, month(time))]
```
```{r eofs}
eofs <- sst[lat %between% c(-55, -20)] %>%
copy() %>%
.[, sst := sst*sqrt(cos(lat*pi/180))] %>%
EOF(sst ~ time | lat + lon, n = 1:2, data = .)
```
(ref:eof12-cap) Spatial patterns of the frist and second leading EOFs of detrended monthly anomalies of SST between 55°S and 20°S weighted by the square root of the cosine of latitude.
```{r eof12, fig.cap = "(ref:eof12-cap)", fig.height = 2.1, fig.width=5, preview = TRUE}
ggplot(eofs$right, aes(lon, lat)) +
geom_contour_fill(aes(z = -sst)) +
scale_fill_divergent(guide = "none") +
quick_map +
facet_wrap(PC~., ncol = 1)
```
And indeed. The first EOF is kind of ENSO-like (not really full ENSO, because I'm missing the equatorial SSTs, which is where ENSO really shines) and the second EOF looks pretty much identical to the their Figure 1.a save the different prime meridian. The time series associated with the second EOF is also almost exactly the same.
(ref:eoftime-cap) Temporal pattern of the second EOF in gray with a 5-month running mean in black.
```{r eoftime, fig.cap = "(ref:eoftime-cap)", layout = "l-body", fig.height=2.1}
cut(eofs, 2) %>%
.$left %>%
copy() %>%
.[, sst5 := frollmean(sst, 5, align = "center")] %>%
ggplot(aes(time, -sst)) +
geom_hline(yintercept = 0, size = 0.2, color = "gray50") +
geom_line(aes(y = -sst5)) +
geom_line(color = "gray") +
scale_y_continuous(NULL) +
scale_x_datetime(date_breaks = "5 years", date_labels = "%Y")
```
OK, I'm on the right track.
Now, my main concern with this paper is whether this pattern is actually, as the title of the paper says, a **global** pattern. EOF is a great technique for dimensionality reduction, but it's too easy to end up with statistical patterns that are a mix of actual physical patterns or even just noise.
The authors agree, and they say that..
> In order to examine the synchronization of the W4 pattern among all the basins, point correlation analysis has been performed. For this purpose, eight points [i(37.5°S, 173.5°W), ii(37.5°S, 133.5°W), iii(44.5°S, 90.5°W), iv(39.5°S, 40.5°W), v(29.5°S, 2.5°W), vi(41.5°S, 41.5°E), vii(30.5°S, 86.5°E), viii(35.5°S, 130.5°E)] corresponding to the loading centres are selected (marked by green dots, i-viii, in Fig. 1a). The time series of SST anomaly is computed at each grid point after removing the contributions of the first EOF mode (henceforth, reconstructed SST anomaly). Further, point correlation is performed for the time series at the loading centers (Fig. 1a) with the reconstructed SST anomaly (Fig. 2a–h corresponding respectively to points (i) to (viii) of Fig. 1a).
The problem, IMHO, is that their [Figure 2](https://www.nature.com/articles/s41598-020-80492-x/figures/2) doesn't really show as much synchronisation as they claim. First, let's reproduce it here.
```{r points}
# Define points of interest
points <- tibble::tribble(~lat, ~lon,
-37.5, -173.5,
-37.5, -133.5,
-44.5, -90.5,
-39.5, -40.5,
-29.5, -2.5,
-41.5, 41.5,
-30.5, 86.5,
-35.5, 130.5) %>%
as.data.table() %>%
.[, lon := ConvertLongitude(lon)] %>%
.[, id := tolower(as.roman(seq_len(.N)))] %>%
.[, sign := rep(c(-1, 1), 4)] # sign of original correlation
```
```{r sst_reconstruct}
# Reconstruct SST from leading EOF and add it to
# original data
sst_reconstruct <- predict(eofs, n = 1) %>%
setnames("sst", "sst_reconstructed")
sst <- sst[sst_reconstruct, on = .NATURAL]
rm(sst_reconstruct)
```
```{r corrs}
# Compute correlation maps of points of interest,
# both with the orignial sst and sst with filtered EOF1
# (for compatison)
corrs <- sst[points, on = .NATURAL] %>%
.[, ":="(lon = NULL, lat = NULL)] %>%
setnames(c("sst", "sst_reconstructed"), c("ref", "ref_reconstructed")) %>%
.[sst, on = "time", allow.cartesian = TRUE] %>%
.[, .(correlation = cor(sst, ref),
correlation_reconstructed = cor(sst - sst_reconstructed, ref - ref_reconstructed)),
by = .(lon, lat, id, sign)]
```
(ref:correlations-cap) Correlation maps of SST and SST with the first EOF filterred out with the corresponding SST at each one of the points of interest. Compare with [Figure 2](https://www.nature.com/articles/s41598-020-80492-x/figures/2) of @senapati2021. The sign of the correlation is flipped for the even points as to preserve always the same sign and allow for easier comparison among maps.
```{r correlations, fig.cap = "(ref:correlations-cap)"}
corrs %>%
melt(measure.vars = c("correlation", "correlation_reconstructed")) %>%
ggplot(aes(lon, lat)) +
geom_contour_fill(aes(z = -value*sign),
breaks = AnchorBreaks(anchor = 0, binwidth = 0.1)) +
quick_map +
geom_point(data = copy(points)[, id := NULL], shape = 21, fill = NA) +
geom_point(data = copy(points), shape = 21, color = "black", fill = "#0e9a83") +
scale_fill_divergent() +
facet_grid(id~variable, labeller = labeller(variable = c(correlation = "SST",
correlation_reconstructed = "SST - EOF1")))
```
I purposely computed the two versions. The panels on the right should be a reproduction of @senapati2021's Figure 2. It's hard to compare them because of the different colour palettes and scale limits (the authors chose parameters that enhanced small correlations), but I personally don't see much coherency. The first three rows do show high correlations between the three points in the Pacific, which in the unfiltered data looks very much like the well-known PSA pattern that appears as a response to El Niño Southern Oscillation.
Another way to look at it is with by plotting $r^2$ which directly quantifies the degree of (linear) dependence between points.
(ref:r2-cap) Coefficient of determination ($r^2$) computed from the correlations with the filtered SST in Figure \@ref(fig:correlations). Contours only show areas with $r^2 > 0.1$.
```{r r2, fig.cap = "(ref:r2-cap)"}
corrs %>%
ggplot(aes(lon, lat)) +
geom_contour_fill(aes(z = correlation_reconstructed^2,
fill = ..level..), breaks = seq(.1, 1, by = 0.1)) +
quick_map +
geom_point(data = copy(points)[, id := NULL], shape = 21, fill = NA) +
geom_point(data = copy(points), shape = 21, color = "black", fill = "#de3e80") +
scale_fill_viridis_c(oob = scales::squish,
super = ScaleDiscretised) +
facet_grid(id~.)
```
Again, aside from the points in the Pacific, there is not a lot of long-range relationships between points. As I see it, I strongly suspect that this PC2 pattern is not really robustly **global**.
```{r bootstrap, cache=TRUE}
set.seed(42)
N <- 500
random_points <- unique(corrs[, .(lon, lat)]) %>%
.[sample(.N, N), ] %>%
setkey(lon, lat)
random_cors <- vapply(seq_len(N), function(i) {
r <- random_points[i, ]
ref <- sst[lat == r$lat & lon == r$lon, sst - sst_reconstructed]
sst[, ref_data := ..ref, by = .(lon, lat)]
sst[, .(cor(sst - sst_reconstructed, ref_data)), by = .(lon, lat)] %>%
.[, min(V1)]
}, numeric(1))
```
Ok, but how could I quantify this vague impression that these maps don't show a lot of long-range teleconnections? What I'm going to do is to compute `r N` correlation maps like Figure \@ref(fig:correlations) but using random points. For each map, I'll take the absolute value of the minimum (negative) correlation as the level of connectivity.
(ref:correlation-random-cap) Same as Figure \@ref(fig:correlations) but only for the reconstructed SST. Added in black contours, the 0.5 quantile of random correlations derived from `r N` correlation maps with random points.
```{r correlation-random,fig.cap = "(ref:correlation-random-cap)"}
corrs %>%
ggplot(aes(lon, lat)) +
geom_contour_fill(aes(z = -correlation_reconstructed*sign),
breaks = AnchorBreaks(anchor = 0, binwidth = 0.1)) +
geom_contour2(aes(z = abs(correlation)), breaks = quantile(abs(random_cors), .5), size = 0.25) +
quick_map +
geom_point(data = copy(points)[, id := NULL], shape = 21, fill = NA) +
geom_point(data = copy(points), shape = 21, color = "black", fill = "#0e9a83") +
scale_fill_divergent() +
facet_grid(id~., labeller = labeller(variable = c(correlation = "SST",
correlation_reconstructed = "SST - EOF1")))
```
Figure \@ref(fig:correlation-random) repeats Figure \@ref(fig:correlations) but it marks the 50th percentile of the minimum correlation values derived from the bootstrap procedure, which is `r signif(quantile(abs(random_cors), 0.5), 2)`. 95% of random points had minimum correlations with absolute values larger than `r signif(quantile(abs(random_cors), 0.95), 2)`. With this in mind, correlations of the order of $\pm `r signif(quantile(abs(random_cors), 0.5), 2)`$ are not surprising and what would be expected by chance alone. This, of course, is only a crude measure since it doesn't take into account the size or pattern of the correlations, but I think that it should give some perspective to the real significance to the correlation levels shown here.
Let's try something else. From the correlation maps above (and previous knowledge), it's pretty obvious that the Pacific sector does behave somewhat coherently. So what I'm going to do is to split the spatial pattern into the Pacific basin (between 150°E and 290°E) and the Atlantic-Indian basins (the rest of the hemisphere). Then, I'm going to project each pattern onto the corresponding SST fields to get two indices. If the patterns is really coherent in time, then both indices must be strongly correlated.
```{r series}
series <- eofs$right %>% copy() %>%
.[PC == "PC2"] %>%
.[, basin := ifelse(lon %between% c(150, 290), "pacific", "atlantic")] %>%
setnames("sst", "EOF") %>%
.[sst, on = c("lon", "lat")] %>%
.[, weighted.mean(sst*EOF, cos(lat*pi/180)), by = .(time, basin)]
```
```{r indices, fig.cap = "Relationship between the Pacific index and Atlantic-Indian index.", layout= "l-body"}
# Relationship between the two indices.
series %>%
dcast(time ~ basin, value.var = "V1") %>%
ggplot(aes(pacific, atlantic)) +
geom_point() +
geom_label(data = ~.x[, .(cor(pacific, atlantic))],
aes(label = paste0("cor= ", signif(V1, 2))),
x = -0.0035, y = 0.001, size = 7) +
geom_smooth(method = "lm") +
scale_x_continuous("Pacific index") +
scale_y_continuous("Atlantic-Indian index")
```
I mean... A correlation of 0.44 is not nothing, but it's also not a lot.
Let's do the correlation map of each index. Again, if the pattern is really global, then the correlation map of the Pacific Index should also show th
.e Atlantic-Indian pattern and vice versa.
```{r patterns}
patterns <- eofs$left[PC == "PC2"] %>%
setnames(c("sst", "PC"), c("V1", "basin")) %>%
rbind(., series, use.names = TRUE) %>%
.[sst, on = "time", allow.cartesian = TRUE] %>%
.[, .(correlation = cor(V1, sst)), by = .(lon, lat, basin)]
lines <- CJ(lon = c(150, 290),
basin = c("pacific", "atlantic"))
```
```{r patterns2, fig.cap = "Correlation patterns with the Pacific index, the Atlantic-Indian index and the PC2."}
patterns %>%
ggplot(aes(lon, lat)) +
geom_contour_fill(aes(z = -correlation, fill = ..level..),
breaks = AnchorBreaks()) +
quick_map +
geom_vline(data = lines, aes(xintercept = lon)) +
scale_fill_divergent_discretised(limits = c(-1, 1)) +
facet_wrap(basin~., ncol = 1, labeller = labeller(basi = c(pacific = "Pacific",
atlantic = "Atlantic-Indian")))
```
Again... kinda? For the Atlantic-Indian index, the Pacific signal is barely there and it's actually completely missing in the Western Pacific. And for the Pacific index, the there is *some* signal in the Indian ocean, but barely any signal in the Atlantic.
Now one last test. To extend this analysis, I'll do the same computation but for every longitude. That is, for each longitude, take a 140º wide section of SST centred in that longitude and project the corresponding wave-4 pattern onto it to get a time-varying index. The result, then, is one "local wave-4 index" for each longitude.
```{r lon_eofs, cache = TRUE}
lon_width <- diff(c(150, 290))
lon_halfwidth <- lon_width/2
# "extended" version of geopotential height
#
sst2 <- eofs$right %>% copy() %>%
.[PC == "PC2"] %>%
setnames("sst", "EOF") %>%
sst[., on = .NATURAL] %>%
ggperiodic::qwrap(lon = c(0, 360) ~ c(-lon_halfwidth, 360 + lon_halfwidth))
# For each longitude, compute EOF using a segment of lon_width width
# centered in that longitude.
lon_eofs <- lapply(unique(sst$lon), function(base_lon) {
sst2 %>%
.[lon %between% (base_lon + c(-lon_halfwidth, lon_halfwidth))] %>%
.[, .(eof = weighted.mean(sst*EOF, cos(lat*pi/180))),
by = time] %>%
.[, base_lon := base_lon] %>%
.[]
}) %>%
rbindlist()
```
```{r sections}
k <- lon_eofs %>%
widyr::pairwise_cor(base_lon, time, eof) %>%
as.data.table() %>%
.[, correlation := 1 - abs(correlation)] %>%
dcast(item1 ~ item2, value.var = "correlation") %>%
.[, -1] %>%
as.dist() %>%
hclust() %>%
cutree(3)
sections <- data.table(lon = as.numeric(names(k)), k = k)
cuts <- sections[c(0, diff(k)) != 0]
label_k <- c("1" = "East Pacific & Atlantic",
"2" = "Indian",
"3" = "West Pacific")
cuts_lon <- LonLabel(cuts$lon)
cuts_lon_round <- LonLabel(round(cuts$lon/5)*5)
```
```{r cor-pairwise, fig.cap = "Pairwise correlation of \"local wave-4\" indices.", layout = "l-body"}
lon_eofs %>%
widyr::pairwise_cor(base_lon, time, eof) %>%
ggplot(aes(item1, item2)) +
geom_contour_fill(aes(z = correlation, fill = ..level..), na.fill = 1) +
geom_contour2(aes(z = correlation), size = 0.2) +
geom_text_contour(aes(z = correlation, stroke.color = ..level..), color = "black",
stroke = 0.2) +
scale_fill_divergent("Correlation",
super = ScaleDiscretised) +
scale_color_divergent(aesthetics = "stroke.color", guide = "none") +
scale_x_longitude() +
scale_y_longitude() +
coord_equal()
```
Figure \@ref(fig:cor-pairwise) shows the pairwise correlation between all indices. Correlation drop rapidly with distance, but there are three clearly defined regions of high correlation that could represent various teleconnected areas. Perhaps not coincidentally, they correspond approximately to the three oceanic basins. The Indian between 0º and 120ºE, Western Pacific between 120ºE and 120ºW and Eastern Pacific and Atlantic between 120ºW and 0º.
Based on these correlations, we use hierarchical clustering with `1 - abs(correlation)` as distance measure to classify each longitude into each of 3 groups. The clusters are flanked by the longitudes `r knitr::combine_words(cuts_lon)`, which agree well with the visual interpretation of Figure \@ref(fig:cor-pairwise). Using this classification, I now create three indices by again projecting the corresponding wave-4 pattern into SST anomalies.
```{r series_k}
series_k <- eofs$right %>% copy() %>%
.[PC == "PC2"] %>%
.[sections, on = "lon"] %>%
setnames("sst", "EOF") %>%
.[sst, on = c("lon", "lat")] %>%
.[, .(value = weighted.mean(sst*EOF, cos(lat*pi/180))), by = .(time, k)] %>%
.[, value := as.numeric(scale(value)), by = .(k)]
```
```{r patterns_k}
patterns_k <- series_k %>%
.[sst, on = "time", allow.cartesian = TRUE] %>%
.[, .(correlation = cor(value, sst)), by = .(lon, lat, k)]
```
(ref:patterns-k2-cap) Correlation maps between SST and each of the three basin-dependend wave-4 index. Overlayed in gray, the tree distinct areas of shared variability identified in Figure \@ref(fig:cor-pairwise) by hierarchical clustering and selecting 3 clusters.
```{r patterns-k2, fig.cap = "(ref:patterns-k2-cap)"}
patterns_k %>%
ggplot(aes(lon, lat)) +
geom_contour_fill(aes(z = -correlation, fill = ..level..),
breaks = AnchorBreaks(0)) +
quick_map +
geom_raster(data = unique(patterns_k[, .(lon, lat)])[sections, on = "lon"],
alpha = 0.2) +
scale_fill_divergent_discretised("Correlation", limits = c(-1, 1)) +
facet_wrap(k~., ncol = 1, labeller = labeller(k = label_k))
```
Correlation maps between SST anomalies and each of the three indices are shown in Figure \@ref(fig:patterns-k2). Inside the area used to define each index correlation are high and the pattern is well defined, as expected by construction. However, outside those areas, there is very little signal.
`r emo::ji("shrug")`. Take it as you will. From what I've seen here, I remain unconvinced that this is a global pattern of SST.
# Download code
Click on this button to get the code that generated this document:
```{r echo=FALSE, layout = "l-body"}
dir <- dirname(knitr::current_input(TRUE))
img <- file.path("img", list.files(file.path(dir, "img")))
zipfile <- file.path(tempdir(), paste0(basename(dir), ".zip"))
zip(zipfile,
files = c(knitr::current_input(),
img,
"bibliography.bib"))
downloadthis::download_file(zipfile,
output_name = basename(dir),
button_label = "Download code",
button_type = "primary")
```