_posts/2021-01-14-wave4/wave4.Rmd

---
title: Analysis of "Global wave number-4 pattern in the southern subtropical sea surface temperature."
description: |
  A recent paper claims to have detected a global wave-4 pattern in subtropical SSTs. But have they really?
author:
  - name: Elio Campitelli
    url: https://eliocamp.github.io/
    affiliation: Centro de Investigaciones del Mar y la Atmósfera
    affiliation_url: http://www.cima.fcen.uba.ar/
date: "2021-01-14"
slug: wave4
categories:
  - zonal waves 
  - general circulation
output: 
  distill::distill_article: default
bibliography: bibliography.bib
compare_updates_url: https://github.com/eliocamp/scrapbook/blob/main/_posts/2021-01-14-wave4/wave4.Rmd
---

```{r setup, include=FALSE}
file <- tools::file_path_sans_ext(knitr::current_input())
knitr::opts_chunk$set(echo = TRUE, 
                      cache = FALSE, 
                      cache.path = file.path("..", "cache", file, "/"),
                      code_folding = TRUE,
                      layout = "l-page")
knitr::knit_hooks$set(crop = knitr::hook_pdfcrop)
# https://www.nature.com/articles/s41598-020-80492-x
```


This are some quick notes on [Global wave number-4 pattern in the southern subtropical sea surface temperature](https://www.nature.com/articles/s41598-020-80492-x) [@senapati2021]. The article claims to discover a wave-4 pattern in the southern Sea Surface Temperature (SST). Their basic method is to perform Empirical Orthogonal Function (EOF) to SST between 55°S and 20°S. They discard the first EOF as uninteresting because it's "ENSO-like" (unsurprising) and focus on the second EOF, whose spatial pattern is in Figure \@ref(fig:fig1).

(ref:fig1-cap) Panel a from  @senapati2021's Figure 1: Spatial pattern of second leading EOF mode of SST anomaly over the region (20°S-55°S) from HadSST.

```{r fig1, echo=FALSE, fig.cap="(ref:fig1-cap)",  out.extra="class=external"}
knitr::include_graphics(file.path("img", "wave4-fig1.png"))
```

First things first. Download the data and try to reproduce their figure. 

```{r helpers, cache = FALSE}
# packages
library(magrittr)
library(data.table)
library(metR)
library(ggplot2)


# fast detrend.
Detrend <- function(y, x) {
  nas <- is.na(y)
  m <- mean(y, na.rm = TRUE)
  if (!hasArg(x)) x <- seq_along(y)
  y[!nas] <- .lm.fit(cbind(1, x[!nas]), y[!nas])$residuals
  return(y + m)
}

theme_set(theme_minimal() + 
            theme(panel.grid = element_blank()))
# simple and dirty map
map <- ggplot2::map_data("world2") %>%
  subset(lat %between% c(-90, -0))

quick_map <- list(geom_polygon(data = map,
                               aes(long, lat, group = group), fill = "white", color = "black",
                               size =0.2),
                  scale_x_longitude(),
                  scale_y_latitude(ticks = 10),
                  coord_quickmap(ylim = c(-55, -20)))
```

```{r download-data}
# Download and decompress HadSST data
had_file <- here::here("_data", "hadsst.nc")
if (!file.exists(had_file)) {
  hadsst <- "https://www.metoffice.gov.uk/hadobs/hadisst/data/HadISST_sst.nc.gz"
  had_zip <- tempfile()
  
  download.file(hadsst, had_zip, mode = "wb")  
  R.utils::gunzip(had_zip, had_file, remove = FALSE)
}

```

```{r sst}
# Read data and detrend
sst <- ReadNetCDF(had_file, 
                   vars = c("sst"),
                   subset = list(latitude = c(-90, -0),
                                 time = c("1979-01-01", "2018-12-31"))) %>% 
  setnames(c("longitude", "latitude"), c("lon", "lat")) %>% 
  na.omit() %>% 
  .[, lon := ConvertLongitude(lon)] %>% 
  .[, sst := Detrend(Anomaly(sst)), by = .(lon, lat, month(time))]
```

```{r eofs}
eofs <- sst[lat %between% c(-55, -20)] %>%
  copy() %>% 
  .[, sst := sst*sqrt(cos(lat*pi/180))] %>% 
  EOF(sst ~ time | lat + lon, n = 1:2, data = .)
  
```

(ref:eof12-cap) Spatial patterns of the frist and second leading EOFs of detrended monthly anomalies of SST between 55°S and 20°S weighted by the square root of the cosine of latitude. 

```{r eof12, fig.cap = "(ref:eof12-cap)", fig.height = 2.1, fig.width=5, preview = TRUE}
ggplot(eofs$right, aes(lon, lat)) +
  geom_contour_fill(aes(z = -sst)) +
  scale_fill_divergent(guide = "none") +
  quick_map + 
  facet_wrap(PC~., ncol = 1)

```

And indeed. The first EOF is kind of ENSO-like (not really full ENSO, because I'm missing the equatorial SSTs, which is where ENSO really shines) and the second EOF looks pretty much identical to the their Figure 1.a save the different prime meridian. The time series associated with the second EOF is also almost exactly the same. 

(ref:eoftime-cap) Temporal pattern of the second EOF in gray with a 5-month running mean in black. 

```{r eoftime, fig.cap = "(ref:eoftime-cap)", layout = "l-body", fig.height=2.1}
cut(eofs, 2) %>% 
  .$left %>%
  copy() %>% 
  .[, sst5 := frollmean(sst, 5, align = "center")] %>% 
  ggplot(aes(time, -sst)) +
  geom_hline(yintercept = 0, size = 0.2, color = "gray50")  +
  geom_line(aes(y = -sst5))  +
  geom_line(color = "gray") +
  
  scale_y_continuous(NULL) +
  scale_x_datetime(date_breaks = "5 years", date_labels = "%Y")

```
OK, I'm on the right track. 

Now, my main concern with this paper is whether this pattern is actually, as the title of the paper says, a **global** pattern. EOF is a great technique for dimensionality reduction, but it's too easy to end up with statistical patterns that are a mix of actual physical patterns or even just noise. 

The authors agree, and they say that..

> In order to examine the synchronization of the W4 pattern among all the basins, point correlation analysis has been performed. For this purpose, eight points [i(37.5°S, 173.5°W), ii(37.5°S, 133.5°W), iii(44.5°S, 90.5°W), iv(39.5°S, 40.5°W), v(29.5°S, 2.5°W), vi(41.5°S, 41.5°E), vii(30.5°S, 86.5°E), viii(35.5°S, 130.5°E)] corresponding to the loading centres are selected (marked by green dots, i-viii, in Fig. 1a). The time series of SST anomaly is computed at each grid point after removing the contributions of the first EOF mode (henceforth, reconstructed SST anomaly). Further, point correlation is performed for the time series at the loading centers (Fig. 1a) with the reconstructed SST anomaly (Fig. 2a–h corresponding respectively to points (i) to (viii) of Fig. 1a). 

The problem, IMHO, is that their [Figure 2](https://www.nature.com/articles/s41598-020-80492-x/figures/2) doesn't really show as much synchronisation as they claim. First, let's reproduce it here. 

```{r points}
# Define points of interest
points <- tibble::tribble(~lat, ~lon,
                          -37.5, -173.5,
                          -37.5, -133.5,
                          -44.5, -90.5,
                          -39.5, -40.5, 
                          -29.5, -2.5, 
                          -41.5, 41.5,
                          -30.5, 86.5,
                          -35.5, 130.5) %>% 
  as.data.table() %>% 
  .[, lon := ConvertLongitude(lon)] %>% 
  .[, id := tolower(as.roman(seq_len(.N)))] %>% 
  .[, sign := rep(c(-1, 1), 4)]   # sign of original correlation
```


```{r sst_reconstruct}
# Reconstruct SST from leading EOF  and add it to 
# original data                     
sst_reconstruct <- predict(eofs, n = 1) %>% 
  setnames("sst", "sst_reconstructed")

sst <- sst[sst_reconstruct, on = .NATURAL]
rm(sst_reconstruct)
```


```{r corrs}
# Compute correlation maps of  points of interest,
# both with the orignial sst and sst with filtered EOF1
# (for compatison)
corrs <- sst[points, on = .NATURAL] %>% 
  .[, ":="(lon = NULL, lat = NULL)] %>% 
  setnames(c("sst", "sst_reconstructed"), c("ref", "ref_reconstructed")) %>% 
  .[sst, on = "time", allow.cartesian = TRUE] %>% 
  .[, .(correlation = cor(sst, ref),
        correlation_reconstructed = cor(sst - sst_reconstructed, ref - ref_reconstructed)),
    by = .(lon, lat, id, sign)]
```


(ref:correlations-cap) Correlation maps of SST and SST with the first EOF filterred out with the corresponding SST at each one of the points of interest. Compare with [Figure 2](https://www.nature.com/articles/s41598-020-80492-x/figures/2) of @senapati2021. The sign of the correlation is flipped for the even points as to preserve always the same sign and allow for easier comparison among maps.

```{r correlations, fig.cap = "(ref:correlations-cap)"}
corrs %>% 
  melt(measure.vars = c("correlation", "correlation_reconstructed")) %>% 
  ggplot(aes(lon, lat)) +
  geom_contour_fill(aes(z = -value*sign), 
                    breaks = AnchorBreaks(anchor = 0, binwidth = 0.1)) +
  quick_map +
  geom_point(data = copy(points)[, id := NULL], shape = 21, fill = NA) +
  geom_point(data = copy(points), shape = 21, color = "black", fill =  "#0e9a83") +
  scale_fill_divergent() +
  facet_grid(id~variable, labeller = labeller(variable = c(correlation = "SST", 
                                                           correlation_reconstructed = "SST - EOF1")))

```

I purposely computed the two versions. The panels on the right should be a reproduction of @senapati2021's Figure 2. It's hard to compare them because of the different colour palettes and scale limits (the authors chose parameters that enhanced small correlations), but I personally don't see much coherency. The first three rows do show high correlations between the three points in the Pacific, which in the unfiltered data looks very much like the well-known PSA pattern that appears as a response to El Niño Southern Oscillation. 

Another way to look at it is with by plotting $r^2$ which directly quantifies the degree of (linear) dependence between points. 

(ref:r2-cap) Coefficient of determination ($r^2$) computed from the correlations with the filtered SST in Figure \@ref(fig:correlations). Contours only show areas with $r^2 > 0.1$. 

```{r r2, fig.cap = "(ref:r2-cap)"}
corrs %>% 
  ggplot(aes(lon, lat)) +
  geom_contour_fill(aes(z = correlation_reconstructed^2, 
                        fill = ..level..), breaks = seq(.1, 1, by = 0.1)) +
  quick_map +
  geom_point(data = copy(points)[, id := NULL], shape = 21, fill = NA) +
  geom_point(data = copy(points), shape = 21, color = "black", fill =  "#de3e80") +
  scale_fill_viridis_c(oob = scales::squish,
                       super = ScaleDiscretised) +
  facet_grid(id~.)

```

Again, aside from the points in the Pacific, there is not a lot of long-range relationships between points. As I see it, I strongly suspect that this PC2 pattern is not really robustly **global**.


```{r bootstrap, cache=TRUE}
set.seed(42)
N <- 500
random_points <- unique(corrs[, .(lon, lat)]) %>% 
  .[sample(.N, N), ] %>% 
  setkey(lon, lat)

random_cors <- vapply(seq_len(N), function(i) {
  r <- random_points[i, ] 
  ref <- sst[lat == r$lat & lon == r$lon, sst - sst_reconstructed]
  sst[, ref_data := ..ref, by = .(lon, lat)]
  sst[, .(cor(sst - sst_reconstructed, ref_data)), by = .(lon, lat)] %>% 
    .[, min(V1)]
}, numeric(1))
```


Ok, but how could I quantify this vague impression that these maps don't show a lot of long-range teleconnections? What I'm going to do is to compute `r N` correlation maps like Figure \@ref(fig:correlations) but using random points. For each map, I'll take the absolute value of the minimum (negative) correlation as the level of connectivity. 

(ref:correlation-random-cap) Same as Figure \@ref(fig:correlations) but only for the reconstructed SST. Added in black contours, the 0.5 quantile of random correlations derived from `r N` correlation maps with random points. 

```{r correlation-random,fig.cap = "(ref:correlation-random-cap)"}
corrs %>% 
  ggplot(aes(lon, lat)) +
  geom_contour_fill(aes(z = -correlation_reconstructed*sign), 
                    breaks = AnchorBreaks(anchor = 0, binwidth = 0.1)) +
  geom_contour2(aes(z = abs(correlation)), breaks = quantile(abs(random_cors), .5), size = 0.25) +
  quick_map +
  geom_point(data = copy(points)[, id := NULL], shape = 21, fill = NA) +
  geom_point(data = copy(points), shape = 21, color = "black", fill =  "#0e9a83") +
  scale_fill_divergent() +
  facet_grid(id~., labeller = labeller(variable = c(correlation = "SST", 
                                                           correlation_reconstructed = "SST - EOF1")))
```


Figure \@ref(fig:correlation-random) repeats Figure \@ref(fig:correlations) but it marks the 50th percentile of the minimum correlation values derived from the bootstrap procedure, which is `r signif(quantile(abs(random_cors), 0.5), 2)`. 95% of random points had minimum correlations with absolute values larger than `r signif(quantile(abs(random_cors), 0.95), 2)`. With this in mind, correlations of the order of $\pm `r signif(quantile(abs(random_cors), 0.5), 2)`$ are not surprising and what would be expected by chance alone. This, of course, is only a crude measure since it doesn't take into account the size or pattern of the correlations, but I think that it should give some perspective to the real significance to the correlation levels shown here.


Let's try something else. From the correlation maps above (and previous knowledge), it's pretty obvious that the Pacific sector does behave somewhat coherently. So what I'm going to do is to split the spatial pattern into the Pacific basin (between 150°E and 290°E) and the Atlantic-Indian basins (the rest of the hemisphere). Then, I'm going to project each pattern onto the corresponding SST fields to get two indices. If the patterns is really coherent in time, then both indices must be strongly correlated. 


```{r series}
series <- eofs$right %>% copy() %>% 
  .[PC == "PC2"] %>% 
  .[, basin := ifelse(lon %between% c(150, 290), "pacific", "atlantic")] %>% 
  setnames("sst", "EOF") %>% 
  .[sst, on = c("lon", "lat")] %>% 
  .[, weighted.mean(sst*EOF, cos(lat*pi/180)), by = .(time, basin)]
```


```{r indices, fig.cap = "Relationship between the Pacific index and Atlantic-Indian index.", layout= "l-body"}
# Relationship between the two indices. 
series %>% 
  dcast(time ~ basin, value.var = "V1") %>% 
  ggplot(aes(pacific, atlantic)) +
  geom_point() +
  geom_label(data = ~.x[, .(cor(pacific, atlantic))], 
            aes(label = paste0("cor= ", signif(V1, 2))), 
            x = -0.0035, y = 0.001, size = 7) +
  geom_smooth(method = "lm") +
  scale_x_continuous("Pacific index") +
  scale_y_continuous("Atlantic-Indian index")

```

I mean... A correlation of 0.44 is not nothing, but it's also not a lot. 

Let's do the correlation map of each index. Again, if the pattern is really global, then the correlation map of the Pacific Index should also show th
.e Atlantic-Indian pattern and vice versa.


```{r patterns}
patterns <- eofs$left[PC == "PC2"] %>% 
  setnames(c("sst", "PC"), c("V1", "basin")) %>% 
  rbind(., series, use.names = TRUE) %>% 
  .[sst, on = "time", allow.cartesian = TRUE] %>% 
  .[, .(correlation = cor(V1, sst)), by = .(lon, lat, basin)] 

lines <- CJ(lon = c(150, 290),
            basin = c("pacific", "atlantic"))
```


```{r patterns2, fig.cap = "Correlation patterns with the Pacific index, the Atlantic-Indian index and the PC2."}
patterns %>% 
  ggplot(aes(lon, lat)) +
  geom_contour_fill(aes(z = -correlation, fill = ..level..), 
                    breaks = AnchorBreaks()) +
  quick_map + 
  geom_vline(data = lines, aes(xintercept = lon)) +
  scale_fill_divergent_discretised(limits = c(-1, 1)) +
  facet_wrap(basin~., ncol = 1, labeller = labeller(basi = c(pacific = "Pacific", 
                                                             atlantic = "Atlantic-Indian")))

```

Again... kinda? For the Atlantic-Indian index, the Pacific signal is barely there and it's actually completely missing in the Western Pacific. And for the Pacific index, the there is *some* signal in the Indian ocean, but barely any signal in the Atlantic.

Now one last test. To extend this analysis, I'll do the same computation but for every longitude. That is, for each longitude, take a 140º wide section of SST centred in that longitude and project the corresponding wave-4 pattern onto it to get a time-varying index. The result, then, is one "local wave-4 index" for each longitude. 


```{r lon_eofs, cache = TRUE}
lon_width <- diff(c(150, 290))
lon_halfwidth <- lon_width/2

# "extended" version of geopotential height 
# 
sst2 <- eofs$right %>% copy() %>% 
  .[PC == "PC2"] %>% 
  setnames("sst", "EOF") %>% 
  sst[., on = .NATURAL] %>% 
  ggperiodic::qwrap(lon = c(0, 360) ~ c(-lon_halfwidth, 360 + lon_halfwidth))

# For each longitude, compute EOF using a segment of lon_width width
# centered in that longitude.
lon_eofs <- lapply(unique(sst$lon), function(base_lon) {
  sst2 %>% 
    .[lon %between% (base_lon + c(-lon_halfwidth, lon_halfwidth))] %>% 
    .[, .(eof = weighted.mean(sst*EOF, cos(lat*pi/180))),
      by = time] %>% 
    .[, base_lon := base_lon] %>% 
    .[]
}) %>% 
  rbindlist()
```


```{r sections}
k <- lon_eofs %>% 
  widyr::pairwise_cor(base_lon, time, eof) %>% 
  as.data.table() %>% 
  .[, correlation := 1 - abs(correlation)] %>% 
  dcast(item1 ~ item2, value.var = "correlation") %>% 
  .[, -1] %>% 
  as.dist() %>% 
  hclust() %>% 
  cutree(3)

sections <- data.table(lon = as.numeric(names(k)), k = k)
cuts <- sections[c(0, diff(k)) != 0]

label_k <- c("1" = "East Pacific & Atlantic", 
             "2" = "Indian",
             "3" = "West Pacific")


cuts_lon <- LonLabel(cuts$lon)
cuts_lon_round <- LonLabel(round(cuts$lon/5)*5)
```

```{r cor-pairwise, fig.cap = "Pairwise correlation of \"local wave-4\" indices.", layout = "l-body"}
lon_eofs %>% 
  widyr::pairwise_cor(base_lon, time, eof) %>% 
  ggplot(aes(item1, item2)) +
  geom_contour_fill(aes(z = correlation, fill = ..level..), na.fill = 1) +
  geom_contour2(aes(z = correlation), size = 0.2) +
  geom_text_contour(aes(z = correlation, stroke.color = ..level..), color = "black",
                    stroke = 0.2) +
  scale_fill_divergent("Correlation",
                       super = ScaleDiscretised) +
  scale_color_divergent(aesthetics = "stroke.color", guide = "none") +
  scale_x_longitude() +
  scale_y_longitude() +
  coord_equal() 

```

Figure \@ref(fig:cor-pairwise) shows the pairwise correlation between all indices. Correlation drop rapidly with distance, but there are three clearly defined regions of high correlation that could represent various teleconnected areas. Perhaps not coincidentally, they correspond approximately to the three oceanic basins. The Indian between 0º and 120ºE, Western Pacific between 120ºE and 120ºW and Eastern Pacific and Atlantic between 120ºW and 0º.


Based on these correlations, we use hierarchical clustering with `1 - abs(correlation)` as distance measure to classify each longitude into each of 3 groups. The clusters are flanked by the longitudes `r knitr::combine_words(cuts_lon)`, which agree well with the visual interpretation of Figure \@ref(fig:cor-pairwise). Using this classification, I now create three indices by again projecting the corresponding wave-4 pattern into SST anomalies.

```{r series_k}
series_k <- eofs$right %>% copy() %>% 
  .[PC == "PC2"] %>% 
  .[sections, on = "lon"] %>% 
  setnames("sst", "EOF") %>% 
  .[sst, on = c("lon", "lat")] %>% 
  .[, .(value = weighted.mean(sst*EOF, cos(lat*pi/180))), by = .(time, k)] %>% 
  .[, value := as.numeric(scale(value)), by = .(k)]
```

```{r patterns_k}
patterns_k <- series_k %>% 
  .[sst, on = "time", allow.cartesian = TRUE] %>% 
  .[, .(correlation = cor(value, sst)), by = .(lon, lat, k)] 
```

(ref:patterns-k2-cap) Correlation maps between SST and each of the three basin-dependend wave-4 index. Overlayed in gray, the tree distinct areas of shared variability identified in Figure \@ref(fig:cor-pairwise) by hierarchical clustering and selecting 3 clusters.

```{r patterns-k2, fig.cap = "(ref:patterns-k2-cap)"}
patterns_k %>% 
  ggplot(aes(lon, lat)) +
  geom_contour_fill(aes(z = -correlation, fill = ..level..), 
                    breaks = AnchorBreaks(0)) +
  quick_map + 
  geom_raster(data = unique(patterns_k[, .(lon, lat)])[sections, on = "lon"],
              alpha = 0.2) +
  scale_fill_divergent_discretised("Correlation", limits = c(-1, 1)) +
  facet_wrap(k~., ncol = 1, labeller = labeller(k = label_k))
```

Correlation maps between SST anomalies and each of the three indices are shown in Figure \@ref(fig:patterns-k2). Inside the area used to define each index correlation are high and the pattern is well defined, as expected by construction. However, outside those areas, there is very little signal.

`r emo::ji("shrug")`. Take it as you will. From what I've seen here, I remain unconvinced that this is a global pattern of SST. 


# Download code 

Click on this button to get the code that generated this document:

```{r echo=FALSE, layout = "l-body"}
dir <- dirname(knitr::current_input(TRUE))
img <- file.path("img", list.files(file.path(dir, "img")))
zipfile <- file.path(tempdir(), paste0(basename(dir), ".zip"))

zip(zipfile, 
    files = c(knitr::current_input(),
              img, 
              "bibliography.bib"))

downloadthis::download_file(zipfile,
                           output_name = basename(dir),
                           button_label = "Download code",
                           button_type = "primary")
```