analysis/mhw-flux.Rmd

---
title: "MHWs vs. heat flux"
author: "Robert Schlegel"
date: "2020-02-25"
output: workflowr::wflow_html
editor_options:
  chunk_output_type: console
csl: FMars.csl
bibliography: MHWflux.bib
---

```{r global_options, include = FALSE}
knitr::opts_chunk$set(fig.width = 8, fig.align = 'center',
                      echo = TRUE, warning = FALSE, message = FALSE, 
                      eval = TRUE, tidy = FALSE)
```

## Introduction

This vignette will walk through the thinking and the process for how to link physical variables to their potential effect on driving or dissipating MHWs. The primary source that inspired this work was @Chen2016. In this paper the authors were able to illustrate which parts of the heat budget were most likely driving the anomalous heat content in the surface of the ocean. What this analysis seeks to do is to build on this methodology by applying the fundamental concept to ALL of the MHWs detected in the NW Atlantic. Fundamentally we are running thousands of correlations between SST anomalies and the co-occurrent anomalies for a range of physical variables. The stronger the correlation (both positive and negative) the more of an indication this is to us that these phenomena are related.

```{r startup}
# All of the libraries and objects used in the project
# Note that this also loads the data we will be using in this vignette
source("code/functions.R")
```

## Correlations

We know when the MHWs occurred, and our physical data are prepped, so what we need to do is run correlations between SST from the start to peak and peak to end of each event for the full suite of variables. This will show us for each event which values correlated the best for the onset AND decline of the events. We will run correlations on the full time series, too.

```{r MHW-var-cor, eval=FALSE}
# Extract just the event info
GLORYS_MHW_event_index <- GLORYS_MHW_event %>% 
  select(event_no, region, season) %>% 
  ungroup() %>% 
  mutate(row_index = 1:n())

# Run all the stats
ALL_cor <- plyr::ddply(GLORYS_MHW_event_index, .parallel = T,
                       .variables = c("row_index"), .fun = cor_all) %>% 
  left_join(GLORYS_MHW_event_index, by = "row_index") %>% 
  select(region, season, event_no, ts, everything()) %>%
  arrange(region, event_no) %>% 
  mutate(Parameter2 = factor(Parameter2))

# Save
saveRDS(ALL_cor, "data/ALL_cor.Rda")
saveRDS(ALL_cor, "shiny/ALL_cor.Rda")
```

Seeing as how we're running correlations verything runs pretty quickly. With the method sorted for now we need to have a look at the results. What we have at the moment is a long dataframe containing the correlations of different variables with temperature anomalies. It must be pointed out that these are for the same day, there is no time lag introduced, which may be important. Below we are going to visualise the range of correlations for each variable to see how much each distribution is skewed. This skewness could probably be quantified in a meaningful way... but let's look at the data first.

We also want to filter by p-value to highlight the strong correlations.

```{r shiny-histo}
# source("shiny/app.R")
# Or it is live here:
# https://robert-schlegel.shinyapps.io/MHWflux/
```

There are some really clear patterns coming through in the data. In particular SSS seems to be strongly related to the onset of MHWs. There are a lot of nuances in these data and so I think this is actually an example of where a Shiny app is useful to interrogate the data.

In the shiny app it also comes out that the longer events tend not to correlate strongly with a single variable. This is to be expected and supports the argument that very persistent MHWs are supported by a confluence of variables. How to parse that out is an interesting challenge.

## Regions + Seasons

With the correlations calculated for the onset, decline, and full extent of each MHW, we also want to know if any signals emerge from the regions and/or seasons of occurrence of these events. Is the relationship between SSS and MHW onset stronger in the winter? Stronger in certain region? Having manually looked through the Shiny app it does look like there are some patterns. These will be written down in the results table below.

## Relationships

With patterns pulled out by region and season, we want to see if there are any relationships between MHWs that show strong correlations at onset with a particular variables and strong correlations at decline with another. We will look for this within regions and seasons as well. For example, do MHWs that correlate well with an increase in SSS also correlate well with a decrease in long-wave radiation during the decline of the event? I'm not sure how best to go about this in a clean manner.

Another thing to consider would be if fast onset slow decline (and vice versa) events have different characteristics to slower evolving events. The same question could be posed to long vs short events and those with high intensities vs low. In order to begin this investigation we must join the MHW results to the correlation results. We will visualise these patterns with heatmaps.

```{r metrics_cor}
ALL_cor_wide <- readRDS("data/ALL_cor.Rda") %>% 
  ungroup() %>% 
  filter(Parameter1 == "sst") %>% 
  dplyr::select(region:ts, Parameter2, r, n_Obs) %>% 
  pivot_wider(values_from = r, names_from = Parameter2)

# Combine MHW metrics and correlation results
events_cor_prep <- GLORYS_MHW_event %>% 
  dplyr::select(region, season, event_no, duration, intensity_mean, intensity_max, 
                intensity_cumulative, rate_onset, rate_decline) %>% 
  left_join(ALL_cor_wide, by = c("region", "season", "event_no")) %>% 
  ungroup() %>% 
  dplyr::select(region:n_Obs, sst, bottomT, sss, mld_cum, mld_1_cum, t2m, tcc_cum, p_e_cum, mslp_cum,
                 lwr_mld_cum, swr_mld_cum, lhf_mld_cum, shf_mld_cum, qnet_mld_cum)

# Heatmap showing average correlations by MHW duration
events_cor_prep %>% 
  mutate(duration = plyr::round_any(duration, 5)) %>% 
  group_by(ts, duration) %>% 
  mutate(count = n()) %>% 
  summarise_if(is.numeric, mean) %>% 
  pivot_longer(cols = sst:qnet_mld_cum) %>% 
  filter(name != "sst",
         ts != "full") %>% 
  ggplot(aes(x = duration, y = name)) +
  geom_tile(aes(fill = value)) +
  geom_label(aes(label = count)) +
  facet_wrap(~ts) +
  scale_fill_gradient2(low = "blue", high = "red") +
  coord_cartesian(expand = F) +
  labs(y = NULL, x = "Duration (5 day steps)", fill = "r (mean)")

# Heatmap showing average correlations by MHW max intensity
events_cor_prep %>% 
  mutate(intensity_max = plyr::round_any(intensity_max, 0.25)) %>% 
  group_by(ts, intensity_max) %>%
  mutate(count = n()) %>% 
  summarise_if(is.numeric, mean) %>% 
  pivot_longer(cols = sst:qnet_mld_cum) %>% 
  filter(name != "sst",
         ts != "full") %>% 
  ggplot(aes(x = intensity_max, y = name)) +
  geom_tile(aes(fill = value)) +
  geom_label(aes(label = count)) +
  facet_wrap(~ts) +
  scale_fill_gradient2(low = "blue", high = "red") +
  coord_cartesian(expand = F) +
  labs(y = NULL, x = "Max Intensity (°C; 0.25° steps)", fill = "r (mean)")

# Heatmap showing average correlations by MHW rate onset
events_cor_prep %>% 
  mutate(rate_onset = round(rate_onset, 1)) %>% 
  group_by(ts, rate_onset) %>% 
  mutate(count = n()) %>% 
  summarise_if(is.numeric, mean) %>% 
  pivot_longer(cols = sst:qnet_mld_cum) %>% 
  filter(name != "sst",
         ts != "full") %>% 
  ggplot(aes(x = rate_onset, y = name)) +
  geom_tile(aes(fill = value)) +
  geom_label(aes(label = count)) +
  facet_wrap(~ts) +
  scale_fill_gradient2(low = "blue", high = "red") +
  coord_cartesian(expand = F) +
  labs(y = NULL, x = "Rate of onset (°C; 0.1° steps)", fill = "r (mean)")

# Heatmap showing average correlations by MHW rate decline
events_cor_prep %>% 
  mutate(rate_decline = round(rate_decline, 1)) %>% 
  group_by(ts, rate_decline) %>% 
  mutate(count = n()) %>% 
  summarise_if(is.numeric, mean) %>% 
  pivot_longer(cols = sst:qnet_mld_cum) %>% 
  filter(name != "sst",
         ts != "full") %>% 
  ggplot(aes(x = rate_decline, y = name)) +
  geom_tile(aes(fill = value)) +
  geom_label(aes(label = count)) +
  facet_wrap(~ts) +
  scale_fill_gradient2(low = "blue", high = "red") +
  coord_cartesian(expand = F) +
  labs(y = NULL, x = "Rate of decline (°C; 0.1° steps)", fill = "r (mean)")
```

In the code chunk below we look at the correlations of the correlation results. This isn't terribly useful...

```{r metrics_cor_cor, eval=FALSE}
# All correlations by region
events_cor_region <- events_cor_prep %>% 
  group_by(region, ts) %>% 
  correlation(redundant = TRUE) %>% 
  mutate_if(is.numeric, round, 4) %>% 
  filter(Parameter1 %in% c("duration", "intensity_mean", "intensity_max", 
                           "intensity_cumulative", "rate_onset", "rate_decline"),
         !Parameter2 %in% c("duration", "intensity_mean", "intensity_max",
                           "intensity_cumulative", "rate_onset", "rate_decline"))
saveRDS(events_cor_region, "data/events_cor_region.Rds")

# All correlations by season
events_cor_season <- events_cor_prep %>% 
  group_by(season, ts) %>% 
  correlation(redundant = TRUE) %>% 
  mutate_if(is.numeric, round, 4) %>% 
  filter(Parameter1 %in% c("duration", "intensity_mean", "intensity_max", 
                           "intensity_cumulative", "rate_onset", "rate_decline"),
         !Parameter2 %in% c("duration", "intensity_mean", "intensity_max",
                           "intensity_cumulative", "rate_onset", "rate_decline"))
saveRDS(events_cor_season, "data/events_cor_season.Rds")

# All correlations by region+season
# Some groupings don't have enough observations
# This throws an error but it still runs
# events_cor_region_season <- events_cor_prep %>%
#   group_by(region, season, ts) %>%
#   correlation(redundant = TRUE) %>%
#   mutate_if(is.numeric, round, 4) %>%
#   filter(Parameter1 %in% c("duration", "intensity_mean", "intensity_max",
#                            "intensity_cumulative", "rate_onset", "rate_decline"),
#          !Parameter2 %in% c("duration", "intensity_mean", "intensity_max",
#                            "intensity_cumulative", "rate_onset", "rate_decline"))
# saveRDS(events_cor_region_season, "data/events_cor_region_season.Rds")

# test visuals
events_cor_prep %>%
  filter(region == "nfs", ts == "onset") %>%
  ggplot(aes(x = intensity_mean, y = msshf_mld)) +
  geom_smooth(method = "lm", se = FALSE, colour = "black") +
  geom_point(aes(colour = season))

```

## Choice events

There are a lot of results to wade through and though it is clear there are important signals in the results, it is proving difficult to distill them. One thought is that we don't need to look at all of the events, just the longest/most intense events with strong r values. This is first done by cutting out all Cat. I events. We then find strong correlations with long events. There should just be a few.

Once this has been done we group events by their strongest Qx relationship. Then find their strongest relationship with the next level of variables (e.g. MLD, MSLP, and so on). Ideally one may find the top four flavours.

```{r choice-events, eval=FALSE}
# Filter out smol events
events_cor_cat <- events_cor_prep %>% 
  left_join(GLORYS_MHW_cats[,c("region", "event_no", "category")], by = c("region", "event_no")) %>% 
  filter(category != "I Moderate", duration >= 21)

# Events with high Qlw correlations at onset
events_cor_cat %>% 
  filter(ts == "onset", lwr_mld_cum >= 0.7)

# Melt the data frame and find the q term with the highest correlation
# Those are then used to separate events into groups
```

## SOM

The code used for the MHWNWA project was also used on the GLORYS MHW results to create a SOM for the GLORYS data. These SOM nodes are used below to cluster the correlation results to see how the differ based on the SOM.

```{r SOM}
# Load the SOM from the MHWNWA
SOM <- readRDS("../MHWNWA/data/SOM/som_GLORYS.Rda")

# Grab only the node info
SOM_info <- SOM$info

# Join to the GLORYS MHW correlation results
events_cor_SOM <- left_join(events_cor_prep, SOM_info, by = c("region", "event_no"))

# Plotting function
plot_func <- function(df, name) {
  ggplot(data = df, aes(x = node, y = ts)) +
    geom_tile(aes(fill = value)) +
    # facet_wrap(~name, scales = "free") +
    scale_fill_gradient2(low = "blue", high = "red", name = name) +
    coord_cartesian(expand = F)
}


# Summary stats per node shown as heatmap
nested_SOM <- events_cor_SOM %>% 
  dplyr::select(-event_no) %>% 
  mutate(node = as.factor(node)) %>% 
  group_by(node, ts) %>% 
  summarise_if(is.numeric, mean) %>% 
  pivot_longer(cols = duration:count) %>%
  filter(name != "temp",
         ts != "full") %>%
  group_by(name) %>% 
  nest() %>% 
  mutate(plots = map2(data, name, plot_func)) 
gridExtra::grid.arrange(grobs = nested_SOM$plots)

# Summary heatmap for correlation values only
events_cor_SOM %>% 
  dplyr::select(node, ts, bottomT:qnet_mld_cum) %>% 
  mutate(node = as.factor(node)) %>% 
  group_by(node, ts) %>% 
  summarise_if(is.numeric, mean) %>% 
  # dplyr::select(duration) %>% 
  pivot_longer(cols = bottomT:qnet_mld_cum) %>%
  filter(name != "temp",
         ts != "full") %>%
  ggplot(aes(x = node, y = ts)) +
  geom_tile(aes(fill = value)) +
  facet_wrap(~name, scales = "free") +
  scale_fill_gradient2(low = "blue", high = "red") +
  coord_cartesian(expand = F) +
  labs(y = NULL, fill = "r (mean)")

# Boxplots for correlation values only
events_cor_SOM %>% 
  dplyr::select(node, ts, bottomT:qnet_mld_cum) %>% 
  mutate(node = as.factor(node)) %>% 
  group_by(node, ts) %>%
  # summarise_if(is.numeric, mean) %>% 
  # dplyr::select(duration) %>% 
  pivot_longer(cols = bottomT:qnet_mld_cum) %>%
  filter(name != "temp",
         ts != "full") %>%
  ggplot(aes(x = node, y = value)) +
  geom_boxplot(aes(fill = ts)) +
  facet_wrap(~name, scales = "free") +
  # scale_fill_gradient2(low = "blue", high = "red") +
  coord_cartesian(expand = F) +
  labs(y = NULL, fill = "time series\nsection")
```

Some important patterns come through when we look at the summary correlation and MHW metric results when grouped into their SOM nodes. This is as far as the numeric results will go. From here out it is necessary for a human to look at these summary results with the SOM node results to discern the meaning of the combined results.

## Results

In the following table a more concise summary of the results is presented.

```{r, echo=FALSE, message=FALSE}
# NB: This table was created manually by going through the Shiny app one variable at a time.
res_table <- read_csv("data/res_table.csv")
knitr::kable(res_table, caption = "Most of the variables that have been correlated against the temperature anomalies during the onset, decline, and full duration of MHWs. The cumulative heat flux terms were corrected for by the daily MLD (Q/(rho x Cp x hmld)) before the correlations were calculated. Correlations were also run on the cumulative flux terms without correcting for MLD, but there was little difference so the results are not itemised here. This table shows the full names of the variables, as well as the abbreviations used in the code. The 'onset' column describes (in shorthand) what the tendency of correlations for the MHWs is during the onset of events. This is repeated for the 'full' and 'decline' columns respectively. The 'season' column briefly states the most clear/noteworthy pattern(s) when looking at how the correlations are divided up by season. The same is done in the 'region' column. The last column, 'story', gives a TRUE/FALSE if I think the variable has a story to tell. Something worth pursuing further. Particularly to see if the variables relate strongly to other variables, not just temperature. This then could provide a framework for determining 'types' of MHWs (e.g. strong SSS change with strong latent heat flux).")
```

With a table organised by each variable, it makes sense to also create a table organised by season, and another by region.

```{r}

```

## Notes

The deepening of the MLD with MHW onset may be due to wind mixing into the deeper warmer water during winter.

Look into the relationship between MSLP and MHW onset. It is odd.

Look into relationship between decrease in SSS with decline and also decrease in latent heat flux/evaporation.

### NWA 2012

From Chen et al. 2016 (JGR)
Such an extreme event in the MAB was attributed to the anomalous atmospheric forcing, which was linked to the northward shift in the jet stream position [Chen et al., 2014a, 2015]. The anomalously warm atmospheric conditions in the winter of 2011–2012 increased the ocean heat content (increased the ocean heat content anomaly) and facilitated the extreme warm ocean temperature in spring 2012 [Chen et al., 2014a, 2015]. On the other hand, the ocean advection played a secondary role, which partially damped the heat content anomaly created by the air-sea heat flux [Chen et al., 2015].
In both cases, initial temperature and ocean advection are not sufficient to describe the seasonal mean temperature. Additional cooling (warming) in addition to ocean advection is needed to further describe the winter (spring) temperature. In comparison, using the sum of the initial temperature and air-sea flux yields a much better description of seasonal mean temperatures (Figures 5c and 5f)
While the overall role of ocean advection is smaller than that of air-sea flux in determining the winter and spring temperatures, the year-to-year changes in the relative importance is worth investigating.
Normally, given anomalous initial temperature, air will act to damp the temperature anomaly, as in winter 2007 or 2011, or even 2005 to some extent. However, in winter 2012, the air continued to increase the temperature anomaly.
Out of the 12 years 2003–2014, the air-sea flux normally dominated the temperature anomaly in the MAB during winter. In only 3 years was the winter time temperature anomaly primarily controlled by ocean advection.
For spring, ocean advection has more control on the temperature anomalies than air-sea flux does, although the difference is smaller (Table 2). In both seasons, the relative importance of air-sea flux and ocean advection does not seem to be related to either the initial or seasonal mean thermal condition of the shelf water (fourth and fifth columns of Tables 1 and 2).
The correlation coefficients increase from 0.66 in the first half of February to 0.91 in the second half of March. This suggests that estimation of spring temperature anomaly in the MAB  based  on  the  thermal  condition  2 months before spring is statistically possible.
This suggests that more northerly jet stream positions result in larger heatflux from the atmosphere into the ocean in the MAB. This is likely due to warmer and more humid air overlying the continental shelf, which reduces the heat loss from the ocean during the cooling seasons [Chenet al., 2014a].
In spring and summer, the air-sea flux may be less correlated with the air temperature due to the shallowness of the surface mixed layer, and thus may be disconnected from large-scale atmospheric circulation, i.e., jetstream variability.

## References