analysis/mhw-flux.Rmd

---
title: "MHWs vs. heat flux"
author: "Robert Schlegel"
date: "2020-02-25"
output: workflowr::wflow_html
editor_options:
  chunk_output_type: console
csl: FMars.csl
bibliography: MHWflux.bib
---

```{r global_options, include = FALSE}
knitr::opts_chunk$set(fig.width = 8, fig.align = 'center',
                      echo = TRUE, warning = FALSE, message = FALSE, 
                      eval = TRUE, tidy = FALSE)
```

## Introduction

This vignette will walk through the thinking and the process for how to link physical variables to their potential effect on driving or dissipating MHWs. The primary source that inspired this work was @Chen2016. In this paper the authors were able to illustrate which parts of the heat budget were most likely driving the anomalous heat content in the surface of the ocean. What this analysis seeks to do is to build on this methodology by applying the fundamental concept to ALL of the MHWs detected in the NW Atlantic. Fundamentally we are running thousands of correlations between SST anomalies and the co-occurrent anomalies for a range of physical variables. The stronger the correlation (both positive and negative) the more of an indication this is to us that these phenomena are related.

```{r startup}
# All of the libraries and objects used in the project
# Note that this also loads the data we will be using in this vignette
source("code/functions.R")
```

## Correlations

We know when the MHWs occurred, and our physical data are prepped, so what we need to do is run correlations between SST from the start to peak and peak to end of each event for the full suite of variables. This will show us for each event which values correlated the best for the onset AND decline of the events. We'll also run correlations on the full time series.

```{r MHW-var-cor, eval=FALSE}
# Extract just the event info
GLORYS_MHW_event_index <- GLORYS_MHW_event %>% 
  select(event_no, region, season) %>% 
  ungroup() %>% 
  mutate(row_index = 1:n())

# Run all the stats
ALL_cor <- plyr::ddply(GLORYS_MHW_event_index, .parallel = T,
                       .variables = c("row_index"), .fun = cor_all) %>% 
  left_join(GLORYS_MHW_event_index, by = "row_index") %>% 
  select(region, season, event_no, ts, everything()) %>%
  arrange(region, event_no) %>% 
  mutate(Parameter2 = factor(Parameter2))

# Save
saveRDS(ALL_cor, "data/ALL_cor.Rda")
saveRDS(ALL_cor, "shiny/ALL_cor.Rda")
```

Seeing as how we're just running correlations at the moment everything runs pretty quickly. With the method sorted for now we need to have a look at the results. What we have at the moment is a long dataframe containing the correlations of different variables with the temperature anomaly. It must be pointed out that these are for the same day, there is no time lag introduced, which may be important. Below we are going to visualise the range of correlations for each variable to see how much each distribution is skewed. This skewness could probably be quantified in a meaningful way... but let's look at the data first.

We also want to filter by p-value to highlight the strong correlations.

```{r shiny-histo}
# source("shiny/app.R")
# Or it is live here:
# https://robert-schlegel.shinyapps.io/MHWflux/
```

Wow! What a surprise. There are some really clear patterns coming through in the data. In particular SSS seems to be strongly related to the onset of MHWs. There are a lot of nuances in these data and so I think this is actually an example of where a Shiny app is useful to interrogate the data.

In the shiny app it also comes out that the longer events tend not to correlate strongly with a single variable. This is to be expected and supports the argument that very persistent MHWs are supported by a confluence of variables. How to parse that out is an interesting challenge.

## Regions + Seasons

With the correlations calculated for the onset, decline, and full extent of each MHW, we also want to know if any signals emerge from the regions and/or seasons of occurrence of these events. Is the relationship between SSS and MHW onset stronger in the winter? Stronger in certain region? I'm thinking a linear model may be the way to go on this.

Having manually looked through the Shiny app it does look like there are some patterns. These will be written down in the table below.

## Relationships

With patterns pulled out by region and season, we want to see if there are any relationships between MHWs that show strong correlations at onset with a particular variables and strong correlations at decline with another. We will look for this within regions and seasons as well. For example, do MHWs that correlate well with an increase in SSS also correlate well with a decrease in long-wave radiation during the decline of the event? I'm not sure how best to go about this in a clean manner. I'll have to think of something clever.

One possibility could be to look at correlations between r values themselves. And to do so by region + season. It may also work to just directly correlate different values with one another. See if any patterns come out by region and/or season.

```{r relationships}

```

Another thing to consider would be if fast onset slow decline (and vice versa) events have different characteristics to slower evolving events. The same question could be posed to long vs short events and those with high intensities vs low. In order to begin this investigation we must join the MHW results to the correlation results. Yet another thing to look at is the number of days in the MHW section and the correlations of that events with the variables.

```{r metrics_cor_cor, eval=FALSE}
ALL_cor_wide <- readRDS("data/ALL_cor.Rda") %>% 
  ungroup() %>% 
  filter(Parameter1 == "temp") %>% 
  dplyr::select(region:ts, Parameter2, r, n_Obs) %>% 
  pivot_wider(values_from = r, names_from = Parameter2)

# Combine MHW metrics and correlation results
events_cor_prep <- GLORYS_MHW_event %>% 
  dplyr::select(region, season, event_no, duration, intensity_mean, intensity_max, 
                intensity_cumulative, rate_onset, rate_decline) %>% 
  left_join(ALL_cor_wide, by = c("region", "season", "event_no")) %>% 
  dplyr::select(-event_no)

# All correlations by region
events_cor_region <- events_cor_prep %>% 
  group_by(region, ts) %>% 
  correlation(redundant = TRUE) %>% 
  mutate_if(is.numeric, round, 4) %>% 
  filter(Parameter1 %in% c("duration", "intensity_mean", "intensity_max", 
                           "intensity_cumulative", "rate_onset", "rate_decline"),
         !Parameter2 %in% c("duration", "intensity_mean", "intensity_max",
                           "intensity_cumulative", "rate_onset", "rate_decline"))
saveRDS(events_cor_region, "data/events_cor_region.Rds")

# All correlations by season
events_cor_season <- events_cor_prep %>% 
  group_by(season, ts) %>% 
  correlation(redundant = TRUE) %>% 
  mutate_if(is.numeric, round, 4) %>% 
  filter(Parameter1 %in% c("duration", "intensity_mean", "intensity_max", 
                           "intensity_cumulative", "rate_onset", "rate_decline"),
         !Parameter2 %in% c("duration", "intensity_mean", "intensity_max",
                           "intensity_cumulative", "rate_onset", "rate_decline"))
saveRDS(events_cor_season, "data/events_cor_season.Rds")

# All correlations by region+season
# Some groupings don't have enough observations
# This throws an error but it still runs
# events_cor_region_season <- events_cor_prep %>%
#   group_by(region, season, ts) %>%
#   correlation(redundant = TRUE) %>%
#   mutate_if(is.numeric, round, 4) %>%
#   filter(Parameter1 %in% c("duration", "intensity_mean", "intensity_max",
#                            "intensity_cumulative", "rate_onset", "rate_decline"),
#          !Parameter2 %in% c("duration", "intensity_mean", "intensity_max",
#                            "intensity_cumulative", "rate_onset", "rate_decline"))
# saveRDS(events_cor_region_season, "data/events_cor_region_season.Rds")

# test visuals
events_cor_prep %>% 
  filter(region == "nfs", ts == "onset") %>% 
  ggplot(aes(x = intensity_mean, y = msshf_mld)) +
  geom_smooth(method = "lm", se = FALSE, colour = "black") +
  geom_point(aes(colour = season))
```

These correlations between the MHW metrics and the correlations from the previous round of calculations do show som interesting results, but are a bit difficult to interpret. I think it would also be useful to create the summary stats of the r values, too. These are visible in the Shiny app but are not currently saved in this project.

```{r}

```

Another thought now was to show the summary stats of the r results by MHW metric.

```{r}

```

Here is a heat map of important variables ordered by MHW length.

```{r}

```


I'm starting to think I'm going to need to cluster these results in order to make the results more manageable.

## Results

In the following table a more concise summary of the results is presented.

```{r, echo=FALSE, message=FALSE}
# NB: This table was created manually by going through the Shiny app one variable at a time.
res_table <- read_csv("data/res_table.csv")
knitr::kable(res_table, caption = "Most of the variables that have been correlated against the temperature anomalies during the onset, decline, and full duration of MHWs. The cumulative heat flux terms were corrected for by the daily MLD (Q/(rho x Cp x hmld)) before the correlations were calculated. Correlations were also run on the cumulative flux terms without correcting for MLD, but there was little difference so the results are not itemised here. This table shows the full names of the variables, as well as the abbreviations used in the code. The 'onset' column describes (in shorthand) what the tendency of correlations for the MHWs is during the onset of events. This is repeated for the 'full' and 'decline' columns respectively. The 'season' column briefly states the most clear/noteworthy pattern(s) when looking at how the correlations are divided up by season. The same is done in the 'region' column. The last column, 'story', gives a TRUE/FALSE if I think the variable has a story to tell. Something worth pursuing further. Particularly to see if the variables relate strongly to other variables, not just temperature. This then could provide a framework for determining 'types' of MHWs (e.g. strong SSS change with strong latent heat flux).")
```

With a table organised by each variable, it makes sense to also create a table organised by season, and another by region.

```{r}

```

## Notes

The deepening of the MLD with MHW onset may be due to wind mixing into the deeper warmer water during winter.

Look into the relationship between MSLP and MHW onset. It is odd.

Look into relationship between decrease in SSS with decline and also decrease in latent heat flux/evaporation.

### NWA 2012

From Chen et al. 2016 (JGR)
Such an extreme event in the MAB was attributed to the anomalous atmospheric forcing, which was linked to the northward shift in the jet stream position [Chen et al., 2014a, 2015]. The anomalously warm atmospheric conditions in the winter of 2011–2012 increased the ocean heat content (increased the ocean heat content anomaly) and facilitated the extreme warm ocean temperature in spring 2012 [Chen et al., 2014a, 2015]. On the other hand, the ocean advection played a secondary role, which partially damped the heat content anomaly created by the air-sea heat flux [Chen et al., 2015].
In both cases, initial temperature and ocean advection are not sufficient to describe the seasonal mean temperature. Additional cooling (warming) in addition to ocean advection is needed to further describe the winter (spring) temperature. In comparison, using the sum of the initial temperature and air-sea flux yields a much better description of seasonal mean temperatures (Figures 5c and 5f)
While the overall role of ocean advection is smaller than that of air-sea flux in determining the winter and spring temperatures, the year-to-year changes in the relative importance is worth investigating.
Normally, given anomalous initial temperature, air will act to damp the temperature anomaly, as in winter 2007 or 2011, or even 2005 to some extent. However, in winter 2012, the air continued to increase the temperature anomaly.
Out of the 12 years 2003–2014, the air-sea flux normally dominated the temperature anomaly in the MAB during winter. In only 3 years was the winter time temperature anomaly primarily controlled by ocean advection.
For spring, ocean advection has more control on the temperature anomalies than air-sea flux does, although the difference is smaller (Table 2). In both seasons, the relative importance of air-sea flux and ocean advection does not seem to be related to either the initial or seasonal mean thermal condition of the shelf water (fourth and fifth columns of Tables 1 and 2).
The correlation coefficients increase from 0.66 in the first half of February to 0.91 in the second half of March. This suggests that estimation of spring temperature anomaly in the MAB  based  on  the  thermal  condition  2 months before spring is statistically possible.
This suggests that more northerly jet stream positions result in larger heatflux from the atmosphere into the ocean in the MAB. This is likely due to warmer and more humid air overlying the continental shelf, which reduces the heat loss from the ocean during the cooling seasons [Chenet al., 2014a].
In spring and summer, the air-sea flux may be less correlated with the air temperature due to the shallowness of the surface mixed layer, and thus may be disconnected from large-scale atmospheric circulation, i.e., jetstream variability.

## References