-
Notifications
You must be signed in to change notification settings - Fork 1
/
mhw-flux.Rmd
837 lines (719 loc) · 40.5 KB
/
mhw-flux.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
---
title: "MHWs vs. heat flux"
author: "Robert Schlegel"
date: "2020-02-25"
output: workflowr::wflow_html
editor_options:
chunk_output_type: console
csl: FMars.csl
bibliography: MHWflux.bib
---
```{r global_options, include = FALSE}
knitr::opts_chunk$set(fig.width = 8, fig.align = 'center',
echo = TRUE, warning = FALSE, message = FALSE,
eval = TRUE, tidy = FALSE)
```
## Introduction
This vignette will walk through the thinking and the process for how to link physical variables to their potential effect on driving or dissipating MHWs. The primary source that inspired this work was @Chen2016. In this paper the authors were able to illustrate which parts of the heat budget were most likely driving the anomalous heat content in the surface of the ocean. What this analysis seeks to do is to build on this methodology by applying the fundamental concept to ALL of the MHWs detected in the NW Atlantic. Fundamentally we are running thousands of RMSE and correlation calculations between SST anomalies and the co-occurring anomalies for a range of physical variables. The lower the RMSE and the stronger the correlation (both positive and negative) the more of an indication this is to us that these phenomena are related. We will also be comparing the magnitude of change for Qx terms and SSTa during the onset and decline of MHWs to see how much of these events can be attributed to heat flux.
```{r startup}
# All of the libraries and objects used in the project
# Note that this also loads the data we will be using in this vignette
source("code/functions.R")
```
## Relating SSTa to physical variables
We know when MHWs occurred, and our physical data are prepped, so what we need to do is run magnitude comparisons, RMSE, and correlations between SSTa from the start to peak and peak to end of each event for the full suite of variables. This will show us for each event which values related the best for the onset AND decline of the events. We will run RMSE and correlations on the full time series, but these were decided not to be included in the final manuscript as they don't add much other than to show how much better the calculations for onset or decline are.
The magnitude of change in SSTa and SST_Qx for the onset and decline of MHWs will be calculated by finding how much SSTa and SST_Qx increase from the onset to the peak of a MHW, and how much SSTa and SST_Qx decrease on the decline of the same event. The thinking here is that if SSTa exceeds the SST_Qx increase for the onset, this will indicate that some advective process is aiding the development of the SSTa, but if SST_Qx exceeds SSTa, this indicates that the heat flux term is responsible for the creation of the event. Likewise for the decline of events, if the SST_Qx value decreases more than SSTa this would mean that negative heat flux into the ocean was terminating the event, whereas a greater decrease in SSTa would imply that advection was primarily terminating the event.
Running correlations is useful to see how well certain variables change in relation to SSTa, but these do not show how similar those proportions of change are. In order to do this we need to calculate the RMSE between the SST anomalies and the SST_Qx terms. This may only be done with the SST_Qx terms because they are in the same units as SST. One additional tweak we will need here is to add the cumulative Qx term for the following day to the SSTa from the first day of the MHW. This is done so that we may see more closely how the developing Qx term matches to the development of the SSTa. Effectively this tells us how much the Qx term may be driving SSTa.
```{r MHW-var-cor, eval=FALSE}
# Extract just the event info
OISST_MHW_event_index <- OISST_MHW_event %>%
select(event_no, region, season) %>%
ungroup() %>%
mutate(row_index = 1:n())
# Run all the stats
ALL_cor <- plyr::ddply(OISST_MHW_event_index, .parallel = T,
.variables = c("row_index"), .fun = stats_all,
df_event = OISST_MHW_event) %>%
left_join(OISST_MHW_event_index, by = "row_index") %>%
select(region, season, event_no, ts, everything()) %>%
arrange(region, event_no) %>%
mutate(Parameter2 = factor(Parameter2))
# Save
saveRDS(ALL_cor, "data/ALL_cor.Rda")
saveRDS(ALL_cor, "shiny/ALL_cor.Rda")
```
What we have now is a long dataframe containing the magnitude of change, RMSE, and correlations of different variables with SSTa. It must be pointed out that the correlations for the non-Qx terms are for the same day, there is no time lag introduced, which may be important. Below we are going to visualise the range of correlations for each variable to see how much each distribution is skewed. This skewness could probably be quantified in a meaningful way... but let's look at the data first.
```{r shiny-histo}
# source("shiny/app.R")
# Or it is live here:
# https://robert-schlegel.shinyapps.io/MHWflux/
```
There are some really clear patterns coming through in the data. In particular SSS seems to be strongly related to the onset of MHWs. There are a lot of nuances in these data and so I think this is actually an example of where a Shiny app is useful to interrogate the data.
In the shiny app it also comes out that the longer events tend not to correlate strongly with a single variable. This is to be expected and supports the argument that very persistent MHWs are supported by a confluence of variables. How to parse that out is an interesting challenge.
## Results
### 12 hour offset
While going through the data it was discovered that the OISST temperature values represent an average from midnight to midnight. This therefore puts them off set to the Qx terms, which were also being calculated over the same time period. But to compare the Qx terms to SST there should be a 12 hour offset. So 12 hours were added to the terms before re-running the pipeline and recalculating all of the correlations and RMSE. In the code chunk below we compare the stats pre-12 hour shift and afterwards.
```{r shift-compare}
# Load old and new data
ALL_cor_old <- readRDS("data/ALL_cor_old.Rda") %>%
mutate(region = toupper(region)) %>%
mutate(run = "old")
ALL_cor <- readRDS("data/ALL_cor.Rda") %>%
dplyr::select(region:n_Obs, rmse) %>%
mutate(run = "new")
ALL_cor_stack <- rbind(ALL_cor, ALL_cor_old) %>%
filter(rmse > 0) %>%
mutate(Parameter2 = as.character(Parameter2),
Parameter2 = case_when(Parameter2 == "qnet_budget" ~ "qnet",
Parameter2 == "lhf_budget" ~ "lhf",
Parameter2 == "shf_budget" ~ "shf",
Parameter2 == "lwr_budget" ~ "lwr",
Parameter2 == "swr_budget" ~ "swr",
Parameter2 == "qnet_mld_cum" ~ "qnet",
Parameter2 == "lhf_mld_cum" ~ "lhf",
Parameter2 == "shf_mld_cum" ~ "shf",
Parameter2 == "lwr_mld_cum" ~ "lwr",
Parameter2 == "swr_mld_cum" ~ "swr",
TRUE ~ Parameter2))
# Compare RMSE quantiles
ALL_cor_stack %>%
filter(rmse > 0) %>%
group_by(run, region, ts, Parameter2) %>%
summarise(q10 = quantile(rmse, 0.10),
q50 = quantile(rmse, 0.50),
q90 = quantile(rmse, 0.90)) %>%
pivot_wider(names_from = run, values_from = q10:q90)
# Plot difference
ALL_cor_stack %>%
filter(rmse > 0) %>%
ggplot(aes(x = ts, y = rmse)) +
geom_boxplot(aes(fill = run)) +
facet_grid(Parameter2~region) #+
# scale_y_continuous(limits = c(0, 2))
```
The new an improved method that correctly integrates Qx into SSTa creates much lower RMSE values for almost every variable and region. A smashing success.
### Magnitude of change summary
In this sub-section we will look at how the different magnitudes of change for onset and decline of the SST_Qx terms compare to the same changes in SSTa.
```{r mag-summary}
# Load magnitude data
ALL_mag <- readRDS("data/ALL_cor.Rda") %>%
dplyr::select(region:ts, Parameter2, n_Obs, mag) %>%
na.omit()
# Calculate the proportions of change in magnitudes
ALL_mag_prop <- ALL_mag %>%
filter(Parameter2 != "sst") %>%
left_join(filter(ALL_mag, Parameter2 == "sst"),
by = c("region", "season", "event_no", "ts", "n_Obs")) %>%
filter(ts != "full") %>%
dplyr::select(-Parameter2.y) %>%
dplyr::rename(var = Parameter2.x, mag_Qx = mag.x, mag_SSTa = mag.y) %>%
mutate(prop = mag_Qx/mag_SSTa) %>%
mutate(var = as.character(var),
var = case_when(var == "lwr_budget" ~ "Qlw",
var == "swr_budget" ~ "Qsw",
var == "lhf_budget" ~ "Qlh",
var == "shf_budget" ~ "Qsh",
var == "qnet_budget" ~ "Qnet",
TRUE ~ var),
var = factor(var, levels = c("Qnet", "Qlh", "Qsh", "Qlw", "Qsw")),
region = toupper(region))
# Find an event with good onset and decline match
ALL_mag_prop %>% filter(var == "Qnet", prop >= 1, prop <= 2)
# Create schematic figure showing how magnitudes are calculated
p1 <- fig_mag_func(2, "GSL") +
ggtitle("Good onset, bad decline")
p2 <- fig_mag_func(44, "GM") +
ggtitle("Bad onset, good decline")
p3 <- fig_mag_func(32, "GSL") +
ggtitle("Good onset, good decline")
mag_schematic <- ggpubr::ggarrange(p1, p2, p3, ncol = 3, nrow = 1, common.legend = TRUE)
# mag_schematic
# Scatterplot of SSTa vs SST_Qx (facets per variable)
mag_scatter <- ALL_mag_prop %>%
ggplot(aes(x = mag_SSTa, y = mag_Qx)) +
geom_point(aes(colour = n_Obs, shape = ts)) +
geom_smooth(aes(linetype = ts), method = "lm", colour = "black") +
scale_colour_gradientn(colours = rainbow(10)) +
facet_wrap(~var, nrow = 1) +
labs(x = "ΔSSTa", y = "ΔSST_Qx", shape = "Time series", linetype = "Time series") +
theme(legend.position = "top")
# mag_scatter
# Boxplots of proportions of change for SST_Qx for onset/decline
mag_box <- ALL_mag_prop %>%
ggplot(aes(x = ts, y = prop)) +
geom_hline(aes(yintercept = 0), colour = "red") +
geom_boxplot(aes(fill = season), outlier.shape = NA) +
# geom_boxplot(aes(fill = region), outlier.shape = NA) +
scale_y_continuous(limits = c(-2, 2)) +
facet_wrap(~var, nrow = 1) +
labs(x = NULL, y = "ΔSST_Qx / ΔSSTa")
# mag_box
# Combine for one massive display
mag_full <- ggpubr::ggarrange(mag_schematic, mag_scatter, mag_box, ncol = 1, nrow = 3, labels = c("A)", "B)", "C)"))
ggsave("output/magnitude_schematic.png", height = 8, width = 10)
ggsave("output/magnitude_schematic.pdf", height = 8, width = 10)
mag_full
```
The first thing we want to look at with these magnitude scores is how many MHWs appear to be driven or decayed by Qnet. We determine this if the proportion of change in Qnet is more than half that of the change in SSTa. But also, because the Qx variables can be greater than the sum (Qnet), we will see when individual variables are contributing more than half of the change in SSTa, too.
```{r mag-prop}
# Create TRUE/FALSE table when prop is over 0.5 for any variable
ALL_mag_TF <- ALL_mag_prop %>%
mutate(prop_TF = ifelse(prop > 0.5, TRUE, FALSE)) %>%
dplyr::select(region:n_Obs, prop_TF) %>%
pivot_wider(names_from = var, values_from = prop_TF)
# Total proportion of MHWs forced by Qnet
sum(ALL_mag_TF$Qnet)/nrow(ALL_mag_TF)
# Total number of events driven by Qnet
nrow(filter(ALL_mag_TF, ts == "onset", Qnet == TRUE))
# Total number of events decayed by Qnet
nrow(filter(ALL_mag_TF, ts == "decline", Qnet == TRUE))
# Proportion of MHWs with Qnet inset and decline
ALL_mag_TF %>%
dplyr::select(region:ts, Qnet) %>%
pivot_wider(names_from = ts, values_from = Qnet) %>%
filter(onset == TRUE, decline == TRUE)
# Counts by time series phase
ALL_mag_count_ts <- ALL_mag_prop %>%
group_by(ts, var) %>%
mutate(ts_count = n(),
group = "Total") %>%
filter(prop > 0.5) %>%
mutate(filter_count = n(),
filter_prop = round(filter_count/ts_count, 2)) %>%
dplyr::select(group, ts, var, filter_prop) %>%
distinct() %>%
mutate(var = factor(var, levels = c("Qnet", "Qlh", "Qsh", "Qlw", "Qsw"))) %>%
pivot_wider(names_from = var, values_from = filter_prop) %>%
dplyr::select(group, ts, Qnet, Qlh, Qsh, Qlw, Qsw)
# knitr::kable(ALL_mag_count_ts)
# Counts by time region phase
ALL_mag_count_region <- ALL_mag_prop %>%
group_by(region, ts, var) %>%
mutate(ts_count = n()) %>%
filter(prop > 0.5) %>%
mutate(filter_count = n(),
filter_prop = round(filter_count/ts_count, 2)) %>%
dplyr::select(region, ts, var, filter_prop) %>%
distinct() %>%
pivot_wider(names_from = var, values_from = filter_prop) %>%
dplyr::select(region, ts, Qnet, Qlh, Qsh, Qlw, Qsw) %>%
arrange(region, ts) %>%
dplyr::rename(group = region)
# knitr::kable(ALL_mag_count_region)
# Counts by time series phase
ALL_mag_count_season <- ALL_mag_prop %>%
group_by(season, ts, var) %>%
mutate(ts_count = n()) %>%
filter(prop > 0.5) %>%
mutate(filter_count = n(),
filter_prop = round(filter_count/ts_count, 2)) %>%
dplyr::select(season, ts, var, filter_prop) %>%
distinct() %>%
pivot_wider(names_from = var, values_from = filter_prop) %>%
dplyr::select(season, ts, Qnet, Qlh, Qsh, Qlw, Qsw) %>%
arrange(season, ts) %>%
dplyr::rename(group = season)
# knitr::kable(ALL_mag_count_season)
ALL_mag_count_ts_region_season <- rbind(ALL_mag_count_ts, ALL_mag_count_region, ALL_mag_count_season) %>%
dplyr::select(group:Qnet) %>%
pivot_wider(names_from = ts, values_from = Qnet) %>%
dplyr::select(group, onset, decline) %>%
mutate(onset = paste0(onset*100,"%"),
decline = paste0(decline*100,"%"))
knitr::kable(ALL_mag_count_ts_region_season)
```
We also want a table showing the count of best magnitude match per variable per region/season.
```{r mag-table}
# Get the top results other than Qnet
ALL_mag_top <- ALL_mag_prop %>%
filter(var != "Qnet") %>% # Remove Qnet
group_by(region, season, event_no, ts) %>%
filter(prop == max(prop))
# The top count by region
ALL_mag_region <- ALL_mag_top %>%
group_by(region, ts, var) %>%
summarise(count = n(), .groups = "drop") %>%
pivot_wider(names_from = var, values_from = count) %>%
mutate(Qlw = replace_na(Qlw, 0)) %>%
left_join(mag_region_count, by = c("region", "ts")) %>%
dplyr::select(region, ts, count, Qlh:Qsw) %>%
mutate(Qlh = paste0(Qlh, " (",round((Qlh/count)*100),"%)"),
Qsh = paste0(Qsh, " (",round((Qsh/count)*100),"%)"),
Qlw = paste0(Qlw, " (",round((Qlw/count)*100),"%)"),
Qsw = paste0(Qsw, " (",round((Qsw/count)*100),"%)")) %>%
dplyr::rename(group = region)
# knitr::kable(ALL_mag_region)
# The top count by season
ALL_mag_season <- ALL_mag_top %>%
group_by(season, ts, var) %>%
summarise(count = n(), .groups = "drop") %>%
pivot_wider(names_from = var, values_from = count) %>%
mutate(Qsh = replace_na(Qsh, 0)) %>%
left_join(mag_season_count, by = c("season", "ts")) %>%
dplyr::select(season, ts, count, Qlh:Qsw) %>%
mutate(Qlh = paste0(Qlh, " (",round((Qlh/count)*100),"%)"),
Qsh = paste0(Qsh, " (",round((Qsh/count)*100),"%)"),
Qlw = paste0(Qlw, " (",round((Qlw/count)*100),"%)"),
Qsw = paste0(Qsw, " (",round((Qsw/count)*100),"%)")) %>%
dplyr::rename(group = season)
# knitr::kable(ALL_mag_season)
# Print table
ALL_mag_region_season <- rbind(ALL_mag_region, ALL_mag_season)
knitr::kable(ALL_mag_region_season)
```
These tables help to reiterate how much more important Qsw is for the onset and decline of MHWs over the other variables, and that Qlh often comes in second. For the seasons though we see a much more complex relationship that I won't go into here.
### RMSE summary
Here we will go about summarising the RMSE results. We will also create some boxplots to help with the process.
```{r RMSE-boxplot}
# The base RMSE results
ALL_RMSE <- ALL_cor %>%
filter(rmse > 0 ) %>%
dplyr::rename(var = Parameter2) %>%
dplyr::select(region:ts, var, n_Obs, rmse) %>%
mutate(var = as.character(var),
var = case_when(var == "lwr_budget" ~ "Qlw",
var == "swr_budget" ~ "Qsw",
var == "lhf_budget" ~ "Qlh",
var == "shf_budget" ~ "Qsh",
var == "qnet_budget" ~ "Qnet",
TRUE ~ var),
var = factor(var, levels = c("Qnet", "Qlh", "Qsh", "Qlw", "Qsw")),
region = toupper(region)) %>%
# filter out events where Qnet is not a primary drivers
left_join(ALL_mag_TF, by = c("region", "season", "event_no", "ts", "n_Obs")) %>%
filter(Qnet == TRUE)
# Summaries by region
ALL_RMSE_region <- ALL_RMSE %>%
dplyr::select(-season, -event_no, -n_Obs) %>%
group_by(region, ts, var) %>%
summarise(rmse_min = min(rmse),
rmse_mean = mean(rmse),
rmse_max = max(rmse), .groups = "drop")
# Boxplot of region summaries
ALL_RMSE %>%
dplyr::select(-season, -event_no, -n_Obs) %>%
ggplot(aes(x = var, y = rmse)) +
geom_boxplot(aes(fill = ts)) +
facet_wrap(~region) #+
# theme(axis.text.x = element_text(angle = 45))
# Summaries by season
ALL_RMSE_season <- ALL_RMSE %>%
dplyr::select(-region, -event_no, -n_Obs) %>%
group_by(season, ts, var) %>%
summarise(rmse_min = min(rmse),
rmse_mean = mean(rmse),
rmse_max = max(rmse), .groups = "drop")
# Boxplot of season results
ALL_RMSE %>%
dplyr::select(-region, -event_no, -n_Obs) %>%
ggplot(aes(x = var, y = rmse)) +
geom_boxplot(aes(fill = ts)) +
facet_wrap(~season)
# Boxplot by variable
ALL_RMSE %>%
filter(var != "Qnet") %>%
ggplot(aes(x = region, y = rmse)) +
geom_boxplot(aes(fill = ts)) +
facet_wrap(~var)
ALL_RMSE %>%
filter(var != "Qnet") %>%
ggplot(aes(x = season, y = rmse)) +
geom_boxplot(aes(fill = ts)) +
facet_wrap(~var)
```
These boxplots help to tell a high level story about the importance of the Qx terms with SSTa, but I want a slightly more fine-grain level of results. To do this I am going to count which of the Qx terms has the lowest RMSE per MHW. These will then be grouped by region and season to provide an understanding for how this may vary.
```{r RMSE-table}
# Get the top results other than Qnet
ALL_RMSE_top <- ALL_RMSE %>%
filter(var != "Qnet") %>% # Remove Qnet
group_by(region, season, event_no, ts) %>%
filter(rmse == min(rmse))
# The top count overall
ALL_RMSE_ts <- ALL_RMSE_top %>%
group_by(ts) %>%
mutate(total_count = n(),
group = "Total") %>%
group_by(group, ts, total_count, var) %>%
summarise(count = n(), .groups = "drop") %>%
pivot_wider(names_from = var, values_from = count) %>%
replace_na(list(Qlh = 0, Qsh = 0, Qlw = 0, Qsw = 0)) %>%
dplyr::select(group, ts, total_count, Qlh:Qsw) %>%
filter(ts != "full") %>%
mutate(Qlh = paste0(Qlh, " (",round((Qlh/total_count)*100),"%)"),
Qsh = paste0(Qsh, " (",round((Qsh/total_count)*100),"%)"),
Qlw = paste0(Qlw, " (",round((Qlw/total_count)*100),"%)"),
Qsw = paste0(Qsw, " (",round((Qsw/total_count)*100),"%)"))
# The top count by region
ALL_RMSE_region <- ALL_RMSE_top %>%
group_by(region, ts) %>%
mutate(total_count = n()) %>%
group_by(region, ts, total_count, var) %>%
summarise(count = n(), .groups = "drop") %>%
pivot_wider(names_from = var, values_from = count) %>%
replace_na(list(Qlh = 0, Qsh = 0, Qlw = 0, Qsw = 0)) %>%
dplyr::select(region, ts, total_count, Qlh:Qlw) %>%
filter(ts != "full") %>%
mutate(Qlh = paste0(Qlh, " (",round((Qlh/total_count)*100),"%)"),
Qsh = paste0(Qsh, " (",round((Qsh/total_count)*100),"%)"),
Qlw = paste0(Qlw, " (",round((Qlw/total_count)*100),"%)"),
Qsw = paste0(Qsw, " (",round((Qsw/total_count)*100),"%)")) %>%
dplyr::rename(group = region)
# knitr::kable(ALL_RMSE_region)
# The top count by season
ALL_RMSE_season <- ALL_RMSE_top %>%
group_by(season, ts) %>%
mutate(total_count = n()) %>%
group_by(season, ts, total_count, var) %>%
summarise(count = n(), .groups = "drop") %>%
pivot_wider(names_from = var, values_from = count) %>%
replace_na(list(Qlh = 0, Qsh = 0, Qlw = 0, Qsw = 0)) %>%
dplyr::select(season, ts, total_count, Qlh:Qlw) %>%
filter(ts != "full") %>%
mutate(Qlh = paste0(Qlh, " (",round((Qlh/total_count)*100),"%)"),
Qsh = paste0(Qsh, " (",round((Qsh/total_count)*100),"%)"),
Qlw = paste0(Qlw, " (",round((Qlw/total_count)*100),"%)"),
Qsw = paste0(Qsw, " (",round((Qsw/total_count)*100),"%)")) %>%
dplyr::rename(group = season)
# knitr::kable(ALL_RMSE_season)
# Print table
ALL_RMSE_ts_region_season <- rbind(ALL_RMSE_ts, ALL_RMSE_region, ALL_RMSE_season) %>%
dplyr::rename(count = total_count)
knitr::kable(ALL_RMSE_ts_region_season)
```
I'm pretty happy with how these results are displayed. But I think it would be better to show the regions and seasons in separate tables just to keep them more manageable. In the last chunk in this sub-section we will grab some specific summary stats that are used in the publication.
```{r RMSE-stats}
# Short event best fits
ALL_RMSE_short <- ALL_RMSE %>%
filter(n_Obs <= 14, var != "Qnet") %>%
group_by(region, season, event_no, ts) %>%
filter(rmse == min(rmse))
# Short event index
ALL_RMSE_long <- ALL_RMSE %>%
filter(n_Obs > 14, var != "Qnet") %>%
group_by(region, season, event_no, ts) %>%
filter(rmse == min(rmse))
# The top Qx for events of 14 days or less by region
ALL_RMSE_short_region <- ALL_RMSE_short %>%
group_by(region, ts) %>%
mutate(count = n()) %>%
group_by(region, ts, count, var) %>%
summarise(count_Q = n(), .groups = "drop") %>%
pivot_wider(names_from = var, values_from = count_Q) %>%
replace_na(list(Qlh = 0, Qsh = 0, Qlw = 0, Qsw = 0)) %>%
dplyr::select(region, ts, count, Qlh:Qlw) %>%
mutate(Qlh = paste0(Qlh, " (",round((Qlh/count)*100),"%)"),
Qsh = paste0(Qsh, " (",round((Qsh/count)*100),"%)"),
Qlw = paste0(Qlw, " (",round((Qlw/count)*100),"%)"),
Qsw = paste0(Qsw, " (",round((Qsw/count)*100),"%)"))
knitr::kable(ALL_RMSE_short_region)
# The top Qx for events of more than 14 days by region
ALL_RMSE_long_region <- ALL_RMSE_long %>%
group_by(region, ts) %>%
mutate(count = n()) %>%
group_by(region, ts, count, var) %>%
summarise(count_Q = n(), .groups = "drop") %>%
pivot_wider(names_from = var, values_from = count_Q) %>%
replace_na(list(Qlh = 0, Qsh = 0, Qlw = 0, Qsw = 0)) %>%
dplyr::select(region, ts, count, Qlh:Qlw) %>%
mutate(Qlh = paste0(Qlh, " (",round((Qlh/count)*100),"%)"),
Qsh = paste0(Qsh, " (",round((Qsh/count)*100),"%)"),
Qlw = paste0(Qlw, " (",round((Qlw/count)*100),"%)"),
Qsw = paste0(Qsw, " (",round((Qsw/count)*100),"%)"))
knitr::kable(ALL_RMSE_long_region)
# The top Qx for events of 14 days or less by season
ALL_RMSE_short_season <- ALL_RMSE_short %>%
group_by(season, ts) %>%
mutate(count = n()) %>%
group_by(season, ts, count, var) %>%
summarise(count_Q = n(), .groups = "drop") %>%
pivot_wider(names_from = var, values_from = count_Q) %>%
replace_na(list(Qlh = 0, Qsh = 0, Qlw = 0, Qsw = 0)) %>%
dplyr::select(season, ts, count, Qlh:Qlw) %>%
mutate(Qlh = paste0(Qlh, " (",round((Qlh/count)*100),"%)"),
Qsh = paste0(Qsh, " (",round((Qsh/count)*100),"%)"),
Qlw = paste0(Qlw, " (",round((Qlw/count)*100),"%)"),
Qsw = paste0(Qsw, " (",round((Qsw/count)*100),"%)"))
knitr::kable(ALL_RMSE_short_season)
# The top Qx for events of more than 14 days by season
ALL_RMSE_long_season <- ALL_RMSE_long %>%
group_by(season, ts) %>%
mutate(count = n()) %>%
group_by(season, ts, count, var) %>%
summarise(count_Q = n(), .groups = "drop") %>%
pivot_wider(names_from = var, values_from = count_Q) %>%
replace_na(list(Qlh = 0, Qsh = 0, Qlw = 0, Qsw = 0)) %>%
dplyr::select(season, ts, count, Qlh:Qlw) %>%
mutate(Qlh = paste0(Qlh, " (",round((Qlh/count)*100),"%)"),
Qsh = paste0(Qsh, " (",round((Qsh/count)*100),"%)"),
Qlw = paste0(Qlw, " (",round((Qlw/count)*100),"%)"),
Qsw = paste0(Qsw, " (",round((Qsw/count)*100),"%)"))
knitr::kable(ALL_RMSE_long_season)
```
We also want to see how RMSE scores change the shorter the time series portion of the MHW is.
```{r RMSE-short-change}
# Faceted scatter plot showing how RMSE changes based on ts length
ggplot(ALL_RMSE, aes(x = n_Obs, y = rmse)) +
geom_point(aes(colour = ts)) +
geom_smooth(aes(colour = ts), method = "lm") +
facet_wrap(~region)
# Faceted scatter plot showing how RMSE changes based on ts length
ggplot(ALL_RMSE, aes(x = n_Obs, y = rmse)) +
geom_point(aes(colour = ts)) +
geom_smooth(aes(colour = ts), method = "lm") +
facet_wrap(~season)
```
### Correlations: summary, regions, seasons
With the correlations calculated for the onset, decline, and full extent of each MHW, we want to know if any signals emerge from the regions and/or seasons of occurrence of these events. Is the relationship between SSS and MHW onset stronger in the winter? Stronger in certain region? Having manually looked through the Shiny app it does look like there are some patterns. Below is the code used to determine the stats referred to in the paper.
```{r}
# Overall correlation results
cor_summary_all <- ALL_cor %>%
filter(Parameter1 == "sst",
Parameter2 != "sst") %>%
group_by(ts, Parameter2) %>%
summarise(q10_r = quantile(r, 0.1),
mean_r = mean(r, na.rm = T),
q90_r = quantile(r, 0.9),
mean_rmse = mean(rmse, na.rm = T))
# Overall positive only results
cor_positive_all <- ALL_cor %>%
filter(Parameter1 == "sst",
Parameter2 != "sst",
r > 0) %>%
group_by(ts, Parameter2) %>%
summarise(q10_r = quantile(r, 0.1),
mean_r = mean(r, na.rm = T),
q90_r = quantile(r, 0.9),
mean_rmse = mean(rmse, na.rm = T))
# Overall negative only results
cor_negative_all <- ALL_cor %>%
filter(Parameter1 == "sst",
Parameter2 != "sst",
r < 0) %>%
group_by(ts, Parameter2) %>%
summarise(q10_r = quantile(r, 0.1),
mean_r = mean(r, na.rm = T),
q90_r = quantile(r, 0.9),
mean_rmse = mean(rmse, na.rm = T))
# region correlation results
cor_summary_region <- ALL_cor %>%
filter(Parameter1 == "sst",
Parameter2 != "sst") %>%
group_by(ts, region, Parameter2) %>%
summarise(q10_r = quantile(r, 0.1),
mean_r = mean(r, na.rm = T),
q90_r = quantile(r, 0.9),
mean_rmse = mean(rmse, na.rm = T))
# Look at specific results
cor_summary_region %>%
filter(Parameter2 == "mld_cum")
# region correlation results
cor_summary_season <- ALL_cor %>%
filter(Parameter1 == "sst",
Parameter2 != "sst") %>%
group_by(ts, season, Parameter2) %>%
summarise(q10_r = quantile(r, 0.1),
mean_r = mean(r, na.rm = T),
q90_r = quantile(r, 0.9),
mean_rmse = mean(rmse, na.rm = T))
# Look at specific results
cor_summary_season %>%
filter(Parameter2 == "mld")
# Test for normality of r distributions
```
### Lead or lag of heat flux
Also could look into lead or lag of Qnet etc.
### Variability in Qx during regins and seasons
See if there is more variance in Qx during certain seasons or regions.
### Relationships
With patterns pulled out by region and season, we want to see if there are any relationships between MHWs that show strong correlations at onset/decline with a particular QX term and strong correlations at onset/decline with an air/sea variable. We will look for this within regions and seasons as well. For example, do MHWs that correlate well with an increase in SSS also correlate well with a decrease in long-wave radiation during the decline of the event? I'm not sure how best to go about this in a clean manner.
Another thing to consider would be if fast onset slow decline (and vice versa) events have different characteristics to slower evolving events. The same question could be posed to long vs short events and those with high intensities vs low. In order to begin this investigation we must join the MHW results to the correlation results. We will visualise these patterns with heatmaps.
```{r metrics_cor}
# Strong positive correlation at onset and strong positive and decline and vice versa
ALL_cor_wide <- readRDS("data/ALL_cor.Rda") %>%
dplyr::rename(var = Parameter2) %>%
ungroup() %>%
filter(Parameter1 == "sst") %>%
dplyr::select(region:ts, var, r, n_Obs) %>%
mutate(var = as.character(var)) %>%
mutate(var = case_when(var == "sst" ~ "SST",
var == "bottomT" ~ "SBT",
var == "sss" ~ "SSS",
var == "mld_cum" ~ "MLD_c",
var == "mld_1_cum" ~ "MLD_1_c",
var == "t2m" ~ "Air_temp",
var == "tcc_cum" ~ "Cloud_cover_c",
var == "p_e_cum" ~ "Precip_Evap_c",
var == "mslp_cum" ~ "MSLP_c",
var == "lwr_budget" ~ "Qlw",
var == "swr_budget" ~ "Qsw",
var == "lhf_budget" ~ "Qlh",
var == "shf_budget" ~ "Qsh",
var == "qnet_budget" ~ "Qnet",
TRUE ~ var)) %>%
filter(var %in% c("SST", "SSS", "SBT", "MLD_c", "MLD_1_c", "Air_temp", "Cloud_cover_c",
"Precip_Evap_c", "MSLP_c", "Qlw", "Qsw", "Qlh", "Qsh", "Qnet")) %>%
pivot_wider(values_from = r, names_from = var)
# Combine MHW metrics and correlation results
events_cor_prep <- OISST_MHW_event %>%
dplyr::select(region, season, event_no, duration, intensity_mean, intensity_max,
intensity_cumulative, rate_onset, rate_decline) %>%
left_join(ALL_cor_wide, by = c("region", "season", "event_no"))
# Heatmap showing average correlations by MHW duration
events_cor_prep %>%
mutate(duration = plyr::round_any(duration, 10)) %>%
group_by(ts, duration) %>%
mutate(count = n()) %>%
summarise_if(is.numeric, mean) %>%
pivot_longer(cols = Air_temp:Qsw) %>%
filter(name != "sst",
ts != "full",
duration <= 50) %>% # Only 1 event is longer than this
ggplot(aes(x = duration, y = name)) +
geom_tile(aes(fill = value)) +
geom_text(aes(label = count)) +
facet_wrap(~ts) +
scale_fill_gradient2(low = "blue", high = "red") +
coord_cartesian(expand = F) +
labs(y = NULL, x = "Duration (10 day steps)", fill = "r (mean)") +
theme(legend.position = "bottom")
# Heatmap showing average correlations by MHW max intensity
events_cor_prep %>%
mutate(intensity_max = plyr::round_any(intensity_max, 0.5)) %>%
group_by(ts, intensity_max) %>%
mutate(count = n()) %>%
summarise_if(is.numeric, mean) %>%
pivot_longer(cols = Air_temp:Qsw) %>%
filter(name != "sst",
ts != "full") %>%
ggplot(aes(x = intensity_max, y = name)) +
geom_tile(aes(fill = value)) +
geom_text(aes(label = count)) +
facet_wrap(~ts) +
scale_fill_gradient2(low = "blue", high = "red") +
coord_cartesian(expand = F) +
labs(y = NULL, x = "Max Intensity (°C; 0.5° steps)", fill = "r (mean)") +
theme(legend.position = "bottom")
# Heatmap showing average correlations by MHW rate onset
events_cor_prep %>%
mutate(rate_onset = round(rate_onset, 1)) %>%
group_by(ts, rate_onset) %>%
mutate(count = n()) %>%
summarise_if(is.numeric, mean) %>%
pivot_longer(cols = Air_temp:Qsw) %>%
filter(name != "sst",
ts != "full",
rate_onset <= 0.75) %>% # Only two is faster than this
ggplot(aes(x = rate_onset, y = name)) +
geom_tile(aes(fill = value)) +
geom_text(aes(label = count)) +
facet_wrap(~ts) +
scale_fill_gradient2(low = "blue", high = "red") +
coord_cartesian(expand = F) +
labs(y = NULL, x = "Rate of onset (°C; 0.1° steps)", fill = "r (mean)") +
theme(legend.position = "bottom")
# Heatmap showing average correlations by MHW rate decline
events_cor_prep %>%
mutate(rate_decline = round(rate_decline, 1)) %>%
filter(rate_decline <= 0.6) %>% # nip off a couple of outliers
group_by(ts, rate_decline) %>%
mutate(count = n()) %>%
summarise_if(is.numeric, mean) %>%
pivot_longer(cols = Air_temp:Qsw) %>%
filter(name != "sst",
ts != "full") %>%
ggplot(aes(x = rate_decline, y = name)) +
geom_tile(aes(fill = value)) +
geom_text(aes(label = count)) +
facet_wrap(~ts) +
scale_fill_gradient2(low = "blue", high = "red") +
coord_cartesian(expand = F) +
labs(y = NULL, x = "Rate of decline (°C; 0.1° steps)", fill = "r (mean)") +
theme(legend.position = "bottom")
```
### Linearity of events
Another observation I made while going through the results in the app was how the onset and decline of events tend to have a better RMSE when the SSTa trend is more linear. My hypothesis is that this is because there is less advective pressure on the dispersion of the heat anomaly, therefore more of the heatflux is responsible for the temperature anomaly. THis is a pretty straight forward statement, but I think it is useful in that one may be able to take linearity of SSTa as a proxy for the proportion of heat flux that is responsible for it. So in the chunk below I use simple linear models to create an R2 value for each onset and decline portion of an event. Then I find how close to 1.0 that R2 value is (meaning more linear SSTa) and I see how RMSE changes as R2 improves. I hypothesised that more linear SSTa onset will provide better (lower) RMSE values. The same is likely true for decline but I'm not certain.
```{r linearity}
# R2 for RMSE vs R2
ALL_cor_R2 <- readRDS("data/ALL_cor.Rda") %>%
ungroup() %>%
filter(Parameter1 == "sst", rmse > 0) %>%
nest_by(region, season, ts) %>%
summarise(broom::glance(lm(rmse ~ sst_R2, data = data)))
# Scatterplots
readRDS("data/ALL_cor.Rda") %>%
ungroup() %>%
filter(Parameter1 == "sst", rmse > 0) %>%
ggplot(aes(x = sst_R2, y = rmse)) +
geom_point() +
geom_smooth(method = "lm") +
facet_grid(region ~ Parameter2)
# Boxplots
readRDS("data/ALL_cor.Rda") %>%
ungroup() %>%
filter(Parameter1 == "sst", rmse > 0) %>%
mutate(sst_R2_cut = cut(sst_R2, breaks = c(0, 0.25, 0.5, 0.75,1.0))) %>%
na.omit() %>%
ggplot(aes(x = sst_R2_cut, y = rmse)) +
geom_boxplot() +
facet_grid(region ~ Parameter2)
```
Nope. If there is any relationship it is incredibly noisey. I'll not be pursuing this further for this project, but I'm not entirely dissuaded from pursuing this avenue of research in the future with a deeper learning approach.
### Choice events
There are a lot of results to wade through and though it is clear there are important signals in the results, it is proving difficult to distill them. One thought is that we don't need to look at all of the events, just the longest/most intense events with strong r values. This is first done by cutting out all Cat. I events. We then find strong correlations with long events. There should just be a few.
Once this has been done we group events by their strongest Qx relationship. Then find their strongest relationship with the next level of variables (e.g. MLD, MSLP, and so on). Ideally one may find the top four flavours.
```{r choice-events}
# Filter out smol events
events_cor_cat <- events_cor_prep %>%
left_join(OISST_MHW_cats[,c("region", "event_no", "category")], by = c("region", "event_no")) %>%
filter(category != "I Moderate", duration >= 21)
# Events with high Qlw correlations at onset
events_cor_cat %>%
filter(ts == "onset", Qlw >= 0.7)
# Melt the data frame and find the q term with the highest correlation
# Those are then used to separate events into groups
events_q_onset <- events_cor_cat %>%
filter(ts == "onset") %>%
dplyr::select(region:event_no, ts, Qnet:Qsh) %>%
pivot_longer(cols = Qnet:Qsh, names_to = "var", values_to = "val") %>%
group_by(region, season, event_no) %>%
filter(abs(val) == max(abs(val)))
# With this guide one may then parse out the q groups
events_lhf_p_onset <- events_q_onset %>%
filter(val >= 0, var == "Qlh") %>%
left_join(events_cor_cat) %>%
ungroup() %>%
# unite(region, season, event_no, sep = "_")
mutate(event_no = as.factor(event_no)) %>%
dplyr::select(region:ts, Air_temp:Qsw) %>%
pivot_longer(Air_temp:Qsw) %>%
ggplot(aes(x = event_no, y = name)) +
geom_tile(aes(fill = value)) +
# facet_wrap(~name, scales = "free") +
scale_fill_gradient2(low = "blue", high = "red") +
coord_cartesian(expand = F) +
labs(y = NULL, fill = "r (mean)") +
theme(legend.position = "bottom")
events_lhf_p_onset
```
### Table
In the following table a more concise summary of the results is presented.
```{r, echo=FALSE, message=FALSE}
# NB: This table was created manually by going through the Shiny app one variable at a time.
res_table <- read_csv("data/res_table.csv")
knitr::kable(res_table, caption = "Most of the variables that have been correlated against the temperature anomalies during the onset, decline, and full duration of MHWs. The cumulative heat flux terms were corrected for by the daily MLD (Q/(rho x Cp x hmld)) before the correlations were calculated. Correlations were also run on the cumulative flux terms without correcting for MLD, but there was little difference so the results are not itemised here. This table shows the full names of the variables, as well as the abbreviations used in the code. The 'onset' column describes (in shorthand) what the tendency of correlations for the MHWs is during the onset of events. This is repeated for the 'full' and 'decline' columns respectively. The 'season' column briefly states the most clear/noteworthy pattern(s) when looking at how the correlations are divided up by season. The same is done in the 'region' column. The last column, 'story', gives a TRUE/FALSE if I think the variable has a story to tell. Something worth pursuing further. Particularly to see if the variables relate strongly to other variables, not just temperature. This then could provide a framework for determining 'types' of MHWs (e.g. strong SSS change with strong latent heat flux).")
```
With a table organised by each variable, it makes sense to also create a table organised by season, and another by region.
```{r}
```
### NWA 2012
From Chen et al. 2016 (JGR)
Such an extreme event in the MAB was attributed to the anomalous atmospheric forcing, which was linked to the northward shift in the jet stream position [Chen et al., 2014a, 2015]. The anomalously warm atmospheric conditions in the winter of 2011–2012 increased the ocean heat content (increased the ocean heat content anomaly) and facilitated the extreme warm ocean temperature in spring 2012 [Chen et al., 2014a, 2015]. On the other hand, the ocean advection played a secondary role, which partially damped the heat content anomaly created by the air-sea heat flux [Chen et al., 2015].
In both cases, initial temperature and ocean advection are not sufficient to describe the seasonal mean temperature. Additional cooling (warming) in addition to ocean advection is needed to further describe the winter (spring) temperature. In comparison, using the sum of the initial temperature and air-sea flux yields a much better description of seasonal mean temperatures (Figures 5c and 5f)
While the overall role of ocean advection is smaller than that of air-sea flux in determining the winter and spring temperatures, the year-to-year changes in the relative importance is worth investigating.
Normally, given anomalous initial temperature, air will act to damp the temperature anomaly, as in winter 2007 or 2011, or even 2005 to some extent. However, in winter 2012, the air continued to increase the temperature anomaly.
Out of the 12 years 2003–2014, the air-sea flux normally dominated the temperature anomaly in the MAB during winter. In only 3 years was the winter time temperature anomaly primarily controlled by ocean advection.
For spring, ocean advection has more control on the temperature anomalies than air-sea flux does, although the difference is smaller (Table 2). In both seasons, the relative importance of air-sea flux and ocean advection does not seem to be related to either the initial or seasonal mean thermal condition of the shelf water (fourth and fifth columns of Tables 1 and 2).
The correlation coefficients increase from 0.66 in the first half of February to 0.91 in the second half of March. This suggests that estimation of spring temperature anomaly in the MAB based on the thermal condition 2 months before spring is statistically possible.
This suggests that more northerly jet stream positions result in larger heatflux from the atmosphere into the ocean in the MAB. This is likely due to warmer and more humid air overlying the continental shelf, which reduces the heat loss from the ocean during the cooling seasons [Chenet al., 2014a].
In spring and summer, the air-sea flux may be less correlated with the air temperature due to the shallowness of the surface mixed layer, and thus may be disconnected from large-scale atmospheric circulation, i.e., jet stream variability.
## References