Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query with dates of CMIP6 outputs with heatwaveR - NAs appear after ts_whole <- make_whole_fast(ts_xy) #13

Closed
megrounsley opened this issue Jan 21, 2021 · 6 comments

Comments

@megrounsley
Copy link

Hi there,

In R I am using the heatwaveR package to produce heatwave metrics for CMIP6 sea surface temperature outputs, except with using the ts2clm() function the output is only ever NA whichever CMIP6 model I am using. I’ve found it is at the line
ts_whole=make_whole_fast(ts_xy) that the NAs appear.

In that ts_xy looks like

  ts_x ts_y
1 2010-01-01 27.98661
2 2010-01-02 27.80809

but ts_whole looks like

  doy ts_x ts_y
1 1 2010-01-01 NA
2 2 2010-01-02 NA

and within the make_whole_fast() function I found that the NAs appear at the line
ts_merged <- merge(ts_full, data, all.x = TRUE).

A colleague tried manually playing around with the merge command, as there are options as to how the function deals with missing values or things that don't match perfectly. By tweaking it so it includes all data in all dimensions, using the command: merge(ts_full, data, all = TRUE), you get the following table

  ts_x ts_y
1 2010-01-01 NA
2 2010-01-01 27.98661
3 2010-02-02 NA
4 2010-02-02 27.80809

So strangely, the model 'dates' don't seem to be read in the same way as the dates in the table it is trying to merge with.
For example, rows 1 and 2 of ts_x look to be exactly the same ("2010-01-01"), but if we check what R thinks it disagrees
ts_merged3$ts_x[1] == ts_merged3$ts_x[2]
[1] FALSE

It does correctly recognise that all of the entries in ts_x are dates. E.g. checking if it is a date using the inherits command:
inherits(ts_merged3$ts_x[1],'Date')
[1] TRUE

I don’t have this issue with satellite sea surface temperature datasets (OISST) only ones derived from the CMIP6 models. However, both types of data are dealt with exactly the same to produce a table of lon, lat, t and temp to then be inputted into the heatwaveR workflow. Is actually some time component as well which is not shown and although the model and new table have the same date, they are different times on those dates and therefore are technically not equal? I wonder if you can help or suggest a way to correct this issue I am having.

Reproducible code example:

# Read in and check gridded temperature data
M=readRDS(gzcon(url("https://github.com/megrounsley/mrounsley_dir/raw/main/modelsubsetforheatwaveR.Rds")))
head(M)
#>        lon lat          t     temp
#> 32561  160 0.5 2010-01-01 27.98661
#> 97361  160 0.5 2010-01-02 27.80809
#> 162161 160 0.5 2010-01-03 27.65729
#> 226961 160 0.5 2010-01-04 27.59987
#> 291761 160 0.5 2010-01-05 27.53460
#> 356561 160 0.5 2010-01-06 27.41012

# Attempt heatwaveR workflow
library(heatwaveR)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
clim <- ts2clm(data = M, climatologyPeriod = c("2010-01-01", "2014-12-30"))
# See unexpected output
head(clim)
#>    doy          t temp seas thresh
#> 1:   1 2010-01-01   NA   NA     NA
#> 2:   2 2010-01-02   NA   NA     NA
#> 3:   3 2010-01-03   NA   NA     NA
#> 4:   4 2010-01-04   NA   NA     NA
#> 5:   5 2010-01-05   NA   NA     NA
#> 6:   6 2010-01-06   NA   NA     NA

# Open debug window for ts2clm() 
debugonce(ts2clm)
## See that NA's appear at line 44 in debug window with "ts_whole <- make_whole_fast(ts_xy)"
clim <- ts2clm(data = M, climatologyPeriod = c("2010-01-01", "2014-12-30"))
#> debugging in: ts2clm(data = M, climatologyPeriod = c("2010-01-01", "2014-12-30"))
#> debug: {
#>     if (missing(climatologyPeriod)) 
#>         stop("Oops! Please provide a period (two dates) for calculating the climatology.")
#>     if (length(climatologyPeriod) != 2) 
#>         stop("Bummer! Please provide BOTH start and end dates for the climatology period.")
#>     if (!(is.logical(robust))) 
#>         stop("Please ensure that 'robust' is either TRUE or FALSE.")
#>     if (robust) 
#>         message("The 'robust' argument has been deprecated and will be removed from future versions.")
#>     if (maxPadLength != FALSE & !is.numeric(maxPadLength)) 
#>         stop("Please ensure that 'maxPadLength' is either FALSE or a numeric/integer value.")
#>     if (!(is.numeric(pctile))) 
#>         stop("Please ensure that 'pctile' is a numeric/integer value.")
#>     if (!(is.numeric(windowHalfWidth))) 
#>         stop("Please ensure that 'windowHalfWidth' is a numeric/integer value.")
#>     if (!(is.logical(smoothPercentile))) 
#>         stop("Please ensure that 'smoothPercentile' is either TRUE or FALSE.")
#>     if (!(is.numeric(smoothPercentileWidth))) 
#>         stop("Please ensure that 'smoothPercentileWidth' is a numeric/integer value.")
#>     if (!(is.logical(clmOnly))) 
#>         stop("Please ensure that 'clmOnly' is either TRUE or FALSE.")
#>     if (!(is.numeric(roundClm))) {
#>         if (!roundClm == FALSE) {
#>             stop("Please ensure that 'roundClm' is either a numeric value or FALSE.")
#>         }
#>     }
#>     clim_start <- climatologyPeriod[1]
#>     clim_end <- climatologyPeriod[2]
#>     doy <- temp <- .SD <- NULL
#>     ts_x <- eval(substitute(x), data)
#>     ts_y <- eval(substitute(y), data)
#>     rm(data)
#>     if (!inherits(ts_x[1], "Date")) 
#>         stop("Please ensure your date values are type 'Date'. This may be done with 'as.Date()'.")
#>     if (!is.numeric(ts_y[1])) 
#>         stop("Please ensure the temperature values you are providing are type 'num' for numeric.")
#>     ts_xy <- data.table::data.table(ts_x = ts_x, ts_y = ts_y)[base::order(ts_x)]
#>     rm(ts_x)
#>     rm(ts_y)
#>     ts_whole <- make_whole_fast(ts_xy)
#>     if (length(stats::na.omit(ts_whole$ts_y)) < length(ts_whole$ts_y) & 
#>         is.numeric(maxPadLength)) {
#>         ts_whole <- na_interp(doy = ts_whole$doy, x = ts_whole$ts_x, 
#>             y = ts_whole$ts_y, maxPadLength = maxPadLength)
#>     }
#>     if (ts_whole$ts_x[1] > clim_start) 
#>         stop(paste("The specified start date precedes the first day of series, which is", 
#>             ts_whole$ts_x[1]))
#>     if (clim_end > utils::tail(ts_whole$ts_x, 1)) 
#>         stop(paste("The specified end date follows the last day of series, which is", 
#>             ts_whole$ts_x[nrow(ts_whole)]))
#>     if (as.Date(clim_end) - as.Date(clim_start) < 1095) 
#>         stop("The climatologyPeriod must be at least three years to calculate thresholds")
#>     ts_wide <- clim_spread(ts_whole, clim_start, clim_end, windowHalfWidth)
#>     if (nrow(stats::na.omit(ts_wide)) < nrow(ts_wide) | var) {
#>         ts_mat <- clim_calc(ts_wide, windowHalfWidth, pctile)
#>         ts_mat[is.nan(ts_mat)] <- NA
#>     }
#>     else {
#>         ts_mat <- clim_calc_cpp(ts_wide, windowHalfWidth, pctile)
#>     }
#>     rm(ts_wide)
#>     if (smoothPercentile) {
#>         ts_clim <- smooth_percentile(ts_mat, smoothPercentileWidth, 
#>             var)
#>     }
#>     else {
#>         ts_clim <- data.table::data.table(ts_mat)
#>     }
#>     cols <- names(ts_clim)
#>     if (is.numeric(roundClm)) {
#>         ts_clim[, `:=`((cols), round(.SD, roundClm)), .SDcols = cols]
#>     }
#>     rm(ts_mat)
#>     if (clmOnly) {
#>         return(ts_clim)
#>     }
#>     else {
#>         data.table::setkey(ts_whole, doy)
#>         data.table::setkey(ts_clim, doy)
#>         ts_res <- merge(ts_whole, ts_clim, all = TRUE)
#>         rm(ts_whole)
#>         rm(ts_clim)
#>         data.table::setorder(ts_res, ts_x)
#>         names(ts_res)[2] <- paste(substitute(x))
#>         names(ts_res)[3] <- paste(substitute(y))
#>         return(ts_res)
#>     }
#> }
#> debug: if (missing(climatologyPeriod)) stop("Oops! Please provide a period (two dates) for calculating the climatology.")
#> debug: if (length(climatologyPeriod) != 2) stop("Bummer! Please provide BOTH start and end dates for the climatology period.")
#> debug: if (!(is.logical(robust))) stop("Please ensure that 'robust' is either TRUE or FALSE.")
#> debug: if (robust) message("The 'robust' argument has been deprecated and will be removed from future versions.")
#> debug: if (maxPadLength != FALSE & !is.numeric(maxPadLength)) stop("Please ensure that 'maxPadLength' is either FALSE or a numeric/integer value.")
#> debug: if (!(is.numeric(pctile))) stop("Please ensure that 'pctile' is a numeric/integer value.")
#> debug: if (!(is.numeric(windowHalfWidth))) stop("Please ensure that 'windowHalfWidth' is a numeric/integer value.")
#> debug: if (!(is.logical(smoothPercentile))) stop("Please ensure that 'smoothPercentile' is either TRUE or FALSE.")
#> debug: if (!(is.numeric(smoothPercentileWidth))) stop("Please ensure that 'smoothPercentileWidth' is a numeric/integer value.")
#> debug: if (!(is.logical(clmOnly))) stop("Please ensure that 'clmOnly' is either TRUE or FALSE.")
#> debug: if (!(is.numeric(roundClm))) {
#>     if (!roundClm == FALSE) {
#>         stop("Please ensure that 'roundClm' is either a numeric value or FALSE.")
#>     }
#> }
#> debug: clim_start <- climatologyPeriod[1]
#> debug: clim_end <- climatologyPeriod[2]
#> debug: doy <- temp <- .SD <- NULL
#> debug: ts_x <- eval(substitute(x), data)
#> debug: ts_y <- eval(substitute(y), data)
#> debug: rm(data)
#> debug: if (!inherits(ts_x[1], "Date")) stop("Please ensure your date values are type 'Date'. This may be done with 'as.Date()'.")
#> debug: if (!is.numeric(ts_y[1])) stop("Please ensure the temperature values you are providing are type 'num' for numeric.")
#> debug: ts_xy <- data.table::data.table(ts_x = ts_x, ts_y = ts_y)[base::order(ts_x)]
#> debug: rm(ts_x)
#> debug: rm(ts_y)
#> debug: ts_whole <- make_whole_fast(ts_xy)
#> debug: if (length(stats::na.omit(ts_whole$ts_y)) < length(ts_whole$ts_y) & 
#>     is.numeric(maxPadLength)) {
#>     ts_whole <- na_interp(doy = ts_whole$doy, x = ts_whole$ts_x, 
#>         y = ts_whole$ts_y, maxPadLength = maxPadLength)
#> }
#> debug: if (ts_whole$ts_x[1] > clim_start) stop(paste("The specified start date precedes the first day of series, which is", 
#>     ts_whole$ts_x[1]))
#> debug: if (clim_end > utils::tail(ts_whole$ts_x, 1)) stop(paste("The specified end date follows the last day of series, which is", 
#>     ts_whole$ts_x[nrow(ts_whole)]))
#> debug: if (as.Date(clim_end) - as.Date(clim_start) < 1095) stop("The climatologyPeriod must be at least three years to calculate thresholds")
#> debug: ts_wide <- clim_spread(ts_whole, clim_start, clim_end, windowHalfWidth)
#> debug: if (nrow(stats::na.omit(ts_wide)) < nrow(ts_wide) | var) {
#>     ts_mat <- clim_calc(ts_wide, windowHalfWidth, pctile)
#>     ts_mat[is.nan(ts_mat)] <- NA
#> } else {
#>     ts_mat <- clim_calc_cpp(ts_wide, windowHalfWidth, pctile)
#> }
#> debug: ts_mat <- clim_calc(ts_wide, windowHalfWidth, pctile)
#> debug: ts_mat[is.nan(ts_mat)] <- NA
#> debug: rm(ts_wide)
#> debug: if (smoothPercentile) {
#>     ts_clim <- smooth_percentile(ts_mat, smoothPercentileWidth, 
#>         var)
#> } else {
#>     ts_clim <- data.table::data.table(ts_mat)
#> }
#> debug: ts_clim <- smooth_percentile(ts_mat, smoothPercentileWidth, var)
#> debug: cols <- names(ts_clim)
#> debug: if (is.numeric(roundClm)) {
#>     ts_clim[, `:=`((cols), round(.SD, roundClm)), .SDcols = cols]
#> }
#> debug: ts_clim[, `:=`((cols), round(.SD, roundClm)), .SDcols = cols]
#> debug: rm(ts_mat)
#> debug: if (clmOnly) {
#>     return(ts_clim)
#> } else {
#>     data.table::setkey(ts_whole, doy)
#>     data.table::setkey(ts_clim, doy)
#>     ts_res <- merge(ts_whole, ts_clim, all = TRUE)
#>     rm(ts_whole)
#>     rm(ts_clim)
#>     data.table::setorder(ts_res, ts_x)
#>     names(ts_res)[2] <- paste(substitute(x))
#>     names(ts_res)[3] <- paste(substitute(y))
#>     return(ts_res)
#> }
#> debug: data.table::setkey(ts_whole, doy)
#> debug: data.table::setkey(ts_clim, doy)
#> debug: ts_res <- merge(ts_whole, ts_clim, all = TRUE)
#> debug: rm(ts_whole)
#> debug: rm(ts_clim)
#> debug: data.table::setorder(ts_res, ts_x)
#> debug: names(ts_res)[2] <- paste(substitute(x))
#> debug: names(ts_res)[3] <- paste(substitute(y))
#> debug: return(ts_res)
#> exiting from: ts2clm(data = M, climatologyPeriod = c("2010-01-01", "2014-12-30"))


# Open debugger make_whole_fast() 
debugonce(make_whole_fast)
#> Error in debugonce(make_whole_fast): object 'make_whole_fast' not found
## Receive "Error: object 'make_whole_fast' not found" ?
make_whole_fast=function(data) {feb28 <- 59
date_start <- lubridate::ymd(utils::head(data$ts_x, 1))
date_end <- lubridate::ymd(utils::tail(data$ts_x, 1))
ts_full <- data.table::data.table(ts_x = seq.Date(date_start, date_end, "day"))
ts_merged <- merge(ts_full, data, all.x = TRUE)
rm(ts_full)
v_date <- ts_merged$ts_x
v_doy <- lubridate::yday(v_date)
v_doy <- as.integer(ifelse(
  lubridate::leap_year(lubridate::year(ts_merged$ts_x)) == FALSE,
  ifelse(v_doy > feb28, v_doy + 1, v_doy),
  v_doy)
)
v_ts_y <- as.numeric(ts_merged$ts_y)
t_series <- data.table::data.table(doy = v_doy, ts_x = v_date, ts_y = v_ts_y)
rm(v_date); rm(v_doy); rm(v_ts_y)
return(t_series)}

# Try again to open debug window for make_whole_fast() 
debugonce(make_whole_fast)
data=M
x=data$t
y=data$temp
climatologyPeriod=c("2010-01-01", "2014-12-30")
clim_start <- climatologyPeriod[1]
clim_end <- climatologyPeriod[2]
doy <- temp <- .SD <- NULL
ts_x <- eval(substitute(x), data)
ts_y <- eval(substitute(y), data)
rm(data)
ts_xy <- data.table::data.table(ts_x = ts_x, ts_y = ts_y)[base::order(ts_x)]
rm(ts_x)
rm(ts_y)
## See that NA's appear at line 5 in debug window with ts_merged <- merge(ts_full, data, all.x = TRUE)
ts_whole <- make_whole_fast(ts_xy)
#> debugging in: make_whole_fast(ts_xy)
#> debug at <text>#21: {
#>     feb28 <- 59
#>     date_start <- lubridate::ymd(utils::head(data$ts_x, 1))
#>     date_end <- lubridate::ymd(utils::tail(data$ts_x, 1))
#>     ts_full <- data.table::data.table(ts_x = seq.Date(date_start, 
#>         date_end, "day"))
#>     ts_merged <- merge(ts_full, data, all.x = TRUE)
#>     rm(ts_full)
#>     v_date <- ts_merged$ts_x
#>     v_doy <- lubridate::yday(v_date)
#>     v_doy <- as.integer(ifelse(lubridate::leap_year(lubridate::year(ts_merged$ts_x)) == 
#>         FALSE, ifelse(v_doy > feb28, v_doy + 1, v_doy), v_doy))
#>     v_ts_y <- as.numeric(ts_merged$ts_y)
#>     t_series <- data.table::data.table(doy = v_doy, ts_x = v_date, 
#>         ts_y = v_ts_y)
#>     rm(v_date)
#>     rm(v_doy)
#>     rm(v_ts_y)
#>     return(t_series)
#> }
#> debug at <text>#21: feb28 <- 59
#> debug at <text>#22: date_start <- lubridate::ymd(utils::head(data$ts_x, 1))
#> debug at <text>#23: date_end <- lubridate::ymd(utils::tail(data$ts_x, 1))
#> debug at <text>#24: ts_full <- data.table::data.table(ts_x = seq.Date(date_start, 
#>     date_end, "day"))
#> debug at <text>#25: ts_merged <- merge(ts_full, data, all.x = TRUE)
#> debug at <text>#26: rm(ts_full)
#> debug at <text>#27: v_date <- ts_merged$ts_x
#> debug at <text>#28: v_doy <- lubridate::yday(v_date)
#> debug at <text>#29: v_doy <- as.integer(ifelse(lubridate::leap_year(lubridate::year(ts_merged$ts_x)) == 
#>     FALSE, ifelse(v_doy > feb28, v_doy + 1, v_doy), v_doy))
#> debug at <text>#34: v_ts_y <- as.numeric(ts_merged$ts_y)
#> debug at <text>#35: t_series <- data.table::data.table(doy = v_doy, ts_x = v_date, 
#>     ts_y = v_ts_y)
#> debug at <text>#36: rm(v_date)
#> debug at <text>#36: rm(v_doy)
#> debug at <text>#36: rm(v_ts_y)
#> debug at <text>#37: return(t_series)
#> exiting from: make_whole_fast(ts_xy)
head(ts_xy)
#>          ts_x     ts_y
#> 1: 2010-01-01 27.98661
#> 2: 2010-01-02 27.80809
#> 3: 2010-01-03 27.65729
#> 4: 2010-01-04 27.59987
#> 5: 2010-01-05 27.53460
#> 6: 2010-01-06 27.41012
head(ts_whole)
#>    doy       ts_x ts_y
#> 1:   1 2010-01-01   NA
#> 2:   2 2010-01-02   NA
#> 3:   3 2010-01-03   NA
#> 4:   4 2010-01-04   NA
#> 5:   5 2010-01-05   NA
#> 6:   6 2010-01-06   NA

rm(list=c("clim", "M", "make_whole_fast", "x", "y", "climatologyPeriod", "clim_start", "clim_end", "doy", "temp", ".SD", "ts_xy", "ts_whole"))

Created on 2021-01-21 by the reprex package (v0.3.0)

Meg Rounsley
mr17809@bristol.ac.uk

@robwschlegel
Copy link
Owner

Hello,
Without looking into this in too much depth, my first question is if you have made certain that the date values you are getting from the CMIP6 data are set to date values with the as.Date() function specifically. The issue may be that the CMIP6 data are still some sort of POSIX date value.
Let me know what you find.
All the best,
-Robert

@megrounsley
Copy link
Author

megrounsley commented Jan 21, 2021 via email

@robwschlegel
Copy link
Owner

Hello Meg,
I've seen this issue once before with someone analysing station data. I don't know why converting dates from some file formats sometimes causes this hidden error. I suspect it may actually be some sort of RStudio or OS interference in the date creation. Or perhaps a local date-time problem with operating systems. Which OS are you using? Having recently moved from Canada to France, my new locale is no longer recognised in the time stamp requirement for package creation on Linux, requiring that I set my .Rprofile to ignore this warning.

Regardless of what it may be, there is some sort of hidden date info in your data.frame that is causing the mismatch. Even though it doesn't show up when you look at the structure of the data. The fix is to simply convert your dates to character strings and then back to dates. Everything should work after that. C'est la vie, eh?

library(tidyverse)
library(heatwaveR)

M <- readRDS(gzcon(url("https://github.com/megrounsley/mrounsley_dir/raw/main/modelsubsetforheatwaveR.Rds"))) %>% 
  mutate(t = as.Date(as.character(t)))

clim <- ts2clm(data = M, climatologyPeriod = c("2010-01-01", "2014-12-30"))

All the best,
-Robert

@megrounsley
Copy link
Author

Thats great thank you very much!

Best,
Meg

@megrounsley
Copy link
Author

Regarding which OS I've use I've switched the code between linux, Mac and windows!!

@robwschlegel
Copy link
Owner

Curious. Thanks for letting me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants