# On the use of Deep Generative Models for "Perfect" Prognosis Climate Downscaling

This notebook performs the download and preprocessing of the data required by the CVAE model. This part of the analysis is written in R and builds on the [climate4R](https://github.com/SantanderMetGroup/climate4R) framework

First we load the required R libraries

In [2]:
library(loadeR)
library(transformeR)
library(downscaleR)
library(climate4R.value)
library(magrittr)
library(sp)
library(downscaleR.keras)
library(loadeR.2nc)

These datasets are downloaded from the [User Data Getaway - Thredds Access Portal (UDG-TAP)](http://meteo.unican.es/udg-tap/home) maintained by the Santander Meteorology Group. An user may be needed.

In [None]:
loginUDG('***', '***')

Set the spatial and time boundaries

In [3]:
longitude <- c(-10, 32)
latitude <- c(36, 72)
time <- 1979:2008

Select the variables to use as predictors

In [4]:
variables <- c('z@500','z@700','z@850','z@1000',
               'hus@500','hus@700','hus@850','hus@1000',
               'ta@500','ta@700','ta@850','ta@1000',
               'ua@500','ua@700','ua@850','ua@1000',
               'va@500','va@700','va@850','va@1000')

Download the predictor (ERA-Interim)

In [7]:
x <- lapply(variables, function(x) {
                loadGridData(dataset = 'ECMWF_ERA-Interim-ESD',
                     var = x,
                     lonLim = longitude,
                     latLim = latitude,
                     years = time)
      }) %>% makeMultiGrid()

save(x, file = paste0('./data/x.rda'))

Download the predictand (E-OBS)

In [8]:
y <- loadGridData(dataset = 'E-OBS_v14_0.50regular',
                  var = 'pr',
                  lonLim = longitude,
                  latLim = latitude,
                  years = time)

save(y, file = paste0('./data/y.rda'))

Split both datasets into train and test sets

In [9]:
years_train <- 1979:2002
years_test <- 2003:2008

x_train <- subsetGrid(x, years = years_train)
y_train <- subsetGrid(y, years = years_train)

x_test <- subsetGrid(x, years = years_test)
y_test <- subsetGrid(y, years = years_test)

rm(x); rm(y)

Standardize the predictors

In [12]:
x_test <- scaleGrid(x_test, base = x_train, type = 'standardize')
x_train <- scaleGrid(x_train, type = 'standardize')

Rearrange the predictors' dimensions for Pytorch

In [13]:
x_train <- redim(x_train, drop = TRUE)
x_train$Data <- aperm(x_train$Data, c(2, 1, 3, 4))
x_train <- x_train$Data

x_test <- redim(x_test, drop = TRUE)
x_test$Data <- aperm(x_test$Data, c(2, 1, 3, 4))
x_test <- x_test$Data

Format the predictand to Python

In [14]:
y_train$Data <- array3Dto2Dmat(y_train$Data)
ind_nonNaN <- (!apply(y_train$Data, MARGIN = 2, anyNA)) %>% which()
y_train <- y_train$Data[, ind_nonNaN]

Save the processed data, including information about the `NaNs` present in the data

In [15]:
save(y_test, file = './data/y_test_template.RData')
y_test$Data <- array3Dto2Dmat(y_test$Data)
y_test <- y_test$Data

save(x_train, file = './data/x_train.RData')
save(x_test, file = './data/x_test.RData')
save(y_train, file = './data/y_train.RData')
save(y_test, file = './data/y_test.RData')

save(ind_nonNaN, file = './data/ind_nonNaN.RData')