# NcML dataset creation

## Tools for accessing and processing climate data: Case study with R

****
![c4R](https://github.com/SantanderMetGroup/climate4R/blob/devel/man/figures/climate4R_2.png?raw=true)

------------

This worked example contains code that reproduces part of the examples shown in the paper ["climate4R: An R-based Framework for Climate Data Access, Post-processing and Bias Correction"](https://www.sciencedirect.com/science/article/pii/S1364815218303049).

In [1]:
library(loadeR)
library(magrittr)

Loading required package: rJava

Loading required package: loadeR.java

Java version 22x amd64 by N/A detected

NetCDF Java Library v4.6.0-SNAPSHOT (23 Apr 2015) loaded and ready

Loading required package: climate4R.UDG

climate4R.UDG version 0.2.6 (2023-06-26) is loaded

Please use 'citation("climate4R.UDG")' to cite this package.

loadeR version 1.8.1 (2023-06-22) is loaded

Please use 'citation("loadeR")' to cite this package.



## NcML creation

### Observational reference

In [2]:
var <- "tas"
data.dir <- "../IMPETUS4CHANGE/data"
ncml.dir <- "../data/ncml"
data.dir.obs <- sprintf("%s/BSC/CERRA/daily_mean", data.dir)

In [3]:
lf.obs <- list.files(data.dir.obs, pattern = sprintf("%s_", var), recursive = T)
head(lf.obs)

In [4]:
dir.obs <- unique(dirname(lf.obs))
dir.obs

In [5]:
ncml.dir.obs <- sprintf("%s/CERRA", ncml.dir)
if (! dir.exists(ncml.dir.obs))
    dir.create(ncml.dir.obs, recursive = TRUE)

In [6]:
makeAggregatedDataset(
    source.dir = sprintf("%s/%s", data.dir.obs, dir.obs), 
    ncml.file = sprintf("%s/%s.ncml", ncml.dir.obs, dir.obs),
    aggr.dim = "time"
) %>% suppressMessages

In [7]:
dataset.obs <- list.files(ncml.dir.obs, full.names = T) 
dataset.obs

In [8]:
di <- dataInventory(dataset.obs)

[2024-05-16 08:11:05.44064] Doing inventory ...

[2024-05-16 08:11:06.2247] Retrieving info for 'tas' (0 vars remaining)

[2024-05-16 08:11:06.33287] Done.



In [9]:
di$tas$Dimensions$time$Date_range

### Decadal predictions

In [10]:
data.dir.pred <- sprintf("%s/ESGF/CMIP6/DCPP/EC-Earth-Consortium/EC-Earth3/dcppA-hindcast", data.dir)

In [11]:
lf <- list.files(data.dir.pred, recursive = T, pattern = sprintf("%s_.*hindcast", var))
head(lf)

In [12]:
tail(lf)

In [13]:
dir.inits <- unique(dirname(lf))
head(dir.inits)

In [14]:
tail(dir.inits)

In [15]:
ncml.dir.pred <- sprintf("%s/EC-Earth3/dcppA-hindcast/", ncml.dir)
if (! dir.exists(ncml.dir.pred))
    dir.create(ncml.dir.pred, recursive = TRUE)

In [16]:
for (d in dir.inits)
    makeAggregatedDataset(
        source.dir = sprintf("%s/%s", data.dir.pred, d), 
        ncml.file = sprintf("%s/%s.ncml", ncml.dir.pred, gsub("/", "_", d)),
        aggr.dim = "time"
    ) %>% suppressMessages

In [17]:
datasets <- list.files(ncml.dir.pred, full.names = T) 
head(datasets)

### Perform a data inventory

In [18]:
di <- dataInventory(datasets[1])

[2024-05-16 08:11:17.406718] Doing inventory ...

[2024-05-16 08:11:17.458434] Retrieving info for 'tas' (0 vars remaining)

[2024-05-16 08:11:17.532717] Done.



In [19]:
str(di)

List of 1
 $ tas:List of 7
  ..$ Description: chr "Near-Surface Air Temperature"
  ..$ DataType   : chr "float"
  ..$ Shape      : int [1:3] 4017 256 512
  ..$ Units      : chr "K"
  ..$ DataSizeMb : num 2106
  ..$ Version    : logi NA
  ..$ Dimensions :List of 3
  .. ..$ time:List of 4
  .. .. ..$ Type      : chr "Time"
  .. .. ..$ TimeStep  : chr "1.0 days"
  .. .. ..$ Units     : chr "days since 1850-01-01 00:00:00"
  .. .. ..$ Date_range: chr "1960-11-01T12:00:00Z - 1971-10-31T12:00:00Z"
  .. ..$ lat :List of 5
  .. .. ..$ Type       : chr "Lat"
  .. .. ..$ Units      : chr "degrees_north"
  .. .. ..$ Values     : num [1:256] -89.5 -88.8 -88.1 -87.4 -86.7 ...
  .. .. ..$ Shape      : int 256
  .. .. ..$ Coordinates: chr "lat"
  .. ..$ lon :List of 5
  .. .. ..$ Type       : chr "Lon"
  .. .. ..$ Units      : chr "degrees_east"
  .. .. ..$ Values     : num [1:512] 0 0.703 1.406 2.109 2.812 ...
  .. .. ..$ Shape      : int 512
  .. .. ..$ Coordinates: chr "lon"
