## Remapping training data to the cubed sphere

The novel addition in DLWP-CS is the ability to train convolutional neural networks on data mapped to the cubed sphere. The re-mapping is performed offline from the model training/inference. 

#### Required packages

We use the TempestRemap library for cubed sphere remapping which is available as a pre-compiled conda package. Let's start by installing it.

In [6]:
%conda install -c conda-forge tempest-remap

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.10.1
  latest version: 4.10.3

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /usr/local/google/home/ilopezgp/anaconda3/envs/dlwp2

  added / updated specs:
    - tempest-remap


The following packages will be UPDATED:

  ca-certificates    pkgs/main::ca-certificates-2021.5.25-~ --> conda-forge::ca-certificates-2021.5.30-ha878542_0

The following packages will be SUPERSEDED by a higher-priority channel:

  certifi            pkgs/main::certifi-2021.5.30-py37h06a~ --> conda-forge::certifi-2021.5.30-py37h89c1867_0
  openssl              pkgs/main::openssl-1.1.1k-h27cfd23_0 --> conda-forge::openssl-1.1.1k-h7f98852_0


Preparing transaction: done
Verifying transaction: done
Executing transaction: done

Note: you may need to restart the kernel to use updated packages.


Let's use the DLWP CubeSphereRemap class on the data we processed earlier.

In [7]:
import os
os.chdir(os.pardir)
from DLWP.remap import CubeSphereRemap

data_directory = '/usr/local/google/ilopezgp/ERA5_data_dlwp'
processed_file = '%s/tutorial_z500.nc' % data_directory
remapped_file = '%s/tutorial_z500_CS.nc' % data_directory

csr = CubeSphereRemap()

Generate the offline maps. Since we used 2 degree data, we have 91 latitude points and 180 longitude points. We are mapping to a cubed sphere with 48 points on the side of each cube face. Since data from CDS comes with monotonically decreasing latitudes, we specify the `inverse_lat` option. New versions of TempestRemap have added the capability to read the coordinates from a netCDF file with any latitude/longitude coordinate names. You'll need to install it from source to use the `generate_offline_maps_from_file` method.

In [3]:
csr.generate_offline_maps(lat=91, lon=180, res=48, inverse_lat=True, remove_meshes=False)

Self.map is at path:  ./map_LL91x180_CS48.nc
CubeSphereRemap: generating offline forward map...
/usr/local/google/home/ilopezgp/anaconda3/envs/dlwp2/bin/GenerateRLLMesh --lat 91 --lon 180 --file outLL.g --lat_begin 90 --lat_end -90 --out_format Netcdf4
/usr/local/google/home/ilopezgp/anaconda3/envs/dlwp2/bin/GenerateCSMesh --res 48 --file outCS.g --out_format Netcdf4
/usr/local/google/home/ilopezgp/anaconda3/envs/dlwp2/bin/GenerateOverlapMesh --a outLL.g --b outCS.g --out ov_LL_CS.g --out_format Netcdf4
/usr/local/google/home/ilopezgp/anaconda3/envs/dlwp2/bin/GenerateOfflineMap --in_mesh outLL.g --out_mesh outCS.g --ov_mesh ov_LL_CS.g --in_np 1 --in_type FV --out_type FV --out_map ./map_LL91x180_CS48.nc --out_format Netcdf4
CubeSphereRemap: generating offline inverse map...
/usr/local/google/home/ilopezgp/anaconda3/envs/dlwp2/bin/GenerateOverlapMesh --a outCS.g --b outLL.g --out ov_CS_LL.g --out_format Netcdf4
/usr/local/google/home/ilopezgp/anaconda3/envs/dlwp2/bin/GenerateOfflineMap 

Apply the forward map, saving to a temporary file. We specify to operate on the variable `predictors`, which is the only variable in the processed data. TempestRemap is very finicky about metadata in netCDF files, sometimes failing with segmentation faults for no apparent reason. I've found that the most common crash is because it does not like the string coordinate values in the `'varlev'` coordinate. If you used the command in the previous tutorial to produce an extra "nocoord" version of this file, you might *have to* use it here.

In [4]:
csr.remap(processed_file + '.nocoord', '%s/temp.nc' % data_directory, '--var', 'predictors')

CubeSphereRemap: applying forward map...
/usr/local/google/home/ilopezgp/anaconda3/envs/dlwp2/bin/ApplyOfflineMap --in_data /usr/local/google/ilopezgp/ERA5_data_dlwp/tutorial_z500.nc.nocoord --out_data /usr/local/google/ilopezgp/ERA5_data_dlwp/temp.nc --map ./map_LL91x180_CS48.nc --var predictors
CubeSphereRemap: successfully remapped data into /usr/local/google/ilopezgp/ERA5_data_dlwp/temp.nc


By default, TempestRemap has a 1-dimensional spatial coordinate (In this case with dimension 13824=48x48x60). We convert the file to 3-dimensional faces (face, height, width)=(6, 48, 48). A few other points here:  
- Even if TempestRemap does not crash, it will probably delete the string coordinates, and sometimes the sample time coordinate as well, so it's a good idea to use this feature.  
- We also take advantage of the `chunking` parameter to save data with ideal chunking when using the file for training and evaluating models.

In [5]:
csr.convert_to_faces('%s/temp.nc' % data_directory, 
                     remapped_file,
                     coord_file=processed_file,
                     chunking={'sample': 1, 'varlev': 1})

CubeSphereRemap.convert_to_faces: loading data to memory...
CubeSphereRemap.convert_to_faces: assigning new coordinates to dataset
CubeSphereRemap.convert_to_faces: exporting data to file /usr/local/google/ilopezgp/ERA5_data_dlwp/tutorial_z500_CS.nc...
CubeSphereRemap.convert_to_faces: successfully exported reformatted data


Unnamed: 0,Array,Chunk
Bytes,108.00 kiB,108.00 kiB
Shape,"(6, 48, 48)","(6, 48, 48)"
Count,1 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 108.00 kiB 108.00 kiB Shape (6, 48, 48) (6, 48, 48) Count 1 Tasks 1 Chunks Type float64 numpy.ndarray",48  48  6,

Unnamed: 0,Array,Chunk
Bytes,108.00 kiB,108.00 kiB
Shape,"(6, 48, 48)","(6, 48, 48)"
Count,1 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,108.00 kiB,108.00 kiB
Shape,"(6, 48, 48)","(6, 48, 48)"
Count,1 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 108.00 kiB 108.00 kiB Shape (6, 48, 48) (6, 48, 48) Count 1 Tasks 1 Chunks Type float64 numpy.ndarray",48  48  6,

Unnamed: 0,Array,Chunk
Bytes,108.00 kiB,108.00 kiB
Shape,"(6, 48, 48)","(6, 48, 48)"
Count,1 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,54.00 kiB
Shape,"(17528, 1, 6, 48, 48)","(1, 1, 6, 48, 48)"
Count,17528 Tasks,17528 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 0.90 GiB 54.00 kiB Shape (17528, 1, 6, 48, 48) (1, 1, 6, 48, 48) Count 17528 Tasks 17528 Chunks Type float32 numpy.ndarray",1  17528  48  48  6,

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,54.00 kiB
Shape,"(17528, 1, 6, 48, 48)","(1, 1, 6, 48, 48)"
Count,17528 Tasks,17528 Chunks
Type,float32,numpy.ndarray


In [6]:
import os
os.remove('%s/temp.nc' % data_directory)