# Converting `.rds` files to `.npy` files

In order to be compatable with PyTorch, the `.rds` files present at at [https://doi.org/10.25919/skw8-yx65](https://doi.org/10.25919/skw8-yx65) must be converted to `.npy` files. This notebook completes that conversion.

In [23]:
import os
import glob
import numpy as np

In [2]:
import rpy2.robjects as robjects
readRDS = robjects.r['readRDS']

*A Note About Packages:* `rpy2` requires `R` to be installed as a dependency

## Converting Data

In [3]:
R_data_dir = 'RDSdata'
NPY_data_dir = 'NPYdata'
subdirs = ['train', 'tune', 'test']
files = glob.glob(os.path.join('.', R_data_dir, subdirs[0])+'/*')
files = [os.path.basename(f).split('.')[0] for f in files]
files = [f.split('_')[0] +'_'+ f.split('_')[1] for f in files]
files

['PET_scaled',
 'precip_scaled',
 'GW_depth',
 'year_scaled',
 'DEM_scaled',
 'coordinates_scaled']

In [4]:
readRDS = robjects.r['readRDS']
for subdir in subdirs:
    for file in files:
        r_file = file + '_' + subdir + '.rds'
        np_file = file + '_' + subdir + '.npy'
        path = os.path.join(R_data_dir, subdir, r_file)
        print('Attempting', path)
        data = readRDS(path)
        nparr= np.array(data)
        np.save(os.path.join(NPY_data_dir, subdir, np_file), nparr, allow_pickle=False)

print('Completed')

Attempting RDSdata/train/PET_scaled_train.rds
Attempting RDSdata/train/precip_scaled_train.rds
Attempting RDSdata/train/GW_depth_train.rds
Attempting RDSdata/train/year_scaled_train.rds
Attempting RDSdata/train/DEM_scaled_train.rds
Attempting RDSdata/train/coordinates_scaled_train.rds
Attempting RDSdata/tune/PET_scaled_tune.rds
Attempting RDSdata/tune/precip_scaled_tune.rds
Attempting RDSdata/tune/GW_depth_tune.rds
Attempting RDSdata/tune/year_scaled_tune.rds
Attempting RDSdata/tune/DEM_scaled_tune.rds
Attempting RDSdata/tune/coordinates_scaled_tune.rds
Attempting RDSdata/test/PET_scaled_test.rds
Attempting RDSdata/test/precip_scaled_test.rds
Attempting RDSdata/test/GW_depth_test.rds
Attempting RDSdata/test/year_scaled_test.rds
Attempting RDSdata/test/DEM_scaled_test.rds
Attempting RDSdata/test/coordinates_scaled_test.rds
Completed


## Checking `.npy` files have same shapes as `.rds` files

In [15]:
RDS_path = os.path.join('..', '..', 'RDSdata', 'train')
NPY_path = os.path.join('..', '..', 'NPYdata', 'train')

### GW Depth

*Each entry contains the observed groundwater depth (in metres) at a well at a particular point in time.  This is a numeric vector object type, with length equal to the number of observations in the training set.*

In [22]:
parameter = 'GW_depth'
file = parameter + '_train'

rds_file = readRDS(os.path.join(RDS_path, file)+'.rds')
print('RDS File Shape:', np.shape(rds_file))
npy_file = np.load(os.path.join(NPY_path, file)+'.npy')
print('NPY File Shape:', npy_file.shape)

RDS File Shape: (310143,)
NPY File Shape: (310143,)


### Coordinate Scaled

*This is a numeric matrix where each row corresponds to each groundwater observation and the columns are the normalised (using the range of values in the training set) latitude and longitude.*

In [21]:
parameter = 'coordinates_scaled'
file = parameter + '_train'

rds_file = readRDS(os.path.join(RDS_path, file)+'.rds')
print('RDS File Shape:', np.shape(rds_file))
npy_file = np.load(os.path.join(NPY_path, file)+'.npy')
print('NPY File Shape:', npy_file.shape)

RDS File Shape: (310143, 2)
NPY File Shape: (310143, 2)


### Year Scaled

*This is a numeric matrix with a single column.  Each row corresponds to a groundwater observation and the single column contains the calendar year (using the range of values in the training set).*

In [24]:
parameter = 'year_scaled'
file = parameter + '_train'

rds_file = readRDS(os.path.join(RDS_path, file)+'.rds')
print('RDS File Shape:', np.shape(rds_file))
npy_file = np.load(os.path.join(NPY_path, file)+'.npy')
print('NPY File Shape:', npy_file.shape)

RDS File Shape: (310143, 1)
NPY File Shape: (310143, 1)


### DEM Scaled

*This is an R array object with four dimensions (number of observations, 1, 9, 9).  The values in the array are normalised elevations for a 9 x 9 patch of pixels, centred around the groundwater observation.  Each pixel is 1500m x 1500m.  We can basically think of this as a tensor containing 9x9 pixel images with only 1 channel.*

In [25]:
parameter = 'DEM_scaled'
file = parameter + '_train'

rds_file = readRDS(os.path.join(RDS_path, file)+'.rds')
print('RDS File Shape:', np.shape(rds_file))
npy_file = np.load(os.path.join(NPY_path, file)+'.npy')
print('NPY File Shape:', npy_file.shape)

RDS File Shape: (310143, 1, 9, 9)
NPY File Shape: (310143, 1, 9, 9)


### Precip Scaled

*This is an R array object with four dimensions (number of observations, 12, 9, 9).  The values in the array are normalised precipitation for a 9 x 9 patch of pixels over the preceding 12 months, and centred around the groundwater observation.  Each pixel is 1500m x 1500m.  We can basically think of this as a tensor containing 9x9 pixel images with 12 channels.*

In [26]:
parameter = 'precip_scaled'
file = parameter + '_train'

rds_file = readRDS(os.path.join(RDS_path, file)+'.rds')
print('RDS File Shape:', np.shape(rds_file))
npy_file = np.load(os.path.join(NPY_path, file)+'.npy')
print('NPY File Shape:', npy_file.shape)

RDS File Shape: (310143, 12, 9, 9)
NPY File Shape: (310143, 12, 9, 9)


### PET Scaled

*This is an R array object with four dimensions (number of observations, 12, 9, 9).  The values in the array are normalised potential evapo-transpiration for a 9 x 9 patch of pixels over the preceding 12 months, and centred around the groundwater observation.  Each pixel is 1500m x 1500m.  We can basically think of this as a tensor containing 9x9 pixel images with 12 channels.*

In [27]:
parameter = 'PET_scaled'
file = parameter + '_train'

rds_file = readRDS(os.path.join(RDS_path, file)+'.rds')
print('RDS File Shape:', np.shape(rds_file))
npy_file = np.load(os.path.join(NPY_path, file)+'.npy')
print('NPY File Shape:', npy_file.shape)

RDS File Shape: (310143, 12, 9, 9)
NPY File Shape: (310143, 12, 9, 9)
