# Generate Synthetic Data

This notebook generates synthetic data in the same shape as the groundwater data from [https://doi.org/10.25919/skw8-yx65](https://doi.org/10.25919/skw8-yx65).

In [1]:
import os
import glob
import numpy as np

In [2]:
import rpy2.robjects as robjects
readRDS = robjects.r['readRDS']

In [3]:
parameter_list = ['GW_depth', 'coordinates_scaled', 'year_scaled', 'DEM_scaled', 'precip_scaled', 'PET_scaled']
sizes_dict = dict.fromkeys(parameter_list)

**WARNING**: Do not re-run the following cell unless you have access to the orignal groundwater dataset

In [4]:
RDS_path = os.path.join('..', '..', 'RDSdata', 'train')
for param in parameter_list:
    file = param + '_train'
    rds_file = readRDS(os.path.join(RDS_path, file)+'.rds')
    sizes_dict[param] = np.shape(rds_file)

In [5]:
sizes_dict

{'GW_depth': (310143,),
 'coordinates_scaled': (310143, 2),
 'year_scaled': (310143, 1),
 'DEM_scaled': (310143, 1, 9, 9),
 'precip_scaled': (310143, 12, 9, 9),
 'PET_scaled': (310143, 12, 9, 9)}

In [6]:
data_dir = os.path.join('..', 'synthetic_data')
subdirs = ['train', 'tune', 'test']

The following cell determines how many synthetic data points to generate in each category (train, tune, test).

In [7]:
n_data = {'train': 100,
          'tune': 50,
          'test': 50}

In [8]:
def flatten_tuple(data):
    if isinstance(data, tuple):
        if len(data) == 0:
            return ()
        else:
            return flatten_tuple(data[0]) + flatten_tuple(data[1:])
    else:
        return (data,)

In [9]:
for param in parameter_list:
    print('Generating synthetic', param, 'data ...')
    for sub in subdirs:
        file_name = 'SYNTHETIC_'+param+'_'+sub+'.npy'
        file_path = os.path.join(data_dir, sub, file_name)
        size = n_data[sub], sizes_dict[param][1:]
        size = flatten_tuple(size)
        data = np.random.rand(*size)
        np.save(file_path, data, allow_pickle=False)
        print('Saved', param, sub, 'data.')

Generating synthetic GW_depth data ...
Saved GW_depth train data.
Saved GW_depth tune data.
Saved GW_depth test data.
Generating synthetic coordinates_scaled data ...
Saved coordinates_scaled train data.
Saved coordinates_scaled tune data.
Saved coordinates_scaled test data.
Generating synthetic year_scaled data ...
Saved year_scaled train data.
Saved year_scaled tune data.
Saved year_scaled test data.
Generating synthetic DEM_scaled data ...
Saved DEM_scaled train data.
Saved DEM_scaled tune data.
Saved DEM_scaled test data.
Generating synthetic precip_scaled data ...
Saved precip_scaled train data.
Saved precip_scaled tune data.
Saved precip_scaled test data.
Generating synthetic PET_scaled data ...
Saved PET_scaled train data.
Saved PET_scaled tune data.
Saved PET_scaled test data.
