# Test-time dropout script

The goal is to develop a function with a command line interface that takes a trained model with dropout and returns an ensemble prediction, so I imagine something like:

```
python create_dropout_ensemble.py --exp_id 44-resnet_deeper2 --members 100 ...
```

The script should return and save a xarray dataset just like `create_prediction` but with an added dimension `ens_member`.

You basically already did the work in the starter exercise I gave you. You can also check out my solution. Now it's just a matter of creating a convenient script. For examples of command line scripts I wrote, check out `src/extract_level.py` using `argparse` or `scripts/download_tigge.py` using Google's `fire`. Also, see whether your or my method of implementing the test-time dropout is more convenient. Whatever requires fewer changes to the rest of the code (probably yours).

As mentioned in the WeatherBench paper, testing is done using the years 2017 and 2018. This means the ensemble predictions also have to be created for these two years. The data can be downloaded here: https://mediatum.ub.tum.de/1524895. However, the files, which contain all years, are quite large, so you probably don't want to download it to your laptop. I uploaded just the last two years for each variable here: To come...

Next, you need a trained model. I number my experiments (see Dropbox document). You can find two different models in the link above. 

As mentioned in the Dropbox document, I would suggest developing the main function in the notebook. Once that works, you can create a CLI around it and save the script. 

Also, let's use `tensorflow>=2.0`.

#This notebook is just for testing. Script saved as create_dropout_ensemble.py

ToDo:
- make it work for all networks. #(Differences: custom_objects, -can be done with an if conditon on load_model(), #output_vars, test_years, lead_time?, anything else?
- load full data instead of batches. output for full size of X.
- pass optional arguments. like is_normalized, start_date, end_date, test_years
- what to do if output vars are different?? numpy-->xarray wont work
- solve eager_execution problem

In [None]:
# Here is a useful tip: Using autoreload allows you to make changes to an imported module
# which are then automatically updated in this notebook. This is how I start all my notebooks.
%load_ext autoreload
%autoreload 2 # Every two seconds

In [2]:
import fire
from fire import Fire
import xarray as xr
import numpy as np
from src.data_generator import *
from src.train import *
from src.networks import *
from src.utils import *
from tensorflow.keras import backend as K

In [3]:
# You only need this if you are using a GPU
os.environ["CUDA_VISIBLE_DEVICES"]=str(0)
limit_mem()

In [4]:
#Final Working Script
# exp_id_path='/home/garg/WeatherBench/nn_configs/B/63-resnet_d3_best.yml'
# model_save_dir='/home/garg/data/WeatherBench/predictions/saved_models'
# datadir='/home/garg/data/WeatherBench/5.625deg'
# pred_save_dir='/home/garg/data/WeatherBench/predictions'

# !python create_dropout_ensemble.py 5 {exp_id_path} {datadir} {model_save_dir} {pred_save_dir}

#Everything from below is just for practice. CAN IGNORE!

In [5]:
#use conda-forge
#!conda uninstall tensorflow --y
#!conda install -c conda-forge tensorflow-gpu=2.0.0
#check CUDA compatibility: https://www.tensorflow.org/install/source#tested_build_configurations

In [6]:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 3874071107582455291
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 2560370265758005249
physical_device_desc: "device: XLA_CPU device"
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 10886145639
locality {
  bus_id: 1
  links {
  }
}
incarnation: 1866733689567031590
physical_device_desc: "device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 8364414005821887952
physical_device_desc: "device: XLA_GPU device"
]


In [7]:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

Num GPUs Available:  1


In [8]:
import sys
print(sys.version) #python version.

3.7.7 (default, Mar 26 2020, 15:48:22) 
[GCC 7.3.0]


In [9]:
tf.compat.v1.disable_eager_execution() #needed 
tf.__version__

'2.0.0'

In [10]:
# tf.debugging.set_log_device_placement(True)

In [11]:
exp_id_path='/home/garg/WeatherBench/nn_configs/B/82-resnet_d3_dr_0.2.yml'
!ls {exp_id_path}

/home/garg/WeatherBench/nn_configs/B/82-resnet_d3_dr_0.2.yml


In [12]:
# exp_id_path='../nn_configs/B/53-unet_google_dropout_0.2_no_ss.yml'
# !ls {exp_id_path}

In [13]:
# exp_id_path='../nn_configs/B/28-unet_medium_bn_dropout_0.2.yml'
# !ls {exp_id_path}

In [14]:
    args=load_args(exp_id_path)
    exp_id=args['exp_id']
    var_dict=args['var_dict']
    batch_size=args['batch_size']
    output_vars=args['output_vars']
    
    #Question: how to optionally  input data_subsample, norm_subsample, nt_in, dt_in, test_years?
    data_subsample=args['data_subsample']
    norm_subsample=args['norm_subsample']
    nt_in=args['nt_in']
    #nt_in=args['nt']
    dt_in=args['dt_in']
    test_years=args['test_years']
    lead_time=args['lead_time']
    #changing paths
    model_save_dir='/home/garg/data/WeatherBench/predictions/saved_models'
    datadir='/home/garg/data/WeatherBench/5.625deg'

In [15]:
nt_in#Ques: difference b/w nt and nt_in 
# --> A: nt = number of time steps corresponding to forecast leat time
# nt_in = number of time steps in the input

3

In [16]:
ds = xr.merge([xr.open_mfdataset(f'{datadir}/{var}/*.nc', combine='by_coords') for var in var_dict.keys()])
mean = xr.open_dataarray(f'{model_save_dir}/{exp_id}_mean.nc') 
std = xr.open_dataarray(f'{model_save_dir}/{exp_id}_std.nc')

In [17]:
data_subsample

2

In [18]:
#Ques:  shuffle should be false? since its testing  --> Correct
#Question: Should we input data_subsample, norm_subsample, nt_in, dt_in? 
#for instance, dt_in not always provided in config file. 'nt_in' is sometimes called 'nt' is it?

# nt_in is needed. 
# data_subsample is 1 by default for the test generator but really we don't need every hour.
# So let's set it to 6, that should help with the time it takes to create the ensemble
# predictions for every time step. norm_subsample doesn't matter since we pass an external mean/std file

ds_test= ds.sel(time=slice(test_years[0],test_years[-1]))
dg_test = DataGenerator(ds_test, var_dict, lead_time, batch_size=batch_size, shuffle=False, load=True,
                 mean=mean, std=std, output_vars=output_vars, nt_in=nt_in, dt_in=dt_in, data_subsample = 6) 
# dg_test = DataGenerator(
#     ds_test, var_dict, lead_time, batch_size=batch_size, mean=mean, std=std,
#     shuffle=False, output_vars=output_vars)

DG start 20:36:39.175644
DG normalize 20:36:39.190544
DG load 20:36:39.196613
Loading data into RAM
DG done 20:36:41.768108


In [19]:
#NOT a good idea to load whole data at once. rather load a batch, make prediction, and so on make a loop.

# X,y=dg_test[0]
# for i in range(len(dg_test)):
#     X_batch,y_batch=dg_test[i+1]
#     X=np.append(X,X_batch,axis=0)
#     y=np.append(y,y_batch,axis=0)

In [20]:
#X.shape

In [21]:
# print(X.shape, y.shape) #should not be different if loading full data!
print(dg_test.n_samples, dg_test.batch_size, dg_test.n_samples/dg_test.batch_size)

X,y=dg_test[272]
print(X.shape)
print(272*32+18)
print(len(dg_test.data.time)) 
#QUESTION: why not same? X is 8722. n_samples is 8724, dg_test.data.time is 8760??

2908 32 90.875
(0, 32, 64, 114)
8722
2920


In [22]:
# Number of time steps in the data set
dg_test.data.time.shape

(2920,)

In [23]:
# Number of time steps to forecast
dg_test.nt

12

In [24]:
# Number of samples (because we need a y for every x)
# But this isn't the actual number of samples (yeah, legacy code...)
dg_test.n_samples, dg_test.data.time.shape[0] - dg_test.nt

(2908, 2908)

In [25]:
# For the actual number of sample you also have to subtract the number of input time steps (-1) = nt_offset
# Yeah, this could probably be cleaned up.
len(dg_test.idxs), dg_test.data.time.shape[0] - dg_test.nt - dg_test.nt_offset

(2906, 2906)

In [26]:
# dg_test.data.time.isel(time=slice(None,X.shape[0])) #would work for any size of x

In [27]:
#ToDo: add other loss functions to custom_objects. doesn't matter if it is not used in the model itself, only so that load_model() doesn't break)
#Since we dont build again, we dont need to pass model params like kernel, filters, activation, dropout,loss and other details to the network?
saved_model_path=f'{model_save_dir}/{exp_id}.h5'
substr=['resnet','unet_google','unet']
assert any(x in exp_id for x in substr)

model=tf.keras.models.load_model(saved_model_path,
                                 custom_objects={'PeriodicConv2D':PeriodicConv2D,'lat_mse': tf.keras.losses.mse})

Instructions for updating:
If using Keras pass *_constraint arguments to layers.


In [28]:
# model.summary() #confirm if input layer has same length as input_vars, i.e, X[...,:]

In [29]:
print(len(dg_test))
X,y=dg_test[len(dg_test)-1]
X.shape, y.shape

91


((26, 32, 64, 114), (26, 32, 64, 2))

In [30]:
X, y = dg_test[0]

In [31]:
import tqdm

In [32]:
%%time
p = model.predict(dg_test, verbose=1)

CPU times: user 20.3 s, sys: 2 s, total: 22.3 s
Wall time: 14.7 s


In [34]:
func = K.function(model.inputs + [K.learning_phase()], model.outputs)

In [35]:
preds = []
for X, y in tqdm.tqdm(dg_test): 
    preds.append(np.asarray(func([X] + [1.]), dtype=np.float32).squeeze())

100%|██████████| 91/91 [01:17<00:00,  1.17it/s]


In [36]:
# So unfortunately this is much slower. Not entirely sure why but I think this means that we do not want to use K.function after all. 
# Below is a workaround that allows us to load the model and then change the training attribute afterwards. 
# Super ugly but I really can't think of a better way.

In [37]:
model=tf.keras.models.load_model(saved_model_path,
                                 custom_objects={'PeriodicConv2D':PeriodicConv2D,'lat_mse': tf.keras.losses.mse})

In [38]:
[model.predict(X[:1])[0, 0, 0,0] for _ in range(3)]   # Always the same output, no test time dropout

[-1.1247067, -1.1247067, -1.1247067]

In [39]:
c = model.get_config()

In [40]:
c

{'name': 'model',
 'layers': [{'class_name': 'InputLayer',
   'config': {'batch_input_shape': (None, 32, 64, 114),
    'dtype': 'float32',
    'sparse': False,
    'name': 'input_1'},
   'name': 'input_1',
   'inbound_nodes': []},
  {'class_name': 'PeriodicConv2D',
   'config': {'name': 'periodic_conv2d',
    'trainable': True,
    'dtype': 'float32',
    'filters': 128,
    'kernel_size': 7,
    'conv_kwargs': {'kernel_regularizer': {'class_name': 'L1L2',
      'config': {'l1': 0.0, 'l2': 9.999999747378752e-06}},
     'use_bias': 1}},
   'name': 'periodic_conv2d',
   'inbound_nodes': [[['input_1', 0, 0, {}]]]},
  {'class_name': 'LeakyReLU',
   'config': {'name': 'leaky_re_lu',
    'trainable': True,
    'dtype': 'float32',
    'alpha': 0.30000001192092896},
   'name': 'leaky_re_lu',
   'inbound_nodes': [[['periodic_conv2d', 0, 0, {}]]]},
  {'class_name': 'BatchNormalization',
   'config': {'name': 'batch_normalization',
    'trainable': True,
    'dtype': 'float32',
    'axis': ListWr

In [41]:
for l in c['layers']:
    if l['class_name'] == 'Dropout':
        l['inbound_nodes'][0][0][-1] = {'training': True}

In [42]:
model2 = keras.models.Model.from_config(c, custom_objects={'PeriodicConv2D':PeriodicConv2D,'lat_mse': tf.keras.losses.mse})

In [43]:
model2.set_weights(model.get_weights())

In [44]:
[model2.predict(X[:1])[0, 0, 0,0] for _ in range(3)]   # Different output everytime = dropout on :)

[-1.0398799, -1.0418909, -1.124301]

In [45]:
model = model2

In [46]:
%%time
p = model.predict(dg_test, verbose=1)   # Maybe slightly slower because of dropout

CPU times: user 18.1 s, sys: 7.65 s, total: 25.8 s
Wall time: 17.6 s


In [47]:
from tqdm import tqdm

In [48]:
# Here is a new version without K.function. Much easier thankfully
ensemble_size = 2 # 50
preds = []
for _ in tqdm(range(ensemble_size)):
    preds.append(model.predict(dg_test))

100%|██████████| 2/2 [00:31<00:00, 15.92s/it]


In [49]:
preds = np.array(preds)
preds.shape   # No transposing necessary

(2, 2906, 32, 64, 2)

In [None]:
#NOTE: Appending to a 'list' is faster than appending to 'numpy' array. so ideally appedn to list and then convert it to numpy array. but be careful fo the shape.
#if you wanna do with numpy array, pre-allocate space.
number_of_forecasts=100
func = K.function(model.inputs + [K.learning_phase()], model.outputs)

#For 1 batch
# X,y=dg_test[0] #currently limiting output due to RAM issues.
# #test-time dropout
# pred_ensemble = np.array([np.asarray(func([X] + [1.]), dtype=np.float32).squeeze() for _ in
#                               range(number_of_forecasts)])
    
#@Stephan- please see!
#For full data. #still takes too long !
#Issue: The last batch is shorter (18 elements instead of 32). so list has differing sizes. 
#so unable to convert to np.array(preds). throws error: can't broadcast from shape (1,32,32,64,2) to (2)
#so using an if conditon to break.

#Question: By doing it in batches, we are making 100 new predictions for each batch, not each X. So overall we are making many many more predictions than 100. It is more random, but is it theoretically okay?
preds = []
counter=0
for X, y in dg_test: 
    preds.append(np.array([np.asarray(func([X] + [1.]), dtype=np.float32).squeeze() 
                           for _ in range(number_of_forecasts)]))
    
    if (counter%10==0):
            print(counter)
    if counter==len(dg_test)-2:
        print(counter)
        break
    counter=counter+1
    
pred_ensemble=np.array(preds)
#reshaping. Be careful!
shp=pred_ensemble.shape
pred_ensemble=pred_ensemble.transpose(1,0,2,3,4,5).reshape(shp[1],-1,shp[-3],shp[-2],shp[-1])
pred_ensemble.shape

#for last batch (Bad method)
last_element=len(dg_test)-1
X,y=dg_test[last_element]
pred_last=np.array([np.asarray(func([X] + [1.]), dtype=np.float32).squeeze() 
                            for _ in range(number_of_forecasts)])
pred_ensemble=np.append(pred_ensemble,pred_last,axis=1)

0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160


In [None]:
#pred_ensemble=np.array(preds)

In [None]:
#pred_ensemble.shape

In [None]:
# #for last batch (Bad method)
# last_element=len(dg_test)-1
# X,y=dg_test[last_element]
# pred_last=np.array([np.asarray(func([X] + [1.]), dtype=np.float32).squeeze() 
#                             for _ in range(number_of_forecasts)])
# pred_ensemble=np.append(pred_ensemble,pred_last,axis=1)

In [None]:
pred_ensemble.shape

In [None]:
# shp = pred_ensemble.shape
# out = pred_ensemble.transpose(1,0,2,3,4,5)
# print(out.shape)

In [None]:
# pred_ensemble[:,0,0,0,0,0]

In [None]:
# print(out2[0,0,0,0,0])
# print(out2[0,32,0,0,0])
# print(out2[0,64,0,0,0])

In [None]:
pred_ensemble.shape

In [None]:
pred_ensemble_reserve=pred_ensemble
observation_reserve=y
observation=y

In [None]:
#unnormalize
pred_ensemble=pred_ensemble* dg_test.std.isel(level=dg_test.output_idxs).values+dg_test.mean.isel(level=dg_test.output_idxs).values
observation=observation* dg_test.std.isel(level=dg_test.output_idxs).values+dg_test.mean.isel(level=dg_test.output_idxs).values

In [None]:
pred_ensemble.shape[1]

In [None]:
preds = xr.Dataset()
i=0
for var in output_vars:
    da= xr.DataArray(pred_ensemble[...,i], 
                     coords={'member': np.arange(number_of_forecasts),
                             'time': dg_test.data.time.isel(time=slice(None,pred_ensemble.shape[1])),
                             'lat': dg_test.data.lat, 'lon': dg_test.data.lon,}, 
                     dims=['member', 'time','lat', 'lon'])
    preds[var]=da
    i=i+1  

In [None]:
preds

In [None]:
# preds = xr.Dataset({
#     'z_500': xr.DataArray(pred_ensemble[...,0],
#         dims=['member', 'time','lat', 'lon'],
#         coords={'member': np.arange(number_of_forecasts),'time': dg_test.data.time.isel(time=slice(None,X.shape[0])), 'lat': dg_test.data.lat, 'lon': dg_test.data.lon,},)
#     ,
#     't_850': xr.DataArray(pred_ensemble[...,1],
#         dims=['member', 'time','lat', 'lon'],
#         coords={'member': np.arange(number_of_forecasts),'time': dg_test.data.time.isel(time=slice(None,X.shape[0])), 'lat': dg_test.data.lat, 'lon': dg_test.data.lon,},)
# })

In [None]:
# #ToDo: make it general for output_vars
# #convert from numpy to xarray
# preds = xr.Dataset({
#     'z_500': xr.DataArray(pred_ensemble[...,0],
#         dims=['member', 'time','lat', 'lon'],
#         coords={'member': np.arange(number_of_forecasts),'time': dg_test.data.time.isel(time=slice(None,X.shape[0])), 'lat': dg_test.data.lat, 'lon': dg_test.data.lon,},)
#     ,
#     't_850': xr.DataArray(pred_ensemble[...,1],
#         dims=['memebr', 'time','lat', 'lon'],
#         coords={'forecast_number': np.arange(number_of_forecasts),'time': dg_test.data.time.isel(time=slice(None,X.shape[0])), 'lat': dg_test.data.lat, 'lon': dg_test.data.lon,},)
# })

# observation= xr.Dataset({
#     'z_500': xr.DataArray(observation[...,0],
#                          dims=['time','lat','lon'],
#                          coords={'time':dg_test.data.time.isel(time=slice(None,X.shape[0])),'lat':dg_test.data.lat,'lon':dg_test.data.lon},)
#     ,
#     't_850': xr.DataArray(observation[...,1],dims=['time','lat','lon'],coords={'time':dg_test.data.time.isel(time=slice(None,X.shape[0])),'lat':dg_test.data.lat,'lon':dg_test.data.lon},)          
# })

In [None]:
#preds

In [None]:
#pred_dataset

In [None]:
#xr.Dataset.equals(pred_dataset,preds)

In [None]:
#observation

In [None]:
#preds.t850.isel(time=0,forecast_number=0,lat=0,lon=0).values

In [None]:
preds.to_netcdf(f'../../data/WeatherBench/predictions/{exp_id}.nc')

In [None]:
preds

In [None]:
from ranky import rankz

obs = np.asarray(observation.to_array(), dtype=np.float32).squeeze();
obs_z500=obs[0,...].squeeze()
obs_t850=obs[1,...].squeeze()

pred=np.asarray(preds.to_array(), dtype=np.float32).squeeze();
pred_z500=pred[0,...].squeeze() 
pred_t850=pred[1,...].squeeze() 

mask=np.ones(obs_z500.shape) #useless
# feed into rankz function
result = rankz(obs_z500, pred_z500, mask)
# plot histogram
plt.bar(range(1,pred_z500.shape[0]+2), result[0])
# view histogram
plt.show() ##overconfident (underdispersive)