In [1]:
import os, json, sys, shutil
from ovejero import model_trainer

2023-09-14 02:34:13.118169: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


# Fitting a Model Using model_trainer

__Author:__ Sebastian Wagner-Carena

__Last Run:__ 08/04/2020

__Goals:__ Learn how to use model_trainer to fit the types of models used by ovejero

__Before running this notebook:__ Run the Generate_Config notebook to understand what goes into the configuration files for overjero

We'll start by loading up the test configuration file made by Generate_Config and inspecting it.

In [2]:
json_path = os.getcwd()[:-5]+'test/test_data/' + 'test.json'
with open(json_path,'r') as json_f:
    cfg = json.load(json_f)
print(cfg)

{'training_params': {'bnn_type': 'gmm', 'dropout_type': 'standard', 'batch_size': 10, 'n_epochs': 10, 'learning_rate': 0.0001, 'decay': 3e-06, 'kernel_regularizer': 1e-05, 'dropout_rate': 0.1, 'dropout_regularizer': 1e-06, 'root_path': './test_data/', 'tf_record_path': 'tf_record_test', 'final_params': ['external_shear_g1', 'external_shear_g2', 'lens_mass_center_x', 'lens_mass_center_y', 'lens_mass_e1', 'lens_mass_e2', 'lens_mass_gamma', 'lens_mass_theta_E_log'], 'flip_pairs': [], 'img_dim': 128, 'model_weights': './test_data/test_model.h5', 'tensorboard_log_dir': './test_data/test.log', 'random_seed': 1138, 'norm_images': True, 'shift_pixels': 2, 'shift_params': [['lens_mass_center_x'], ['lens_mass_center_y']], 'pixel_scale': 0.051, 'baobab_config_path': './test_data/test_baobab_cfg.py'}, 'validation_params': {'root_path': './test_data/', 'tf_record_path': 'tf_record_test_val'}, 'dataset_params': {'lens_params_path': 'metadata.csv', 'new_param_path': 'new_metadata.csv', 'normalization

A lot of good information there! This is a good config file to start with. Let's go ahead and change a few paths and use it for our toy model.

In [3]:
# Change the model weights to point to the demo directory! Same for log file.
print('old path:')
print(cfg['training_params']['model_weights'])

cfg['training_params']['model_weights'] = os.getcwd() + '/test_model.h5'
cfg['training_params']['tensorboard_log_dir'] = os.getcwd() + '/test_logs'
cfg['training_params']['baobab_config_path'] = os.getcwd() + '/../test/test_data/test_baobab_cfg.py'
cfg['training_params']['root_path'] = os.getcwd() + '/../test/test_data'
cfg['validation_params']['root_path'] = os.getcwd() + '/../test/test_data'
print('new path:')
print(cfg['training_params']['model_weights'])

# Don't want shifts for this easier version of the problem
cfg['training_params']['shift_pixels'] = 0

# Also let's start with the easy diagonal case
cfg['training_params']['bnn_type'] = 'diag'

# Now let's go ahead and save this as our new configuration file
diag_json_path = os.getcwd() + '/diag.json'
with open(diag_json_path,'w') as json_f:
    json.dump(cfg,json_f,indent=4)

old path:
./test_data/test_model.h5
new path:
/home/parlange/ovejero/demos/test_model.h5


All we have to do is call the main function of model_trainer with the path to our config file! You should see the loss go down as the model learns to overfit to the lenses in the very small training set. Because the random seed is set by the configuration file the final loss should be 1.9520

In [4]:
# This is equivalent to 'python -m model_trainer diag_json_path' in the terminal where diag_json_path is the path.
sys.argv = ['model_trainer',diag_json_path]
model_trainer.main()

Checking for training data.
TFRecord found at /home/parlange/ovejero/demos/../test/test_data/tf_record_test
Checking for validation data.
TFRecord found at /home/parlange/ovejero/demos/../test/test_data/tf_record_test_val


2023-09-14 02:34:19.575905: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-09-14 02:34:19.606268: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1960] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


Initializing the model
Is model built: True
No weights found. Saving new weights to /home/parlange/ovejero/demos/test_model.h5
Epoch 1/10

  saving_api.save_model(


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


At the end of each epoch we get the loss (which includes the concrete dropout regularization penalty) and the diagonal/full/gmm loss term (essentially the measure of how well our pdf is doing at capturing the data) on both the training and validation sets. 

In [None]:
# Clean up the files that were created by this notebook.
test_data_path = os.getcwd()[:-5]+'test/test_data/'

os.remove(test_data_path+'new_metadata.csv')
os.remove(test_data_path+'norms.csv')
os.remove(test_data_path+'tf_record_test')
os.remove(test_data_path+'tf_record_test_val')
os.remove('test_model.h5')
shutil.rmtree('test_logs')
os.remove('diag.json')

If we want to fit a different type of model, all we have to do is change the config specification for the bnn type or dropout rate.