In [3]:
import Learn2_new as ln
ut = ln.ut # utilities
ef = ln.ef # ERA_Fields_New

# log to stdout
import logging
import sys
logging.getLogger().level = logging.INFO
logging.getLogger().handlers = [logging.StreamHandler(sys.stdout)]

## Dealing with parameters

The structure of `Learn2_new.py` is pretty nested, with functions calling other functions in such a way that changing a parameter seems difficult.
The best way to do so is by using the functions `ln.get_default_params` and `ut.set_values_recursive`

### Beginner approach

The first way of approaching the code is by looking at the documentation of its functions, and when they have an argument of kind `*_kwargs`, it means that the function `*` will be called, so you can then look at its documentation and so on

Let's say you want to use the function `ln.prepare_data`. If you look at its documentation you get

In [None]:
help(ln.prepare_data)

This functions calls `ln.load_data` and `ln.prepare_XY`.

In [None]:
help(ln.load_data)

In [None]:
help(ln.prepare_XY)

And `ln.prepare_XY` calls `ln.make_XY` and `ln.roll_X`

In [None]:
help(ln.make_XY)

In [None]:
help(ln.roll_X)

So, now let's say you want to call `ln.prepare_data` with just the temperature field, using data from the short 1000 years dataset and rolling `X` by 16 steps, leavingall other values at their default. One (cumbersome) way to do it is the following

In [4]:
X, Y, yp = ln.prepare_data(load_data_kwargs = {'fields': ['t2m'], 'dataset_years': 1000},
                           prepare_XY_kwargs = {'roll_X_kwargs': {'roll_steps': 16}})

prepare_data:
load_data:
load_field:
Loading field tas
Loaded time array
input self.var.shape = (150000, 22, 128)
output self.var.shape = (1000, 150, 22, 128)
load_field: completed in 5.4 s
Set_area_integral:
Set_area_integral: completed in 0.1 s
load_data: completed in 5.4 s
prepare_XY:
make_XY:
make_X:
make_X: completed in 0.3 s
assign_labels:
assign_labels: completed in 0.0 s
make_XY: completed in 0.4 s
roll_X:
roll_X: completed in 0.5 s
Mixing
balance_folds:
Balancing folds
fold 0 done!
fold 1 done!
fold 2 done!
fold 3 done!
fold 4 done!
fold 5 done!
fold 6 done!
fold 7 done!
fold 8 done!
fold 9 done!
Sums of the balanced 10 folds:
[385 385 385 385 385 385 385 385 385 385]
std/avg = 0.0
max relative deviation = 0.0\%
balance_folds: completed in 0.0 s
Mixing completed in 0.3 s

X.shape = (1000, 77, 22, 128, 1), Y.shape = (1000, 77)
Flattened time: X.shape = (77000, 22, 128, 1), Y.shape = (77000,)
prepare_XY: completed in 1.2 s
prepare_data: completed in 6.6 s


## Better approach

If intstead you already roughly know how the code works, you can proceed in a more elegant way

First create a dictionary of the default parameters using `ln.get_default_params`. Remeber to specify `recursive = True`, which will gather all the default parameters of the functions called in a nested manner

In [2]:
prepare_data_kwargs_default = ln.get_default_params(ln.prepare_data, recursive=True)
print(ut.dict2str(prepare_data_kwargs_default)) # a nice way of printing nested dictionaries

{
    "load_data_kwargs": {
        "dataset_years": 8000,
        "year_list": null,
        "sampling": "",
        "Model": "Plasim",
        "area": "France",
        "filter_area": "France",
        "lon_start": 0,
        "lon_end": 128,
        "lat_start": 0,
        "lat_end": 22,
        "mylocal": "/local/gmiloshe/PLASIM/",
        "fields": [
            "t2m",
            "zg500",
            "mrso_filtered"
        ]
    },
    "prepare_XY_kwargs": {
        "do_premix": false,
        "premix_seed": 0,
        "do_balance_folds": true,
        "nfolds": 10,
        "year_permutation": null,
        "flatten_time_axis": true,
        "make_XY_kwargs": {
            "label_field": "t2m",
            "time_start": 30,
            "time_end": 120,
            "T": 14,
            "tau": 0,
            "percent": 5,
            "threshold": null
        },
        "roll_X_kwargs": {
            "roll_axis": "lon",
            "roll_steps": 64
        }
    }
}


Now you want to set the two parameters to non default values, and you can do it by using `ut.set_values_recursive`, without needing to account for the level of nestedness of the parameter.

In [7]:
prepare_data_kwargs = ut.set_values_recursive(prepare_data_kwargs_default,
                                              {'fields': ['t2m'], 'dataset_years': 1000, 'roll_steps': 16})
print(ut.dict2str(prepare_data_kwargs))

{
    "load_data_kwargs": {
        "dataset_years": 1000,
        "year_list": null,
        "sampling": "",
        "Model": "Plasim",
        "area": "France",
        "filter_area": "France",
        "lon_start": 0,
        "lon_end": 128,
        "lat_start": 0,
        "lat_end": 22,
        "mylocal": "/local/gmiloshe/PLASIM/",
        "fields": [
            "t2m"
        ]
    },
    "prepare_XY_kwargs": {
        "do_premix": false,
        "premix_seed": 0,
        "do_balance_folds": true,
        "nfolds": 10,
        "year_permutation": null,
        "flatten_time_axis": true,
        "make_XY_kwargs": {
            "label_field": "t2m",
            "time_start": 30,
            "time_end": 120,
            "T": 14,
            "tau": 0,
            "percent": 5,
            "threshold": null
        },
        "roll_X_kwargs": {
            "roll_axis": "lon",
            "roll_steps": 16
        }
    }
}


And then you can run

In [8]:
X, Y, yp = ln.prepare_data(**prepare_data_kwargs)

		Mixing
			Balancing folds
			fold 0 done!
			fold 1 done!
			fold 2 done!
			fold 3 done!
			fold 4 done!
			fold 5 done!
			fold 6 done!
			fold 7 done!
			fold 8 done!
			fold 9 done!
			Sums of the balanced 10 folds:
			[385 385 385 385 385 385 385 385 385 385]
			std/avg = 0.0
			max relative deviation = 0.0\%
		Mixing completed in 0.3 s
		
		X.shape = (1000, 77, 22, 128, 1), Y.shape = (1000, 77)
		Flattened time: X.shape = (77000, 22, 128, 1), Y.shape = (77000,)


If you want to load your default values from a config file, you can do as following

In [None]:
config_dict = ut.json2dict('example_config.json') # load the config file
print(ut.dict2str(config_dict))
print('\n\n')
config_dict_flat = ut.collapse_dict(config_dict) # flatten the dictionary
print(ut.dict2str(config_dict_flat))

In [None]:
ut.set_values_recursive(prepare_data_kwargs_default, config_dict_flat, inplace=True)
nice_dict_print(prepare_data_kwargs_default)

We had to do this way because `prepare_data_kwargs` is not a key of the config dictionary. Let's say instead you wanted the default arguments for the function `make_XY`, you could have simply done

In [None]:
make_XY_kwargs_default = ut.extract_nested(config_dict, 'make_XY_kwargs')
print(ut.dict2str(make_XY_kwargs_default))

Where the function `ut.extract_nested` allows to obtain a key from a nested dictionary regardless of the level of indentation

(77000, 22, 128, 1)