# GammaBayes Data Classes

One of the more fundamental things within `GammaBayes` is the use of three data classes: `EventData`, `Parameter`, and `ParameterSet`. Here we will go over `Parameter` and `ParameterSet`.

# `Parameter`

The `Parameter` class is a nice wrapper (sensing a theme?) for important values involved when performing analysis on a parameter. This class provides less help in the way of keyword arguments as it is meant to act more like a dictionary of common keys than a class almost. Two example inputs are below.

In [1]:
import numpy as np

example_discrete_parameter_dict = {
    'discrete': True,
    'name': 'mass',
    'parameter_type': 'spectral',
    'scaling': 'log10',
    'bounds': [1e-1, 1e2],
    'default_value': 1.0,
    'bins': 31,
}

example_continuous_parameter_dict = {
    'discrete': False,
    'name': 'sig|total',
    'scaling': 'linear',
    'bounds': [0,1],
}

So when we have two types of parameters continuous and discrete. 

When we analyse discrete parameters we need to have the range of values that the parameter can take. In the case of the mass parameter, based on the information provided, we would construct an axis of values like so.

In [2]:
mass_axis = np.logspace(np.log10(1e-1), np.log10(1e2), 31)

We also need to obviously keep track of some identifier for each parameter, e.g. mass. 

If we may also want to keep track of a default value or standard value for a parameter (e.g. 0.17 for the alpha parameter in the Einasto dark matter density profile).

And if we plug this parameter into some sort of MC sampler (e.g. nested sampling) we would want a nice way to keep track of the inverse cdf to turn a unit cube into a value of said parameter.

To make my, and your, lives each easier, this class is just meant to put all of this into a single semi-standardised object.

In [3]:
from gammabayes import Parameter

mass_parameter = Parameter(example_discrete_parameter_dict)

And we can access all these attributes like a dictionary.

In [4]:
mass_parameter['name']

'mass'

Or you can access them as attributes of the class.

In [5]:
mass_parameter.name

'mass'

And derivatives of these values as attributes/properties

In [6]:
mass_parameter.axis

array([  0.1       ,   0.12589254,   0.15848932,   0.19952623,
         0.25118864,   0.31622777,   0.39810717,   0.50118723,
         0.63095734,   0.79432823,   1.        ,   1.25892541,
         1.58489319,   1.99526231,   2.51188643,   3.16227766,
         3.98107171,   5.01187234,   6.30957344,   7.94328235,
        10.        ,  12.58925412,  15.84893192,  19.95262315,
        25.11886432,  31.6227766 ,  39.81071706,  50.11872336,
        63.09573445,  79.43282347, 100.        ])

And based on the input parameters we create an inverse cumulative distribution function that can be used within a sampler within the 'transform' method. This is pretty much the main reason the class exists.

In [7]:
mass_parameter.transform(0.1)

0.19952623149688797

This also allows one to specify all needed information from a yaml file, as the class just takes in a dictionary, and you can store other information about the parameter as well as it is essentially just a dictionary.

In [8]:
mass_parameter['physics_is_cool'] = True

In [9]:
mass_parameter.physics_is_cool

True

We also note that we generally keep track of a parameter called `parameter_type` which currently takes values of `spectral` or `spatial`. This is because most parameters can be put into one of these two categories, and many models treat them independently (e.g. dark matter's angular vs spectral distributions). So to decrease the number of unneeded computations (which is one of the main focuses of this code) one can read in the relevant parameters to each component without any checks as they are already stored in a formatted manner.

## Scaling

The `scaling` parameter can either take the value of `linear` or `log10`. Which for discrete parameters means either using `np.linspace` or `np.logspace` and for continuous parameters different transform functions are used of the kind,

`u * transform_scale + bounds[0]`


or

`10**(u * transform_scale + np.log10(bounds[0]))`


And `transform_scale` is either `np.diff(bounds)` or `np.log10(bounds)` depending on which scale is chosen.

## Prior ID/Likelihood ID

Another parameter that the class keeps track of are identifiers for the prior or likelihood that the parameter belong to. This is typically just a string, but helps keep track of where the parameter belongs within the analysis.

And just a note, likelihood parameter analysis is not inherently supported within the code yet. Currently the focys of the code is to optimise the evaluation of the priors. Later updates may include this functionality.

And if you're interested in doing this yourself or want to help out, please email me at `Liam.Pinchbeck@monash.edu`.

## Save/Load

And just like most classes in this package, you can save and/or load.

In [10]:
mass_parameter.save('mass_param.h5')

In [11]:
loaded_mass_parameter = Parameter.load('mass_param.h5')

{'bins': 31, 'default_value': 1.0, 'discrete': True, 'likelihood_id': nan, 'name': 'mass', 'parameter_type': 'spectral', 'physics_is_cool': True, 'prior_id': nan, 'prob_model': '<scipy.stats._distn_infrastructure.rv_continuous_frozen object at 0x118030800>', 'scaling': 'log10', 'transform_scale': 31, 'axis': array([  0.1       ,   0.12589254,   0.15848932,   0.19952623,
         0.25118864,   0.31622777,   0.39810717,   0.50118723,
         0.63095734,   0.79432823,   1.        ,   1.25892541,
         1.58489319,   1.99526231,   2.51188643,   3.16227766,
         3.98107171,   5.01187234,   6.30957344,   7.94328235,
        10.        ,  12.58925412,  15.84893192,  19.95262315,
        25.11886432,  31.6227766 ,  39.81071706,  50.11872336,
        63.09573445,  79.43282347, 100.        ]), 'bounds': array([  0.1, 100. ])}


In [12]:
loaded_mass_parameter.axis

array([  0.1       ,   0.12589254,   0.15848932,   0.19952623,
         0.25118864,   0.31622777,   0.39810717,   0.50118723,
         0.63095734,   0.79432823,   1.        ,   1.25892541,
         1.58489319,   1.99526231,   2.51188643,   3.16227766,
         3.98107171,   5.01187234,   6.30957344,   7.94328235,
        10.        ,  12.58925412,  15.84893192,  19.95262315,
        25.11886432,  31.6227766 ,  39.81071706,  50.11872336,
        63.09573445,  79.43282347, 100.        ])

In [13]:
import os

os.system("rm -rf mass_param.h5")

0

If in particular you need to save a `Parameter` class instance with a custom transform function, which you can specify via the `custom_parameter_transform` value of the given dictionary or setting the method explicitly like I have done below, you will need to use the `save_to_pickle` and `load_from_pickle` methods, or pickle the class instance yourself.

We recommend to save to `h5` format to save space, but you do you.

In [14]:
def times2(x):
    return x*2

mass_parameter.transform = times2

mass_parameter.save_to_pickle('pickled_mass.pkl')

loaded_mass_parameter = Parameter.load_from_pickle('pickled_mass.pkl')

os.system("rm -rf pickled_mass.pkl")

0

In [15]:
print(loaded_mass_parameter.transform.__name__)
print(loaded_mass_parameter.transform(1.0))


times2
2.0


# `ParameterSet`

This class is a little more self-explanatory, it contains a set of the `Parameter` classes.

The inputs to this class are made to be really flexible to support whatever your use case is. I will give most/all of them below, the `ParameterSet` instances that come from them are all equivalent.

## Example Inputs

### Nested Dictionary Parameter Specifications 

Use case for this one, despite the disgusting syntax, is the fully specify parameters from a config file with near no effort. This is essentially how the parameters are stored within the configuration files for these tutorials.

In [16]:
# Why do I condone this behaviour...

nested_input_dict = {
        'Z2 dark matter':{
            'spectral_parameters':{
                'mass':{
                    'discrete': True,
                    'scaling': 'log10',
                    # event_dynamic means the range of values 
                        # tested around the default value becomes
                        # smaller i.e. effective bounds become closer
                        # to the default
                    'bounds': 'event_dynamic', 
                    'absolute_bounds': [1e-1, 1e2],
                    'num_events': 1e2,
                    'dynamic_multiplier': 3,
                    'default_value': 1.0,
                    'bins': 31},},
            'spatial_parameters':{
                'alpha':{
                    'discrete': True,
                    'scaling': 'log10',
                    'bounds': 'event_dynamic',
                    'num_events': 1e2,
                    'dynamic_multiplier': 3,
                    'absolute_bounds': [1e-2, 1e1],
                    'bins': 31,
                    'default_value': 0.17},},
                    }
    }


In [17]:
from gammabayes import ParameterSet

nested_input_dict_parameter_set = ParameterSet(nested_input_dict)
list(nested_input_dict_parameter_set.keys())

['mass', 'alpha']

### List of dictionary specifications

This is a more syntax friendly way you can store the parameters within the config file, however it required you to re-specify many parameters like the parameter_type and the `prior_id`.

In [18]:
list_of_dict_input = [{
        'discrete': True,
        'scaling': 'log10',
        'bounds': 'event_dynamic',
        'absolute_bounds': [1e-1, 1e2],
        'num_events': 1e2,
        'dynamic_multiplier': 3,
        'default_value': 1.0,
        'name':'mass',
        'parameter_type':'spectral',
        'bins': 31},

        
        {'name':'alpha',
        'parameter_type':'spatial',
        'discrete': True,
        'scaling': 'log10',
        'bounds': 'event_dynamic',
        'num_events': 1e2,
        'dynamic_multiplier': 3,
        'absolute_bounds': [1e-2, 1e1],
        'bins': 31,
        'default_value': 0.17},
        ]
    

In [19]:
list_of_dict_input_parameter_set = ParameterSet(list_of_dict_input)
list_of_dict_input_parameter_set.axes

[array([0.50118723, 0.52480746, 0.54954087, 0.57543994, 0.60255959,
        0.63095734, 0.66069345, 0.69183097, 0.72443596, 0.75857758,
        0.79432823, 0.83176377, 0.87096359, 0.91201084, 0.95499259,
        1.        , 1.04712855, 1.0964782 , 1.14815362, 1.20226443,
        1.25892541, 1.31825674, 1.38038426, 1.44543977, 1.51356125,
        1.58489319, 1.65958691, 1.73780083, 1.81970086, 1.90546072,
        1.99526231]),
 array([0.08520183, 0.08921727, 0.09342195, 0.09782479, 0.10243513,
        0.10726275, 0.11231789, 0.11761127, 0.12315411, 0.12895819,
        0.1350358 , 0.14139984, 0.14806381, 0.15504184, 0.16234874,
        0.17      , 0.17801185, 0.18640129, 0.19518612, 0.20438495,
        0.21401732, 0.22410365, 0.23466532, 0.24572476, 0.25730541,
        0.26943184, 0.28212977, 0.29542614, 0.30934915, 0.32392832,
        0.33919459])]

### List of `Parameter` class instances

In [20]:
list_of_Parameters_input = [Parameter({
        'discrete': True,
        'scaling': 'log10',
        'bounds': 'event_dynamic',
        'absolute_bounds': [1e-1, 1e2],
        'num_events': 1e2,
        'dynamic_multiplier': 3,
        'default_value': 1.0,
        'name':'mass',
        'parameter_type':'spectral',
        'bins': 31}),

        
        Parameter({'name':'alpha',
        'parameter_type':'spatial',
        'discrete': True,
        'scaling': 'log10',
        'bounds': 'event_dynamic',
        'num_events': 1e2,
        'dynamic_multiplier': 3,
        'absolute_bounds': [1e-2, 1e1],
        'bins': 31,
        'default_value': 0.17}),
        ]

In [21]:
list_of_Parameters_input_parameter_set = ParameterSet(list_of_Parameters_input)
print(list_of_Parameters_input_parameter_set.bounds)
print(list_of_Parameters_input_parameter_set.keys())

[[0.5011872336272722, 1.9952623149688795], [0.08520182971663631, 0.3391945935447096]]
dict_keys(['mass', 'alpha'])


### Dict of Parameters

In [22]:
dict_of_Parameters_input = {
        'mass': Parameter({'discrete': True,
                    'scaling': 'log10',
                    'bounds': 'event_dynamic',
                    'absolute_bounds': [1e-1, 1e2],
                    'num_events': 1e2,
                    'dynamic_multiplier': 3,
                    'default_value': 1.0,
                    'bins': 31}),
        'alpha': Parameter({
                    'discrete': True,
                    'scaling': 'log10',
                    'bounds': 'event_dynamic', 
                    'num_events': 1e2,
                    'dynamic_multiplier': 3,
                    'absolute_bounds': [1e-2, 1e1],
                    'bins': 31,
                    'default_value': 0.17},)
                    
    }

In [23]:
dict_of_Parameters_input_parameter_set = ParameterSet(dict_of_Parameters_input)
print(dict_of_Parameters_input_parameter_set.bounds)
print(dict_of_Parameters_input_parameter_set.keys())

[[0.5011872336272722, 1.9952623149688795], [0.08520182971663631, 0.3391945935447096]]
dict_keys(['mass', 'alpha'])


## Methods/Attributes

### `dict_of_parameters_by_name` and `axes_by_type`

The analysis classes within this package take two different roots, either a 'scan' or 'sample'. 

Each of these require different information about the parameters.  

The 'scan' requiring the inputs to be in the format similar to that of the nested dictionaries above. 

The 'sample' needs the parameter information simply by the name of the parameter. 

To help with this, the `ParameterSet` class has properties/attributes that output these required formats.

#### Scan format

In [24]:
nested_input_dict_parameter_set.axes_by_type

{'spectral_parameters': {'mass': array([0.50118723, 0.52480746, 0.54954087, 0.57543994, 0.60255959,
         0.63095734, 0.66069345, 0.69183097, 0.72443596, 0.75857758,
         0.79432823, 0.83176377, 0.87096359, 0.91201084, 0.95499259,
         1.        , 1.04712855, 1.0964782 , 1.14815362, 1.20226443,
         1.25892541, 1.31825674, 1.38038426, 1.44543977, 1.51356125,
         1.58489319, 1.65958691, 1.73780083, 1.81970086, 1.90546072,
         1.99526231])},
 'spatial_parameters': {'alpha': array([0.08520183, 0.08921727, 0.09342195, 0.09782479, 0.10243513,
         0.10726275, 0.11231789, 0.11761127, 0.12315411, 0.12895819,
         0.1350358 , 0.14139984, 0.14806381, 0.15504184, 0.16234874,
         0.17      , 0.17801185, 0.18640129, 0.19518612, 0.20438495,
         0.21401732, 0.22410365, 0.23466532, 0.24572476, 0.25730541,
         0.26943184, 0.28212977, 0.29542614, 0.30934915, 0.32392832,
         0.33919459])}}

Or you can for straightforwardly call

In [25]:
nested_input_dict_parameter_set.scan_format

{'spectral_parameters': {'mass': array([0.50118723, 0.52480746, 0.54954087, 0.57543994, 0.60255959,
         0.63095734, 0.66069345, 0.69183097, 0.72443596, 0.75857758,
         0.79432823, 0.83176377, 0.87096359, 0.91201084, 0.95499259,
         1.        , 1.04712855, 1.0964782 , 1.14815362, 1.20226443,
         1.25892541, 1.31825674, 1.38038426, 1.44543977, 1.51356125,
         1.58489319, 1.65958691, 1.73780083, 1.81970086, 1.90546072,
         1.99526231])},
 'spatial_parameters': {'alpha': array([0.08520183, 0.08921727, 0.09342195, 0.09782479, 0.10243513,
         0.10726275, 0.11231789, 0.11761127, 0.12315411, 0.12895819,
         0.1350358 , 0.14139984, 0.14806381, 0.15504184, 0.16234874,
         0.17      , 0.17801185, 0.18640129, 0.19518612, 0.20438495,
         0.21401732, 0.22410365, 0.23466532, 0.24572476, 0.25730541,
         0.26943184, 0.28212977, 0.29542614, 0.30934915, 0.32392832,
         0.33919459])}}

### sampling format

In [26]:
nested_input_dict_parameter_set.dict_of_parameters_by_name

{'mass': {'discrete': True,
  'scaling': 'log10',
  'bounds': [0.5011872336272722, 1.9952623149688795],
  'absolute_bounds': [0.1, 100.0],
  'num_events': 100.0,
  'dynamic_multiplier': 3.0,
  'default_value': 1.0,
  'bins': 31,
  'prob_model': <scipy.stats._distn_infrastructure.rv_continuous_frozen at 0x11175b410>,
  'axis': array([0.50118723, 0.52480746, 0.54954087, 0.57543994, 0.60255959,
         0.63095734, 0.66069345, 0.69183097, 0.72443596, 0.75857758,
         0.79432823, 0.83176377, 0.87096359, 0.91201084, 0.95499259,
         1.        , 1.04712855, 1.0964782 , 1.14815362, 1.20226443,
         1.25892541, 1.31825674, 1.38038426, 1.44543977, 1.51356125,
         1.58489319, 1.65958691, 1.73780083, 1.81970086, 1.90546072,
         1.99526231]),
  'transform_scale': 31,
  'transform': <bound method Parameter.discrete_parameter_transform of {'discrete': True, 'scaling': 'log10', 'bounds': [0.5011872336272722, 1.9952623149688795], 'absolute_bounds': [0.1, 100.0], 'num_events': 100

Or more straightforwardly

In [27]:
nested_input_dict_parameter_set.sampling_format

{'mass': {'discrete': True,
  'scaling': 'log10',
  'bounds': [0.5011872336272722, 1.9952623149688795],
  'absolute_bounds': [0.1, 100.0],
  'num_events': 100.0,
  'dynamic_multiplier': 3.0,
  'default_value': 1.0,
  'bins': 31,
  'prob_model': <scipy.stats._distn_infrastructure.rv_continuous_frozen at 0x11175b410>,
  'axis': array([0.50118723, 0.52480746, 0.54954087, 0.57543994, 0.60255959,
         0.63095734, 0.66069345, 0.69183097, 0.72443596, 0.75857758,
         0.79432823, 0.83176377, 0.87096359, 0.91201084, 0.95499259,
         1.        , 1.04712855, 1.0964782 , 1.14815362, 1.20226443,
         1.25892541, 1.31825674, 1.38038426, 1.44543977, 1.51356125,
         1.58489319, 1.65958691, 1.73780083, 1.81970086, 1.90546072,
         1.99526231]),
  'transform_scale': 31,
  'transform': <bound method Parameter.discrete_parameter_transform of {'discrete': True, 'scaling': 'log10', 'bounds': [0.5011872336272722, 1.9952623149688795], 'absolute_bounds': [0.1, 100.0], 'num_events': 100

The general behaviour of the class as a python object generally mimic that of the `dict_of_parameters_by_name`/`sampling_format` (equivalent) attribute, but more on that later.

### `append`

Pretty self-explanatory, if you need to add a parameter after instantiation, you can append other parameters with this method.

In [28]:
dict_of_Parameters_input_parameter_set.append(Parameter({
    'name':'lahS',
    'discrete':True,
    'bounds':[1e-3,1e1],
    'bins':41,
    'parameter_type':'spectral',
    'default_value':0.1,
}))

In [29]:
dict_of_Parameters_input_parameter_set.keys()

dict_keys(['mass', 'alpha', 'lahS'])

### `items`, `keys`, `values`

As previously stated, the general behaviours of the `ParameterSet` class mimic that of the dictionary output of the `dict_of_parameters_by_name` attribute. So when calling the above in the title of this section, you just call them on this dictionary.

In [30]:
dict_of_Parameters_input_parameter_set.keys() # keys=names

dict_keys(['mass', 'alpha', 'lahS'])

In [31]:
for param in dict_of_Parameters_input_parameter_set.values():
    print(param)


{'discrete': True, 'scaling': 'log10', 'bounds': [0.5011872336272722, 1.9952623149688795], 'absolute_bounds': [0.1, 100.0], 'num_events': 100.0, 'dynamic_multiplier': 3.0, 'default_value': 1.0, 'bins': 31, 'prob_model': <scipy.stats._distn_infrastructure.rv_continuous_frozen object at 0x199284740>, 'axis': array([0.50118723, 0.52480746, 0.54954087, 0.57543994, 0.60255959,
       0.63095734, 0.66069345, 0.69183097, 0.72443596, 0.75857758,
       0.79432823, 0.83176377, 0.87096359, 0.91201084, 0.95499259,
       1.        , 1.04712855, 1.0964782 , 1.14815362, 1.20226443,
       1.25892541, 1.31825674, 1.38038426, 1.44543977, 1.51356125,
       1.58489319, 1.65958691, 1.73780083, 1.81970086, 1.90546072,
       1.99526231]), 'transform_scale': 31, 'transform': <bound method Parameter.discrete_parameter_transform of {...}>, 'prior_id': nan, 'likelihood_id': nan, 'parameter_type': 'None', 'name': 'mass'}
{'discrete': True, 'scaling': 'log10', 'bounds': [0.08520182971663631, 0.339194593544709

### `axes`

If all the parameters within the set are discrete you can also call this attribute which will return all the discrete value axes for the parameters.

In [32]:
dict_of_Parameters_input_parameter_set.axes

[array([0.50118723, 0.52480746, 0.54954087, 0.57543994, 0.60255959,
        0.63095734, 0.66069345, 0.69183097, 0.72443596, 0.75857758,
        0.79432823, 0.83176377, 0.87096359, 0.91201084, 0.95499259,
        1.        , 1.04712855, 1.0964782 , 1.14815362, 1.20226443,
        1.25892541, 1.31825674, 1.38038426, 1.44543977, 1.51356125,
        1.58489319, 1.65958691, 1.73780083, 1.81970086, 1.90546072,
        1.99526231]),
 array([0.08520183, 0.08921727, 0.09342195, 0.09782479, 0.10243513,
        0.10726275, 0.11231789, 0.11761127, 0.12315411, 0.12895819,
        0.1350358 , 0.14139984, 0.14806381, 0.15504184, 0.16234874,
        0.17      , 0.17801185, 0.18640129, 0.19518612, 0.20438495,
        0.21401732, 0.22410365, 0.23466532, 0.24572476, 0.25730541,
        0.26943184, 0.28212977, 0.29542614, 0.30934915, 0.32392832,
        0.33919459]),
 array([1.000000e-03, 2.509750e-01, 5.009500e-01, 7.509250e-01,
        1.000900e+00, 1.250875e+00, 1.500850e+00, 1.750825e+00,
        2.00

### `defaults`

Tthis attribute returns all the defaults of the parameters within the set.

In [33]:
dict_of_Parameters_input_parameter_set.defaults

[1.0, 0.17, 0.1]

### `save` and `load`

Just as it should be for every class defined within this package, there are defined save and load methods for `ParameterSet` into a `.h5` file.

In [34]:
dict_of_Parameters_input_parameter_set.save('ex_parameter_set.h5')
loaded_dict_of_Parameters_input_parameter_set = ParameterSet.load(file_name='ex_parameter_set.h5')
os.system('rm -rf ex_parameter_set.h5')

0

In [35]:
loaded_dict_of_Parameters_input_parameter_set.keys()

dict_keys(['mass', 'alpha', 'lahS'])

And again, if you have custom functions saved, then there exists equivalent pickle methods.

In [36]:
dict_of_Parameters_input_parameter_set.save_to_pickle('ex_parameter_set.pkl')
pickle_loaded_dict_of_Parameters_input_parameter_set = ParameterSet.load_from_pickle(file_name='ex_parameter_set.pkl')
os.system('rm -rf ex_parameter_set.pkl')

0

In [37]:
pickle_loaded_dict_of_Parameters_input_parameter_set.keys()

dict_keys(['mass', 'alpha', 'lahS'])