# Bayesian modelling

This is a template notebok for design of experiments for bayesian modelling.

Author: {{ cookiecutter.author_name }}
Created: {{ cookiecutter.timestamp }}

In [0]:
# Link to project experiments folder hypothesis_experiment_learnings.board (refresh and hit enter on this line to see the link)

## How to use the notebook

The following cells:
- specify the model and the adjustable parameters, observable quatities, and targets,
- specify the possible designs,
- compute the expected information gaing (EIG) for the suggested designs,
- help choose a design among the suggested designs,
- fit the model to the results from running the experiment with that design.

By default, the notebook is set up to run with an example. To see how it works, run the notebook (multiple times) without changing the code.

For your project, adjust the code in the linked cells with your objectives, variables, dataset etc. and then execute all cells in order.

Please refer to bayesian_modelling.board for detailed instructions.

In [0]:
# <halerium id="2599e85f-7e0f-40aa-a4d3-87daf21aab7e">
# Link to bayesian_modelling.board
# </halerium id="2599e85f-7e0f-40aa-a4d3-87daf21aab7e">


## Imports and general setup

In [0]:
import os

from datetime import datetime

import numpy as np
import pandas as pd

import halerium.core as hal
from halerium import InformationGainEstimator, show
from halerium.core import Graph, Entity, Variable, StaticVariable, connect_via_regression, get_posterior_model

### 2. Specify the Data Path

In [0]:
experiment_name = 'bayesian_modelling' # please provide a name for the optimization experiment
# <halerium id="d28427bc-d519-4b0e-b492-d2c25622a19f">
data_dir = "./"           # please provide a name for saving the trial data for the experiment
# </halerium id="d28427bc-d519-4b0e-b492-d2c25622a19f">

data_file_name = os.path.join(data_dir,  f"data_{experiment_name}_running_trials.csv")
print(f"the trial data will be read from/stored in: {data_file_name}")

### 3. Specify the graph

Halerium graph is the way to describe a statistical model in Halerium Inference (https://hal.erium.io/).

Please specify the Halerium graph of your model:

In [0]:
# <halerium id="b0cd4fab-5843-4eac-b530-6277efaf73b9">
with Graph("process_graph") as process_graph:
    
    with Entity("input_parameters") as input_parameters:
        x1 = Variable("x1", shape=(), mean=0, variance=1)
        x2 = Variable("x2", shape=(), mean=0, variance=1)
        x3 = Variable("x3", shape=(), mean=0, variance=1)
        
    with Entity("outcome") as outcome:
        cost    = Variable("cost"   , shape=(), mean=0, variance=0.1**2)
        quality = Variable("quality", shape=(), mean=0, variance=0.1**2)
        
    connect_via_regression(name_prefix="regression_parameter", 
                           inputs=[x1, x2, x3],
                           outputs=[cost, quality],
                           order=2,
                           include_cross_terms=True)
    
# </halerium id="b0cd4fab-5843-4eac-b530-6277efaf73b9">
show(process_graph)        
        

### 4. Specify parameters, observables, and targets

In [0]:
# <halerium id="b50d2dc3-e657-4eb2-8a48-b22cf8e7b7bc">
parameters = [
    # please insert the information on the names and bound/values of the parameters to try:
    {
        "name": "x1",           # the name of the parameter
        "type": "range",        # the type of parameter: "range" is for continuous parameters
        "bounds": [0., 1.],     # the lower and upper bound of the parameter as a tuple for range parameters
        "n_values": 3,          # the number of different values for the range parameter to try
        "variable": x1          # the variable in the Halerium graph representing the parameter
    },
    {
        "name": "x2",
        "type": "range",
        "bounds": [0., 10.],
        "n_values": 3,
        "variable": x2
    },  
    {
        "name": "x3",
        "type": "range",
        "bounds": [-5., 5.],
        "n_values": 3,
        "variable": x3
    },
]
# </halerium id="b50d2dc3-e657-4eb2-8a48-b22cf8e7b7bc">

design_variables = [parameter["variable"] for parameter in parameters]
# <halerium id="b50d2dc3-e657-4eb2-8a48-b22cf8e7b7bc">
observables = [cost, quality]
# </halerium id="b50d2dc3-e657-4eb2-8a48-b22cf8e7b7bc">
targets = process_graph.get_all_variables(included_types=StaticVariable)


### 5. Specify the Designs

In the example code in the cell below, the designs are various "classical" designs.

You can also specify your own designs. The format for each design is:

`{
    first_design_variable: list_of_its_values_for_each_trial,
    second_design_variable: list_of_its_values_for_each_trial,
    ...
 }`

Finally, all designs put into a list, which will be fed to the EIG estimation algorithm.

In [0]:
from functions.bayesian_modelling import get_design_data

# <halerium id="97fdffa5-e4dd-4368-bd6e-db4af17b96ad">
design_names = [
    "full_factorial",
    "simple_central",
    "mixed_central" ,
    "full_central"  ,    
]
# </halerium id="97fdffa5-e4dd-4368-bd6e-db4af17b96ad">

observable_names, design_datas, n_trials, designs = get_design_data(observables, parameters, design_names)

### 6. Compute Expected information gains

In [0]:
from functions.bayesian_modelling import show_eig

estimator = InformationGainEstimator(
    graph=process_graph,
    designs=designs,
    observables=observables,
    targets=targets,
)
eigs = estimator()

# <halerium id="857e04db-6059-4af9-a8c9-038fed6177e8">
show_eig(design_names, n_trials, eigs)
# </halerium id="857e04db-6059-4af9-a8c9-038fed6177e8">


### 7. Choose the design

Please choose a design.

In [0]:
from functions.bayesian_modelling import show_design

# <halerium id="f9654476-f3d1-41e7-aece-32a56797dcf8">
design_data = design_datas[0]
# </halerium id="f9654476-f3d1-41e7-aece-32a56797dcf8">

show_design(data_dir, design_data, experiment_name)

### 8. Run the trials

Run the experiment with the chosen design.

Note that the following cell contains code to invent trial results for demonstration purposes. 

For real applications, either
 - replace the cell with appropriate code for retrieving the actual trial results, or 
 - remove the cell entirely, if you intend to add the trial results to the data files in a different way.


In [0]:
# <halerium id="1a9df2d3-b096-46a9-afcc-b80816a9171b">
data = pd.read_csv(data_file_name, index_col="index")

data["cost"] = (data["x1"] - 0.6)**2 + 0.1 * (data["x2"] - 7.)**2  + 0.3 *(data["x3"] - 2.)**2
data["quality"] = 2./(1 + np.exp(-data["x2"] + 2))

# display(data)

data.to_csv(data_file_name)
# </halerium id="1a9df2d3-b096-46a9-afcc-b80816a9171b">


Read the experiment's results

In [0]:
from functions.bayesian_modelling import show_experiment_results

# <halerium id="1a9df2d3-b096-46a9-afcc-b80816a9171b">
data_for_fit = show_experiment_results(data_file_name, design_variables, observables, parameters, observable_names)
# </halerium id="1a9df2d3-b096-46a9-afcc-b80816a9171b">


### 9. Fit the model

In [0]:
# <halerium id="4e65ea3a-27fe-45fa-aa19-32a3eb814c54">
method="MAPFisher"
# </halerium id="4e65ea3a-27fe-45fa-aa19-32a3eb814c54">

model = get_posterior_model(
    graph=process_graph,
    data=data_for_fit,
    method=method,
)

means = model.get_means({target.global_name: target for target in targets})
stds = model.get_standard_deviations({target.global_name: target for target in targets})

In [0]:
from functions.bayesian_modelling import show_model_results

# <halerium id="4e65ea3a-27fe-45fa-aa19-32a3eb814c54">
show_model_results(means, stds)
# </halerium id="4e65ea3a-27fe-45fa-aa19-32a3eb814c54">
