# Bayesian modelling

This is a template notebok for design of experiments for bayesian modelling.

Author: {{ cookiecutter.author_name }}
Created: {{ cookiecutter.timestamp }}

## How to use the notebook

The following cells:
- specify the model and the adjustable parameters, observable quatities, and targets,
- specify the possible designs,
- compute the expected information gaing (EIG) for the suggested designs,
- help choose a design among the suggested designs,
- fit the model to the results from running the experiment with that design.

By default, the notebook is set up to run with an example. To see how it works, run the notebook (multiple times) without changing the code.

For your project, adjust the code in the linked cells with your objectives, variables, dataset etc. and then execute all cells in order.

Please refer to bayesian_modelling.board for detailed instructions.

In [0]:
# Link to project experiments folder hypothesis_experiment_learnings.board (refresh and hit enter on this line to see the link)

## Imports and general setup

In [0]:
import os

from datetime import datetime

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import matplotlib.cm as cm

import halerium.core as hal
from halerium import InformationGainEstimator, show
from halerium.core import Graph, Entity, Variable, StaticVariable, connect_via_regression, get_posterior_model

import classical_designs

plt.style.use("dark_background")


## Project

In [0]:
experiment_name = '{{cookiecutter.use_case_name}}' # please provide a name for the optimization experiment
data_dir = "./"           # please provide a name for saving the trial data for the experiment

data_file_name = os.path.join(data_dir,  f"data_{experiment_name}_running_trials.csv")
print(f"the trial data will be read from/stored in: {data_file_name}")


## The graph

Halerium graph is the way to describe a statistical model in Halerium Inference (https://hal.erium.io/).

Please specify the Halerium graph of your model:

In [0]:
with Graph("process_graph") as process_graph:
    
    with Entity("input_parameters") as input_parameters:
        x1 = Variable("x1", shape=(), mean=0, variance=1)
        x2 = Variable("x2", shape=(), mean=0, variance=1)
        x3 = Variable("x3", shape=(), mean=0, variance=1)
        
    with Entity("outcome") as outcome:
        cost    = Variable("cost"   , shape=(), mean=0, variance=0.1**2)
        quality = Variable("quality", shape=(), mean=0, variance=0.1**2)
        
    connect_via_regression(name_prefix="regression_parameter", 
                           inputs=[x1, x2, x3],
                           outputs=[cost, quality],
                           order=2,
                           include_cross_terms=True)
    
show(process_graph)        
        

## Parameters, observables, targets

In [0]:
parameters = [
    # please insert the information on the names and bound/values of the parameters to try:
    {
        "name": "x1",           # the name of the parameter
        "type": "range",        # the type of parameter: "range" is for continuous parameters
        "bounds": [0., 1.],     # the lower and upper bound of the parameter as a tuple for range parameters
        "n_values": 3,          # the number of different values for the range parameter to try
        "variable": x1          # the variable in the Halerium graph representing the parameter
    },
    {
        "name": "x2",
        "type": "range",
        "bounds": [0., 10.],
        "n_values": 3,
        "variable": x2
    },  
    {
        "name": "x3",
        "type": "range",
        "bounds": [-5., 5.],
        "n_values": 3,
        "variable": x3
    },
]

design_variables = [parameter["variable"] for parameter in parameters]
observables = [cost, quality]
targets = process_graph.get_all_variables(included_types=StaticVariable)


## Designs

In the example code in the cell below, the designs are various "classical" designs.

You can also specify your own designs. The format for each design is:

`{
    first_design_variable: list_of_its_values_for_each_trial,
    second_design_variable: list_of_its_values_for_each_trial,
    ...
 }`

Finally, all designs put into a list, which will be fed to the EIG estimation algorithm.

In [0]:
design_names = [
    "full_factorial",
    "simple_central",
    "mixed_central" ,
    "full_central"  ,    
]

observable_names = [variable.name for variable in observables]

design_datas = [classical_designs.get_design(parameters, design_name, metrics=observable_names)
                 for design_name in design_names]
n_trials = [len(table) for table in design_datas]

designs = [{parameter["variable"]: design_data[parameter["name"]].values for parameter in parameters}
           for design_data in design_datas]


## Expected information gains

In [0]:
estimator = InformationGainEstimator(
    graph=process_graph,
    designs=designs,
    observables=observables,
    targets=targets,
)
eigs = estimator()


In [0]:
design_stats = pd.DataFrame(index=pd.Index((), name="index"))
design_stats["name"] = design_names
design_stats["n_trials"] = n_trials
design_stats["EIG"] = eigs
design_stats["EIG_per_trial"] = design_stats["EIG"] / design_stats["n_trials"]

display(design_stats)

i_largest_eig = design_stats["EIG"].argmax()
i_largest_eig_per_trial = design_stats["EIG_per_trial"].argmax()

largest_eig_stats = pd.Series((i_largest_eig, ), index=("index",)).append(design_stats.loc[i_largest_eig])
largest_eig_per_trial_stats = pd.Series((i_largest_eig_per_trial, ), index=("index",)).append(design_stats.loc[i_largest_eig_per_trial])

print("\ndesign with largest EIG:")
display(largest_eig_stats)
 
print("\ndesign with largest EIG per trial:")
display(largest_eig_per_trial_stats)
                                                                                          

## Choose a design

Please choose a design.

In [0]:
design_data = design_datas[0]
display(design_data)


data_file_name = os.path.join(data_dir,  f"data_{experiment_name}_running_trials.csv")
print(f"the data will be stored in: {data_file_name}")

if os.path.exists(data_file_name):
    dt = datetime.now().strftime("%Y_%m_%d_%H_%M_%S")
    os.rename(data_file_name, os.path.join(data_dir,  f"data_{experiment_name}_running_trials_{dt}.csv"))

design_data.to_csv(data_file_name)


## Run the experiment

Run the experiment with the chosen design.

Note that the following cell contains code to invent trial results for demonstration purposes. 

For real applications, either
 - replace the cell with appropriate code for retrieving the actual trial results, or 
 - remove the cell entirely, if you intend to add the trial results to the data files in a different way.


In [0]:
data = pd.read_csv(data_file_name, index_col="index")

data["cost"] = (data["x1"] - 0.6)**2 + 0.1 * (data["x2"] - 7.)**2  + 0.3 *(data["x3"] - 2.)**2
data["quality"] = 2./(1 + np.exp(-data["x2"] + 2))

# display(data)

data.to_csv(data_file_name)


## Read the experiment's results

In [0]:
data = pd.read_csv(data_file_name, index_col="index")

display(data)

variables = design_variables + observables
data_for_variables = [data[parameter["name"]] for parameter in parameters] + [data[name] for name in observable_names]
data_for_fit = {variable: data_for_variable.values for variable, data_for_variable in zip(variables, data_for_variables)}


## Fit model

In [0]:
method="MAPFisher"

model = get_posterior_model(
    graph=process_graph,
    data=data_for_fit,
    method=method,
)

means = model.get_means({target.global_name: target for target in targets})
stds = model.get_standard_deviations({target.global_name: target for target in targets})


In [0]:
print("target means:")
for name, value in means.items():
    print(f"\n{name}:\n{value}")
    
print("\ntarget standard deviations:\n(note that some models don't compute standard deviations)")
for name, value in stds.items():
    print(f"\n{name}:\n{value}")    
