## How to statistically evaluate your data from ensemble runs:

Given, that you whent through 02_tutorial, there should be some results to work with already.

In [36]:
import os
SAVE_PATH_RAW = "./dummy/pymofatutorial"
print os.listdir(SAVE_PATH_RAW)[0:5]

['0o11_0o1_0o005_2o0_1000_s4.pkl', '0o1_0o1_0o005_2o0_1000_s4.pkl', '0o11_0o1_0o01_1o0_1000_s2.pkl', '0o09_0o1_0o1_2o0_1000_s0.pkl', '0o1_0o1_0o01_2o0_1000_s4.pkl']


Now, we want to do some MPI accelerated statistical analysis with this data using the resave routine of the experiment_handle class from pymofa.
Therefore, again, we need to start an *ipcontroller* and some *engines* for ipyparallel

$ipcluster

$mpirun-n [number of engines] ipengine --mpi=mpi4py

and connect to them with the IPython kernel

In [37]:
from ipyparallel import Client
c = Client()

Then, we need to to setup the experiment handle from the previous experiment again, only that this time, we will pass the experiment handle the optional path to create a folder for results (different from the one for the raw data)

In [38]:
%%px
# imports
from pymofa.experiment_handling import experiment_handling as eh
import numpy as np
import itertools as it
import pandas as pd
# import cPickle


#Definingh the experiment execution function
#it gets paramater you want to investigate, plus `filename` as the last parameter
def RUN_FUNC(prey_birth_rate, coupling, predator_death_rate, initial_pop, time_length,
             filename):
    """Dummy RUN_FUNC just to make the INDEX (below) work."""
    
    exit_status = 42
    randomvarname = 'bla'

    return exit_status 

# Path where to Store the simulated Data
SAVE_PATH_RAW = "./dummy/pymofatutorial/"

#path to folder for results of statistical evaluation
SAVE_PATH_RES = "./dummy/stateval_results/"

# Parameter combinations to investiage
prey_birth_rate = [0.09, 0.1, 0.11]
coupling = [0.1]
predator_death_rate = [0.005, 0.01, 0.05, 0.1]
initial_pop = [1.0, 2.0]
time_length = [1000]

PARAM_COMBS = list(it.product(prey_birth_rate, coupling, predator_death_rate, initial_pop, time_length))

# Sample Size
SAMPLE_SIZE = 5

# INDEX 
INDEX = {i: RUN_FUNC.func_code.co_varnames[i] for i in xrange(RUN_FUNC.__code__.co_argcount-1)}
print INDEX

# initiate handle instance with experiment variables
handle = eh(SAMPLE_SIZE, PARAM_COMBS, INDEX, SAVE_PATH_RAW, SAVE_PATH_RES)

[stdout:0] 
{0: 'prey_birth_rate', 1: 'coupling', 2: 'predator_death_rate', 3: 'initial_pop', 4: 'time_length'}
0
[stdout:1] 
{0: 'prey_birth_rate', 1: 'coupling', 2: 'predator_death_rate', 3: 'initial_pop', 4: 'time_length'}
0
[stdout:2] 
{0: 'prey_birth_rate', 1: 'coupling', 2: 'predator_death_rate', 3: 'initial_pop', 4: 'time_length'}
0
[stdout:3] 
{0: 'prey_birth_rate', 1: 'coupling', 2: 'predator_death_rate', 3: 'initial_pop', 4: 'time_length'}
0
[stdout:4] 
{0: 'prey_birth_rate', 1: 'coupling', 2: 'predator_death_rate', 3: 'initial_pop', 4: 'time_length'}
3
[stdout:5] 
{0: 'prey_birth_rate', 1: 'coupling', 2: 'predator_death_rate', 3: 'initial_pop', 4: 'time_length'}
2
[stdout:6] 
{0: 'prey_birth_rate', 1: 'coupling', 2: 'predator_death_rate', 3: 'initial_pop', 4: 'time_length'}
0
[stdout:7] 
{0: 'prey_birth_rate', 1: 'coupling', 2: 'predator_death_rate', 3: 'initial_pop', 4: 'time_length'}
1


The resave routine of this experiment handle requires a dictionary of callables and a filename for the results as inputs. The experiment handle keeps track of the simulation results internally in a list of filenames that is required as an input to the callables passed to the resave routine.

Note, that the callables are designed to handle Pandas Dataframes. To be more exact, they load the Dataframes for each list of filenames in a list. Then they concatenate all the Dataframes in the list together in one dataframe. Then the groupby routine of the Dataframe class groups all rows according to their index value on the first level (the timestep in our case) and then applies either a 'mean' or a 'standard error of the mean' estimator to these groups.

In [39]:
%%px
filename = "stateval_results"

def sem(fnames):
    """calculate the standard error of the mean for the data in the files
    that are in the list of fnames
    
    Parameter:
    ----------
    fnames: string
        list of strings of filenames containing simulation results
    Returns:
    sem: float
        Standard error of the mean of the data in the files specified
        by the list of fnames
    """
    import scipy.stats as st
    import numpy as np
    import pandas as pd
    
    return pd.concat([np.load(f) for f in fnames]).groupby(level=0).mean()
    

#callables can be functions, lambda expressions etc...
EVA = {"sem": sem,
        "mean": lambda fnames: pd.concat([np.load(f) for f in fnames]).groupby(level=0).mean()}

handle.resave(EVA, filename)

[stdout:0] 
processing  stateval_results
under operators  ['sem', 'mean']
Post-processing done
[stdout:1] 
processing  stateval_results
under operators  ['sem', 'mean']
Post-processing done
[stdout:2] 
processing  stateval_results
under operators  ['sem', 'mean']
Post-processing done
[stdout:3] 
processing  stateval_results
under operators  ['sem', 'mean']
Post-processing done
[stdout:6] 
processing  stateval_results
under operators  ['sem', 'mean']
Post-processing 3 ... [2%] Post-processing 1 ... [4%] Post-processing 2 ... [6%] Post-processing 3 ... [8%] Post-processing 3 ... [10%] Post-processing 1 ... [12%] Post-processing 2 ... [15%] Post-processing 3 ... [17%] Post-processing 2 ... [19%] Post-processing 1 ... [21%] Post-processing 3 ... [23%] Post-processing 3 ... [25%] Post-processing 1 ... [27%] Post-processing 2 ... [28%] Post-processing 3 ... [31%] Post-processing 3 ... [33%] Post-processing 1 ... [35%] Post-processing 2 ... [38%] Post-processing 3 ... [40%]