# Experiments 🧪

A key part of any computer simulation study is **experimentation**.  Here a set of experiments will be conducted in an attempt to understand and find improvements to the system under study.  Experiments essentially vary inputs and process logic.

We can do this manually, but as we develop a model the number of input parameters will increase. 

💡 There are several data structures you might employ to organise parameters.

* a python dictionary
* a custom parameter class
* a dataclass

All of these approaches work well and it really is a matter of judgement on what you prefer. One downside of a python `dict` and a custom class is that they are both mutable (although a class can have custom properties where users can only access 'viewable' attributes).  A dataclass can easily be made immutable and requires less code than a custom class, but has the downside that its syntax is a little less pythonic. Here we will build a parameter class called `Experiment`.  

> ☺️ We will also use this re-organisation of code to elimiate our global variables!

## 1. Imports

In [None]:
import numpy as np
import pandas as pd
import simpy
import itertools

## 2. Notebook level variables, constants, and default values

A useful first step when setting up a simulation model is to define the base case or as-is parameters.  Here we will create a set of constant/default values for our `Experiment` class, but you could also consider reading these in from a file.

In [None]:
# default resources
N_OPERATORS = 13

# default mean inter-arrival time (exp)
MEAN_IAT = 60 / 100

# default service time parameters (triangular)
CALL_LOW = 5.0
CALL_MODE = 7.0
CALL_HIGH = 10.0

# sampling settings
N_STREAMS = 2
DEFAULT_RND_SET = 0

# Boolean switch to simulation results as the model runs
TRACE = False

# run variables
RESULTS_COLLECTION_PERIOD = 1000

## 3. Distribution classes

We will define two distribution classes (`Triangular` and `Exponential`) to encapsulate the random number generation, parameters and random seeds used in the sampling.  This simplifies what we will need to include in the `Experiment` class and as we will see later makes it easier to vary distributions as well as parameters.

In [None]:
class Triangular:
    """
    Convenience class for the triangular distribution.
    packages up distribution parameters, seed and random generator.
    """

    def __init__(self, low, mode, high, random_seed=None):
        """
        Constructor. Accepts and stores parameters of the triangular dist
        and a random seed.

        Params:
        ------
        low: float
            The smallest values that can be sampled

        mode: float
            The most frequently sample value

        high: float
            The highest value that can be sampled

        random_seed: int | SeedSequence, optional (default=None)
            Used with params to create a series of repeatable samples.
        """
        self.rand = np.random.default_rng(seed=random_seed)
        self.low = low
        self.high = high
        self.mode = mode

    def sample(self, size=None):
        """
        Generate one or more samples from the triangular distribution

        Params:
        --------
        size: int
            the number of samples to return.  If size=None then a single
            sample is returned.

        Returns:
        -------
        float or np.ndarray (if size >=1)
        """
        return self.rand.triangular(self.low, self.mode, self.high, size=size)

In [None]:
class Exponential:
    """
    Convenience class for the exponential distribution.
    packages up distribution parameters, seed and random generator.
    """

    def __init__(self, mean, random_seed=None):
        """
        Constructor

        Params:
        ------
        mean: float
            The mean of the exponential distribution

        random_seed: int| SeedSequence, optional (default=None)
            A random seed to reproduce samples.  If set to none then a unique
            sample is created.
        """
        self.rand = np.random.default_rng(seed=random_seed)
        self.mean = mean

    def sample(self, size=None):
        """
        Generate a sample from the exponential distribution

        Params:
        -------
        size: int, optional (default=None)
            the number of samples to return.  If size=None then a single
            sample is returned.

        Returns:
        -------
        float or np.ndarray (if size >=1)
        """
        return self.rand.exponential(self.mean, size=size)

## 3. Experiment class

An experiment class is useful because it allows use to easily configure and schedule a large number of experiments to occur in a loop.  We set the class up so that it uses the default variables we defined above i.e. as default the model reflects the as-is process.  To run a new experiment we simply override the default values.

In [None]:
class Experiment:
    """
    Encapsulates the concept of an experiment 🧪 with the urgent care
    call centre simulation model.

    An Experiment:
    1. Contains a list of parameters that can be left as defaults or varied
    2. Provides a place for the experimentor to record results of a run 
    3. Controls the set & streams of psuedo random numbers used in a run.
    
    """

    def __init__(
        self,
        random_number_set=DEFAULT_RND_SET,
        n_operators=N_OPERATORS,
        mean_iat=MEAN_IAT,
        call_low=CALL_LOW,
        call_mode=CALL_MODE,
        call_high=CALL_HIGH,
        n_streams=N_STREAMS,
    ):
        """
        The init method sets up our defaults.
        """
        # sampling
        self.random_number_set = random_number_set
        self.n_streams = n_streams
        
        # store parameters for the run of the model
        self.n_operators = n_operators
        self.mean_iat = mean_iat
        self.call_low = call_low
        self.call_mode = call_mode
        self.call_high = call_high
        
        # resources: we must init resources after an Environment is created.
        # But we will store a placeholder for transparency
        self.operators = None

        # initialise results to zero
        self.init_results_variables()

        # initialise sampling objects
        self.init_sampling()

    def set_random_no_set(self, random_number_set):
        """
        Controls the random sampling
        Parameters:
        ----------
        random_number_set: int
            Used to control the set of pseudo random numbers used by 
            the distributions in the simulation.
        """
        self.random_number_set = random_number_set
        self.init_sampling()

    def init_sampling(self):
        """
        Create the distributions used by the model and initialise
        the random seeds of each.
        """
        # produce n non-overlapping streams
        seed_sequence = np.random.SeedSequence(self.random_number_set)
        self.seeds = seed_sequence.spawn(self.n_streams)

        # create distributions

        # call inter-arrival times
        self.arrival_dist = Exponential(
            self.mean_iat, random_seed=self.seeds[0]
        )

        # duration of call triage
        self.call_dist = Triangular(
            self.call_low,
            self.call_mode,
            self.call_high,
            random_seed=self.seeds[1],
        )

    def init_results_variables(self):
        """
        Initialise all of the experiment variables used in results
        collection.  This method is called at the start of each run
        of the model
        """
        # variable used to store results of experiment
        self.results = {}
        self.results["waiting_times"] = []

        # total operator usage time for utilisation calculation.
        self.results["total_call_duration"] = 0.0

### 3.1. Creating a default experiment

To use `Experiment` is very simple.  For example to create a default experiment (that uses all the default parameter values) we would use the following code

In [None]:
env = simpy.Environment()
default_experiment = Experiment()

Due to the way python works we can access all of the experiment variables from the `default_scenario` object. For example the following code will generate an inter-arrival time:

In [None]:
default_experiment.arrival_dist.sample()

In [None]:
default_experiment.mean_iat

### 3.2 Creating an experiment with more call operators

To change parameters in an experiment we just need to include a new value when we create the `Experiment`.  For example if we wanted to increase the number of servers to 14. We use the following code: 

In [None]:
env = simpy.Environment()
extra_server = Experiment(n_operators=14)

In [None]:
extra_server.n_operators

## 4. Modified model code

We will modify the model code and logic that we have already developed.  The functions for service and arrivals will now accept an `Experiment` argument.

> Note that at this point you could put all of the code into a python module and import the functions and classes you need into an experiment workbook.

In [None]:
def trace(msg):
    """
    Turing printing of events on and off.

    Params:
    -------
    msg: str
        string to print to screen.
    """
    if TRACE:
        print(msg)

In [None]:
def service(identifier, env, args):
    """
    simulates the service process for a call operator

    1. request and wait for a call operator
    2. phone triage (triangular)
    3. exit system

    Params:
    ------

    identifier: int
        A unique identifer for this caller

    env: simpy.Environment
        The current environent the simulation is running in
        We use this to pause and restart the process after a delay.

    args: Experiment
        The settings and input parameters for the current experiment

    """

    # record the time that call entered the queue
    start_wait = env.now

    # MODIFICATION: request an operator - stored in the Experiment
    with args.operators.request() as req:
        yield req

        # record the waiting time for call to be answered
        waiting_time = env.now - start_wait

        # ######################################################################
        # MODIFICATION: store the results for an experiment
        args.results["waiting_times"].append(waiting_time)
        # ######################################################################

        trace(f"operator answered call {identifier} at " + f"{env.now:.3f}")

        # ######################################################################
        # MODIFICATION: the sample distribution is defined by the experiment.
        call_duration = args.call_dist.sample()
        # ######################################################################

        # schedule process to begin again after call_duration
        yield env.timeout(call_duration)

        # update the total call_duration
        args.results["total_call_duration"] += call_duration

        # print out information for patient.
        trace(
            f"call {identifier} ended {env.now:.3f}; "
            + f"waiting time was {waiting_time:.3f}"
        )

In [None]:
def arrivals_generator(env, args):
    """
    IAT is exponentially distributed

    Parameters:
    ------
    env: simpy.Environment
        The simpy environment for the simulation

    args: Experiment
        The settings and input parameters for the simulation.
    """
    # use itertools as it provides an infinite loop
    # with a counter variable that we can use for unique Ids
    for caller_count in itertools.count(start=1):

        # ######################################################################
        # MODIFICATION:the sample distribution is defined by the experiment.
        inter_arrival_time = args.arrival_dist.sample()
        ########################################################################

        yield env.timeout(inter_arrival_time)

        trace(f"call arrives at: {env.now:.3f}")

        # ######################################################################
        # MODIFICATION: we pass the experiment to the service function
        env.process(service(caller_count, env, args))
        # ######################################################################

## 5. A single run wrapper function



In [None]:
def single_run(experiment, rep=0, rc_period=RESULTS_COLLECTION_PERIOD):
    """
    Perform a single run of the model and return the results

    Parameters:
    -----------

    experiment: Experiment
        The experiment/paramaters to use with model
    """

    # results dictionary.  Each KPI is a new entry.
    run_results = {}

    # reset all result collection variables
    experiment.init_results_variables()

    # set random number set to the replication no.
    # this controls sampling for the run.
    experiment.set_random_no_set(rep)

    # environment is (re)created inside single run
    env = simpy.Environment()

    # we create simpy resource here - this has to be after we
    # create the environment object.
    experiment.operators = simpy.Resource(env, capacity=experiment.n_operators)

    # we pass the experiment to the arrivals generator
    env.process(arrivals_generator(env, experiment))
    env.run(until=rc_period)

    # end of run results: calculate mean waiting time
    run_results["01_mean_waiting_time"] = np.mean(
        experiment.results["waiting_times"]
    )

    # end of run results: calculate mean operator utilisation
    run_results["02_operator_util"] = (
        experiment.results["total_call_duration"]
        / (rc_period * experiment.n_operators)
    ) * 100.0

    # return the results from the run of the model
    return run_results

In [None]:
TRACE = False
default_scenario = Experiment()
results = single_run(default_scenario)
print(
    f"Mean waiting time: {results['01_mean_waiting_time']:.2f} mins \n"
    + f"Operator Utilisation {results['02_operator_util']:.2f}%"
)

## Multiple Replications

In [None]:
def multiple_replications(
    experiment, rc_period=RESULTS_COLLECTION_PERIOD, n_reps=5
):
    """
    Perform multiple replications of the model.

    Params:
    ------
    experiment: Experiment
        The experiment/paramaters to use with model

    rc_period: float, optional (default=DEFAULT_RESULTS_COLLECTION_PERIOD)
        results collection period.
        the number of minutes to run the model to collect results

    n_reps: int, optional (default=5)
        Number of independent replications to run.

    Returns:
    --------
    pandas.DataFrame
    """

    # loop over single run to generate results dicts in a python list.
    results = [single_run(experiment, rep, rc_period) for rep in range(n_reps)]

    # format and return results in a dataframe
    df_results = pd.DataFrame(results)
    df_results.index = np.arange(1, len(df_results) + 1)
    df_results.index.name = "rep"
    return df_results

In [None]:
TRACE = False
default_scenario = Experiment()
results = multiple_replications(default_scenario)
results

## 6. Multiple experiments 🧪🧪🧪

The `single_run` wrapper function for the model and the `Experiment` class mean that is very simple to run multiple experiments.  We will define two new functions for running multiple experiments:

* `get_experiments()` - this will return a python dictionary containing a unique name for an experiment paired with an `Experiment` object
* `run_all_experiments()` - this will loop through the dictionary, run all experiments and return combined results.
* `experiment_summary_frame()` - take the results from each scenario and format into a simple table.

In [None]:
def get_experiments():
    """
    Creates a dictionary object containing
    objects of type `Experiment` 🧪 to run.

    Returns:
    --------
    dict
        Contains the experiments for the model
    """
    experiments = {}

    # base (default) case
    experiments["base"] = Experiment()

    # +1 extra capacity
    experiments["operators+1"] = Experiment(
        n_operators=N_OPERATORS + 1,
    )

    return experiments

In [None]:
def run_all_experiments(experiments, rc_period=RESULTS_COLLECTION_PERIOD):
    """
    Run each of the scenarios for a specified results
    collection period and replications.

    Params:
    ------
    experiments: dict
        dictionary of Experiment objects

    rc_period: float
        model run length

    """
    print("Model experiments:")
    print(f"No. experiments to execute = {len(experiments)}\n")

    experiment_results = {}
    for exp_name, experiment in experiments.items():

        print(f"Running {exp_name}", end=" => ")
        results = multiple_replications(experiment, rc_period)
        print("done.\n")

        # save the results
        experiment_results[exp_name] = results

    print("All experiments are complete.")

    # format the results
    return experiment_results

In [None]:
# get the experiments
experiments = get_experiments()

# run the scenario analysis
experiment_results = run_all_experiments(experiments)

In [None]:
experiment_results["operators+1"]

In [None]:
def experiment_summary_frame(experiment_results):
    """
    Mean results for each performance measure by experiment

    Parameters:
    ----------
    experiment_results: dict
        dictionary of replications.
        Key identifies the performance measure

    Returns:
    -------
    pd.DataFrame
    """
    columns = []
    summary = pd.DataFrame()
    for sc_name, replications in experiment_results.items():
        summary = pd.concat([summary, replications.mean()], axis=1)
        columns.append(sc_name)

    summary.columns = columns
    return summary

In [None]:
# as well as rounding you may want to rename the cols/rows to
# more readable alternatives.
summary_frame = experiment_summary_frame(experiment_results)
summary_frame.round(2)

## Load and multiple experiments from a CSV

The `Experiment` class provides a simple way to run multiple experiments in a batch. To do so we can create multiple instances of Experiment, each with a different set of inputs for the model. These are then executed in a loop.

### Formatting experiment files

In the format used here each row represents an experiment. The first column is a unique numeric identifier, the second column a name given to the experiment, and following $n$ columns represent the optional input variables that can be passed to an Experiment.

Note that the method described here relies on the names of these columns matching the input parameters to `Experiment`.

   >  But note that columns do not need to be in the same order as Experiment arguments and they do not need to be exhaustive. A selection works fine.

For example, in the urgent care call centre we will include 3 columns with the names:

* n_operators
* mean_iat

The function `create_example_csv()` creates such a file containing four experiments that vary these paramters.

In [None]:
def create_example_csv(filename="example_experiments.csv"):
    """
    Create an example CSV file to use in tutorial.
    This creates 4 experiments that varys
    n_operators, and mean_iat.

    Params:
    ------
    filename: str, optional (default='example_experiments.csv')
        The name and path to the CSV file.
    """
    # each column is defined as a seperate list
    names = ["base", "op+1", "high_demand", "combination"]
    operators = [13, 14, 13, 14]
    mean_iat = [0.6, 0.6, 0.55, 0.55]

    # empty dataframe
    df_experiments = pd.DataFrame()

    # create new columns from lists
    df_experiments["experiment"] = names
    df_experiments["n_operators"] = operators
    df_experiments["mean_iat"] = mean_iat

    df_experiments.to_csv(filename, index_label="id")

In [None]:
create_example_csv()

# load and illustrate results
pd.read_csv("example_experiments.csv", index_col="id")

### Converting the CSV to instances of `Experiment`

The code above displays the experiments to the user and stores as a `pd.Dataframe` in the `df_experiments` variable. To convert the rows to `Experiment` objects is a two step process.

* We cast the `Dataframe` to a nested python dictionary. Each key in the dictionary is the name of an experiment. The value is another dictionary where the key/value pairs are columns and their values.

* We loop through the dictionary entries and pass the parameters to a new instance of the `Experiment` class.

The function `create_experiments` implements both of these steps. The function returns a new dictionary where the key value pairs are the experiment name string, and an instance of `Experiment`

In [None]:
def create_experiments(df_experiments):
    """
    Returns dictionary of Experiment objects based on contents of a dataframe

    Params:
    ------
    df_experiments: pandas.DataFrame
        Dataframe of experiments. First two columns are id, name followed by
        variable names.  No fixed width

    Returns:
    --------
    dict
    """
    experiments = {}

    # experiment input parameter dictionary
    exp_dict = df_experiments[df_experiments.columns[1:]].T.to_dict()
    # names of experiments
    exp_names = df_experiments[df_experiments.columns[0]].T.to_list()

    # loop through params and create Experiment objects.
    for name, params in zip(exp_names, exp_dict.values()):
        experiments[name] = Experiment(**params)

    return experiments

In [None]:
# test of the function

# assume code is run in same directory as example csv file
df_experiment = pd.read_csv("example_experiments.csv", index_col="id")

# convert to dict containing separate Experiment objects
experiments_to_run = create_experiments(df_experiment)

print(type(experiments_to_run))
print(experiments_to_run["op+1"].n_operators)

### Run all experiments and show results in a table.

We can now make use of the `run_all_experiments` and `experiment_summary_frame` functions to run and display these experiments.

In [None]:
results = run_all_experiments(experiments_to_run)

# illustrate results dataframe.
results["base"].head(2)

In [None]:
results["high_demand"].head(2)

In [None]:
# show results
# further adaptions might include adding units for figures.
experiment_summary_frame(results).round(2)

In [None]:
experiment_summary_frame(results).round(2).T