# Analysis

This notebook presents execution and results from:

* Base case analysis
* Scenario analysis
* Sensitivity analysis

The run time is provided at the end of the notebook.

Credit:

* Analysis of the spread of replication results was adapted from Tom Monks (2024) HPDM097 - Making a difference with health data (https://github.com/health-data-science-OR/stochastic_systems) (MIT License).

## Set-up

Load notebook linters.

In [1]:
%load_ext pycodestyle_magic

In [2]:
%pycodestyle_on

Load required packages.

In [3]:
# To ensure any updates to `simulation/` are fetched without needing to restart
# the notebook environment, reload `simulation/` before execution of each cell
%load_ext autoreload
%autoreload 1
%aimport simulation

In [4]:
from great_tables import GT
import itertools
import os
import plotly.express as px
import polars as pl
import time

from simulation.logging import SimLogger
from simulation.model import Defaults, Model, Trial, summary_stats

import scipy.stats as st
import numpy as np

Start timer.

In [5]:
start_time = time.time()

Define path to outputs folder.

In [6]:
output_dir = '../outputs/'

## Default run

Run with default parameters.

In [7]:
param = Defaults()
trial = Trial(param)
trial.run_trial()

Preview results and save to `.csv` files.

Patient-level results are a large file, so that could be compressed and saved as `.csv.gz`:

```
# (At start of file) import required library
import gzip

# Set path to file
file_path = os.path.join(output_dir, 'example_patient.csv.gz')

# Save file
with gzip.open(file_path, 'wb') as f:
    trial.patient_results_df.write_csv(f)

# Load file
with gzip.open(file_path, 'rb') as f:
    patient_results = pl.read_csv(f)
```

However, in may cases, it may not be necessary to save this file - and may be more appropriate to just save the later results.

In [8]:
display(trial.trial_results_df.head())
trial.trial_results_df.write_csv(
    os.path.join(output_dir, 'example_trial.csv'))

run_number,scenario,arrivals,mean_q_time_nurse,mean_time_with_nurse,mean_nurse_utilisation
i64,i64,i64,f64,f64,f64
0,0,10972,0.504541,9.842268,0.499639
1,0,10784,0.514151,10.060481,0.501991
2,0,10854,0.523235,9.925025,0.49813
3,0,10831,0.479149,9.937057,0.49822
4,0,10720,0.461457,10.015904,0.49687


In [9]:
display(trial.interval_audit_df.head())
trial.interval_audit_df.write_csv(
    os.path.join(output_dir, 'example_interval_audit.csv'))

resource_name,simulation_time,utilisation,queue_length,running_mean_wait_time,run
str,i64,f64,i64,f64,i32
"""nurse""",18720,0.6,0,0.427625,0
"""nurse""",18840,0.4,0,0.425181,0
"""nurse""",18960,0.6,0,0.423041,0
"""nurse""",19080,1.0,3,0.427408,0
"""nurse""",19200,0.6,0,0.440854,0


In [10]:
display(trial.overall_results_df.head())
trial.overall_results_df.write_csv(
    os.path.join(output_dir, 'example_overall.csv'))

metric,arrivals,mean_q_time_nurse,mean_time_with_nurse,mean_nurse_utilisation
str,f64,f64,f64,f64
"""mean""",10776.741935,0.499037,9.978457,0.49767
"""std_dev""",115.803272,0.067393,0.115138,0.007524
"""lower_95_ci""",10734.264952,0.474317,9.936224,0.49491
"""upper_95_ci""",10819.218919,0.523757,10.02069,0.50043


## View spread of results across replications

In [11]:
def plot_results_spread(column, x_label, file):
    """
    Plot spread of results from across replications, for chosen column.
    Show figure and save under specified file name.

    Arguments:
        column (str):
            Name of column to plot.
        x_label (str):
            X axis label.
        file (str):
            Filename to save figure to.
    """
    fig = px.histogram(trial.trial_results_df, x=column)
    fig.update_layout(
        xaxis_title=x_label,
        yaxis_title='Frequency'
    )

    # Show figure
    fig.show()

    # Save figure
    fig.write_image(os.path.join(output_dir, file))

In [12]:
plot_results_spread(
    column='arrivals',
    x_label='Arrivals',
    file='spread_arrivals.png')

plot_results_spread(
    column='mean_q_time_nurse',
    x_label='Mean wait time for nurse',
    file='spread_nurse_wait.png')

plot_results_spread(
    column='mean_time_with_nurse',
    x_label='Mean length of nurse consultation',
    file='spread_nurse_time.png')

plot_results_spread(
    column='mean_nurse_utilisation',
    x_label='Mean nurse utilisation',
    file='spread_nurse_util.png')

## Scenario analysis

In [13]:
def run_scenarios(scenarios):
    """
    Run a set of scenarios and return the scenario-level results.

    Arguments:
        scenarios (dict):
            Dictionary where key is name of parameter and value is a list
            with different values to run in scenarios.
    """
    # Find every possible permutation of the scenarios
    all_scenarios_tuples = list(itertools.product(*scenarios.values()))
    # Convert back into dictionaries
    all_scenarios_dicts = [
        dict(zip(scenarios.keys(), p)) for p in all_scenarios_tuples]
    # Preview some of the scenarios
    print(f'There are {len(all_scenarios_dicts)} scenarios. Running:')

    # Run the scenarios...
    results = []
    for index, scenario_to_run in enumerate(all_scenarios_dicts):
        print(scenario_to_run)

        # Overwrite defaults from the passed dictionary
        param = Defaults()
        param.scenario_name = index
        for key in scenario_to_run:
            setattr(param, key, scenario_to_run[key])

        # Run trial and keep trial-level results, adding the scenario values to
        # the results dataframe
        scenario_trial = Trial(param)
        scenario_trial.run_trial()
        for key in scenario_to_run:
            scenario_trial.trial_results_df = (
                scenario_trial.trial_results_df.with_columns(
                    pl.lit(scenario_to_run[key]).alias(key)))

        results.append(scenario_trial.trial_results_df)
    return pl.concat(results)

In [14]:
# Run scenarios
scenario_results = run_scenarios({
    'patient_inter': [3, 4, 5, 6, 7],
    'number_of_nurses': [5, 6, 7, 8]
    })

There are 20 scenarios. Running:
{'patient_inter': 3, 'number_of_nurses': 5}
{'patient_inter': 3, 'number_of_nurses': 6}
{'patient_inter': 3, 'number_of_nurses': 7}
{'patient_inter': 3, 'number_of_nurses': 8}
{'patient_inter': 4, 'number_of_nurses': 5}
{'patient_inter': 4, 'number_of_nurses': 6}
{'patient_inter': 4, 'number_of_nurses': 7}
{'patient_inter': 4, 'number_of_nurses': 8}
{'patient_inter': 5, 'number_of_nurses': 5}
{'patient_inter': 5, 'number_of_nurses': 6}
{'patient_inter': 5, 'number_of_nurses': 7}
{'patient_inter': 5, 'number_of_nurses': 8}
{'patient_inter': 6, 'number_of_nurses': 5}
{'patient_inter': 6, 'number_of_nurses': 6}
{'patient_inter': 6, 'number_of_nurses': 7}
{'patient_inter': 6, 'number_of_nurses': 8}
{'patient_inter': 7, 'number_of_nurses': 5}
{'patient_inter': 7, 'number_of_nurses': 6}
{'patient_inter': 7, 'number_of_nurses': 7}
{'patient_inter': 7, 'number_of_nurses': 8}


In [15]:
scenario_results.head()

run_number,scenario,arrivals,mean_q_time_nurse,mean_time_with_nurse,mean_nurse_utilisation,patient_inter,number_of_nurses
i64,i64,i64,f64,f64,f64,i32,i32
0,0,14491,1.906132,9.949058,0.667461,3,5
1,0,14406,1.918952,10.148115,0.676693,3,5
2,0,14465,1.976377,9.931685,0.665045,3,5
3,0,14424,1.959961,10.003235,0.667725,3,5
4,0,14387,1.780232,9.980463,0.664761,3,5


Example plots...

In [16]:
def plot_scenario(results, x_var, result_var, colour_var, xaxis_title,
                  yaxis_title, legend_title):
    """
    Plot results from different model scenarios.

    Arguments:
        results (pl.DataFrame):
            Contains results to plot.
        x_var (str):
            Name of variable to plot on X axis.
        result_var (str):
            Name of variable with results, to plot on Y axis.
        colour_var (str|None):
            Name of variable to colour lines with (or set to None).
        xaxis_title (str):
            Title for X axis.
        yaxis_title (str):
            Title for Y axis.
        legend_title (str):
            Title for figure legend.
    """
    # If x_var and colour_var are provided, combine both in a list to use
    # as grouping variables when calculating average results
    if colour_var is not None:
        group_vars = [x_var, colour_var]
    else:
        group_vars = [x_var]

    # Calculate average results from each scenario
    df = results.group_by(group_vars).agg([
        # Mean
        pl.mean(result_var).alias('mean'),
        # Standard deviation
        pl.std(result_var).alias('std_dev'),

        # TODO: Use more official method for calculation?
        # Lower 95% confidence interval
        (pl.mean(result_var) - 1.96 * (
            pl.std(result_var) /
            pl.count(result_var).sqrt())).alias('ci_lower'),
        # Upper 95% confidence interval
        (pl.mean(result_var) + 1.96 * (
            pl.std(result_var) /
            pl.count(result_var).sqrt())).alias('ci_upper')])

    # Sort dataframe
    df = df.sort(group_vars)

    # Plot mean line
    fig = px.line(df, x=x_var, y='mean', color=colour_var)
    fig.update_layout(
        xaxis_title=xaxis_title,
        yaxis_title=yaxis_title,
        legend_title_text=legend_title
    )

    # Plot confidence interval lines
    for ci in ['ci_upper', 'ci_lower']:
        trace = (px.line(df, x=x_var, y=ci, color=colour_var)
                 .update_traces(opacity=0.5, showlegend=False)
                 .select_traces())
        # Add to figure
        fig.add_traces(list(trace))

    return df, fig

Mean wait time for nurse from scenarios with varying patient inter-arrival times and number of nurses.

In [17]:
result, fig = plot_scenario(
    results=scenario_results,
    x_var='patient_inter',
    result_var='mean_q_time_nurse',
    colour_var='number_of_nurses',
    xaxis_title='Patient inter-arrival time',
    yaxis_title='Mean wait time for nurse (minutes)',
    legend_title='Nurses')

fig.show()

fig.write_image(os.path.join(output_dir, 'scenario_nurse_wait.png'))

Mean nurse utilisation with those varying scenarios.

In [18]:
result, fig = plot_scenario(
    results=scenario_results,
    x_var='patient_inter',
    result_var='mean_nurse_utilisation',
    colour_var='number_of_nurses',
    xaxis_title='Patient inter-arrival time',
    yaxis_title='Mean nurse utilisation',
    legend_title='Nurses')

fig.show()

fig.write_image(os.path.join(output_dir, 'scenario_nurse_util.png'))

Example table...

In [19]:
# Combine mean and CI into single column, and round, and label nurse column
table = result.with_columns(
    pl.format('{} ({}, {})',
              pl.col('mean').round(2),
              pl.col('ci_lower').round(2),
              pl.col('ci_upper').round(2)).alias('mean_ci'),
    pl.format('{} nurses', pl.col('number_of_nurses').alias('nurse_str')))

# Convert from long to wide format
table = table.pivot('nurse_str', index='patient_inter', values='mean_ci')

# Convert to latex, display and save
table_latex = GT(table).as_latex()
print(table_latex)
with open(os.path.join(output_dir, 'scenario_nurse_util.tex'), 'w') as f:
    f.write(table_latex)

\begin{table}[!t]


\fontsize{12.0pt}{14.4pt}\selectfont

\begin{tabular*}{\linewidth}{@{\extracolsep{\fill}}rllll}
\toprule
patient\_inter & 5 nurses & 6 nurses & 7 nurses & 8 nurses \\ 
\midrule\addlinespace[2.5pt]
3 & 0.66 (0.66, 0.67) & 0.55 (0.55, 0.56) & 0.47 (0.47, 0.48) & 0.41 (0.41, 0.42) \\
4 & 0.5 (0.5, 0.5) & 0.41 (0.41, 0.42) & 0.36 (0.35, 0.36) & 0.31 (0.31, 0.31) \\
5 & 0.4 (0.4, 0.4) & 0.33 (0.33, 0.33) & 0.29 (0.28, 0.29) & 0.25 (0.25, 0.25) \\
6 & 0.33 (0.33, 0.33) & 0.28 (0.28, 0.28) & 0.24 (0.24, 0.24) & 0.21 (0.21, 0.21) \\
7 & 0.29 (0.28, 0.29) & 0.24 (0.24, 0.24) & 0.2 (0.2, 0.21) & 0.18 (0.18, 0.18) \\
\bottomrule
\end{tabular*}

\end{table}



## Sensitivity analysis

Can use similar code to perform sensitivity analyses.

**How does sensitivity analysis differ from scenario analysis?**

* Scenario analysis focuses on a set of predefined situations which are plausible or relevant to the problem being studied. It can often involve varying multiple parameters simulatenously. The purpose is to understand how the system operates under different hypothetical scenarios.
* Sensitivity analysis varies one (or a small group) of parameters and assesses the impact of small changes in that parameter on outcomes. The purpose is to understand how uncertainty in the inputs affects the model, and how robust results are to variation in those inputs.

In [20]:
# Run scenarios
sensitivity_consult = run_scenarios({
    'mean_n_consult_time': [8, 9, 10, 11, 12, 13, 14, 15]
})

There are 8 scenarios. Running:
{'mean_n_consult_time': 8}
{'mean_n_consult_time': 9}
{'mean_n_consult_time': 10}
{'mean_n_consult_time': 11}
{'mean_n_consult_time': 12}
{'mean_n_consult_time': 13}
{'mean_n_consult_time': 14}
{'mean_n_consult_time': 15}


In [21]:
result, fig = plot_scenario(
    results=sensitivity_consult,
    x_var='mean_n_consult_time',
    result_var='mean_q_time_nurse',
    colour_var=None,
    xaxis_title='Mean nurse consultation time (minutes)',
    yaxis_title='Mean wait time for nurse (minutes)',
    legend_title='Nurses'
)

fig.show()

fig.write_image(os.path.join(output_dir, 'sensitivity_consult_time.png'))

In [22]:
# Combine mean and CI into single column
table = result.with_columns(
    pl.format('{} ({}, {})',
              pl.col('mean').round(2),
              pl.col('ci_lower').round(2),
              pl.col('ci_upper').round(2)).alias('mean_ci'))

# Filter and rename columns
cols = {
    'mean_n_consult_time': 'Mean nurse consultation time',
    'mean_ci': 'Mean wait time for nurse (95 percent confidence interval)'
}
table = table.select(cols.keys()).rename(cols)

# Convert to latex, display and save
table_latex = GT(table).as_latex()
print(table_latex)
with open(os.path.join(output_dir, 'sensitivity_consult_time.tex'), 'w') as f:
    f.write(table_latex)

\begin{table}[!t]


\fontsize{12.0pt}{14.4pt}\selectfont

\begin{tabular*}{\linewidth}{@{\extracolsep{\fill}}rl}
\toprule
Mean nurse consultation time & Mean wait time for nurse (95 percent confidence interval) \\ 
\midrule\addlinespace[2.5pt]
8 & 0.15 (0.14, 0.16) \\
9 & 0.28 (0.27, 0.3) \\
10 & 0.5 (0.48, 0.52) \\
11 & 0.84 (0.8, 0.88) \\
12 & 1.36 (1.3, 1.42) \\
13 & 2.15 (2.06, 2.24) \\
14 & 3.37 (3.23, 3.51) \\
15 & 5.3 (5.07, 5.53) \\
\bottomrule
\end{tabular*}

\end{table}



## NaN results

Note: In this model, if patients are still waiting to be seen at the end of the simulation, they will have NaN results.

In [23]:
param = Defaults()
param.patient_inter = 2
trial = Trial(param)
trial.run_trial()
trial.patient_results_df.tail()

patient_id,arrival_time,q_time_nurse,time_with_nurse,run
i64,f64,f64,f64,i32
21586,61913.030043,,,30
21587,61915.384561,,,30
21588,61915.421934,,,30
21589,61917.81791,,,30
21590,61919.845349,,,30


## Example run with logs

The `SimLogger` class is used to log events during the simulation. These can be print to the console (`log_to_console`) or saved to a file (`log_to_file`).

This will output lots of information to the screen - currently set to give information on each patient as they arrive and then see the nurse. Therefore, it is only best used when running the simulation for a short time with few patients.

The logs in `model.py` can be altered to print your desired information during the simulation run, which can be helpful during development.

In [24]:
# Mini run of simulation with logger enabled
param = Defaults()
param.warm_up_period = 50
param.data_collection_period = 100
param.number_of_runs = 1
param.cores = 0
param.logger = SimLogger(log_to_console=True, log_to_file=False)

model = Model(param, run_number=0)
model.run()

2025-01-23 13:46:24,431 - INFO - logging.py:log():128 - Initialised model: {'param': <simulation.model.Defaults object at 0x77de0a279b20>, 'run_number': 0, 'env': <simpy.core.Environment object at 0x77de28160050>, 'nurse': <simpy.resources.resource.Resource object at 0x77de0a9f3b60>, 'patients': [], 'nurse_time_used': 0, 'nurse_consult_count': 0, 'running_mean_nurse_wait': 0, 'audit_list': [], 'results_list': [], 'patient_inter_arrival_dist': <simulation.model.Exponential object at 0x77de0a9f3cb0>, 'nurse_consult_time_dist': <simulation.model.Exponential object at 0x77de0a36f110>}
2025-01-23 13:46:24,433 - INFO - logging.py:log():128 - Parameters: {'_initialising': False, 'patient_inter': 4, 'mean_n_consult_time': 10, 'number_of_nurses': 5, 'warm_up_period': 50, 'data_collection_period': 100, 'number_of_runs': 1, 'audit_interval': 120, 'scenario_name': 0, 'cores': 0, 'logger': <simulation.logging.SimLogger object at 0x77de0a3c50f0>}
2025-01-23 13:46:24,433 - INFO - logging.py:log():128

This will align with the recorded results of each patient (though we only save those that arrive after the warm-up period).

In [25]:
# Compare to patient-level results
model.results_list

[{'patient_id': 1,
  'arrival_time': 51.90400587259546,
  'q_time_nurse': 0.0,
  'time_with_nurse': 18.07891954142075},
 {'patient_id': 2,
  'arrival_time': 51.963434706622714,
  'q_time_nurse': 0.0,
  'time_with_nurse': 3.1020092355006064},
 {'patient_id': 3,
  'arrival_time': 74.3494580155259,
  'q_time_nurse': 0.0,
  'time_with_nurse': 26.744513862017026},
 {'patient_id': 4,
  'arrival_time': 77.53382703300574,
  'q_time_nurse': 0.0,
  'time_with_nurse': 0.7481033661053572},
 {'patient_id': 5,
  'arrival_time': 78.93233230430721,
  'q_time_nurse': 0.0,
  'time_with_nurse': 0.5277574384602378},
 {'patient_id': 6,
  'arrival_time': 86.81473043550623,
  'q_time_nurse': 0.0,
  'time_with_nurse': 2.4349563515001904},
 {'patient_id': 7,
  'arrival_time': 89.78326290873765,
  'q_time_nurse': 0.0,
  'time_with_nurse': 9.665598479334754},
 {'patient_id': 8,
  'arrival_time': 89.80720556833339,
  'q_time_nurse': 0.0,
  'time_with_nurse': 7.004542644523265},
 {'patient_id': 9,
  'arrival_time'

## Run time

In [26]:
# Get run time in seconds
end_time = time.time()
runtime = round(end_time - start_time)

# Display converted to minutes and seconds
print(f'Notebook run time: {runtime // 60}m {runtime % 60}s')

Notebook run time: 0m 22s
