# Stroke+Rehab models

> This is a work in progress.

In [1]:
import pandas as pd
import numpy as np

###  Model outputs

The results of the two generated simulations models were identical to 2 decimal places. The results for stage 1 and stage 2 models are reported and compared graphically below in {numref}`asu_comparison_fig` and {numref}`rehab_comparison_fig`. The figures show that the **probability of delay** and **ward occupancy** match across the acute and rehabilitation wards within the 2 models.

The outputs from the generated models results replicated the results reported in the original article {cite:p}`Monks2016`; although we note that did not run all of the experiments reported in the article.

```{figure} ../../03_stroke/asu_comparison.png
---
height: 500px
name: asu_comparison_fig
---
Acute stroke unit outputs: comparison of stage 1 and stage 2 models
```

```{figure} ../../03_stroke/rehab_comparison.png
---
height: 500px
name: rehab_comparison_fig
---
Rehabilitation unit outputs: comparison of stage 1 and stage 2 models
```

### Model code

The final code files from stage 1 and stage 2 (our internal replication) for the stroke capacity planning model have some substantial differences.  **To describe**

Disregarding comments and documentation, stage 1 generated a simpy model consisting of 436 line of code and stage 2 generated 531 lines of code.  Both models passed the same batch of 34 verification tests.

**To-do**: discuss the LLMs design of `Experiment` for stage 2 versus stage 1. Quite different and less fun to setup!  Show code snippets.

**To-do**: discuss interface code - this is currently not included in code totals.

**To-do** Add table describing differences in classes and functions.
<!-- ```{list-table} Description of model code components Stage 1 versus Stage 2. (stage 2 inside of brackets)
:header-rows: 1
:name: ccu_component_comparison

* - Component
  - Attributes (stage 2)
  - Methods/Functions (stage 2)
* - **Experiment class**
  - 13 (27)
  - 3 (2)
* - **CCU model logic class**
  - 4 (9)
  - 10 (12)
* - **Functions**
  - N/A
  - 6 (6)
``` -->


#### Complexity of setup and use of models

Although the stage 1 and 2 models produced identical outputs the internal implementation of the `Experiment` class varied substantially. The code generated by the LLM led to quite different interfaces to setup and create an instance of `Experiment` and then to access internal parameters.

For example, in the stage 1 model the code to setup an experiment that simulated a 5% increase in stroke patients, and then check the parameter value was as follows:

```python
# setup experiment
default_experiment = Experiment(stroke_mean=1.2*1.05)

# access and check parameter value
print(default_experiment.stroke_mean)
```

The equivalent code in stage 2 involved an additional line of code to create a experimentation dictionary and a *collection data-structure* approach to access the internal parameters.

```python
# setup paramater dictionary
experiment_params = {"patient_types": {"Stroke": {"interarrival_time": 1.2 * 1.05}}}

# pass to Experiment. LLM provided code that updates internal parameter dictionaries
future_demand_experiment = Experiment(experiment_params)

# access and check parameter value
print(future_demand_experiment.params['patient_types']['Stroke']['interarrival_time'])
```

We do not argue that either of the approaches generated by the LLM is optimal. Rather that there are pro's and con's to their implementations. Stage 1 code offers a simple interface, but does not choose a clear naming convention (`stroke_mean` is not specific to inter-arrival time). Stage 1 also does not clearly separate model parameters from the outputs of the experiment. Stage 2 code requires more code and requires a user to understand Python dictionaries. Stage 2's hierarchy to access parameters is more complex than stage 1's (including the internal workings of Experiment), but it uses clear specific naming conventions for patients types and their different parameters configurations.

#### Lines of code data

In [None]:
!pygount --suffix=py --format=summary ../../03_stroke/stroke_rehab_model.py

In [None]:
# code for stroke
# !pygount --suffix=py --format=summary ../../03_stroke/stroke_rehab_model.py
# !pygount --suffix=py --format=summary ../../03_stroke/stroke_rehab_interface.py

In [None]:
!pygount --suffix=py --format=summary ../../03_stroke/s2_stroke_rehab_model.py

### Prompts

In total 31 iterations of the model were used to build the model and interface. In stage 1 this consisted of 41 prompts passed to the LLM. The number of prompts increased to 57 in stage 2. In total **n** (**TO-DO**) additional prompts were needed in stage 2 to fix a variable type bug introduced by the LLM for representing "patient type" across the acute and rehab sections of the model. Stage 2 required 4 additional prompts for introducing common random numbers streams to the LLM struggling to assign streams across model activities  (**TO-do**: check with Alison).

The table below provides a summary of the differences at each iteration.


In [None]:
def highlight_last_row(df: pd.DataFrame) -> list[str]:
    '''
    highlight the last row (for Totals) in a DataFrame in BOLD.
    
    Source:
    -------

    Adapted from stackoverflow
    https://stackoverflow.com/questions/51938245/display-dataframe
    -values-in-bold-font-in-one-row-only#59493062
    '''
    return ['font-weight: bold' if v == df.iloc[-1] else '' for v in df]

In [None]:
# read in prompt results table
prompt_results = (
    pd.read_csv("data/stroke_prompt_table.csv",
                index_col=['Iteration'])
)

In [None]:
prompt_results.loc[len(prompt_results) + 1] = ["Totals"] + prompt_results.sum().tolist()[-3:]
prompt_results.style.apply(highlight_last_row)


## LateX for manuscript

In [None]:
caption = "The number of prompts given to the LLM " \
          + "at each iteration of the stroke capacity planning model)."
                
print(prompt_results.style.to_latex(caption=caption))

## References

```{bibliography}
:style: plain
:filter: docname in docnames
```