# Simulating data at scale

The `gxr.envir` package provides a convenient simulation runner for generating
larger amounts of data for different configurations of parameters.

A convenience script for running simulations can be found at `scripts/simulation.py`.

Here we focus on presenting a simple example and describing the structure of the
generated data.

## Control parameters

There are three control parameters of interest that may be modified by GuestXR
to achieve desired result - sustainable and relatively equal profit extraction.

* Time horizon in which agents try to predict consequences of their actions,
  i.e. the effects of how much resource they extract from the environment.
  This is controlled by `horizon` parameter, which defines (approximately) the
  length of the time horizon of foresight expressed in terms of the characteristic
  timescale of the environment (the amount of time it needs to regenerate from 5%
  to 95% of its carrying capacity).
  * `horizon` $\in (0, \infty)$ with the default/initial value `horizon = 0.1`
* Agents' bias towards believing that the state of the environment is better 
  (closer to the maximum carrying capacity) than it really is.
  * `bias` $\in [0, 1]$ with the default/initial value `bias = 0`
* Agents' belief that the other agents will behave similarly to us
  controlled by `alpha` parameter.
  * `alpha` $\in [0, 1]$ with the default/initial value `alpha = 0`

## Computation parameters

* `n_epochs` - number of epochs to run simulations for. For now let us stick to the
               default value of `10`.
* `n_reps` - number of independent runs for each unique combination of parameters.
* `seed` - top-level seed for generating pseudo-random numbers. It is used to generate
           seed sequences that work correctly also when running parallel computations.
* `n_jobs` - number of processes used for parallel computations.
* `split_epochs` - when `True` rows in an output data frame correspond not to full
                   simulation runs but to individual epochs. This representation
                   makes it easier to have a more fine-grained view of the data,
                   but can be easily converted to a full-history representation
                   by using `.group_epochs()` method defined on the output data frame.

In [1]:
## THIS IS JUST AN EXAMPLE
## A SERIOUS SIMULATIONS SHOULD SAMPLE PARAMETER RANGES MORE DENSELY

import numpy as np  # noqa

from gxr.envir.simulation import Simulation

PARAMS = {
    "params.n_agents": [4],
    "params.horizon": [.5, 2],
    "params.alpha": [0],
    "params.bias": [0]
}


simulation = Simulation(
    n_epochs=10,
    n_reps=10,
    params=PARAMS,
    seed=3034283429,
    n_jobs=10,
    split_epochs=True,
)

data = simulation.run()

## PASS FILE PATH TO SAVE DATA TO DISK INSTEAD OF RETURNING
## THIS IS DONE IN MEMORY EFFICIENT WAY
# simulation.run("path-to-file.parquet")

  0%|          | 0/1 [00:00<?, ?it/s]

QUEUEING TASKS | :   0%|          | 0/2 [00:00<?, ?it/s]

PROCESSING TASKS | :   0%|          | 0/2 [00:00<?, ?it/s]

COLLECTING RESULTS | :   0%|          | 0/2 [00:00<?, ?it/s]

## Structure of the simulation data

The simulation runner stores the generated data on disk using `.parquet` format based
on Apache Arrow. This allows for retaining rich type information and produces files
readable for standard Python data frame libraries such as `pandas` or `polars`.

* `epoch` column specifies the particular epoch in a longer run a given record
  corresponds to. It used only when a simulation was run with `split_epochs=True`,
  as in this case records correspond not to full simulation runs but to individual
  epochs. A simulation data frame grouped to full histories can be generated with
  `.group_epochs()` method defined on the `SimulationFrame`.
* Leading columns store values of parameters that have been set by the runner.
* `epochs` column stores 1D arrays with time points at which ODE computations have
  been made expressed in terms of epochs (time expressed a normalized units).
* `T` column stores 1D arrays with the same time points but expressed in generic units.
* `E` column stores 1D arrays with environment at specified time points.
* `H` stores 1D arrays with flattened agents' harvesting rates at specified time points.
  The arrays have to be reshaped, i.e. `H.reshape(n_agents, -1)`, to recover the
  original structure with rows representing unique agents.
* `P` stores 1D arrays with flattened agents' profits at specified time points.
  Must be reshaped like `H`.
* `U` stores 1D arrays with flattened agents' utilities at specified time points.
  Must be reshaped like `H`.
* `R` stores 1D arrays with rewards at specified time points.
  Rewards are computed based on utilities as `U.mean(0)`, so they are just utility
  averages taken over agents. Note, however, that the default utility is linear for
  losses and concave for gains, so reward actually privileges games in which no agents
  went below zero profit. Note that other rewards can be defined and computed quite
  easily based on utilities stored in `U`.

### Parameter columns

Front columns before `param_id` column store values of control parameters used
for running simulations that were **explicitly set** (default values are not stored).

### Auxiliary columns

* `param_id` columns stores stable (MD5) hashes of set parameters.
* `sim_id` is a unique hash-based identifier of a specific simulation run.
* `rep_id` keeps track of subsequent repetitions for a specific configuration of 
  parameters, so `rep_id` values are unique for a specific value of `param_id`.


### Full data split by epochs

In [2]:
# FULL DATA SPLIT BY EPOCHS
data

Unnamed: 0,n_agents,horizon,alpha,bias,param_id,sim_id,rep_id,epoch,epochs,T,E,H,P,U,R
0,4,0.5,0,0,2624af53c5630476f6dbae022b406afc,40e417b277d91e0a24d4a4e624aa83b6,1,1,"[0.0, 0.0007078499, 0.0068895505, 0.022243792,...","[0.0, 0.035392497, 0.34447753, 1.1121897, 1.87...","[400.0, 399.9916, 399.2731, 393.4432, 382.6599...","[0.0, 0.00031575086, 0.0026607192, 0.007789004...","[0.0, -0.06282094, -0.44865048, -0.29232442, 1...","[0.0, -0.06282094, -0.44865048, -0.29232442, 0...","[0.0, -0.06295223, -0.4473367, -0.2867787, 0.7..."
1,4,0.5,0,0,2624af53c5630476f6dbae022b406afc,40e417b277d91e0a24d4a4e624aa83b6,1,2,"[1.0, 1.0032817, 1.0287545, 1.1198219, 1.33751...","[50.0, 50.164085, 51.43772, 55.991096, 66.8757...","[0.0016555123, 0.0015754324, 0.0010665682, 0.0...","[0.119293645, 0.11956429, 0.12068954, 0.122224...","[74.67433, 74.37277, 72.03197, 63.663013, 43.6...","[8.155885, 8.138448, 8.00188, 7.4945617, 6.126...","[8.449884, 8.433009, 8.300919, 7.811453, 6.503..."
2,4,0.5,0,0,2624af53c5630476f6dbae022b406afc,40e417b277d91e0a24d4a4e624aa83b6,1,3,"[2.0, 2.0019104, 2.0177915, 2.0912004, 2.25843...","[100.0, 100.09552, 100.88957, 104.56002, 112.9...","[0.0018712843, 0.0018048775, 0.0013355821, 0.0...","[0.13656142, 0.13650298, 0.13776214, 0.1373862...","[-17.228163, -17.403713, -18.863039, -25.60908...","[-17.228163, -17.403713, -18.863039, -25.60908...","[-11.995599, -12.171148, -13.630468, -20.37650..."
3,4,0.5,0,0,2624af53c5630476f6dbae022b406afc,40e417b277d91e0a24d4a4e624aa83b6,1,4,"[3.0, 3.0034597, 3.034489, 3.175553, 3.3263538...","[150.0, 150.17299, 151.72444, 158.77765, 166.3...","[-0.005793028, -0.005385677, -0.0027632688, 0....","[0.168246, 0.16813108, 0.1675305, 0.16809236, ...","[-109.12731, -109.44544, -112.29813, -125.2654...","[-109.12731, -109.44544, -112.29813, -125.2654...","[-103.895134, -104.21324, -107.06583, -120.032..."
4,4,0.5,0,0,2624af53c5630476f6dbae022b406afc,40e417b277d91e0a24d4a4e624aa83b6,1,5,"[4.0, 4.00391, 4.031944, 4.1418037, 4.3654685,...","[200.0, 200.1955, 201.59718, 207.0902, 218.273...","[0.004915013, 0.0045311335, 0.002510613, -0.00...","[0.18564674, 0.18547833, 0.18672788, 0.1857306...","[-201.0344, -201.39354, -203.96902, -214.06358...","[-201.0344, -201.39354, -203.96902, -214.06358...","[-195.8012, -196.16039, -198.73604, -208.83098..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,4,2.0,0,0,c1d3b8ee21807a70584bdb64d4c282b3,d198a9658814638e241b4abbbd0b32c2,10,6,"[5.0, 5.004655, 5.040201, 5.1573577, 5.3987303...","[250.0, 250.23274, 252.01007, 257.8679, 269.93...","[0.0028318735, 0.002719473, 0.0019934585, 0.00...","[0.10807926, 0.10809522, 0.10546426, 0.1089803...","[-251.01248, -251.44022, -254.70659, -265.4727...","[-251.01248, -251.44022, -254.70659, -265.4727...","[-255.18729, -255.61502, -258.88147, -269.6477..."
196,4,2.0,0,0,c1d3b8ee21807a70584bdb64d4c282b3,d198a9658814638e241b4abbbd0b32c2,10,7,"[6.0, 6.005827, 6.0396667, 6.153654, 6.472492,...","[300.0, 300.29135, 301.98334, 307.6827, 323.62...","[8.7203516e-05, 8.370923e-05, 6.586803e-05, 2....","[0.10488193, 0.10492291, 0.105566554, 0.104082...","[-342.91315, -343.44864, -346.5586, -357.03427...","[-342.91315, -343.44864, -346.5586, -357.03427...","[-347.08826, -347.62378, -350.7337, -361.2094,..."
197,4,2.0,0,0,c1d3b8ee21807a70584bdb64d4c282b3,d198a9658814638e241b4abbbd0b32c2,10,8,"[7.0, 7.0066285, 7.0495467, 7.1647444, 7.43287...","[350.0, 350.33142, 352.47733, 358.2372, 371.64...","[2.0538755e-05, 1.9593095e-05, 1.4489699e-05, ...","[0.097747326, 0.09760044, 0.099094614, 0.09689...","[-434.8155, -435.42468, -439.3689, -449.95584,...","[-434.8155, -435.42468, -439.3689, -449.95584,...","[-438.99063, -439.59982, -443.54407, -454.1309..."
198,4,2.0,0,0,c1d3b8ee21807a70584bdb64d4c282b3,d198a9658814638e241b4abbbd0b32c2,10,9,"[8.0, 8.007108, 8.073669, 8.2569, 8.573489, 9.0]","[400.0, 400.3554, 403.68347, 412.84497, 428.67...","[8.371711e-07, 7.833862e-07, 4.1895103e-07, -5...","[0.10726638, 0.1072913, 0.1068498, 0.1174764, ...","[-526.71783, -527.37115, -533.4883, -550.3276,...","[-526.71783, -527.37115, -533.4883, -550.3276,...","[-530.893, -531.54626, -537.66345, -554.5027, ..."


### Full data with grouped epochs

In [3]:
# DATA GROUPED BY EPOCHS
data.group_epochs()

Unnamed: 0,n_agents,horizon,alpha,bias,param_id,sim_id,rep_id,epochs,T,E,H,P,U,R
0,4,0.5,0,0,2624af53c5630476f6dbae022b406afc,40e417b277d91e0a24d4a4e624aa83b6,1,"[0.0, 0.0007078499, 0.0068895505, 0.022243792,...","[0.0, 0.035392497, 0.34447753, 1.1121897, 1.87...","[400.0, 399.9916, 399.2731, 393.4432, 382.6599...","[0.0, 0.00031575086, 0.0026607192, 0.007789004...","[0.0, -0.06282094, -0.44865048, -0.29232442, 1...","[0.0, -0.06282094, -0.44865048, -0.29232442, 0...","[0.0, -0.06295223, -0.4473367, -0.2867787, 0.7..."
1,4,0.5,0,0,2624af53c5630476f6dbae022b406afc,ae55dc0c17d48ad774fbf2200771bc69,2,"[0.0, 0.0007523473, 0.0065774047, 0.01858248, ...","[0.0, 0.037617363, 0.32887024, 0.929124, 1.578...","[400.0, 399.9909, 399.33038, 395.00156, 386.35...","[0.0, 0.00028619388, 0.002802817, 0.0075467764...","[0.0, -0.067079395, -0.42266563, -0.35407272, ...","[0.0, -0.067079395, -0.42266563, -0.35407272, ...","[0.0, -0.06686473, -0.43359014, -0.38396996, 0..."
2,4,0.5,0,0,2624af53c5630476f6dbae022b406afc,6e90128885e89aa61171d05107851c55,3,"[0.0, 0.00071409217, 0.005339897, 0.014852422,...","[0.0, 0.03570461, 0.26699483, 0.7426211, 1.373...","[400.0, 399.9921, 399.5641, 396.74896, 389.734...","[0.0, 0.00023980996, 0.0017239787, 0.005182165...","[0.0, -0.06404447, -0.39447346, -0.57054627, 0...","[0.0, -0.06404447, -0.39447346, -0.57054627, 0...","[0.0, -0.06364659, -0.37992305, -0.51502407, 0..."
3,4,0.5,0,0,2624af53c5630476f6dbae022b406afc,d089df11d67d89e7cc43bef958be8e37,4,"[0.0, 0.0007462389, 0.0056186155, 0.018579582,...","[0.0, 0.037311945, 0.2809308, 0.9289791, 1.586...","[400.0, 399.9915, 399.50043, 395.07788, 386.71...","[0.0, 0.00032345604, 0.0022450075, 0.007242337...","[0.0, -0.066253446, -0.3845026, -0.42283317, 0...","[0.0, -0.066253446, -0.3845026, -0.42283317, 0...","[0.0, -0.06644422, -0.38926658, -0.40290785, 0..."
4,4,0.5,0,0,2624af53c5630476f6dbae022b406afc,7471000ac4e87204b8a025a361fc3751,5,"[0.0, 0.00072349166, 0.0057536038, 0.01878774,...","[0.0, 0.036174584, 0.28768018, 0.93938696, 1.7...","[400.0, 399.99158, 399.4756, 394.73608, 383.34...","[0.0, 0.00028030833, 0.002262238, 0.007729138,...","[0.0, -0.06444023, -0.3950219, -0.34371188, 1....","[0.0, -0.06444023, -0.3950219, -0.34371188, 0....","[0.0, -0.064376846, -0.39538378, -0.3316431, 0..."
5,4,0.5,0,0,2624af53c5630476f6dbae022b406afc,d259eee613631a03c214b9e8f854d3d3,6,"[0.0, 0.00072437624, 0.0058908053, 0.019249763...","[0.0, 0.03621881, 0.29454026, 0.9624882, 1.740...","[400.0, 399.99207, 399.4648, 394.8403, 384.451...","[0.0, 0.0002731402, 0.0022280333, 0.007101194,...","[0.0, -0.06457733, -0.4109922, -0.408904, 1.03...","[0.0, -0.06457733, -0.4109922, -0.408904, 0.63...","[0.0, -0.06458045, -0.40507624, -0.3986968, 0...."
6,4,0.5,0,0,2624af53c5630476f6dbae022b406afc,6abf1ecdefb71fdb337693a0f1097d85,7,"[0.0, 0.0007164137, 0.0060460283, 0.019011851,...","[0.0, 0.035820685, 0.3023014, 0.9505926, 1.604...","[400.0, 399.9916, 399.4372, 394.91406, 386.627...","[0.0, 0.0002583616, 0.001989831, 0.006508659, ...","[0.0, -0.06398371, -0.42950025, -0.49506593, 0...","[0.0, -0.06398371, -0.42950025, -0.49506593, 0...","[0.0, -0.06373763, -0.4123739, -0.3982252, 0.4..."
7,4,0.5,0,0,2624af53c5630476f6dbae022b406afc,e6cd97fee0de2a74f0f3fdc1f6529538,8,"[0.0, 0.0007444818, 0.0063196486, 0.019618832,...","[0.0, 0.03722409, 0.31598243, 0.9809416, 1.920...","[400.0, 399.99124, 399.38525, 394.4758, 381.12...","[0.0, 0.00031934713, 0.0023090222, 0.007610185...","[0.0, -0.06591934, -0.42386338, -0.31755027, 1...","[0.0, -0.06591934, -0.42386338, -0.31755027, 0...","[0.0, -0.0662241, -0.42402938, -0.3367301, 0.9..."
8,4,0.5,0,0,2624af53c5630476f6dbae022b406afc,0cec74a4af8db7ad9bc100bdffa1b61f,9,"[0.0, 0.0007589888, 0.0062086033, 0.0189772, 0...","[0.0, 0.03794944, 0.31043017, 0.94886005, 1.55...","[400.0, 399.99133, 399.4252, 395.04248, 387.17...","[0.0, 0.00027475166, 0.0024587319, 0.006971136...","[0.0, -0.06766265, -0.4274958, -0.46270975, 0....","[0.0, -0.06766265, -0.4274958, -0.46270975, 0....","[0.0, -0.067579806, -0.4241124, -0.43160748, 0..."
9,4,0.5,0,0,2624af53c5630476f6dbae022b406afc,583e6412f3ad74ce044b50a24bb2e36b,10,"[0.0, 0.0007065817, 0.005157686, 0.01597752, 0...","[0.0, 0.035329085, 0.2578843, 0.79887605, 1.80...","[400.0, 399.99158, 399.59842, 396.25, 382.5313...","[0.0, 0.0002972614, 0.0019836791, 0.0064578974...","[0.0, -0.06267703, -0.36695585, -0.4673112, 1....","[0.0, -0.06267703, -0.36695585, -0.4673112, 0....","[0.0, -0.06282671, -0.3719935, -0.48463964, 0...."
