# Basic Disease Model

Here we'll produce a data-free disease model focusing on core `vivarium` concepts. More complicated versions of 
the components built here can be found in the 
[`vivarium_public_health`](www.github.com/ihmeuw/vivarium_public_health) library. 

Those components deal must additionally deal with manipulating complex data which makes understanding 
what's going on more complicated. After this tutorial, you should be well poised to begin working with
and examining those components.

### Some Terminology

There's a lot going on in `vivarium`, and so it's useful to define
a few terms up front:

- **simulant**: An individual or agent. One member of the population
  being simulated.
- **attribute**: A variable associated with each simulant. For example,
  each simulant may have an attribute to describe their age or position.
- **component**: Any self-contained piece of code that can be plugged
  into the simulation to add some functionality. In `vivarium`
  we typically think of components as encapsulating and managing
  some behavior or attributes of the simulants.

### Building a population


In many ways, this is a bad place to start. The population component
is one of the more complicated components in the simulation as it
typically is responsible for bootstrapping some of the more interesting
features in `vivarium`.  

We need a population though. So we'll start with one here and defer
explanation of some of the more complex pieces/systems until later.

```python
import numpy as np
import pandas as pd

from vivarium.framework.engine import Builder
from vivarium.framework.population import SimulantData
from vivarium.framework.event import Event


class BasePopulation:
    """Generates a base population with a uniform distribution of age and sex.

    Attributes
    ----------
    configuration_defaults :
        A set of default configuration values for this component. These can be
        overwritten in the simulation model specification or by providing
        override values when constructing an interactive simulation.
    """

    configuration_defaults = {
        'population': {
            # The range of ages to be generated in the initial population
            'age_start': 0,
            'age_end': 100,
            # Note: There is also a 'population_size' key.
        },
    }

    def setup(self, builder: Builder):
        """Performs this component's simulation setup.

        The ``setup`` method is automatically called by the simulation
        framework. The framework passes in a ``builder`` object which
        provides access to a variety of framework subsystems and metadata.

        Parameters
        ----------
        builder :
            Access to simulation tools and subsystems.
        """
        self.config = builder.configuration

        self.with_common_random_numbers = bool(self.config.randomness.key_columns)
        if (self.with_common_random_numbers
                and not ['entrance_time', 'age'] == self.config.randomness.key_columns):
            raise ValueError("If running with CRN, you must specify ['entrance_time', 'age'] as"
                             "the randomness key columns.")

        self.age_randomness = builder.randomness.get_stream('age_initialization',
                                                            for_initialization=self.with_common_random_numbers)
        self.sex_randomness = builder.randomness.get_stream('sex_initialization')
        self.register = builder.randomness.register_simulants

        columns_created = ['age', 'sex', 'alive', 'entrance_time']
        builder.population.initializes_simulants(self.initialize_population,
                                                 creates_columns=columns_created)

        self.population_view = builder.population.get_view(columns_created)

        builder.event.register_listener('time_step', self.age_simulants)

    def initialize_population(self, pop_data: SimulantData):
        """Called by the simulation whenever new simulants are added.

        This component is responsible for creating and filling four columns
        in the population state table:

        'age' : The age of the simulant in fractional years.
        'sex' : The sex of the simulant. One of {'Male', 'Female'}
        'alive' : Whether or not the simulant is alive.
                  One of {'alive', 'dead'}
        'entrance_time' : The time that the simulant entered the simulation.
                          The 'birthday' for simulants that enter as
                          newborns. A `pandas.Timestamp`.

        Parameters
        ----------
        pop_data :
            A record containing the index of the new simulants, the
            start of the time step the simulants are added on, the width
            of the time step, and the age boundaries for the simulants to
            generate.

        """

        age_start = pop_data.user_data.get('age_start', self.config.population.age_start)
        age_end = pop_data.user_data.get('age_end', self.config.population.age_end)
        if age_start == age_end:
            age_window = pop_data.creation_window / pd.Timedelta(days=365)
        else:
            age_window = age_end - age_start

        age_draw = self.age_randomness.get_draw(pop_data.index)
        age = age_start + age_draw * age_window

        if self.with_common_random_numbers:
            population = pd.DataFrame({'entrance_time': pop_data.creation_time,
                                       'age': age.values}, index=pop_data.index)
            self.register(population)
            population['sex'] = self.sex_randomness.choice(pop_data.index, ['Male', 'Female'])
            population['alive'] = 'alive'
        else:
            population = pd.DataFrame(
                {'age': age.values,
                 'sex': self.sex_randomness.choice(pop_data.index, ['Male', 'Female']),
                 'alive': pd.Series('alive', index=pop_data.index),
                 'entrance_time': pop_data.creation_time},
                index=pop_data.index)

        self.population_view.update(population)

    def age_simulants(self, event: Event):
        """Updates simulant age on every time step.

        Parameters
        ----------
        event :
            An event object emitted by the simulation containing an index
            representing the simulants affected by the event and timing
            information.
        """
        population = self.population_view.get(event.index, query="alive == 'alive'")
        population['age'] += event.step_size / pd.Timedelta(days=365)
        self.population_view.update(population)
```

### There are a lot of things going on here.  Let's take things piece by piece.

#### Configuration

You'll see this sort of pattern repeated in many, many `vivarium` components.

```python
class BasePopulation:

    configuration_defaults = {
        'population': {
            # The range of ages to be generated in the initial population
            'age_start': 0,
            'age_end': 100,
            # Note: There is also a 'population_size' key.
        },
    }
```

We declare a configuration block as a class attribute for components.  `vivarium` has a 
cascading configuration system. We pull in configuration information from user-level configurations
stored in the user's home directory, from command line arguments, from components, 
and from model specification files. None of that is especially important to understand at this point.

The configuration is essentially a declaration of the parameter space for the simulation.
What is important for now is to understand that several configuration values are given 
default values provided by the components themselves and that they can be overriden
with a higher level system later.  

In this component in particular declares defaults for the age range for the initial population
of simulants. It also notes that there is a `population_size` key. This key has a default
value (100) set by the `vivarium`'s population management system.

#### `__init__()`

Though this component is represented by a python class, you'll notice it does not contain the normal `__init__` method. This is also relatively common. Due to the way the simulation bootstraps itself, the `__init__` method is usually only used to assign names to generic components and muck with the `configuration_defaults` a bit.  We'll see more of this later.

#### `setup(self, builder)`

This function is what replaces normal object initialization. 
<div class="alert alert-info">

### The Builder
    
The `builder` object is essentially the simulation 
toolbox. It provides access to several simulation subsystems:

- `builder.configuration` : A dictionary-like representation of all of the parameters in the simulation.
- `builder.lookup` : A service for generating interpolated lookup tables.  We won't use these in this tutorial.
- `builder.value` : The value pipeline system. In many ways this is the heart of any `vivarium` simulation. We'll discuss this in great detail as we go.
- `builder.event` : Access to `vivarium`'s event system. The primary use is to register listeners for `time_step` events.
- `builder.population` : The population management system. Registers population initializers (functions that fill in initial state information about simulants), give access to views of the simulation state, and mediates updates to the simulation state.  It also provides access to functionality for generating new simulants (e.g. via birth or migration), though we won't use that feature in this tutorial.
- `builder.randomness` : `vivarium` uses a variance reduction technique called Common Random Numbers to perform counterfactual analysis.  In order for this to work, the simulation provides a centralized source of randomness.
- `builder.time` : The simulation clock.  
- `builder.components` : The component management system. Primarily using for registering subcomponents for setup.  

   
</div>


If we look line by line:

```python
def setup(self, builder: Builder):
```

This is the same signature for every `setup` method.  The simulation will call
this method on each component and provide it with a reference to the `builder`.

```python
    self.config = builder.configuration
```

We then grab a reference to simulation configuration.  This is like a dictionary
that also supports `.` notation access.


_Note_: The explanation below is concerned with bootstrapping the Common Random Number system.
This is relatively complicated, and probably best approached once you have more
familiarity with the less esoteric usage of the randomness system.

```python

    self.with_common_random_numbers = bool(self.config.randomness.key_columns)
    if (self.with_common_random_numbers 
            and not ['entrance_time', 'age'] == self.config.randomness.key_columns):
        raise ValueError("If running with CRN, you must specify ['entrance_time', 'age'] as"
                         "the randomness key columns.")
```

The Common Random Number (CRN) system requires a way to uniquely identify simulants.
This requires some bootstrapping. We need to randomly generate some simulant characteristics
in a repeatable fashion and then use those characteristics to identify the simulants
in the randomness system later.  This is **only** handled by the population component
typically. It's vitally important to get right when doing counterfactual analysis, 
but not especially important to understand the mechnics of.

We're using some information about the configuration of the randomness system
to let us know whether or not we care about using CRN. We'll explore this
much later when we're looking at running simulations with interventions.

```python
    self.age_randomness = builder.randomness.get_stream('age_initialization',
                                                        for_initialization=self.with_common_random_numbers)
    self.sex_randomness = builder.randomness.get_stream('sex_initialization')
    self.register = builder.randomness.register_simulants
```


`get_stream` is the only call most components make to the randomness system. The best way
to think about randomness streams is as decision points in your simulation. Any time
you need to answer a question that requires a random number, you should be using 
a randomness stream linked to that question. Here we have the questions "What age
are my simulants when they enter the simulation?" and "What sex are my simulants?"
and streams to go along with them.

The `for_initialization` argument tells the stream that the simulants you're asking
this question about won't already be registered with the randomness system. This is the 
bootstrapping part. Here we're using the `entrance_time` and `age` to identify a simulant
and so we need a stream to initialize ages with. There is should really only be one of 
these in a simulation.  

The `sex_randomness` is a much more typical example of how to interact with 
the `randomness` system.

Finally, we grab a handle to the function that registers new simulants with the
randomness system. This is called once on every simulant after their identifying characteristics
have been generated but before any other 

```python
    columns_created = ['age', 'sex', 'alive', 'entrance_time']
    builder.population.initializes_simulants(self.initialize_population, 
                                             creates_columns=columns_created)
```

Next we register the `initialize_population` method of our `BasePopulation` object as a population 
initializer and let the population management system know that it is responsible for 
generating the 'age', 'sex', 'alive', and 'entrance_time' columns in the population state table.

<div class="alert alert-info">

**The Population Table**

   When we talk about columns in the context of `vivarium`, we are typically
   talking about the **attributes** we defined at the top of this tutorial.

   `vivarium` represents the population of simulants as a single `pandas.DataFrame`.
   We think of each simulant as a row in this table and each column as an attribute
   of the simulants.
   
</div>

```python
    self.population_view = builder.population.get_view(columns_created)
```

We then get a view into the same state table containing just the columns we create. 
This view is how we'll interact with population table in the methods we'll talk about 
shortly.

```python
    builder.event.register_listener('time_step', self.age_simulants)
```

Finally, we register the `age_simulants` method as a listener to the `'time_step'`
event. Any time this event is called, the `age_simulants` method will be called
as well.

### That was a lot of stuff.

As I mentioned at the top the population component is one of the more complicated pieces of any simulation.
It's not important to grasp everything right now. We'll see many of the same patterns repeated in the `setup`
method of other components later. The unique things here are worth coming back to later.

#### initialize_population(self, pop_data)

During setup, we registered this function with the population system as a simulant initializer.

```python
def initialize_population(self, pop_data: SimulantData):
```

Every initializer is called by the population management whenever simulants are 
created. For our purposes, this happens only once at the very beginning of the 
simulation. Typically, we'd task another component with responsibility for managing
other ways simulants might enter (we might, for instance, have a `Migration` component
that knows about how and when people enter and exit our location of interest).

The population management system uses information about what columns are created by 
which components in order to determine what order to call initializers defined in separate
classes. We'll see what this means in practice later.

Every initializer is called with `pop_data`, which is an instance of the `SimulantData`
[named tuple](link to named tuple docs) carrying some information that might be
useful during simulant initialization.

<div class="alert alert-info">

**SimulantData**

   This simple structure only has four attributes (used here in the generic python sense
   of the word).  
   
   - `index` : The population table index of the simulants being initialized.
   - `user_data` : A (potentially empty) dictionary generated by the user in components
     that directly create simulants.
   - `creation_time` : The current simulation time. A `pandas.Timestamp`.
   - `creation_window` : The size of the time step over which the simulants are created. A `pandas.Timedelta`.
   
</div>

<div class="alert alert-info">

**The Population Index**

The population table we described before has an index that identifies
each simulant. This index is used in several places in the simulation to look
up information, calculate simulant-specific values, and update information
about the simulants' state.
   
</div>

```python
    age_start = pop_data.user_data.get('age_start', self.config.population.age_start)
    age_end = pop_data.user_data.get('age_end', self.config.population.age_end)
    if age_start == age_end:
        age_window = pop_data.creation_window / pd.Timedelta(days=365)
    else:
        age_window = age_end - age_start

    age_draw = self.age_randomness.get_draw(pop_data.index)
    age = age_start + age_draw * age_window
```

Next, we go about generating the initial age of our simulants when they enter the simulation.
This breaks into two cases.  

`age_start == age_end` represents the generation of a cohort
where everyone is the same age. We smear these out within the creation window (the window 
of time between the current simulation time and the beginning of the next time step).

Otherwise we're generating a distribution of people. Here we assume our population
is uniformly spread out within the specified age window.

```python
    if self.with_common_random_numbers:
        population = pd.DataFrame({'entrance_time': pop_data.creation_time,
                                   'age': age.values}, index=pop_data.index)
        self.register(population)
        population['sex'] = self.sex_randomness.choice(pop_data.index, ['Male', 'Female'])
        population['alive'] = 'alive'
```

If we're working with common random numbers, we need to generate our two 
identifying columns, `entrance_time` and `age` first.  We then register
our simulants with the randomness system. We can then generate other columns
using the randomness system with guarantees around reproducibility.

```python
    else:
        population = pd.DataFrame(
            {'age': age.values,
             'sex': self.sex_randomness.choice(pop_data.index, ['Male', 'Female']),
             'alive': pd.Series('alive', index=pop_data.index),
             'entrance_time': pop_data.creation_time},
            index=pop_data.index)
```
Otherwise, we can just make a normal `pandas.DataFrame` containing our new simulant
information.

```python
    self.population_view.update(population)
```

Finally, whichever route we took to generate our new simulant data, we pass it as 
an argument to our `population_view.update` function which records the information
in the state table.

**Warning** - The data generated must have the same index that was passed
in with the `pop_data`. You can potentially cause yourself a great deal 
of headache otherwise.