# States


The `State` class and related architecture in _Starsim_ supports three key features:

- The ability to add and remove agents during a simulation, to enable simulation of births, and increase simulation performance by removing unused agents (e.g., due to death or emigration)
- The ability to access the states of agents by their UID rather than array index (because the array indices may change as agents are removed from simulations)
- The ability to distribute states across different modules and components of the model. For example, interventions and networks can define states for attributes such as vaccination status or age of debut, and have them dynamically grow and shrink the same as states stored inside a `People` object

Compared to previous models such as _Covasim_ and _HPVsim_, the distribution of agent state throught a simulation means that the `People` object should no longer be thought of as a central 'repository' of all of the states associated with agents. Rather, it is a container for demographic states and other attributes that are common to agents across all modules. That is, the `People` object provides primary storage for attributes such as age and sex, whereas module specific attributes (such as susceptible, infectious, vaccinated) are stored within their respective modules. It is of course possible to store references to states across a simulation, and indeed for convenience references to the states associated with disease modules are automatically added to the `People` object. However, it is important to keep in mind that there are are still other states outside of the `People` instance, that are accessed via their respective modules. 

<div class="alert alert-info">
A common use pattern is to store references to modules to faciliate access to specific states. For example, an intervention could store a reference to a disease module, so that vaccine eligibility at each timestep can be checked against infection history. In fact, a module can directly store references to other `State` instances. In the same way that the `People` object contains references to disease states, an `Intervention` could directly reference `States` within a disease. Therefore, an intervention can be linked to a network, disease, or other module by passing that module to the intervention's constructor as part of creation/initialization. This allows binding the module to specific states without having to access them via a `People` instance.
</div>

![State primary storage locations](states_1.svg "State primary storage locations")

## Implementation

The implementation of the states in _Starsim_ is based around three key classes:

![states_2](states_2.svg "states_2")

- The `ArrayView` class is an array-like object that can efficiently grow and shrink. This class is indexed directly, and is used to store the UIDs and UID map in the `People` class.
- The `UIDArray` class is an array-like object that supports fast indexing by UID. Indexing a `UIDArray` in such a way that multiple items are retrieved, returns another `UIDArray` instance.
- The `State` class is a sub-class of `UIDArray`, in which the array being indexed by UID is itself an `ArrayView`. The `State` class also contains additional functionality around setting default values for new agents.

Users are expected to interact with the `State` class and `UIDArray` classes in normal usage, whereas `ArrayView` is mainly only used internally.

## User interface

The user interface for the `State` class (and related architecture) is intended to be as seamless as possible, in that they should generally behave like 'arrays that can be indexed by UID'. The main exception to this is that slicing is not supported because it can behave in ambiguous or confusing ways due if UIDs are non-contiguous. Suppose we have a simulation with a state:

In [None]:
import starsim as ss
import numpy as np
sim = ss.Sim(verbose=0)
sim.run();
state = sim.people.age
state

We can see that a state is printed in a similar way to a Pandas series. In this example simulation, no agents have been removed, and therefore the UIDs are continuous integers the same as an array index. To highlight the impact of UID indexing, we can go ahead and remove some of the agents:

In [None]:
sim.people.remove(range(0,1000))
state

Now the first 1000 agents have been removed. We can index the state using the UID to retrieve a single value:

In [None]:
state[1000]

Or to retrieve multiple values:

In [None]:
state[[1000,1001]]

In [None]:
state[[1000]]

Notice how when multiple items were retrieved, the result was a `UIDArray` that preserves both the UIDs and the values. Indexing a `State` or `UIDArray` with a scalar will return a single value. Indexing with an iterable (a list or an array) will return a `UIDArray` that In general, indexing a `State` 

- `State[uids]` → `U`
- `FusedArray[uids]` → `FusedArray

We can also index the state using a logical array:

In [None]:
state[state<75]

Notice in these examples above that when multiple items were retrieved, the result of the indexing operation retained the UIDs. This is because when an indexing operation is performed on a `FusedArray`, the result is another `FusedArray` instance. This facilitates chained filtering. In particular, it means that UIDs can be incrementally filtered without ever having to track underlying indices. 

In [None]:
older_than_75 = state[state>75]
older_than_75

In [None]:
age_75_to_80 = older_than_75[older_than_75<80]
age_75_to_80

In [None]:
sim.people.female[age_75_to_80.uid]

`UIDArray` objects are also able to store the UIDs in arbitrary order. For example, we can shuffle the requested UIDs in the example above, and construct a `UIDArray` that has items appearing in a random order.

In [None]:
sim.people.female[np.random.permutation(age_75_to_80.uid)]

This functionality can be quite useful when operating on sampled UIDs, where the samples could be drawn in any order. It is not necessary to sort the UIDs in this instance. 

<div class="alert alert-danger">
Elementwise operations act directly on the values stored by `UIDArray` instances, without performing any realignment based on UIDs. When performing vector operations on `UIDArray` objects, make sure that they all contain the same UIDs.
</div>


## Initialization

There are three key steps in state initialization:

1. States might contain default values that are distributions. These distributions are passed in as `scipy` distribution objects (or similar) and need to be converted into `ss.ScipyDistribution` objects, which are linked to RNGs in a specific `Sim`. Therefore, this initialization step requires a `Sim` instance to be specified
2. States must be connected to a `People` instance so that it can be dynamically resized when agents are added or removed. This linkage is bidirectional, with the `State` containing references to `People.uid` and with the `People` containing a reference to the `State` so that it can trigger resizing. This initialization step requires a `People` instance to be specified
3. States must have their initial values populated

The first two steps must be completed prior to the last step, because the number of default values required depends on the number of agents (and therefore the length of the `People` object), and because populating the default values requires that the default value distribution be already linked to an RNG prior to sampling. 

Because the initialization of the states depends on the number of agents, it cannot take place prior to initialization of the `People` object. Since the initialization of the states within the `People` object itself can depend on distributions, a `Sim` is required prior to initializing the `People` object. Therefore, initialization proceeds as


```
Sim.initialize()
    ↳ People.initialize()
        ↳ state.initialize() - for states contained in the `People` instance
    ↳ Module.initialize()    
        ↳ state.initialize() - for states contained in the `Module` instance
```