# The Vaccine Allocation Problem
## Description

TODO

## 4 Groups with Finite Action and Finite State Spaces

The file `vacc_alloc_4groups.py` contains an implemenation of an RL environment for the vaccine allotment problem with 4 possible risk groups, a finite action space, and a finite state space. 

### Reinforcement Learning Problem Setup (dynamics)

**Model Assumptions** 
> The total population is kept constant throughout the entire time horizon. 
There are precisely 4 risk classes, with the first class corresponding to medical workers.
Vaccines are administered by priority group. The first priority group is vaccinated until there are either no more eligible vaccination candidates or there are no more vaccines. If vaccines remain, the next priority group is vaccinated and so on. Asymptomatic individuals are never vaccinated. 

For details about the assumptions made in implementing the SIR disease dynamics model, please refer to the section titled **Disease Dynamics** below.

**State Space**
> The state space for this environment is discrete, consisting of tuples of integer values representing certain population stats. 
In this case, an element of the state space looks like $$(S_1, S_2, S_3, S_4, A_1, A_2, A_3, A_4, I, H, N) \in \{0,1,2,...,\text{total_population}\}^{11}$$ where 
+ $S1,S2,S3,S4 =$ the total # of susceptible people in groups 1, 2, 3 and 4, resp
+ $A1,A2,A3,A4 =$ the total # of asymptomatically infected people in groups 1, 2, 3 and 4, resp
+ $I =$ total # of mild symptomatic infections
+ $H =$ total # of hospitalized infections and 
+ $N =$ total # of NEW infections (includes asymptomatic and symptomatic) that occurred.

**Action Space**
> The action space for this environment is finite, consisting of 25 possible actions. This is represented in `vacc_alloc_4groups.py` as $A =\{0, 1, 2, ..., 24\}$, where each number maps to a particular action. An action in this space is choosing a priority order. A valid priority order is either (1) no priority order, aka vaccinate susceptible people in the population randomly, or (2) a permuation of $\{1, 2, 3, 4\}$. In the file, an action is represented by a number $i \in A$ where the corresponding action is at index $i$ of the list `all_priority_orders`. 0 corresponds to \[ \], which means no priority order. The dictionary in the following cell represents the mapping:

In [2]:
all_priority_orders_dict = {'0': [], '1':["c1","c2","c3","c4"], '2': ["c2","c1","c3","c4"], 
                            '3': ["c3","c1","c2","c4"], '4': ["c1","c3","c2","c4"], '5': ["c2","c3","c1","c4"],
                            '6': ["c3","c2","c1","c4"], '7': ["c4","c2","c1","c3"], '8': ["c2","c4","c1","c3"],
                            '9': ["c1","c4","c2","c3"], '10': ["c4","c1","c2","c3"], '11': ["c2","c1","c4","c3"],
                            '12': ["c1","c2","c4","c3"], '13': ["c1","c3","c4","c2"], '14': ["c3","c1","c4","c2"], 
                            '15': ["c4","c1","c3","c2"], '16': ["c1","c4","c3","c2"], '17': ["c3","c4","c1","c2"], 
                            '18': ["c4","c3","c1","c2"], '19': ["c4","c3","c2","c1"], '20': ["c3","c4","c2","c1"],
                            '21': ["c2","c4","c3","c1"], '22': ["c4","c2","c3","c1"], '23': ["c3","c2","c4","c1"],
                            '24': ["c2","c3","c4","c1"]}

**Reward**
> The reward is precisely equal to (negative) the number of *new* infections that occurred in this step. By maximizing this quantity, we minimize the number of new infections.

**Transitions**
> Transistions between states are governed by the disease dynamics implemented in `dynamics_model_4groups.py` file and described below under the section **Disease Dynamics**. In short, the current state and action are fed to the function `dynamics_model` imported from the aforementioned file. This function then returns the new state as well as some additional information that can be used to plot the population dynamics.

**Environment**
> `reset` returns the environment to its original state
> 
> `step(action)` takes an action from the agent and returns the state of the system after this action is taken
+ action: an integer between 0 and 24 corresponding to the index of the `all_priority_orders` list that corresponds to the chosen priority order
+ example: "no priority order" would be $0$, since `all_priority_orders[0] = "[]"`
> 
> This function returns
+ `newState`: np.array of integers representing the new state (see state space description above)
+ `reward`: integer value equal to the number of new infections that occurred in this step and multiplied by $-1$
+ `done`: either 0 or 1; flag for the end of the episode
+ `info`: dictionary of information we can use to plot `disease` dynamics for this step
>
> `render` is currently unimplemented
>
> `close` is currently unimplemented

### Disease Dynamics
Consider a population of $N$ individuals split into four risk classes: medical workers (c1), non-medical essential workers (c2), high risk individuals (c3) and everyone else (c4). Each risk class is further subdivided into five groups: 
1. susceptible to the disease (but not infected)
2. asymptomatically infected
3. infected with mild symptoms, 
4. infected with severe symptoms (assumed to require hospitalization)
5. recovered or vaccinated or otherwise immune

**Assumptions**
> TODO

**Model Details/Approach**
> TODO

**How to use the `dynamics_model_4groups.py` file**
> TODO

### Heuristic Agents

TODO

## 4 Groups with Continuous Action and Continuous State Space

TODO: currently unimplemented

## Further extensions...

TODO: currently unimplemented