In [1]:
import os, sys
import inspect
#os.chdir(os.path.dirname(os.path.abspath("__file__")))

repo_path = os.path.abspath(os.path.join(os.getcwd(), '../../..'))
code_path = os.path.abspath(os.path.join(repo_path, "code/src"))

os.chdir(os.path.join(repo_path))

if code_path not in sys.path:
    sys.path.append(code_path)

# Table of contents

1. [Background](#background)
    1. [Transmission model](#transmission)
    2. [Demography](#demography)

2. [Model structure](#s_structure)
    1. [Run Scenarios](#s_run_scenarios)
    2. [Model](#s_model)
        1. [Disease](#s_disease)
        2. [Population](#s_population)
        3. [Observers](#s_observers)
    3. [Utils](#utils)
    4. [Code guide](#s_code_guide)
3. [Installing the module](#installation)
4. [How to run simulations](#run_simulation)



# 1. Background  <a name="background"></a>
The transmission model is an individual-based model that incorporates demographic and transmission dynamics. Each serotype is represented individually. We assume a susceptible-infectious-susceptible (SIS) model for S. pneumoniae infection, where “infection” represents carriage of S. pneumoniae, and disease outcomes are considered for the infected individuals. Similar to previous modelling studies, the infectious duration for an individual is sampled from an exponential distribution with an age-specific mean infectious duration. Upon recovery, infectious individuals re-enter the susceptible class, with protection derived from antibodies according to the time since their last vaccination. 


### 1.1. Transmission model <a name="transmission"></a>
### TODO: Update the equations and symbols throughout whole file

##### Mixing assumptions
Transmission within and between age groups is driven by contact rates, derived from synthetic contact matrices . In this work, we have assumed contacts in our population according to the contact matrices provided by Prem et al.

#####  Carriage assumptions
We assume that individuals are exposed to a single serotype at every time step. The exposed serotype is randomly selected from weighted serotype distribution in the population:

Pr(Exposure to serotype a)=  ((τ_a  * I_a))/( ∑_s▒〖(τ_s  * I_s )   〗)

where τ_a is the transmission coefficient multiplier of serotype a, I_a is the number of individuals infected by serotype a, and the denominator represents the sum of exposures by all the circulation serotypes, s. We then calculate the transmission of the exposed S. pneumoniae serotype for each individual. Individual infection risk for a particular serotype is determined by serotype prevalence, age, contact rates, current infection status and current antibody levels:

Pr(Infection by serotype a | exposure to serotype a)= (1 - e^(-foi_i )) * V_a  * C

where, foi_i represents the force of infection on an individual in age class i, V_a represents the vaccine-induced probability of acquisition multiplier to serotype a (see Section 2.1), and C represents the reduction of susceptibility given existing infections. We allow co-infections in the model assuming that individuals may be infected with multiple serotypes concurrently, with a reduced probability of acquiring another serotype if already infected, representing within-host competition. In this work, we assumed that C takes values of 1 (fully susceptible), 0.8 and 0 for the existing number of infections of 0, 1, and 2, respectively, which means that individuals could only be infected with two serotypes concurrently.
 The force of infection comprises age-specific community transmission:

〖foi〗_i =q*∑_s▒τ_s  *(∑_j▒〖η_ij   I_js/(N_j  )〗)  

where:

    q  = Transmission coefficient

    τ_s = Transmission coefficient multiplier of serotype s

    η_ij  = Number of daily community contacts between an individual in age group i and individuals in age group j

    I_js = Number of individuals in age group j infected with serotype s
    
    N_j = Number of individuals in age group j


```python
def check_exposure():

def check_recoveries():
```

In addition to community-level circulation, we introduce serotypes externally via two processes.  First, an external exposure rate simulates the ongoing introduction of serotypes through population migration, assumed to have the same age-specific and serotype-specific incidence of S. pneumoniae carriage as the local population (which maintains the same level of serotype and age specific carriage while population size increases). Second, introduction of any serotype sampled uniformly from the set of all possible serotypes and therefore may not be currently circulating within the local population (which allows possibility of serotype replacement under immune selection over time).

##### Clinical model

The clinical model takes information on the age and vaccine history of newly infected individuals from the transmission model, and the antibody levels at the time of infection, derived from these values, from the immunity model. The model then determines whether an infected individual develops a disease outcome (IPD or CAP), according to the relative probability of each occurring, and records the disease outcomes when they occur. In this work, we assume that 23vPPV is not effective against acquisition and developing CAP. 

```python
def check_disease():
```

### Vaccine Rollout
The model simulates the implementation of historical vaccine schedules with their rollout years, eligible age range, annual fraction of on-time and late vaccine coverages of the eligible age group, number of doses and time interval between doses. We assume that individuals receiving the first dose on time would also receive the following doses on time. We also assume that maximum protection by a vaccine starts immediately.

```python
def check_vaccines():
```

### 1.2. Demography <a name="demography"></a> 
The model includes births, ageing, age-specific mortality, and age-specific migration to capture the age distribution of a population over time. It takes inputs of initial population size, initial age distribution, age-specific death rates, annual net birth rates, annual migration rates, and age distribution in the emigrated population to simulate demographic dynamics. We sourced the inputs from Australian Bureau of Statistics and Macro Trends Global to align with Australian population dynamics. 

# 2. Model structure <a name="s_structure"></a> 

Model is structured as the following: 

1. [Run Scenarios](#s_run_scenarios)
2. [Model](#s_model)
    1. [Disease](#s_disease)
    2. [Population](#s_population)
    3. [Observers](#s_observers)
3. [Utils](#utils)
4. [Code guide](#s_code_guide)

The scripts in [Run Scenarios](#s_run_scenarios) folder takes a model from [Model](#s_model) folder and runs single or batch runs (with helper functions taken from [utils](#utils) folder). The model is structured in three subfolders: 

* [disease](#s_disease) folder stores disease model and disease simulation classes,
* [population](#s_population) folder stores population classes,  
* [observers](#s_observers) folder stores scripts to collect summary statistics.  


### 2.1. Run Scenarios <a name="s_run_scenarios"></a>

Scripts  `run_....py` takes `new_go_single()` function from `at_risk_varying_disease_model.py` script. `new_go_single()` function runs `go_single()` function from corresponding `src/model/disease/at_risk_varying_run.py` script and it runs a single simulation from a single parameter set. The docstring of `go_single()` is provided below.

In [7]:
from model.disease.run import go_single

func_name = go_single.__name__
func_docstring = inspect.getdoc(go_single)

print(f"def {func_name}():")
print(f"Docstring: {func_docstring}")

def go_single():
Docstring: Run a single simulation (or load if previously run).

It takes a parameter dictionary, disease class, contact matrix, a random
seed, a simulation type and then runs a single simulation.

:param p: The simulation parameters
:type p: dict
:param disease_type: The disease model to simulate.
:type disease_type: :class:`DiseaseBase`
:param cmatrix: The population contact matrix.
:type cmatrix: :class:`ContactMatrix`
:param cur_seed: Random seed for current experiment.
:type cur_seed: int
:param sim_type: Simulation class to use (current options are SimEpi or SimBDI).
:type sim_type: :class:`SimEpi`
:param verbose: Flag to indicate whether to write output to terminal.
:type verbose: bool


### 2.2. Model <a name="s_model"></a>

The model is structured in three subfolders: 

* [disease](#s_disease) folder stores disease model and disease simulation classes,
* [population](#s_population) folder stores population classes,  
* [observers](#s_observers) folder stores scripts to collect summary statistics. 


`go_single()` function in `at_risk_varying_run.py` in [disease](#s_disease) folder takes the input of either `DisSimulation` (from [disease_simulation.py](../model/disease/disease_simulation.py)) or `AtRiskDisSimulation` (from [at_risk_varying_disease_simulation.py](../model/disease/at_risk_varying_disease_simulation.py)) classes. Both `DisSimulation` or `AtRiskDisSimulation` classes simulate the demographical dynamics and disease transmsision (based on given disease class) in the population. The `AtRiskDisSimulation` class is a child class of  `DisSimulation`. The main difference is that `AtRiskDisSimulation` class assigns at-risk flags to individuals. 

### 2.2.1. Disease <a name="s_disease"></a>

#TODO

In [8]:
from model.disease.disease_simulation import DisSimulation
from model.disease.at_risk_varying_disease_simulation import AtRiskDisSimulation

class_name = DisSimulation.__name__
class_docstring = inspect.getdoc(DisSimulation)

print(f"Class Name: {class_name}")
print(f"Docstring: {class_docstring}")

Class Name: DisSimulation
Docstring: Simulation class that simulates disease and demographical dynamics 
at every step.

The demographical updates include births, deaths, aging, immigration.
It makes the demographical and disease related updates in 
DisPopulation class object.

In _main_loop(), it updated demography (if True) and disease transmission
and save the population (if True) when the simulation ends. 

Currently population is saved in three different csv files due to the 
nested struct columns of list of strains (_strain_list.csv) and 
corresponding time until clearance (_endList.csv) and the rest of the 
columns. This saving section can be later updated by using write_parquet
function of polars library which can write nested columns into the same
file. The only downside would be that the saved parquet files cannot be 
opened by excel for an easy review of the dataset.


:param p: dictionary of simulation parameters.
:type p: dict
:param create_pop: If `True` (default), create

In [4]:
class_name = AtRiskDisSimulation.__name__
class_docstring = inspect.getdoc(AtRiskDisSimulation)

print(f"Class Name: {class_name}")
print(f"Docstring: {class_docstring}")

Class Name: AtRiskDisSimulation
Docstring: The same as the parent DisSimulation class. The only difference
is that it takes AtRiskDisPopulation as an input population class rather 
than DisPopulation population class which includes atrisk column for 
individuals.

:param p: dictionary of simulation parameters.
:type p: dict
:param create_pop: If `True` (default), create a random population; 
    otherwise, this will need to be done later.
:type create_pop: bool



`DiseaseModel` is the other main input of `go_single()` function.

`DiseaseModel` class takes observers from [observers](../model/observers) folder to collect summary statistics. In order to add remove observers the changes musy be made to `DiseaseModel` class in `...disease_model.py` script. In order to create more observers or edit existing observers, changes must be made in the [observers](../model/observers) folder.

Parameter set are taken from `..._params.py` from the same folder and can be further edited in `run_....py` script.

In [5]:
from model.disease.at_risk_varying_disease import AtRiskDisease

class_name = AtRiskDisease.__name__
class_docstring = inspect.getdoc(AtRiskDisease)

print(f"Class Name: {class_name}")
print(f"Docstring: {class_docstring}")


Class Name: AtRiskDisease
Docstring: Disease class that updates disease state of a population.

It simulates a multi-strain pathogen transmission in a population with age
structure.

:param:
    p: parameter dictionary
    cmatrix: Contact Matrix
    rng: RandomNumber Generator
    fname: FileName to save the collected statistics.
    mode: "w" is used as the base mode which writes parameters to the 
    collected statistics hd5 file.


### 2.2.2. Population <a name="s_population"></a>

#TODO

The model includes births, ageing, age-specific mortality, and age-specific migration to capture the age distribution of a population over time. It takes inputs of initial population size, initial age distribution, age-specific death rates, annual net birth rates, annual migration rates, and age distribution in the emigrated population to simulate demographic dynamics. We sourced the inputs from Australian Bureau of Statistics and Macro Trends Global to align with Australian population dynamics. 

[class AtRiskDisPopulation(DisPopulation)](../model/population/at_risk_varying_disease_population.py) is used as a base population class which adds atrisk column to the population table usin`add_at_risk_column()` from [disease_utils.py](../model/disease/disease_utils.py). The base population class is located in [population.py](../model/population/population.py) which does not include any disease or vaccine related columns. 

`class AtRiskDisPopulation(DisPopulation)` and `class DisPopulation(Population)` can upload population files stored in [saved_checkpoints](../../../data/saved_checkpoints) Folder.



In [6]:
from model.population.at_risk_varying_disease_population import AtRiskDisPopulation

class_name = AtRiskDisPopulation.__name__
class_docstring = inspect.getdoc(AtRiskDisPopulation)

print(f"Class Name: {class_name}")
print(f"Docstring: {class_docstring}")


Class Name: AtRiskDisPopulation
Docstring: The population class for a population containing pl.dataFrame where
each row represent an individual.

AtRiskDisPopulation is a child class of DisPopulation class. 
It extend the population structure and includes an at_risk column to
assign at_risk flag to individuals as 0 (not-at-risk), 1 (Tier 1 at-risk),
and 2 (Tier 2 at-risk).

:class:`.population.Population` adds disease and vaccination rows 
to individuals pl.dataFrame.

:param disease: takes disease class as an input


### 2.2.3. Observers <a name="s_observers"></a>

TODO


# 3. Installing the module <a name="installation"></a>
The module installation is described in [README.md](../../../README.md) file. The module must be install through terminal:
```shell
python3 -m venv venv
source venv/bin/activate
# set working directory as code repository then
pip install -e .
```

# 4. How to run simulations <a name="run_simulation"></a>

Given a parameter set, `run_....py` documents in `run_...` folders, takes the code in [model](../model) folder and simulates the transmission. Folder [run_scenarios](../run_scenarios) includes scripts that simulates the transmission in non-Indigenous population of Australia where the population is segregated as _not-at-risk_, _Tier 1 at-risk_ and _Tier 2 at-risk_ individuals. 

#### Running single simulation
In order to run a single simluation, make sure that `run_....py` script has the following lines at the end:

```python
#run a single simulation among combinations
new_go_single(job_inputs[0][0])
#OR run multiple simulations
#results = parq.run(new_go_single, job_inputs, n_proc=32, results=False)
```
#### Running batch simulations
In order to run a batch of simulations using multiprocessing, given for instance 32 cores, make sure that `run_....py` script has the following lines at the end:

```python
#run a single simulation among combinations
#new_go_single(job_inputs[0][0])
#run multiple simulations
results = parq.run(new_go_single, job_inputs, n_proc=32, results=False)
```