Ed Baskerville
Frédéric Labbé
Var gene evolution model(s), implemented in Julia. Based on previous C++ implementations (varmodel and varmodel2 by Ed Baskerville and Qixin He. This code implements a model of malaria var gene evolution within an individual-based disease transmission model. Malaria strains are represented as unordered sets of var genes, which are in turn composed of abstract loci. A number of alleles can appear at each locus, and the allelic composition of a gene across loci governs immune dynamics in the host. Individual hosts are infected by strains, and infections can be transmitted between hosts. Each infection expresses a single var gene at a time, and the sequence of expressions is explicitly represented in the simulation. The simulation also includes immigration of new strains into the population, recombination during transmission and during an infection, and mutation. The simulation is modeled as a sequence of discrete events (state changes) that happen in continuous time. Model details are described inline in comments in the code; see Code Organization below to get oriented.
Before doing anything, run the following command line to install the packages required by this code:
./install-packages.jl
-
A single ad-hoc run of the model, directly in Julia and without an external parameters file, is convenient for testing. Instead of using the standard
run.jl
, which loads parameters from JSON, you generate parameters inside Julia and run the model code directly with them. To do a single ad-hoc run of the model, do the following:- Copy the script
examples/julia/run.jl
into a new directory, - Modify the relative paths to
preamble.jl
andmodel.jl
, - Modify the parameter values,
- Run the script:
julia run.jl
, or directly as a shell script,./run.jl
.
- Copy the script
-
To perform a run with an existing parameters file in JSON format, copy the parameters file into a new experiment directory, and use the script
varmodel3/run.jl
as described in the comment string. To see how to generate parameters from JSON, see the code embedded in the example for parameter sweeps (examples/sweep/generate-sweep.jl
).
To do a parameter sweep, copy the examples/sweep
directory, and modify/run as described in the comment string in generate-sweep.jl
. This script loops through parameter combinations, and replicates with different random seeds, and generates files necessary to perform runs on a local machine or on a SLURM cluster. It also divides runs into jobs suitable for execution on a single cluster node or local machine. The runs are specified as lines in the job's runs.txt
file, and the job is specified in a job.sbatch
file, which can be run directly as a shell script or submitted to a SLURM cluster. Each job uses the script varmodel3/runmany.jl
to run a single-node, multi-core queue of runs, with one run running on each core at any time. This script also generates a script submit_jobs.sh
, which submits every job to SLURM at once. Runs are divided into at most N_JOBS_MAX
jobs that make use of at most N_CORES_PER_JOB_MAX
for the cluster node's local queue. This allows you to work within limits set by your cluster administrator. If you have no limits, you should set N_JOBS_MAX
to a very large number, and set N_CORES_PER_JOB_MAX = 1
, so that the cluster can dynamically
balance runs across cluster nodes as the experiment runs. To modify configuration settings for SLURM jobs, edit the template string in the generate_jobs()
function.
In this model, hosts carry infections of different strains of the malaria parasite Plasmodium falciparum. Each parasite genome consists of a specific combination (i.e., repertoire) of n_genes_per_strain
var genes. Strain identity is defined by this repertoire independent of order. Although unlikely, the same var gene may occur multiple times in a strain. Each var gene itself is represented as a linear combination of n_loci
epitopes, i.e., parts of the molecule that act as antigens and are targeted by the immune system. At the outset, each locus i has one of n_alleles_per_locus_initial[i]
possible values, indexed from 0 to n_alleles_per_locus_initial[i] - 1
. Mutation events create new alleles, so the number of distinct alleles at each locus can increase over time. At any time, hosts may be infected multiple times by the same or different strains. The var genes in a repertoire are expressed sequentially and the infection ends when the whole repertoire is depleted. The order of expression is randomized distinctly for each infection. The duration of the active period of a var gene, and thus of the infection, is determined by the number of unseen epitopes. When a var gene is deactivated, the host adds the deactivated var gene epitopes to its immunity memory. Specific immunity toward a given epitope experiences a loss rate from host immunity memory, and re-exposure is therefore required to maintain it. The local population is open to immigration from the regional pool.
This code is a new implementation in Julia of the malaria var gene evolution model which is based on previous C++ implementations (i.e., varmodel and varmodel2). The main changes from the previous implementation are as follows:
- While the previous implementations of the stochastic agent-based model (ABM) were adapted from the next-reaction method which optimizes the Gillespie first-reaction method, this implementation uses a simpler Gillespie algorithm.
- Our model extension allows us to keep track of the neutral part of each migrant parasite genome assembled by sampling one of the two possible alleles at each of a defined number of neutral bi-allelic SNPs.
- While the extended model can generate homogeneous initial SNP allele frequencies by sampling the migrant alleles with an identical probability from the regional pool (i.e., 0.5), it can also generate distinct initial SNP allele frequencies by sampling the migrant alleles from the regional pool with distinct probabilities that sum up to one (e.g., 0.2 and 0.8) and are randomly picked from a defined range (e.g., [0.1-0.9]).
- Moreover, to generate the neutral part of a recombinant parasite and mimic meiotic recombination, which happens within the mosquito during the sexual reproduction stage of the parasite, a random allele is sampled for each bi-allelic SNP.
- Finally, to allow for linkage disequilibrium (LD) across the neutral part of the genome, neutral bi-allelic SNPs can be non-randomly associated and co-segregate as defined in a matrix of LD coefficients indicating the probability that pairs of linked SNPs will co-segregate during the meiotic recombination.
The parameter names should match the variables defined in src/parameters.jl
, and the values should match the appropriate type.
Name | Type | Description |
---|---|---|
biting_rate |
Array{Float64} |
Transmission rate for each day of the year |
coinfection_reduces_transmission |
Bool |
Whether or not transmissibility is reduced with coinfection |
distinct_initial_snp_allele_frequencies |
Bool |
Whether the initial allele frequencies of the SNPs are distinct |
ectopic_recombination_generates_new_alleles |
Bool |
Whether or not ectopic recombination generates new alleles |
ectopic_recombination_rate |
Float64 |
Ectopic recombination rate parameter |
gene_strain_count_period |
Int |
How often to output the number of circulating genes and strains |
host_sample_size |
Int |
Number of hosts to sample at each sampling period |
host_sampling_period |
Int |
How often to sample host output |
immigration_rate_fraction |
Float64 |
Immigration rate, as a fraction of the non-immigration biting rate |
immunity_level_max |
Int16 |
Maximum immunity level |
immunity_loss_rate |
Float64 |
Rate at which immunity is lost, per host, per gene |
initial_snp_allele_frequency |
Array{Float32} |
Range of the possible initial frequencies for one of the two SNP alleles |
max_host_lifetime |
Float32 |
Maximum host lifetime |
mean_host_lifetime |
Float32 |
Mean of exponential distribution used to draw host lifetime |
mean_n_mutations_per_epitope |
Float64 |
Mean number of mutations per epitope for similarity calculation |
migrants_match_local_prevalence |
Bool |
Whether the immigration rate needs to time the local infection rate |
migration_rate_update_period |
Int |
How often to update migration rate based on local prevalence |
mutation_rate |
Float64 |
Rate of mutation, per active infection |
n_alleles_per_locus_initial |
Int |
Initial number of alleles for each epitope locus |
n_genes_initial |
Int |
Number of genes in the initial gene pool |
n_genes_per_strain |
Int |
Number of genes in strain |
n_hosts |
Int |
Number of hosts |
n_infections_active_max |
Int |
Maximum number of simultaneous active infections |
n_infections_liver_max |
Int |
Maximum number of simultaneous infections in the liver stage |
n_initial_infections |
Int |
Number of initial infections |
n_loci |
Int |
Number of epitope loci in each gene |
n_snps_per_strain |
Int |
Number of biallelic neutral single nucleotide polymorphims (SNPs) in strain |
p_ectopic_recombination_is_conversion |
Float64 |
Probability that an ectopic recombination is a conversion |
rho_recombination_tolerance |
Float64 |
Recombination tolerance, rho, Drummond et al |
rng_seed |
Int |
Seed for random number generator |
sample_infection_duration_every |
Int |
Sample an infection duration every sample_infection_duration_every clearances |
snp_linkage_disequilibrium |
Bool |
Whether the SNPs (or some SNPs) are in linkage disequilibrium (LD) |
snp_pairwise_ld |
Array{Float32, 2} |
Pairwise linkage disequilibrium (LD) matrix |
summary_period |
Int |
How often to write summary output |
switching_rate |
Float64 |
Switching rate for genes the host is not immune to |
transmissibility |
Float64 |
Baseline transmissibility of infections |
t_liver_stage |
Float64 |
Duration of the liver stage |
t_burnin |
Int |
Burn-in time |
t_end |
Int |
Simulation end time |
t_year |
Int |
Number of time units in a year |
upper_bound_recomputation_period |
Int |
How often to recompute upper bounds for rejection sampling |
verification_period |
Int |
How often to verify consistency of simulation state |
whole_gene_immune |
Bool |
Whether a host gains immunity towards a gene if the host has seen all the alleles |
The output database is in SQLite3 format, which can be easily accessed from R using the RSQLite library, from Python using the built-in sqlite3 library, or from Matlab using the mksqlite package. We also recommend using VisiData or the graphical SQLite browser, especially while testing.
Name | Description |
---|---|
gene_strain_counts |
Number of circulating strains and var genes in all sampled hosts at different sampling times |
initial_snp_allele_frequencies |
Initial allele frequencies of each SNP locus |
meta |
Information related to the run (e.g., elapsed time) |
sampled_duration |
Information related to the infection durations in all sampled hosts (e.g., infection time) |
sampled_host |
Information related to the sampled hosts at different sampling times (e.g., birth and death times) |
sampled_immunity |
Immunity level of all sampled hosts at different sampling times |
sampled_infection_genes |
Information related to the var genes involved in the infections of the sampled hosts (e.g., allele ID) |
sampled_infection_snps |
Information related to the SNP loci involved in the infections of the sampled hosts (e.g., allele ID) |
sampled_infections |
Information related to the infections of the sampled hosts at different sampling times (e.g., infection ID) |
summary |
Summary at different sampling times (e.g., number of infections, bites, and infected hosts) |
Please cite this when using the model: Labbé, F., He, Q., Zhan, Q., Tiedje, K.E., Argyropoulos, D.C., Tan, M.H., Ghansah, A., Day, K.P., Pascual, M., 2023. Neutral vs. non-neutral genetic footprints of Plasmodium falciparum multiclonal infections. PLOS Computational Biology 19, e1010816. https://doi.org/10.1371/journal.pcbi.1010816