Skip to content
Shai Pilosof edited this page May 3, 2019 · 16 revisions

Welcome!

This GitHub repository accompanies the publication: Pilosof S, He Q, Tiedje KE, Ruybal-Pesántez S, Day KP, Pascual M. Competition for hosts modulates vast antigenic diversity to generate persistent strain structure in Plasmodium falciparum. PLOS Biology, doi:

If you are here we assume you have read the paper and we therefore use the terminology we use in the paper.

Analysis

The analysis is perfectly reproducible, yet it is rather complex, with multiple steps that depend on each other. It is performed mainly using a High Performance Computing (HPC) grid because it is computationally intensive. We ran ours on University of Chicago's RCC Midway. There is no way to do all of it on a local machine. Some adjustments for the specific local machine and HPC system will be necessary.

In any file that you use, please make sure to change the working folders, as well as the folders that hold the result files. Note that it is also necessary to change folders in the function get_data in file functions.R

All the necessary files are in this repository (see Files in this repository).

Start by reading the General workflow pipeline. Then you can read about each of the phases of the analysis, which are performed in the following order:

Benchmark scenarios

Low, medium and high diversity, 3 scenarios each.

  1. Run ABM (50 runs per scenario per diversity regime)
  2. Select a threshold edge weights
  3. Obtain results for benchmark scenarios (using the selected cutoff values for each regime).
  4. Sensitivity analysis

Seasonality

High diversity, selection scenario only

  1. Run ABM (50 runs)
  2. Select a threshold edge weights
  3. Obtain results for benchmark scenarios
  4. Sensitivity analysis

Empirical data

In this analysis we use a bio-informatic pipeline to cluster sequences into alleles. Details in the paper.

Data

Simulated data

  • All the parameter files used in the ABM are in file ABM_parameter_files in FIGSHARE.
  • We have put all the data that underlies the figures that are published in the paper, including the SI, in a dedicated repository FIGSHARE. These data were produced using the workflow described above. Each zip file in the repository is named after the corresponding figure. The figures can be reproduced using these data with the code in file Results_PLOS_biol.R.

Empirical data

Empirical data was also placed in the repository under name empirical_data.zip. Follow the code in file empirical.R to analyze the data and produce Figure 3 in the main text.