Skip to content
mutational antigenic profiling of Perth/2009 H3 HA against serum
Jupyter Notebook
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
SRA_upload
binder finalize structures May 23, 2019
data
results
.gitignore
LICENSE.txt added license as requested by _eLife_ Aug 20, 2019
README.md
analyze_map.ipynb
analyze_natseqs.ipynb
analyze_neut.ipynb vertical lines on neut curves at serum concentration used Jul 18, 2019
config.yaml
make_final_figs.ipynb
map_on_struct_template.ipynb
parameterize_map_on_struct.py
run.bash

README.md

Mapping of anti-flu serum against the Perth/2009 H3 HA

Mutational antigenic profiling of Perth/2009 H3 HA codon mutant libraries against ferret and human sera.

This is the computer code and raw data for the study Mapping person-to-person variation in viral mutations that escape polyclonal serum targeting influenza hemagglutinin, eLife, 2019.

Study led by Juhye Lee and Jesse Bloom.

Quick summary

Running the analysis

Automated steps

The main analysis is performed primarily by a series of Jupyter notebooks and Python scripts:

  1. analyze_map.ipynb: analyzes mutational antigenic profiling

  2. analyze_neut.ipynb: analyzes neutralization assays

  3. analyze_natseqs.ipynb: analyzes changes in amino-acid frequencies among natural sequences

  4. parameterize_map_on_struct.py: parameterizes the template Jupyter notebook map_on_struct_template.ipynb to show structures for each type of sera.

To run the three steps above, execute the bash script run.bash with:

./run.bash

On the Hutch cluster, you can also submit this script using slurm with:

sbatch -p largenode -c 16 --mem=100000 run.bash

Manual steps

The following steps to must be performed manually to finalize the paper figures:

  1. The automated steps above create Jupyter notebooks that map the immune selection onto the structure using dms_struct (which is a wrapper around nglview). These notebooks are in results/notebooks with names matching map_on_struct_*.ipynb. To open them interactively with mybinder, click here. You can also directly open each notebook as an interactive app in appmode by clicking on the links in the Quick summary section at the top of this README. To generate static protein structure images for the final figures, you also need to run each notebook locally and interactively cell-by-cell (giving time for each structure to render).

  2. The Jupyter notebook make_final_figs.ipynb generates the final figures for the paper, which are placed in .results/figures/final. You need to run this notebook to generate the figures.

Configuring the analysis

The configuration for the analysis is in a separate file, config.yaml. This file defines key variables for the analysis, and should be self-explanatory. The config.yaml file points to several files in the ./data/ subdirectory that specify essential data for the analysis:

  • data/serum_info.yaml: YAML file that gives information on all of the serum samples used for selections. For each serum there is an entry with the label used in the experiments, then:

    • name: a more informative name used when displaying results
    • description: description of the serum
    • group: group of samples to which serum belong
    • species: species from which serum is derived (if relevant)
    • vaccination: information of vaccination status (if relevant)
  • data/sample_list.csv: CSV file giving each sample that was deep sequenced. Columns are:

    • sample: sample label used in experiments
    • serum: serum used for selection
    • library: viral library, using simple 1, 2, 3 naming rather than the more confusing library codes used to label experiments
    • date: day when sequencing was done
    • serum_dilution: dilution of serum used; this includes the 1:4 dilution used during the RDE treatment of the serum. For antibodies, it is the concentration in ug/ml.
    • percent_infectivity: percent of viral library retaining infectivity
    • R1: glob pattern to R1 FASTQ files on Hutch server; the R2 file names are guessed from the R1 names. If config.yaml sets seq_data_source to R1 then there must be a valid R1 file glob for all samples; otherwise this column is ignored.
    • SRA_accession: the accession number on the Sequence Read Archive (SRA) for the sequencing for this sample. If config.yaml sets seq_data_source to SRA_accession then there must be a valid accession for all samples; otherwise this column is ignored.
  • data/Perth09_HA_reference.fa: FASTA file giving the sequence of the wildtype Perth/2009 HA used in the experiments.

  • data/H3renumbering_scheme.csv: A CSV file that maps sequential 1, 2, ... numbering of the Perth/2009 HA protein sequence (original column) to the standard H3 HA numbering scheme (new column).

  • data/H3_site_to_PDB_4o5n.csv: A CSV file that matches the H3 HA numbering to the site numbers and PDB chains in PDB structure 4o5n.

  • data/human_H3N2_HA_2007-2018.fasta.gz: A gzipped FASTA file that contains all human H3N2 influenza HA coding sequences collected between 2007 and 2018 as downloaded from the Influenza Virus Resource on June-2-2019.

  • data/neut_assays: Data from neutralization assays:

  • data/figure_config.yaml: YAML file giving specifications for fine-tuned figures showing logo plot zooms and neutralization curves.

Results

Results are placed in the ./results/ subdirectory. Many of the results files are not tracked in this GitHub repo since they are very large. However, the following results are tracked:

Other subdirectories

Other subdirectories in the repo are:

You can’t perform that action at this time.