Analysis notebooks and scripts for the first paper about vampire
This repository makes the plots for the paper Deep generative models for T cell receptor protein sequences by Kristian Davidsen, Branden J Olson, William S DeWitt III, Jean Feng, Elias Harkins, Philip Bradley and Frederick A Matsen IV.
To make the figures in the paper, download results files from https://zenodo.org/record/2619576#.XKElTrfYphE and place in an input
directory in the root of this repository. Make an output
directory as well.
Then run these notebooks in the vampire
conda environment built as described in the main vampire repository.
You will also need to install jupyter as follows:
conda install jupyter
conda install -c r r-irkernel
as well as install the R packages cowplot
, latex2exp
, and reshape2
.
The above instructions only concern making plots from output files. If you want to reproduce the analysis, you will want to modify the main vampire repository to work with your cluster scheduler and then follow the instructions below.
To build the results for plotting,
- download the relevant data from immuneACCESS
- run
python util.py split-repertoires
from the main repository (you can see an example call in_output_deneuter-2019-02-07/deneuter-2019-02-07.json
) - Run
scons --data=/path/to/the/resulting/json/file.json
This repo also includes a script prep-cohort-frequency.sh
that prepares files for the cohort frequency analysis.
If you want to reproduce this analysis,
- download the data from Adaptive
- preprocess it using
preprocess_adaptive.py
script in the vampire repository - run the
prep-cohort-frequency.sh
script (editing paths) - run the
pipe_freq
pipeline withscons --pipe=pipe_freq
(editing the path in the SConstruct file)