This github repository contains the code used to run the analyses and generate the figures in the manuscript:
Sammut SJ et al. Predictability of B cell clonal persistence and immunosurveillance in breast cancer.
Raw BCR sequencing data have been uploaded to the European Genome-Phenome Archive (EGA00002343328). If access to raw sequencing data is required please submit a request via the EGA. Once approval from the Data Access Committee is secured processed data can also be provided through direct communication with corresponding authors. This includes data within the ../data/processed/bcr-data, ../data/processed/tcr-data, ../data/processed/MrDarcy and ../data/unprocessed folders.
The metastatic breast cancer cohort was previously described in https://doi.org/10.1016/j.celrep.2019.04.098 whilst the early breast cancer cohort was previously described in https://doi.org/10.1038/s41586-021-04278-5.
This repository has seven folders:
Directory | Description |
---|---|
../BCRNetworks |
Contains R scripts to generate BCR Network centrality analyses |
../data |
Contains the data required to generate the analyses described in the manuscript |
../metadata |
Contains sample metadata files |
../output |
Stores output generated by R scripts |
../python |
Contains python code required to screen antibody sequences against a database of antibody sequences known to bind to antigen |
../R |
Contains R scripts required to generate the analyses described in the manuscript |
../resources |
Contains resource files (such as gene lists) used within the analyses |
The code and data reside in an encrypted compressed file while the manuscript is undergoing peer review. The password to unencrypt this file is the concatenation of the case-sensitive first and second words in the introduction, without a space delimiter.
The scripts included in this repository will allow you to recreate the analyses described within our manuscript. We have used R version 4.1.2 and Python version 3.10.1.
To automatically load the directory and file structure, please specify the location of the root directory in a variable called dir.base
and source the loadData.R file, as shown below:
dir.base <- "~/BCR-Immunosurveillance/"
source (paste0(dir.base,"R/loadData.R"))
In the manuscript we describe a BCR Network centrality analysis pipeline (Figure 4). To run this pipeline:
1.Create and activate a conda container https://docs.conda.io/projects/miniconda/en/latest/ and install networkx and numpy:
conda create -n mrdarcy python=3.9
conda activate mrdarcy
conda install networkx
conda install numpy
2.Download cdhit from https://github.com/weizhongli/cdhit/releases and install.
3.The code to run the analysis can be found in ../BCRNetworks/R folder. Example scripts that generate networks for the multiple sclerosis and early breast cancer cohorts have been provided in the ../BCRNetworks/examples folder.