If your data include UMI、cell barcode and exogenous virus barcode
you can use Barlin to extract all of this tags and calculate the intergroup similarity.
Barlin can only work on typical Linux systems.
git clone https://github.com/mana-W/virus_barcode.git
starcode : https://github.com/gui11aume/starcode
umitools : https://github.com/CGATOxford/UMI-tools
R
pcks <- c("stringr","stringdist","ggplot2","jaccard","reshape2","tidyr","pheatmap","parallel","ggalluvial")
install.packages(pcks)
- Fastq file (with virus barcode):
R1.fastq.gz
R2.fastq.gz - Fasta file of barcode template sequence:
virus.fa - Cells annotation (tsv):
celltype.tsv
Contents in column 'Cluster' have to be like: group_annotation
Step1: Extract cell barcode and UMIs, prepare input file for next step.
sh CB_UMI.sh path/R1.fastq.gz path/R2.fastq.gz
Step2: Recover virus barcodes of cells and relationship between each pair of clusters.
Rscript find_virusBC.R UMI_CB_umitools/CB_UMI.tsv virus.fa celltype.tsv 0.5
Output of this step in directory res.
The most important file is res/clone_final.tsv, include imformation of barcodes.
Step3 (optional): Calculate cells' relationship span multiple groups and results visualization.
Rscript similarity.R 0.5 group1/res/clone_final.tsv group2/res/clone_final.tsv 0.6
Output of this step in directory spanres.
After this step you can also create a sanky plot by:
Rscript alluvial_plot.R group1_celltype.tsv group2_celltype.tsv spanres/all_jac.csv spanres/all_pvalue.csv