GitHub - lanfreem/DoTA-seq-Paper: Scripts used in DoTA-seq methods paper

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
PrimerPicker		PrimerPicker
unknown-community		unknown-community
01.parse_OTU_clustering_result.py		01.parse_OTU_clustering_result.py
02.parse_blastn_result.py		02.parse_blastn_result.py
03.summarize_hits_from_each_barcode.py		03.summarize_hits_from_each_barcode.py
Analysis-Synthetic-community.ipynb		Analysis-Synthetic-community.ipynb
CPS-analysis.R		CPS-analysis.R
CPS-network-analysis.ipynb		CPS-network-analysis.ipynb
R4-parser.ipynb		R4-parser.ipynb
Readme		Readme
SAM-parser.ipynb		SAM-parser.ipynb
Unknown-sample-analysis.ipynb		Unknown-sample-analysis.ipynb
bowtie2db.zip		bowtie2db.zip
dropletPCRcountingmacro.ijm		dropletPCRcountingmacro.ijm

Repository files navigation

Steps in DoTA-seq Analysis workflow

1. Demultiplexing raw sequence files
DoTA-seq uses the Illumina I7 index read to read out the cell barcodes of the library. Thus, one needs to obtain the raw I7 read file from the sequencer in addition to the raw read 1 or read 2 files. Use R4-parser.ipynb to quality filter the barcodes and associate barcodes with their amplicon reads.

2. Mapping reads to reference
Using the output of R4-parser, we map the resulting barcode-associated reads to a reference database of expected amplicons. This database typically consists of the targeted genes (and different orientations/sequence permutations of them) that you expect to see, as well as potential taxonomic marker genes that you expect to see. For small databases, we use Bowtie2 to perform the mapping, obtaining a SAM formatted output. For large highly repetitive databases like the SILVA16S database, we must use a different method (OTU-clustering + Mapping) described later.

3. Filtering of mapped reads
Using the outputs of Bowtie2 mapping to the database of expected amplicons, we group the reads by barcode, perform quality control and filtering on the grouped reads, and generate a summary of the output. The steps are detailed in the SAM-parser.ipynb Jupyter notebook.

4. Analysis of filtered grouped read data
Analysis of the resulting data from SAM-parser depends on the application at hand. For analysis involving defined multi-species communities, the workflow is shown in Analysis-Synthetic-community.ipynb.

For analysis of communities of unknown composition, the classification of taxonomic markers is performed separately from the mapping of target amplicons by Bowtie2 as follows:

    # Create conda env for the analysis (uses python 2.7 and qiime)
    mamba create -p /slowdata/yml_environments/qiime python=2.7 qiime matplotlib mock nose seqtk -c bioconda 

    # Activate the conda env
    conda activate /slowdata/yml_environments/qiime

    # Make blastDB of the representative 16S database
    # Make blastn db
    makeblastdb -in /storage1/databases/QIIME2/qiime/SILVA_132_QIIME_release/rep_set/rep_set_16S_only/99/silva_132_99_16S.fna -out silva_132_97_16S -dbtype 'nucl'

    # Within 'qiime' conda environment 
    # Convert fastq to fasta
    seqtk seq -q20 -a $FILE.fastq > $FILE.fasta

    # Chimera check
    usearch -uchime2_ref reads/$FILE.fasta -db /storage1/databases/QIIME2/qiime/SILVA_132_QIIME_release/rep_set/rep_set_16S_only/97/silva_132_97_16S.fna -strand plus -uchimeout reads/uchimeout.txt -notmatched reads/$FILE.nochimera.fasta -mode sensitive

    # Within 'mmseqs2_v13.45111' conda environment
    # Run mmseqs to make OTUs
    mkdir $FILE;mmseqs easy-linclust reads/$FILE.nochimera.fasta 37-2-R4A/clusterRes tmp --min-seq-id 0.97 -c 0.95 --cov-mode 1 --threads 10;rm -r tmp

    #Run these python scripts to parse the results (NOTE: Edit the scripts to change the filenames and paths according to your own file names)
    # Parse OTU clustering result
    python 01.parse_OTU_clustering_result.py

    # Run blastn
    blastn -query $FILE/clusterRes_rep_seq.fasta -db silva_132_97_16S.wo_Mycobacteria -out 37-2-R4A/$FILE.nochimera_rep.blastn_result.txt -num_threads 10 -evalue 0.001 -perc_identity 90 -outfmt 6 -max_target_seqs 1

    # Parse blastn result
    python 02.parse_blastn_result.py

    # Summarize hits from each barcode
    python 03.summarize_hits_from_each_barcode.py

Following the above workflow, a table is created for the taxonomic classification of marker genes in each barcode group. We combine that we the results of other target gene mapping using Bowtie2 and then perform quality controls of the combined results and analyze the resulting data using Unknown-sample-analysis.ipynb Jupyter notebook.

For analysis of single species phase variation, the workflow is as follows: The output of SAMparser is analyzed using an Rscript (CPS-analysis.Rscript), which filters out barcodes that do not have enough reads for each of the loci and cells with conflicting reads for any loci (both ON and OFF). It outputs an excel file called X_summary.xlsx which contains the details of the analysis, including how many barcodes passed the quality check. The results of this file contains a table of each CPS promoter variant and the number of cells sequenced, which is extracted as a separate table with a .nodes extension to be used to plot the network representation of the populations using the CPS-network-analysis.ipynb Jupyter notebook.

About

Scripts used in DoTA-seq methods paper

Readme

Activity

0 stars

1 watching

2 forks

Report repository

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook 98.6%
Other 1.4%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PrimerPicker

PrimerPicker

unknown-community

unknown-community

01.parse_OTU_clustering_result.py

01.parse_OTU_clustering_result.py

02.parse_blastn_result.py

02.parse_blastn_result.py

03.summarize_hits_from_each_barcode.py

03.summarize_hits_from_each_barcode.py

Analysis-Synthetic-community.ipynb

Analysis-Synthetic-community.ipynb

CPS-analysis.R

CPS-analysis.R

CPS-network-analysis.ipynb

CPS-network-analysis.ipynb

R4-parser.ipynb

R4-parser.ipynb

Readme

Readme

SAM-parser.ipynb

SAM-parser.ipynb

Unknown-sample-analysis.ipynb

Unknown-sample-analysis.ipynb

bowtie2db.zip

bowtie2db.zip

dropletPCRcountingmacro.ijm

dropletPCRcountingmacro.ijm

Repository files navigation

About

Releases

Packages

Languages

lanfreem/DoTA-seq-Paper

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages