GitHub

Taxa4Meta pipeline

Description

Perform human gut microbiome meta-analysis, i.e. 16S rRNA gene amplicon data generated from different sequencing strategies. The core concept bases on the accurate taxamonic binning instead of OTU interpretation during gut meta-analysis.

Dependencies for this pipeline

BLCA and NCBI 16S RefSeq database for species annotation of amplicons
Samtools for fasta manipulation
biom-format and h5py for OTU matrix conversion
DECIPHIER (R package): IdTaxa and its pre-built RDP databse (v16; curated by IdTaxa) for taxonomic annotation down to genus level
VSEARCH: sequence quality control and sequence clustering

Usage

Do not run Taxa4Meta for combined datasets generated from different 16S variable regions, it has to be used for each dataset first and then merge all collpased species (L7) profiles for downstream meta-analysis.

Taxa4Meta.sh --path /path-to-project-dir/ --threads [integer] --fastq_truncee [integer] --minReadLen [integer] --maxReadLen [integer] --fastq_stripleft [integer] --fastq_stripright [integer] --BLCA-GenusConf [integer] --BLCA-SpeciesConf [integer] --ReverseComplement [string] --PairedEnd [string] --ReferenceMode [string]

(Note: all positional parameters must be provided in the same order as indicated above)

Options

--path: Absolute path to the project directory. Arrange data: /ProjectDir/RawReadProcess/SampleDir/RawSequenceFiles

--threads: Number of threads to be specified. (must be integer)

--fastq_truncee: Truncate sequences from left until the total expected error is not higher than the specified value; usually 1 for Illumina short reads; could be 2 for 454 longer reads. (must be integer)

--minReadLen: During quality filtering (after paired-end merging), discard sequenced reads if they are shorter than the specified length. (must be integer)

--maxReadLen: During quality filtering (after paired-end merging), truncate sequence reads to the specified length, and keep sequence reads shorter than the length specified. (must be integer)

--fastq_stripleft: The length of any other technical sequence plus primer sequence to re removed from left. (must be integer)

--fastq_stripright: The length of any other technical sequence plus primer sequence to be removed from right. (must be integer)

--BLCA-GenusConf: During BLCA taxonomic assignment, confidence cut-off for sequence annotation at genus level. (must be integer)

--BLCA-SpeciesConf: During BLCA taxonomic assignment, confidence cut-off for sequence annotation at species level. (must be integer)

--ReverseComplement: Perform reverse complement of filtered reads before clustering (reduce run time); must be TRUE or FALSE.

--PairedEnd: Switch to the process mode for paired-end reads, otherwise remain the process for single-end reads; must be TRUE or FALSE.

--ReferenceMode: Perform chimeras removal using reference-based mode, otherwise perform chimeras removal using both de novo mode and reference-based mode; must be TRUE or FALSE.

Notes

to run your own analysis, you could manipulate lines 89-91 of Taxa4Meta.sh to fit the format of file names of fastq files and sample names
For some cases, the lineage between genus and species could be different due to different taxonomic confidence set-up for these two ranks. To get consistent lineage for species/genus annotated by BLCA for downstream analysis, just replace the lineage for species/genus in the final taxonomy with NCBI taxonomic lineage (16SMicrobial.ACC.taxonomy_modification_records.xlsx under 'scripts' directory; text mapping based procedure) if you want to interpret results at collapsed species (L7) level

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
databases		databases
demo		demo
scripts		scripts
LICENSE		LICENSE
README.md		README.md
Taxa4Meta.sh		Taxa4Meta.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

databases

databases

demo

demo

scripts

scripts

LICENSE

LICENSE

README.md

README.md

Taxa4Meta.sh

Taxa4Meta.sh

Repository files navigation

Taxa4Meta pipeline

Description

Dependencies for this pipeline

Usage

Options

Suggested confidence and read length for key 16S variable regions

Notes

About

Releases 1

Packages

Languages

License

qinglong89/Taxa4Meta

Folders and files

Latest commit

History

Repository files navigation

Taxa4Meta pipeline

Description

Dependencies for this pipeline

Usage

Options

Suggested confidence and read length for key 16S variable regions

Notes

About

Resources

License

Stars

Watchers

Forks

Languages