Skip to content

i2bc/b2forensics

Repository files navigation

b2forensics

Contacts

Installation

Softwares/Scripts

  • Kraken2
# clone or download files
git clone https://github.com/DerrickWood/kraken2.git
# launch install script 
./install_kraken2.sh $KRAKEN2_DIR

More details in the manual of Kraken2.

  • Conda
# get miniconda 
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh;
./Miniconda3-latest-Linux-x86_64.sh
  • b2forensics environment and scripts
# clone or download files
git clone https://github.com/i2bc/b2forensics.git

Database/Data

The first part of the pipeline uses a Kraken2 custom or standard database, a file with tRNA sequences and subunit ribosomal RNA sequences, and files of reference genomes of the species of interest.

  • Kraken2 database
# standard database
kraken2-build --standard --db $DBNAME

As mentioned in the manual of Kraken2, it is possible to create a custom database, view this manual or go to How to use section.

  • BLAST database The pipeline needs a local BLAST database. Do not forget to indicate the BLAST database PATH in the config file. As explained on the "Get NCBI BLAST databases" page, it is possible to download a preformatted NCBI BLAST database. Use the script "update_blastdb.pl" from a blast+ package

  • tRNA sequences/subunit ribosomal sequences tRNA sequences are from tRNAdb, subunit ribosomal RNA sequences are from silva database. Concatenate these files and put it in tRNA_sequences directory.

  • reference genomes Download genomes fasta files of the species of interest and put in reference_genomes directory.

  • taxonomic IDs Make a file containing a list, one for each species of interest, with all taxonomic IDs you would like to include (one per line), and put it in taxonomy_files directory.

How to use

In data directory, you have a dataset to test the pipeline. With this pipeline, we will search for sequences that could be assigned to a taxid (Bacillus anthracis and sub taxid) from the list in the file taxonomy_files/taxonomy_tree_anthracis.txt.

To create Kraken2 database, download fastas for assemblies "Complete Genome" for bacteria, archaea, fungi, protozoa, virus. We can use scripts from Mick Watson

# clone the git repo
git clone https://github.com/mw55309/Kraken_db_install_scripts.git

As explained in opiniomics post and with adjustment for kraken2

# run for each branch of life you wish to download
perl download_bacteria.pl
perl download_archaea.pl
perl download_fungi.pl
perl download_protozoa.pl
perl download_viral.pl
# build a new database 
# download taxonomy
kraken2-build --download-taxonomy --db kraken2_db
# for each branch, add all fna in the directory to the database
for dir in fungi protozoa archaea viral bacteria; do
        for fna in `ls $dir/*.fna`; do
                kraken2-build --add-to-library $fna --db kraken2_db
        done
done
# build the database
kraken2-build --build --db kraken2_db

Results/output files

Structure of the output directory

├── b2forensics_results
    ├── alignment_fastq
    |	├── {sample}_{strain}_R1.fq.gz 	
    |	├── {sample}_{strain}_R2.fq.gz
    ├── alignment_reads_id 
    |	├── {sample}_alignment_paired_reads_id_{strain}.txt
    ├── blast_reads_id
    |	├── {sample}_{strain}_blast_output_uniq.txt
    ├── kraken_fasta
    |	├── {sample}_{strain}_R1.fa
    |	├── {sample}_{strain}_R2.fa
    ├── kraken_fastq
    |	├── {sample}_{strain}_R1.fq.gz 	
    |	├── {sample}_{strain}_R2.fq.gz
    ├── kraken_reads_id
    |	├── {sample}_kraken_paired_reads_id_{strain}.txt
    ├── kraken_results
    |	├── {sample}_cdb_paired.txt 
    ├── megablast_results
    |	├── {sample}_{strain}_blast_output_R1.txt
    |	├── {sample}_{strain}_blast_output_R2.txt
    |	├── {sample}_{strain}_blast_output_R1_filtered.txt
    |	├── {sample}_{strain}_blast_output_R2_filtered.txt
    ├── trDNA_depleted
    	├── blast_alignment_{strain}
    		├── {sample}_aln_paired_trDNA_depleted_{strain}.sorted.bam

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages