HybPiper version 2.3.0
-
Add option
--compress_sample_folderto commandhybpiper assemble. Tarball and compress the sample folder after assembly has completed i.e.<sample_name>.tar.gz.- This is useful when running HybPiper on HPC clusters with file number limits.
- If both an uncompressed and compressed folder exist for a sample, a warning is shown and HybPiper exits.
- All HybPiper subcommands (
stats,recovery_heatmap,retrieve_sequences,paralog_retriever,filter_by_length) work with either compressed or uncompressed sample files/folders, or a combination of both. - If a
<sample_name>.tar.gzfile already exists for a sample, it will be extracted and used for the current run ofhybpiper assemble, and the<sample_name>.tar.gzfile will be deleted.
-
When using BWA for read mapping, the command
samtools flagstatis now run during thehybpiper assemblestep, rather than duringhybpiper stats, and the results are written to a<sample_name>_bam_flagstat.tsv\<sample_name>_unpaired_bam_flagstat.tsvfile(s).- If the
<sample_name>_bam_flagstat.tsv\<sample_name>_unpaired_bam_flagstat.tsvfile(s) are not present in a sample directory (i.e. the sample was assembled with HybPiper version <2.3.0),samtools flagstatwill be run duringhybpiper stats. If the sample is a*.tar.gzfile, the*.bamfile(s) will first be extracted to disk to a temporary directory calledtemp_bam_files, within your current working directory. This temporary directory will be deleted aftersamtools flagstathas been run.
- If the
-
Add option
--not_protein_codingtohybpiper assemble. When this option is provided, sequences matching your target file references will be extracted from SPAdes contigs using BLASTn, rather than Exonerate. This should improve recovery when using a target file with non-protein-coding sequences. Note that this feature is new and might have bugs - please report any issues.- Only nucleotide
*.FNAsequences will be produced (i.e. no amino-acid sequences). - Intronerate will not be run; intron and supercontig sequences will not be produced.
- If BLASTx or DIAMOND is selected for read mapping (i.e. protein vs translated-nucleotide searches), a warning will be displayed and read mapping will switch to BWA.
- Only nucleotide
-
Add the following options to control BLASTn searches of SPAdes contigs when option
--not_protein_codingis used:--extract_contigs_blast_task. Task to use for blastn searches (blastn, blastn-short, megablast, dc-megablast). Default is blastn.--extract_contigs_blast_evalue. Expectation value (E) threshold for saving hits. Default is 10.--extract_contigs_blast_word_size. Word size for wordfinder algorithm (length of best perfect match).--extract_contigs_blast_gapopen. Cost to open a gap.--extract_contigs_blast_gapextend. Cost to extend a gap.--extract_contigs_blast_penalty. Penalty for a nucleotide mismatch.--extract_contigs_blast_reward. Reward for a nucleotide match.--extract_contigs_blast_perc_identity. Percent identity.--extract_contigs_blast_max_target_seqs. Maximum number of aligned sequences to keep (value of 5 or more is recommended). Default is 500.
-
The final step of the
hybpiper assemblepipeline has been renamed fromexonerate_contigstoextract_contigs(as either Exonerate or BLASTn can now be used). -
Reorganised grouping of help options when running
hybpiper assemble --helpto improve clarity. -
Changed option
--timeout_assembleforhybpiper assembleto--timeout_assemble_readsto match the step name. -
Changed option
--timeout_exonerate_contigsforhybpiper assembleto--timeout_extract_contigsto match the step name. -
Changed option
--exonerate_hit_sliding_window_sizeforhybpiper assembleto--trim_hit_sliding_window_size. This option now applies to either Exonerate hits (and is measured in amino-acids) or BLASTn (measured in nucleotides). Defaults are 5 amino-acids (Exonerate; changed from previous default of 3) or 15 nucleotides (BLASTn). -
Changed option
--exonerate_hit_sliding_window_threshforhybpiper assembleto--trim_hit_sliding_window_thresh. This option now applies to either Exonerate hits (and is measured via amino-acid similarity) or BLASTn (measured via nucleotide similarity). Defaults are 75 for amino-acids (Exonerate; changed from previous default of 55) or 65 for nucleotides (BLASTn). -
Fixed a bug in
fix_targetfile.py-MAFFTis now called viasubprocessrather thanBio.Align.Applications.MafftCommandlinewhen checking for best match translations (see issue#156). -
Added a more informative error message if running
hybpiper retrieve_sequencesorhybpiper paralog_retrieverfrom HybPiper version >=2.2.0 on sample folders from HybPiper version >2.2.0. This error occurs because the sample folders do not contain a<prefix>_chimera_check_performed.txtfile (see issue#155). -
When extracting coding sequences from SPAdes contigs using Exonerate, changed the initial Exonerate run to not use the option
--refine full(see Exonerate docs), unless the option--exonerate_refine_fullis provided tohybpiper assemble. Although the Exonerate option--refine fullshould improve output alignments, in some cases it can result in spurious alignment regions (e.g. an intron/non-coding region being included as an "exon" alignment) that can get incorporated in to the HybPiper output sequence.