GitHub - huangnengCSU/NeuralPolish: NeuralPolish: a novel Nanopore polishing method based on alignment matrix construction and orthogonal Bi-GRU Networks.

NeuralPolish

NeuralPolish: a novel Nanopore polishing method based on alignment matrix construction and orthogonal Bi-GRU Networks.

Overview

Here, we developed a novel polishing method, named NeuralPolish, based on alignment matrix construction and orthogonal Bi-GRU networks. In this method, we designed an alignment feature matrix for representing read-to-assembly alignment. Each row of the matrix represents a read, and each column represents the aligned bases at each position of the contig. In the network architecture, a bi-directional GRU network is used to extract the sequence information inside each read by processing the alignment matrix row by row. After that, the feature matrix is processed by another bi-directional GRU network column by column to calculate the probability distribution. Finally, a CTC decoder generates a polished sequence with a greedy algorithm.

Installation

Using this method requires the user to install several tools:

dependencies:

pip install pyyaml pysam python-Levenshtein numpy biopython tensorboardX

if the machine has GPUs, you can install pytorch-gpu >=1.4.0 environment:

conda install pytorch=1.4.0

if the machine only has Cpus, install pytorch-cpu >=1.4.0 environment:

conda install pytorch-cpu=1.4.0

Install BlockPolish from the GitHub repository:

git clone https://github.com/huangnengCSU/NeuralPolish.git
cd neuralpolishextract/ && cmake . && make && cd ..

Useage

bash polish.sh -h
usage: ./polish.sh -i <draft_assembly> -r <basecalled_reads> -o <output_dir> [options ...]
options:
  -i STR   input file in FASTA format, containing draft assembly which will be polished
  -r STR   input file in FASTA/FASTQ format, containing basecalled raw reads used for polishing
  -o STR   output directory of files
  -q INT   skip alignments with mapQ smaller than INT, default: 0
  -Q INT   skip bases with baseQ/BAQ smaller than INT, default: 13
  -t INT   number of threads, default: 16
  -g INT   index of gpu device, default: None
  -c INT   coverage of reads used for polishing, default: 40
  -f STR   neural network config file, default: ./brnn/config/wtdbg2+racon.3x3.yaml
  -m STR   neural network pre-trained file, default: ./brnn/ctc.ecoli_yeast_chr21.wtdbg2+racon.layer3x3.model/model.chkpt

Examples

Here, we use NeuralPolish to polish the draft assembly of E. coli assembled by Flye. The three essential parameters are draft assembly file (-i), basecalled raw reads (-r) and output file directory (-o). The parameter of gpu device index (-g) specifies which GPU device will be used to perform the computation of neural network prediction.

bash polish.sh -i ~/data/ecoli/ecoli_flye/assembly.fasta -r ~/data/ecoli/call.fastq -o polished_ecoli -g 0

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
brnn		brnn
neuralpolishextract		neuralpolishextract
LICENSE		LICENSE
README.md		README.md
polish.sh		polish.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeuralPolish

Overview

Installation

Useage

Examples

License

About

Releases

Packages

Languages

License

huangnengCSU/NeuralPolish

Folders and files

Latest commit

History

Repository files navigation

NeuralPolish

Overview

Installation

Useage

Examples

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages