Skip to content

Create statistic summary of an Oxford Nanopore read dataset

License

Notifications You must be signed in to change notification settings

mroosmalen/nanostat

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NanoStat

Calculate various statistics from a long read sequencing dataset in fastq, bam or albacore sequencing summary format.

Twitter URL install with conda Build Status

INSTALLATION

pip install nanostat
or
conda install -c bioconda nanostat

USAGE

NanoStat [-h] [-v] [-o OUTDIR] [-p PREFIX] [-n NAME] [-t N]
                [--barcoded] [--readtype {1D,2D,1D2}]
                (--fastq file [file ...] | --fasta file [file ...] | --summary file [file ...] | --bam file [file ...])

Calculate statistics of long read sequencing dataset.

General options:
  -h, --help            show the help and exit
  -v, --version         Print version and exit.
  -o, --outdir OUTDIR   Specify directory in which output has to be created.
  -p, --prefix PREFIX   Specify an optional prefix to be used for the output file.
  -n, --name NAME       Specify a filename/path for the output, stdout is the default.
  -t, --threads N       Set the allowed number of threads to be used by the script.

Input options.:
  --barcoded            Use if you want to split the summary file by barcode
  --readtype {1D,2D,1D2}
                        Which read type to extract information about from summary. Options are 1D, 2D,
                        1D2

Input data sources, one of these is required.:
  --fastq file [file ...]
                        Data is in one or more (compressed) fastq file(s).
  --fasta file [file ...]
                        Data is in one or more (compressed) fasta file(s).
  --summary file [file ...]
                        Data is in one or more (compressed) summary file(s)generated by albacore.
  --bam file [file ...]
                        Data is in one or more sorted bam file(s).

EXAMPLES:
  NanoStat --fastq reads.fastq.gz --outdir statreports
  NanoStat --summary sequencing_summary1.txt sequencing_summary2.txtsequencing_summary3.txt --readtype 1D2
  NanoStat --bam alignment.bam alignment2.bam

EXAMPLES

NanoStat --fastq reads.fastq.gz --outdir statreports
NanoStat --summary sequencing_summary1.txt sequencing_summary2.txt sequencing_summary3.txt --readtype 1D2
NanoStat --bam alignment.bam alignment2.bam

Example output

General summary:	 
Number of reads:	3995
Total bases:	11418359
Median read length:	1221.0
Mean read length:	2858.2
Read length N50:	8676
Active channels:	933
Mean read quality:	10.2
Median read quality:	10.6
Top 5 longest reads and their mean basecall quality score
1:	36928 (10.8, [a9dbd2b5-718c-4d0c-afa8-a12a54a5a12a])
2:	32830 (10.2, [b87fc717-1cf8-4526-9f96-3042fda5b769])
3:	30474 (12.4, [ea3e43d8-6cbf-4687-95bd-66e6123512d4])
4:	27531 (12.5, [74c0e08c-eb94-4825-b93b-21d63e05cf14])
5:	26535 (10.4, [8e6ed505-8477-4462-9f0a-3a72783cbf60])
Top 5 highest mean basecall quality scores and their read lengths
1:	14.8 (1040, [acf6f90b-ea22-4960-8049-6e6e694a3f9a])
2:	14.7 (9603, [ec796da1-5c4a-4350-974b-6dabb8deb546])
3:	14.6 (680, [792c485a-81cb-4ef7-8f23-01f10f9c7c23])
4:	14.5 (2664, [d8092ffb-9919-42fb-ad41-34b1658f1bd5])
5:	14.5 (909, [d55d3bf6-0729-4b46-82cd-0cef00bcf849])
Number and percentage of reads above quality cutoffs
>Q5:	3559 (89.1%)
>Q7:	3429 (85.8%)
>Q10:	2705 (67.7%)
>Q12:	1072 (26.8%)
>Q15:	0 (0.0%)

I welcome all suggestions, bug reports, feature requests and contributions. Please leave an issue or open a pull request. I will usually respond within a day, or rarely within a few days.

About

Create statistic summary of an Oxford Nanopore read dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.5%
  • Shell 3.5%