- Maximum coverage (plot_max_coverage)
- Specified Genome Locations (plot_genome_location)
- Centered on motif of interest (plot_motif_centered)
All those available from single sample plotting plus:
- Maximum difference in mean signal level between two samples (plot_max_difference)
- Most significant statistical test between two samples (plot_most_significant)
Plot statistics distrbution around motif of interest across many genomic locations along with most significant example genomic regions with plot_motif_sith_stats.
- --fast5-basedirs: One or more directories containing FAST5 files that have been "re-squiggled" (nanoraw genome_resquiggle). This option is required for all genomic plotting commands.
- --fast5-basedirs2: One or more directories containing FAST5 files that have been "re-squiggled" (nanoraw genome_resquiggle). These will be group2. This option is required only for plot_max_difference, plot_most_significant and plot_motif_with_stats
- --pdf-filename: Filename to store plots from this command. (Default depends on the command)
- --num-regions: Number of difference regions to plot. Each region will be on another page of the output PDF and ordered by criterion (if applicable). This option is not valid for plot_genome_location.
- --num-bases: Number of genomic bases to include in a plot. Selection criterion will apply to the central base of a plotted region.
- --obs-per-base-filter: Filter reads for plotting baseed on threshold of percentiles (over all bases in a read) of the number of observations assigned to a base. Format thresholds as "percentile:thresh [pctl2:thresh2 ...]" E.g. reads with 99th pctl <200 obs and max <5k obs would be "99:200 100:5000". Default is no filter.
- --overplot-threshold: Number of reads to trigger alternative plot type instead of raw signal due to overplotting. Default depends on command.
- --overplot-type: Plot type for regions with higher coverage. Choices: Downsample (default), Boxplot , Quantile, Violin. Examples below.
- --corrected-group: FAST5 group to access/plot created by genome_resquiggle script. Default: RawGenomeCorrected_000. The default is the default slot used by the genome_resquiggle command so this command will not need to be set unless you would like to access an alternatively re-squiggled slot (e.g. including multiple signal normalizations within the same file).
- --basecall-subgroups: FAST5 subgroup (under Analyses/[corrected-group]) where individual template and/or complement reads are stored. Default: BaseCalled_template. This is the default supplied by ONT and should work for most cases.
- --2d: Input contains 2D reads and both forward and complement should be plotted. Equivalent to --basecall-subgroups BaseCalled_template BaseCalled_complement
- --genome-locations: Plot signal at specified genomic locations. Regions will be centered on the specified genomic position. Format locations as "chrm:position [chrm2:position2 ...]". E.g. "chr1:1000 chr21:40000 chrY:5000"
- --motif: DNA motif of interest. Can be composed of any one letter DNA codes (NEB Single Letter Codes).
- --genome-fasta: FASTA file used to map reads with genome_resquiggle command. If chromosomes are missing then regions from those chromosomes (or organims if multi-species) will not be considered for plotting.
- --test-type: Type of significance test to apply. Choices are: mw_utest (default; mann-whitney u-test), ttest.
- --fishers-method-offset: Offset up and downstream over which to compute combined p-values using Fisher's method. For example 2 would compute the Fisher's method p-value over a moving window of 5 bases. Default: Do not compute Fihser's method p-values (report raw, base-by-base p-values).
- --statistics-filename: Filename to save/load base by base signal difference statistics. If file exists it will be loaded, if it does not exist it will be created to save statistics. Default: Don't save/load. Note that --test-type and --fishers-method-offset will be ignored if --statistics-filename is provided and the file exists.
- --minimum-test-reads: Number of reads required from both samples to test for significant difference in signal level. Note that regions with lower coverage levels will not have p-values be computed. Default: 5
- --q-value-threshold: Choose the number of regions to plot by the FDR corrected p-values. Note that --num-regions will be ignored if this option is set.
- --sequences-filename: Filename to store genomic sequences at selected regions (e.g. for PWM search). Sequences will be stored in FASTA format. Default: None.
- --num-context: Number of bases to plot surrounding motif of interest. Default: 2
Single sample genome-anchored plotting functions:
nanoraw plot_max_coverage --fast5-basedirs $g1Dir --2d \ --num-bases 21 --overplot-threshold 1000 nanoraw plot_max_coverage --fast5-basedirs $g1Dir --2d \ --num-bases 21 --overplot-threshold 1000 \ --obs-per-base-filter 99:200 100:5000 nanoraw plot_genome_location --fast5-basedirs $g1Dir \ --genome-locations "S_aureus:2064835" "S_aureus:2064935" \ --2d --num-bases 21 --overplot-threshold 1000 nanoraw plot_motif_centered --fast5-basedirs $g1Dir --motif AHC \ --genome-fasta $genomeFn --2d \ --num-bases 21 --overplot-threshold 1000 nanoraw plot_motif_centered --fast5-basedirs $g1Dir --motif AHC \ --genome-fasta $genomeFn --2d \ --num-bases 21 --overplot-threshold 1000 --deepest-coverage
Mutliple sample genome-anchored plotting functions:
nanoraw plot_max_coverage --fast5-basedirs $g1Dir \ --fast5-basedirs2 $g2Dir --2d \ --num-bases 21 --overplot-threshold 1000 nanoraw plot_max_coverage --fast5-basedirs $g1Dir \ --fast5-basedirs2 $g2Dir --2d \ --num-bases 21 --overplot-threshold 1000 \ --obs-per-base-filter 99:200 100:5000 nanoraw plot_genome_location --fast5-basedirs $g1Dir \ --fast5-basedirs2 $g2Dir \ --genome-locations "S_aureus:2064835" "S_aureus:2064935" \ --2d --num-bases 21 --overplot-threshold 1000 nanoraw plot_motif_centered --fast5-basedirs $g1Dir --motif AHC \ --genome-fasta $genomeFn \ --fast5-basedirs2 $g2Dir --2d \ --num-bases 21 --overplot-threshold 1000 --deepest-coverage
Mutliple sample statistical testing genome-anchored plotting functions:
nanoraw plot_max_difference --fast5-basedirs $g1Dir \ --fast5-basedirs2 $g2Dir --2d \ --num-bases 21 --overplot-threshold 1000 nanoraw plot_most_significant --fast5-basedirs $g1Dir \ --fast5-basedirs2 $g2Dir --2d \ --num-bases 21 --overplot-threshold 1000 nanoraw plot_motif_with_stats --fast5-basedirs $g1Dir \ --fast5-basedirs2 $g2Dir --motif AHC --2d \ --overplot-threshold 1000 --test-type mw_utest \ --genome-fasta $genomeFn
Overplotting options:
nanoraw plot_max_coverage --fast5-basedirs $g1Dir \ --fast5-basedirs2 $g2Dir --2d \ --num-bases 21 --overplot-threshold 20 --overplot-type Downsample \ --pdf-filename Nanopore_read_coverage.max_coverage.Downsample.pdf nanoraw plot_max_coverage --fast5-basedirs $g1Dir \ --fast5-basedirs2 $g2Dir --2d \ --num-bases 21 --overplot-threshold 20 --overplot-type Boxplot \ --pdf-filename Nanopore_read_coverage.max_coverage.Boxplot.pdf nanoraw plot_max_coverage --fast5-basedirs $g1Dir \ --fast5-basedirs2 $g2Dir --2d \ --num-bases 21 --overplot-threshold 20 --overplot-type Quantile \ --pdf-filename Nanopore_read_coverage.max_coverage.Quantile.pdf nanoraw plot_max_coverage --fast5-basedirs $g1Dir \ --fast5-basedirs2 $g2Dir --2d \ --num-bases 21 --overplot-threshold 20 --overplot-type Violin \ --pdf-filename Nanopore_read_coverage.max_coverage.Violin.pdf