Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the majority reads were uncorrected after running canu -correct #2109

Closed
eviewan opened this issue Apr 6, 2022 · 7 comments
Closed

the majority reads were uncorrected after running canu -correct #2109

eviewan opened this issue Apr 6, 2022 · 7 comments

Comments

@eviewan
Copy link

eviewan commented Apr 6, 2022

I am trying to assemble human mitochondrial reads (average genome length ~16k). After running 'canu -correct' on 7519 mitochondrial reads, 7425 reads were uncorrected and only 44 were corrected. This led to 'canu -assemble' failing since too few reads were passed. I'm surprised that so many reads were filtered/uncorrected and curious to know what's the reason behind this?
Thank you!

parameters are the following:

    canu -correct \
         -p chrM -d canu_correct_output \
         genomeSize=16k \
         corMaxEvidenceErate=0.15 \
         correctedErrorRate=0.045 \
         -ccs \
         file.bam 

please see the following for log:

--
--
-- BEGIN CORRECTION
----------------------------------------
-- Starting command on Wed Apr  6 15:28:31 2022 with 147.528 GB free disk space

    cd .
    ./AoU_chrM.seqStore.sh \
    > ./AoU_chrM.seqStore.err 2>&1

-- Finished on Wed Apr  6 15:28:33 2022 (2 seconds) with 147.508 GB free disk space
----------------------------------------
--
-- In sequence store './AoU_chrM.seqStore':
--   Found 298 reads.
--   Found 3203048 bases (200.19 times coverage).
--    Histogram of raw reads:
--    
--    G=3203048                          sum of  ||               length     num
--    NG         length     index       lengths  ||                range    seqs
--    ----- ------------ --------- ------------  ||  ------------------- -------
--    00010        14512        20       331911  ||       7941-8114            3|---------
--    00020        12124        44       642512  ||       8115-8288            5|---------------
--    00030        11426        72       971178  ||       8289-8462            1|---
--    00040        10958       100      1284885  ||       8463-8636            3|---------
--    00050        10483       130      1605572  ||       8637-8810            9|--------------------------
--    00060        10137       161      1924627  ||       8811-8984            6|------------------
--    00070         9818       193      2243384  ||       8985-9158            6|------------------
--    00080         9489       226      2562497  ||       9159-9332           19|-------------------------------------------------------
--    00090         9217       261      2889608  ||       9333-9506           21|-------------------------------------------------------------
--    00100         7941       297      3203048  ||       9507-9680           16|----------------------------------------------
--    001.000x                 298      3203048  ||       9681-9854           22|---------------------------------------------------------------
--                                               ||       9855-10028          14|-----------------------------------------
--                                               ||      10029-10202          21|-------------------------------------------------------------
--                                               ||      10203-10376           9|--------------------------
--                                               ||      10377-10550          17|-------------------------------------------------
--                                               ||      10551-10724          13|--------------------------------------
--                                               ||      10725-10898          11|--------------------------------
--                                               ||      10899-11072           7|---------------------
--                                               ||      11073-11246          10|-----------------------------
--                                               ||      11247-11420          12|-----------------------------------
--                                               ||      11421-11594          10|-----------------------------
--                                               ||      11595-11768           5|---------------
--                                               ||      11769-11942           5|---------------
--                                               ||      11943-12116           7|---------------------
--                                               ||      12117-12290           5|---------------
--                                               ||      12291-12464           3|---------
--                                               ||      12465-12638           1|---
--                                               ||      12639-12812           3|---------
--                                               ||      12813-12986           2|------
--                                               ||      12987-13160           4|------------
--                                               ||      13161-13334           1|---
--                                               ||      13335-13508           3|---------
--                                               ||      13509-13682           0|
--                                               ||      13683-13856           0|
--                                               ||      13857-14030           1|---
--                                               ||      14031-14204           0|
--                                               ||      14205-14378           1|---
--                                               ||      14379-14552           2|------
--                                               ||      14553-14726           2|------
--                                               ||      14727-14900           0|
--                                               ||      14901-15074           1|---
--                                               ||      15075-15248           1|---
--                                               ||      15249-15422           1|---
--                                               ||      15423-15596           1|---
--                                               ||      15597-15770           2|------
--                                               ||      15771-15944           2|------
--                                               ||      15945-16118           1|---
--                                               ||      16119-16292           0|
--                                               ||      16293-16466           5|---------------
--                                               ||      16467-16640           4|------------
--    
----------------------------------------
-- Starting command on Wed Apr  6 15:28:33 2022 with 147.507 GB free disk space

    cd correction/0-mercounts
    ./meryl-configure.sh \
    > ./meryl-configure.err 2>&1

-- Finished on Wed Apr  6 15:28:33 2022 (like a bat out of hell) with 147.507 GB free disk space
----------------------------------------
--  segments   memory batches
--  -------- -------- -------
--        01  0.01 GB       2
--
--  For 298 reads with 3203048 bases, limit to 1 batch.
--  Will count kmers using 01 jobs, each using 2 GB and 4 threads.
--
-- Finished stage 'merylConfigure', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'meryl' concurrent execution on Wed Apr  6 15:28:33 2022 with 147.507 GB free disk space (1 processes; 7 concurrently)

    cd correction/0-mercounts
    ./meryl-count.sh 1 > ./meryl-count.000001.out 2>&1

-- Finished on Wed Apr  6 15:28:34 2022 (one second) with 147.507 GB free disk space
----------------------------------------
-- Found 1 Kmer counting (meryl) outputs.
-- Finished stage 'cor-merylCountCheck', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'meryl' concurrent execution on Wed Apr  6 15:28:34 2022 with 147.507 GB free disk space (1 processes; 7 concurrently)

    cd correction/0-mercounts
    ./meryl-process.sh 1 > ./meryl-process.000001.out 2>&1

-- Finished on Wed Apr  6 15:28:35 2022 (one second) with 147.507 GB free disk space
----------------------------------------
-- Meryl finished successfully.  Kmer frequency histogram:
--
-- WARNING: gnuplot failed.
--
----------------------------------------
--
--  16-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2      6452 ***********************************************************            0.2369 0.0041
--       3-     4      2951 ***************************                                            0.3126 0.0060
--       5-     7       840 *******                                                                0.3601 0.0078
--       8-    11       245 **                                                                     0.3813 0.0090
--      12-    16       114 *                                                                      0.3864 0.0095
--      17-    22        25                                                                        0.3897 0.0099
--      23-    29        12                                                                        0.3904 0.0100
--      30-    37        10                                                                        0.3908 0.0101
--      38-    46         4                                                                        0.3912 0.0103
--      47-    56         8                                                                        0.3912 0.0103
--      57-    67         0                                                                        0.0000 0.0000
--      68-    79         3                                                                        0.3915 0.0104
--      80-    92         2                                                                        0.3916 0.0105
--      93-   106         0                                                                        0.0000 0.0000
--     107-   121         3                                                                        0.3917 0.0106
--     122-   137         3                                                                        0.3918 0.0107
--     138-   154        23                                                                        0.3920 0.0109
--     155-   172      2190 ********************                                                   0.3928 0.0119
--     173-   191      7625 ********************************************************************** 0.4905 0.1549
--     192-   211      5269 ************************************************                       0.7748 0.6042
--     212-   232      1452 *************                                                          0.9544 0.9152
--
--           0 (max occurrences)
--     3159060 (total mers, non-unique)
--       27231 (distinct mers, non-unique)
--           0 (unique mers)
-- Finished stage 'meryl-process', reset canuIteration.
--
-- Removing meryl database 'correction/0-mercounts/AoU_chrM.ms16'.
--
-- OVERLAPPER (mhap) (correction)
--
-- Set corMhapSensitivity=low based on read coverage of 200.19.
--
-- PARAMETERS: hashes=256, minMatches=3, threshold=0.8
--
-- Given 5.4 GB, can fit 16200 reads per block.
-- For 2 blocks, set stride to 2 blocks.
-- Logging partitioning to 'correction/1-overlapper/partitioning.log'.
-- Configured 1 mhap precompute jobs.
-- Configured 1 mhap overlap jobs.
-- Finished stage 'cor-mhapConfigure', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'cormhap' concurrent execution on Wed Apr  6 15:28:35 2022 with 147.507 GB free disk space (1 processes; 2 concurrently)

    cd correction/1-overlapper
    ./precompute.sh 1 > ./precompute.000001.out 2>&1

-- Finished on Wed Apr  6 15:28:39 2022 (4 seconds) with 147.502 GB free disk space
----------------------------------------
-- All 1 mhap precompute jobs finished successfully.
-- Finished stage 'cor-mhapPrecomputeCheck', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'cormhap' concurrent execution on Wed Apr  6 15:28:39 2022 with 147.502 GB free disk space (1 processes; 2 concurrently)

    cd correction/1-overlapper
    ./mhap.sh 1 > ./mhap.000001.out 2>&1

-- Finished on Wed Apr  6 15:28:41 2022 (2 seconds) with 147.502 GB free disk space
----------------------------------------
-- Found 1 mhap overlap output files.
-- Finished stage 'cor-mhapCheck', reset canuIteration.
----------------------------------------
-- Starting command on Wed Apr  6 15:28:41 2022 with 147.502 GB free disk space

    cd correction
    /canu-2.2/bin/ovStoreConfig \
     -S ../AoU_chrM.seqStore \
     -M 4-8 \
     -L ./1-overlapper/ovljob.files \
     -create ./AoU_chrM.ovlStore.config \
     > ./AoU_chrM.ovlStore.config.txt \
    2> ./AoU_chrM.ovlStore.config.err

-- Finished on Wed Apr  6 15:28:41 2022 (in the blink of an eye) with 147.502 GB free disk space
----------------------------------------
--
-- Creating overlap store correction/AoU_chrM.ovlStore using:
--      1 bucket
--      2 slices
--        using at most 1 GB memory each
-- Finished stage 'cor-overlapStoreConfigure', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'ovB' concurrent execution on Wed Apr  6 15:28:41 2022 with 147.502 GB free disk space (1 processes; 7 concurrently)

    cd correction/AoU_chrM.ovlStore.BUILDING
    ./scripts/1-bucketize.sh 1 > ./logs/1-bucketize.000001.out 2>&1

-- Finished on Wed Apr  6 15:28:42 2022 (one second) with 147.5 GB free disk space
----------------------------------------
-- Overlap store bucketizer finished.
-- Finished stage 'cor-overlapStoreBucketizerCheck', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'ovS' concurrent execution on Wed Apr  6 15:28:42 2022 with 147.5 GB free disk space (2 processes; 3 concurrently)

    cd correction/AoU_chrM.ovlStore.BUILDING
    ./scripts/2-sort.sh 1 > ./logs/2-sort.000001.out 2>&1
    ./scripts/2-sort.sh 2 > ./logs/2-sort.000002.out 2>&1

-- Finished on Wed Apr  6 15:28:42 2022 (lickety-split) with 147.499 GB free disk space
----------------------------------------
-- Overlap store sorter finished.
-- Finished stage 'cor-overlapStoreSorterCheck', reset canuIteration.
----------------------------------------
-- Starting command on Wed Apr  6 15:28:42 2022 with 147.499 GB free disk space

    cd correction
    /canu-2.2/bin/ovStoreIndexer \
      -O  ./AoU_chrM.ovlStore.BUILDING \
      -S ../AoU_chrM.seqStore \
      -C  ./AoU_chrM.ovlStore.config \
      -delete \
    > ./AoU_chrM.ovlStore.BUILDING.index.err 2>&1

-- Finished on Wed Apr  6 15:28:42 2022 (fast as lightning) with 147.499 GB free disk space
----------------------------------------
-- Overlap store indexer finished.
-- Checking store.
----------------------------------------
-- Starting command on Wed Apr  6 15:28:42 2022 with 147.499 GB free disk space

    cd correction
    /canu-2.2/bin/ovStoreDump \
     -S ../AoU_chrM.seqStore \
     -O  ./AoU_chrM.ovlStore \
     -counts \
     > ./AoU_chrM.ovlStore/counts.dat 2> ./AoU_chrM.ovlStore/counts.err

-- Finished on Wed Apr  6 15:28:42 2022 (furiously fast) with 147.499 GB free disk space
----------------------------------------
--
-- Overlap store 'correction/AoU_chrM.ovlStore' successfully constructed.
-- Found 88472 overlaps for 298 reads; 7173 reads have no overlaps.
--
--
-- Purged 0.005 GB in 3 overlap output files.
-- Finished stage 'cor-createOverlapStore', reset canuIteration.
-- Set corMinCoverage=4 based on read coverage of 200.19.
-- Computing correction layouts.
--   Local  filter coverage   80
--   Global filter coverage   40
----------------------------------------
-- Starting command on Wed Apr  6 15:28:42 2022 with 147.505 GB free disk space

    cd correction
    /canu-2.2/bin/generateCorrectionLayouts \
      -S ../AoU_chrM.seqStore \
      -O  ./AoU_chrM.ovlStore \
      -C  ./AoU_chrM.corStore.WORKING \
      -eE 0.15 \
      -eC 80 \
      -xC 40 \
    > ./AoU_chrM.corStore.err 2>&1

-- Finished on Wed Apr  6 15:28:42 2022 (in the blink of an eye) with 147.504 GB free disk space
----------------------------------------
-- Finished stage 'cor-buildCorrectionLayoutsConfigure', reset canuIteration.
-- Computing correction layouts.
----------------------------------------
-- Starting command on Wed Apr  6 15:28:42 2022 with 147.504 GB free disk space

    cd correction/2-correction
    /canu-2.2/bin/filterCorrectionLayouts \
      -S  ../../AoU_chrM.seqStore \
      -C     ../AoU_chrM.corStore \
      -R      ./AoU_chrM.readsToCorrect.WORKING \
      -cc 4 \
      -cl 1000 \
      -g  16000 \
      -c  40 \
    > ./AoU_chrM.readsToCorrect.err 2>&1

-- Finished on Wed Apr  6 15:28:42 2022 (like a bat out of hell) with 147.504 GB free disk space
----------------------------------------
--                             original      original
--                            raw reads     raw reads
--   category                w/overlaps  w/o/overlaps
--   -------------------- ------------- -------------
--   Number of Reads                282          7189
--   Number of Bases            3066432        136616
--   Coverage                   191.652         8.539
--   Median                       10382             0
--   Mean                         10873            19
--   N50                          10577          8563
--   Minimum                       8277             0
--   Maximum                      16615          9539
--   
--                                        --------corrected---------  ----------rescued----------
--                             evidence                     expected                     expected
--   category                     reads            raw     corrected            raw     corrected
--   -------------------- -------------  ------------- -------------  ------------- -------------
--   Number of Reads                298             46            46              0             0
--   Number of Bases            3203048         650094        649514              0             0
--   Coverage                   200.190         40.631        40.595          0.000         0.000
--   Median                       10250          13866         13838              0             0
--   Mean                         10748          14132         14119              0             0
--   N50                          10510          14512         14498              0             0
--   Minimum                       7941          12078         12077              0             0
--   Maximum                      16615          16556         16551              0             0
--   
--                        --------uncorrected--------
--                                           expected
--   category                       raw     corrected
--   -------------------- ------------- -------------
--   Number of Reads               7425          7425
--   Number of Bases            2552954       2255633
--   Coverage                   159.560       140.977
--   Median                           0             0
--   Mean                           343           303
--   N50                          10137         10240
--   Minimum                          0             0
--   Maximum                      16615         16606
--   
@skoren
Copy link
Member

skoren commented Apr 6, 2022

The default is to correct only the longest 40x of data and, in your case, 44 reads are > 40x so no more are corrected. If you want to correct more, you can use corOutCoverage. However, the assembly shouldn't complain about too few reads if you have 40x so can you post the output from your assembly run?

Last, I wanted to check what your data input is, the command doesn't look correct as Canu doesn't support a -ccs paramter nor bam files as input.

@eviewan
Copy link
Author

eviewan commented Apr 7, 2022

Thank you for the clarifications! I have confirmed my data input is fastq file and the pipeline was run using -pacbio (I didn't copy-paste correctly, my apology). There was no output generated from Assembly run with error "Abort: partitioning failed; increase redMemory).
Could you please clarify what exactly "original raw reads w/ or w/o overlaps' mean? In the "original raw reads w/o overlaps" column, I can't make sense of the values (coverage, median, mean etc). They don't seem to match the distribution shown in the length range histogram either. Please see below the exact values from the log:

--   Found 298 reads.
--   Found 3203048 bases (200.19 times coverage).
--    Histogram of raw reads:
--    
--    G=3203048                          sum of  ||               length     num
--    NG         length     index       lengths  ||                range    seqs
--    ----- ------------ --------- ------------  ||  ------------------- -------
--    00010        14512        20       331911  ||       7941-8114            3|---------
--    00020        12124        44       642512  ||       8115-8288            5|---------------
--    00030        11426        72       971178  ||       8289-8462            1|---
--    00040        10958       100      1284885  ||       8463-8636            3|---------
--    00050        10483       130      1605572  ||       8637-8810            9|--------------------------
--    00060        10137       161      1924627  ||       8811-8984            6|------------------
--    00070         9818       193      2243384  ||       8985-9158            6|------------------
--    00080         9489       226      2562497  ||       9159-9332           19|-------------------------------------------------------
--    00090         9217       261      2889608  ||       9333-9506           21|-------------------------------------------------------------
--    00100         7941       297      3203048  ||       9507-9680           16|----------------------------------------------
--    001.000x                 298      3203048  ||       9681-9854           22|---------------------------------------------------------------
--                                               ||       9855-10028          14|-----------------------------------------
--                                               ||      10029-10202          21|-------------------------------------------------------------
--                                               ||      10203-10376           9|--------------------------
--                                               ||      10377-10550          17|-------------------------------------------------
--                                               ||      10551-10724          13|--------------------------------------
--                                               ||      10725-10898          11|--------------------------------
--                                               ||      10899-11072           7|---------------------
--                                               ||      11073-11246          10|-----------------------------
--                                               ||      11247-11420          12|-----------------------------------
--                                               ||      11421-11594          10|-----------------------------
--                                               ||      11595-11768           5|---------------
--                                               ||      11769-11942           5|---------------
--                                               ||      11943-12116           7|---------------------
--                                               ||      12117-12290           5|---------------
--                                               ||      12291-12464           3|---------
--                                               ||      12465-12638           1|---
--                                               ||      12639-12812           3|---------
--                                               ||      12813-12986           2|------
--                                               ||      12987-13160           4|------------
--                                               ||      13161-13334           1|---
--                                               ||      13335-13508           3|---------
--                                               ||      13509-13682           0|
--                                               ||      13683-13856           0|
--                                               ||      13857-14030           1|---
--                                               ||      14031-14204           0|
--                                               ||      14205-14378           1|---
--                                               ||      14379-14552           2|------
--                                               ||      14553-14726           2|------
--                                               ||      14727-14900           0|
--                                               ||      14901-15074           1|---
--                                               ||      15075-15248           1|---
--                                               ||      15249-15422           1|---
--                                               ||      15423-15596           1|---
--                                               ||      15597-15770           2|------
--                                               ||      15771-15944           2|------
--                                               ||      15945-16118           1|---
--                                               ||      16119-16292           0|
--                                               ||      16293-16466           5|---------------
--                                               ||      16467-16640           4|------------
--    


-- Finished on Wed Apr  6 15:28:42 2022 (like a bat out of hell) with 147.504 GB free disk space
----------------------------------------
--                             original      original
--                            raw reads     raw reads
--   category                w/overlaps  w/o/overlaps
--   -------------------- ------------- -------------
--   Number of Reads                282          7189
--   Number of Bases            3066432        136616
--   Coverage                   191.652         8.539
--   Median                       10382             0
--   Mean                         10873            19
--   N50                          10577          8563
--   Minimum                       8277             0
--   Maximum                      16615          9539
--   
--                                        --------corrected---------  ----------rescued----------
--                             evidence                     expected                     expected
--   category                     reads            raw     corrected            raw     corrected
--   -------------------- -------------  ------------- -------------  ------------- -------------
--   Number of Reads                298             46            46              0             0
--   Number of Bases            3203048         650094        649514              0             0
--   Coverage                   200.190         40.631        40.595          0.000         0.000
--   Median                       10250          13866         13838              0             0
--   Mean                         10748          14132         14119              0             0
--   N50                          10510          14512         14498              0             0
--   Minimum                       7941          12078         12077              0             0
--   Maximum                      16615          16556         16551              0             0
--   
--                        --------uncorrected--------
--                                           expected
--   category                       raw     corrected
--   -------------------- ------------- -------------
--   Number of Reads               7425          7425
--   Number of Bases            2552954       2255633
--   Coverage                   159.560       140.977
--   Median                           0             0
--   Mean                           343           303
--   N50                          10137         10240
--   Minimum                          0             0
--   Maximum                      16615         16606
--   
--   Maximum Memory           697499276 ```

@skoren
Copy link
Member

skoren commented Apr 8, 2022

That error message: "Abort: partitioning failed; increase redMemory). is not what you had originally reported and is not due to having too few reads. You're hitting #2035 which occurs w/<100 reads. You can edit the code as suggested there or install from conda which has a bug fix patch. Alternatively, you could add corOutCoverage=200 which should give you >100 reads to assemble and would avoid the bug.

The categories mean what the names imply, reads with overlaps are those reads which have overlaps to other reads and w/o are those that don't. They shouldn't match the input read distribution exactly as the reads w/o overlaps are typically short or noisy reads and are a small subset of your total data which is why the have lower coverage and shorter length.

@eviewan
Copy link
Author

eviewan commented Apr 12, 2022

Thank you for your comment.
canu -correct worked after adding the corOutCoverage parameter, however I am still confused with the stats for "original raw reads w/o overlaps" - please find below the mentioned stats copied from canu -correct log, as well as the histogram of reads distribution from the fastq file with original reads fed into canu -correct command. As can be seen in the histogram of the original reads, we filtered reads shorter than 7000. I'm not sure why in canu -correct stats, mean and median are 0, and 19? Based on the distribution of the original reads, how can I interpret the canu -correct stats?

----------------------------------------
--                             original      original
--                            raw reads     raw reads
--   category                w/overlaps  w/o/overlaps
--   -------------------- ------------- -------------
--   Number of Reads                282          7189
--   Number of Bases            3066432        136616
--   Coverage                   191.652         8.539
--   Median                       10382             0
--   Mean                         10873            19
--   N50                          10577          8563
--   Minimum                       8277             0
--   Maximum                      16615          9539



ORIGINAL READS FED INTO CANU
# TOTAL: 7471 - MEAN: 10806.790
# MIN: 7065.0 - MAX: 17348.0
# SD: 1950.5463357009942 - Variance 3804631.007716575
 7065.0000 -  7270.6600 [ 12]: #
 7270.6600 -  7476.3200 [ 10]: 
 7476.3200 -  7681.9800 [ 14]: #
 7681.9800 -  7887.6400 [ 24]: ##
 7887.6400 -  8093.3000 [ 46]: ####
 8093.3000 -  8298.9600 [ 63]: #####
 8298.9600 -  8504.6200 [106]: #########
 8504.6200 -  8710.2800 [135]: ############
 8710.2800 -  8915.9400 [200]: ##################
 8915.9400 -  9121.6000 [297]: ##########################
 9121.6000 -  9327.2600 [380]: ##################################
 9327.2600 -  9532.9200 [462]: #########################################
 9532.9200 -  9738.5800 [543]: #################################################
 9738.5800 -  9944.2400 [526]: ###############################################
 9944.2400 - 10149.9000 [522]: ###############################################
10149.9000 - 10355.5600 [470]: ##########################################
10355.5600 - 10561.2200 [452]: ########################################
10561.2200 - 10766.8800 [391]: ###################################
10766.8800 - 10972.5400 [415]: #####################################
10972.5400 - 11178.2000 [304]: ###########################
11178.2000 - 11383.8600 [283]: #########################
11383.8600 - 11589.5200 [237]: #####################
11589.5200 - 11795.1800 [197]: #################
11795.1800 - 12000.8400 [127]: ###########
12000.8400 - 12206.5000 [141]: ############
12206.5000 - 12412.1600 [ 75]: ######
12412.1600 - 12617.8200 [ 81]: #######
12617.8200 - 12823.4800 [ 55]: ####
12823.4800 - 13029.1400 [ 57]: #####
13029.1400 - 13234.8000 [ 48]: ####
13234.8000 - 13440.4600 [ 37]: ###
13440.4600 - 13646.1200 [ 38]: ###
13646.1200 - 13851.7800 [ 31]: ##
13851.7800 - 14057.4400 [ 33]: ##
14057.4400 - 14263.1000 [ 36]: ###
14263.1000 - 14468.7600 [ 37]: ###
14468.7600 - 14674.4200 [ 25]: ##
14674.4200 - 14880.0800 [ 33]: ##
14880.0800 - 15085.7400 [ 28]: ##
15085.7400 - 15291.4000 [ 28]: ##
15291.4000 - 15497.0600 [ 27]: ##
15497.0600 - 15702.7200 [ 29]: ##
15702.7200 - 15908.3800 [ 11]: 
15908.3800 - 16114.0400 [ 34]: ###
16114.0400 - 16319.7000 [ 43]: ###
16319.7000 - 16525.3600 [ 92]: ########
16525.3600 - 16731.0200 [192]: #################
16731.0200 - 16936.6800 [ 35]: ###
16936.6800 - 17142.3400 [  8]: 
17142.3400 - 17348.0000 [  1]: 

@skoren
Copy link
Member

skoren commented Apr 12, 2022

Those correction stats are using estimated corrected read length. A corrected read length is the bases that have sufficient support from other reads. Since those reads have no overlaps (or more accurately have insufficient overlaps to be corrected), they have very short corrected read estimates.

@SHuang-Broad
Copy link
Contributor

Hi Sergey,

I'm not fully understanding the answer regarding the histogram.

So there are > 7k reads that passed the initial filters (one of them is >=7kbp, a semi-arbitrary number).
Given that the mitochondrial DNA is only 16kbp long, we'd expect most the 7k "long-enough" reads to significantly overlap with each other, and the attached IGV screenshot is indicating that's the case.

Are we misunderstanding something?

Thanks!
Steve
igv_snapshot

@skoren
Copy link
Member

skoren commented Apr 12, 2022

They should but any dataset has some noisy reads and sequencing artifacts. You had 298 reads input, of these 282 look like the above, and are at least 8kb estimated after correction. The other 16 reads are split up into >7k pieces where the shortest is 0 bp and mean is 19 bp. Essentially there are 16 bad reads in the input. The corrected reads maintain their IDs so you can pull up the reads that were input (in the seqStore) that were not corrected and try to align them to a mito. I expect they will not align well or at all.

@skoren skoren closed this as completed Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants