Does it work on single-end reads? #4

naikai · 2016-03-14T22:27:15Z

Hi,

I was trying to run this tool on bam files generated from both SE and PE reads.
PE reads seem to work fine, but I am having troubles with SE reads:

2016-03-14 18:24:08     START   test
2016-03-14 18:24:08     Collating 34 alignment_summary_metrics files
2016-03-14 18:24:09     Collating 34 quality_distribution_metrics files
2016-03-14 18:24:10     Collating 34 rnaseq_metrics files (summary)
2016-03-14 18:24:11     Collating 34 rnaseq_metrics files (coverage)
2016-03-14 18:24:11     Collating 34 gc_bias_metrics files
2016-03-14 18:24:12     Collating 34 gc_bias_histogram files
2016-03-14 18:24:13     Collating 34 duplicate_metrics files
2016-03-14 18:24:14     Collating 34 base_distribution_by_cycle files
2016-03-14 18:24:15     Collating 34 library_complexity files
2016-03-14 18:24:15     Collating 34 library_complexity files (histogram)
2016-03-14 18:24:16     Collating 34 mapq_stats files
2016-03-14 18:24:17     Joining all files into 'test-all-metrics.tsv'
Warning messages:
1: In read_tsv(sprintf("%s-insert-size-metrics.tsv", prefix)) :
  File does not exist: test-insert-size-metrics.tsv
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  incomplete final line found by readTableHeader on 'test-library-complexity.tsv'
2016-03-14 18:24:17     DONE    test

I guess it has to do with collecting information about the insert-size from PE.
Is it possible to add an option to disable this part but still include all the other metrics in the pipeline?
Thanks for your help.

The text was updated successfully, but these errors were encountered:

slowkow · 2016-03-15T00:51:03Z

Thanks for opening this issue and providing your output. I believe picardmetrics will work with SE data.

Could I ask you to please check if picardmetrics created the test-all-metrics.tsv file? Does it look ok? I am guessing that it completed successfully, because you have no errors. The 2 warnings can probably be safely ignored.

Regarding the warnings...

Indeed, SE reads don't have insert size metrics, so there is a warning printed about the missing file.
The warning incomplete final line found indicates that the last line of test-library-complexity.tsv did not end with a newline character \n. This is probably ok.

naikai · 2016-03-15T03:13:14Z

Thanks for the quick reply.

After running picardmetrics, it created the following six files

test-all-metrics.tsv
test-base-distribution-by-cycle-histogram.tsv
test-gc-bias-histogram.tsv
test-library-complexity-histogram.tv
test-quality-histogram.tv
test-rnaseq-coverage.tsv

However, there's nothing in the test-library-complexity-histogram.tsv file besides the header:

SAMPLE

Same thing in test-all-metrics.tsv:

SAMPLE  CATEGORY        TOTAL_READS     PF_READS        PCT_PF_READS    PF_NOISE_READS  PF_READS_ALIGNED        PCT_PF_READS_ALIGNED    PF_ALIGNED_BASES        PF_HQ_ALIGNED_READS     PF_HQ_ALIGNED_BASES     PF_HQ_ALIGNED_Q20_BASES PF_HQ_MEDIAN_MISMATCHES PF_MISMATCH_RATE        PF_HQ_ERROR_RATE        PF_INDEL_RATE   MEAN_READ_LENGTH
        READS_ALIGNED_IN_PAIRS  PCT_READS_ALIGNED_IN_PAIRS      BAD_CYCLES      STRAND_BALANCE  PCT_CHIMERAS    PCT_ADAPTER     PF_BASES        PF_ALIGNED_BASES_rnaseq_metrics RIBOSOMAL_BASES CODING_BASES    UTR_BASES       INTRONIC_BASES  INTERGENIC_BASES        IGNORED_READS   CORRECT_STRAND_READS    INCORRECT_STRAND_READS  PCT_RIBOSOMAL_BASES     PCT_CODING_BASES        PCT_UTR_BASES   PCT_INTRONIC_BASES      PCT_INTERGENIC_BASES    PCT_MRNA_BASES  PCT_USABLE_BASES        PCT_CORRECT_STRAND_READS        MEDIAN_CV_COVERAGE      MEDIAN_5PRIME_BIAS      MEDIAN_3PRIME_BIAS      MEDIAN_5PRIME_TO_3PRIME_BIAS    LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED     UNMAPPED_READS  UNPAIRED_READ_DUPLICATES        READ_PAIR_DUPLICATES    READ_PAIR_OPTICAL_DUPLICATES    PERCENT_DUPLICATION     ESTIMATED_LIBRARY_SIZE  WINDOW_SIZE     TOTAL_CLUSTERS
  ALIGNED_READS   AT_DROPOUT      GC_DROPOUT      X_library_complexity    MAPQ_Mean       MAPQ_Median     MAPQ_SD MAPQ_Mode       MAPQ_ModePercent

HTH

slowkow · 2016-03-15T11:00:36Z

Please share more information. What's in the other files? What's in the log (example)? What version of Picard are you using?

Could I ask you to please share all of your output files? See here for an example of all the files I'm requesting from you.

Alternatively, please share some of your input files so I can run it myself and see what is happening. You can email me if you don't want to post the files publically.

aldojongejan · 2017-02-16T10:47:20Z

Hi,

I ran into the same problem and solved it by checking in the code whether '.insert-size-metrics.tsv' is actually present (as you do for the rna_metrics): so around line 895 (in the 'read_metrics' function:

  is_insert_size <- file.exists(sprintf("%s-insert-size-metrics.tsv", prefix))
  if (is_insert_size) {
    dat_insert_size_metrics = read_tsv(
      sprintf("%s-insert-size-metrics.tsv", prefix)
    )
  }

And added extra clauses further on:

  # Check whether there is any information on insert size
  if (is_insert_size && !is.null(dat_insert_size_metrics)) {
    stopifnot( all(dat_align_metrics$SAMPLE == dat_insert_size_metrics$SAMPLE) )
    dat_align_metrics <- merge(dat_align_metrics, dat_insert_size_metrics, by = "SAMPLE")
  }

  # Check whether dat_library_complexity contains any data...
  if (!is.null(dat_library_complexity) && dim(dat_library_complexity)[1] > 0) {
    # duplicate_metrics and library_complexity share columns
    cnames <- colnames(dat_library_complexity)
    cnames[2:length(cnames)] <- paste(
      cnames[2:length(cnames)], "_library_complexity",
      sep = ""
    )

Hope this helps,

Aldo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does it work on single-end reads? #4

Does it work on single-end reads? #4

naikai commented Mar 14, 2016

slowkow commented Mar 15, 2016

naikai commented Mar 15, 2016

slowkow commented Mar 15, 2016

aldojongejan commented Feb 16, 2017 •

edited by slowkow

Loading

Does it work on single-end reads? #4

Does it work on single-end reads? #4

Comments

naikai commented Mar 14, 2016

slowkow commented Mar 15, 2016

naikai commented Mar 15, 2016

slowkow commented Mar 15, 2016

aldojongejan commented Feb 16, 2017 • edited by slowkow Loading

aldojongejan commented Feb 16, 2017 •

edited by slowkow

Loading