Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does it work on single-end reads? #4

Open
naikai opened this issue Mar 14, 2016 · 4 comments
Open

Does it work on single-end reads? #4

naikai opened this issue Mar 14, 2016 · 4 comments

Comments

@naikai
Copy link

naikai commented Mar 14, 2016

Hi,

I was trying to run this tool on bam files generated from both SE and PE reads.
PE reads seem to work fine, but I am having troubles with SE reads:

2016-03-14 18:24:08     START   test
2016-03-14 18:24:08     Collating 34 alignment_summary_metrics files
2016-03-14 18:24:09     Collating 34 quality_distribution_metrics files
2016-03-14 18:24:10     Collating 34 rnaseq_metrics files (summary)
2016-03-14 18:24:11     Collating 34 rnaseq_metrics files (coverage)
2016-03-14 18:24:11     Collating 34 gc_bias_metrics files
2016-03-14 18:24:12     Collating 34 gc_bias_histogram files
2016-03-14 18:24:13     Collating 34 duplicate_metrics files
2016-03-14 18:24:14     Collating 34 base_distribution_by_cycle files
2016-03-14 18:24:15     Collating 34 library_complexity files
2016-03-14 18:24:15     Collating 34 library_complexity files (histogram)
2016-03-14 18:24:16     Collating 34 mapq_stats files
2016-03-14 18:24:17     Joining all files into 'test-all-metrics.tsv'
Warning messages:
1: In read_tsv(sprintf("%s-insert-size-metrics.tsv", prefix)) :
  File does not exist: test-insert-size-metrics.tsv
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  incomplete final line found by readTableHeader on 'test-library-complexity.tsv'
2016-03-14 18:24:17     DONE    test

I guess it has to do with collecting information about the insert-size from PE.
Is it possible to add an option to disable this part but still include all the other metrics in the pipeline?
Thanks for your help.

@slowkow
Copy link
Owner

slowkow commented Mar 15, 2016

Thanks for opening this issue and providing your output. I believe picardmetrics will work with SE data.

Could I ask you to please check if picardmetrics created the test-all-metrics.tsv file? Does it look ok? I am guessing that it completed successfully, because you have no errors. The 2 warnings can probably be safely ignored.

Regarding the warnings...

  1. Indeed, SE reads don't have insert size metrics, so there is a warning printed about the missing file.
  2. The warning incomplete final line found indicates that the last line of test-library-complexity.tsv did not end with a newline character \n. This is probably ok.

@naikai
Copy link
Author

naikai commented Mar 15, 2016

Thanks for the quick reply.

After running picardmetrics, it created the following six files

test-all-metrics.tsv
test-base-distribution-by-cycle-histogram.tsv
test-gc-bias-histogram.tsv
test-library-complexity-histogram.tv
test-quality-histogram.tv
test-rnaseq-coverage.tsv

However, there's nothing in the test-library-complexity-histogram.tsv file besides the header:

SAMPLE

Same thing in test-all-metrics.tsv:

SAMPLE  CATEGORY        TOTAL_READS     PF_READS        PCT_PF_READS    PF_NOISE_READS  PF_READS_ALIGNED        PCT_PF_READS_ALIGNED    PF_ALIGNED_BASES        PF_HQ_ALIGNED_READS     PF_HQ_ALIGNED_BASES     PF_HQ_ALIGNED_Q20_BASES PF_HQ_MEDIAN_MISMATCHES PF_MISMATCH_RATE        PF_HQ_ERROR_RATE        PF_INDEL_RATE   MEAN_READ_LENGTH
        READS_ALIGNED_IN_PAIRS  PCT_READS_ALIGNED_IN_PAIRS      BAD_CYCLES      STRAND_BALANCE  PCT_CHIMERAS    PCT_ADAPTER     PF_BASES        PF_ALIGNED_BASES_rnaseq_metrics RIBOSOMAL_BASES CODING_BASES    UTR_BASES       INTRONIC_BASES  INTERGENIC_BASES        IGNORED_READS   CORRECT_STRAND_READS    INCORRECT_STRAND_READS  PCT_RIBOSOMAL_BASES     PCT_CODING_BASES        PCT_UTR_BASES   PCT_INTRONIC_BASES      PCT_INTERGENIC_BASES    PCT_MRNA_BASES  PCT_USABLE_BASES        PCT_CORRECT_STRAND_READS        MEDIAN_CV_COVERAGE      MEDIAN_5PRIME_BIAS      MEDIAN_3PRIME_BIAS      MEDIAN_5PRIME_TO_3PRIME_BIAS    LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED     UNMAPPED_READS  UNPAIRED_READ_DUPLICATES        READ_PAIR_DUPLICATES    READ_PAIR_OPTICAL_DUPLICATES    PERCENT_DUPLICATION     ESTIMATED_LIBRARY_SIZE  WINDOW_SIZE     TOTAL_CLUSTERS
  ALIGNED_READS   AT_DROPOUT      GC_DROPOUT      X_library_complexity    MAPQ_Mean       MAPQ_Median     MAPQ_SD MAPQ_Mode       MAPQ_ModePercent

HTH

@slowkow
Copy link
Owner

slowkow commented Mar 15, 2016

Please share more information. What's in the other files? What's in the log (example)? What version of Picard are you using?

Could I ask you to please share all of your output files? See here for an example of all the files I'm requesting from you.

Alternatively, please share some of your input files so I can run it myself and see what is happening. You can email me if you don't want to post the files publically.

@aldojongejan
Copy link

aldojongejan commented Feb 16, 2017

Hi,

I ran into the same problem and solved it by checking in the code whether '.insert-size-metrics.tsv' is actually present (as you do for the rna_metrics): so around line 895 (in the 'read_metrics' function:

  is_insert_size <- file.exists(sprintf("%s-insert-size-metrics.tsv", prefix))
  if (is_insert_size) {
    dat_insert_size_metrics = read_tsv(
      sprintf("%s-insert-size-metrics.tsv", prefix)
    )
  }

And added extra clauses further on:

  # Check whether there is any information on insert size
  if (is_insert_size && !is.null(dat_insert_size_metrics)) {
    stopifnot( all(dat_align_metrics$SAMPLE == dat_insert_size_metrics$SAMPLE) )
    dat_align_metrics <- merge(dat_align_metrics, dat_insert_size_metrics, by = "SAMPLE")
  }

  # Check whether dat_library_complexity contains any data...
  if (!is.null(dat_library_complexity) && dim(dat_library_complexity)[1] > 0) {
    # duplicate_metrics and library_complexity share columns
    cnames <- colnames(dat_library_complexity)
    cnames[2:length(cnames)] <- paste(
      cnames[2:length(cnames)], "_library_complexity",
      sep = ""
    )

Hope this helps,

Aldo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants