Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qualimap: @RG flag missing SM line #238

Closed
olgabot opened this issue Jun 21, 2019 · 5 comments
Closed

Qualimap: @RG flag missing SM line #238

olgabot opened this issue Jun 21, 2019 · 5 comments

Comments

@olgabot
Copy link
Contributor

olgabot commented Jun 21, 2019

Hello, I'm running into issues with Qualimap needing the SM field. From here it looks like one can add VALIDATION_STRINGENCY=LENIENT to the command line to ignore this error, though I'm not sure if that's exactly what would be wanted by this pipeline

ERROR ~ Error executing process > 'qualimap (F18_scSLAM_S138_R1_001Aligned.sortedByCoord.out)'

Caused by:
  Process `qualimap (F18_scSLAM_S138_R1_001Aligned.sortedByCoord.out)` terminated with an error exit status (255)

Command executed:

  unset DISPLAY
  qualimap --java-mem-size=16G rnaseq non-strand-specific -pe -s -bam F18_scSLAM_S138_R1_001Aligned.sortedByCoord.out.bam -gtf gencode.v30.annotation.ERCC92.gtf -outdir F18_scSLAM_S138_R1_001Aligned.sortedByCoord.out

Command exit status:
  255

Command output:
  Java memory size is set to 16G
  Launching application...

  QualiMap v.2.2.2-dev
  Built on 2018-12-03 16:04

  Selected tool: rnaseq
  Initializing regions from gencode.v30.annotation.ERCC92.gtf...

  Initialized 100000 regions...
  Initialized 200000 regions...
  Initialized 300000 regions...
  Initialized 400000 regions...
  Initialized 500000 regions...
  Initialized 600000 regions...
  Initialized 700000 regions...
  Initialized 800000 regions...
  Initialized 900000 regions...
  Initialized 1000000 regions...
  Initialized 1100000 regions...
  Initialized 1200000 regions...
  Initialized 1300000 regions...
  Initialized 1400000 regions...
  Initialized 1500000 regions...
  Initialized 1600000 regions...
  Initialized 1700000 regions...
  Initialized 1800000 regions...
  Initialized 1900000 regions...
  Initialized 2000000 regions...
  Initialized 2100000 regions...
  Initialized 2200000 regions...
  Initialized 2300000 regions...
  Initialized 2400000 regions...
  Initialized 2500000 regions...
  Initialized 2600000 regions...
  Initialized 2700000 regions...

  Initialized 2774391 regions it total

  Starting constructing transcripts for RNA-seq stats...
  Finished constructing transcripts

  Starting BAM file analysis

  Thu Jun 20 21:29:45 UTC 2019          WARNING Cleanup output dir


Command error:
  Failed to run rnaseq
  net.sf.samtools.SAMFormatException: Error parsing SAM header. @RG line missing SM tag. Line:
  @RG   ID:F18_scSLAM_S138_R1_001       CN:czbiohub; File /tmp/nxf.MgEXew9lNI/F18_scSLAM_S138_R1_001Aligned.sortedByCoord.out.bam; Line number 688
        at net.sf.samtools.SAMTextHeaderCodec.reportErrorParsingLine(SAMTextHeaderCodec.java:230)
        at net.sf.samtools.SAMTextHeaderCodec.access$100(SAMTextHeaderCodec.java:39)
        at net.sf.samtools.SAMTextHeaderCodec$ParsedHeaderLine.requireTag(SAMTextHeaderCodec.java:306)
        at net.sf.samtools.SAMTextHeaderCodec.parseRGLine(SAMTextHeaderCodec.java:160)
        at net.sf.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:93)
        at net.sf.samtools.BAMFileReader.readHeader(BAMFileReader.java:393)
        at net.sf.samtools.BAMFileReader.<init>(BAMFileReader.java:146)
        at net.sf.samtools.BAMFileReader.<init>(BAMFileReader.java:114)
        at net.sf.samtools.SAMFileReader.init(SAMFileReader.java:514)
        at net.sf.samtools.SAMFileReader.<init>(SAMFileReader.java:167)
        at net.sf.samtools.SAMFileReader.<init>(SAMFileReader.java:122)
        at org.bioinfo.ngs.qc.qualimap.process.ComputeCountsTask.run(ComputeCountsTask.java:483)
        at org.bioinfo.ngs.qc.qualimap.process.RNASeqQCAnalysis.run(RNASeqQCAnalysis.java:68)
        at org.bioinfo.ngs.qc.qualimap.main.RnaSeqQcTool.execute(RnaSeqQcTool.java:188)
        at org.bioinfo.ngs.qc.qualimap.main.NgsSmartTool.run(NgsSmartTool.java:190)
        at org.bioinfo.ngs.qc.qualimap.main.NgsSmartMain.main(NgsSmartMain.java:113)

Work dir:
  s3://darmanis-group/shayanhoss/scSLAM/190610_M05295_0286_000000000-G43TH/nextflow-intermediates/8c/a6b6b7ec23bfc66653513c5ec5f069

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
@olgabot
Copy link
Contributor Author

olgabot commented Jun 21, 2019

I should mention this was run with the most recent Salmon branch: #221

@olgabot olgabot mentioned this issue Jun 21, 2019
8 tasks
@lpantano
Copy link
Contributor

It seems that is realted to the --seq_center arguments. Did you use it? Maybe you can try to add the SM tag.

I haven't had this issues but I don't use that parameter.

Something related: https://sourceforge.net/p/samtools/mailman/message/27690955/

@olgabot
Copy link
Contributor Author

olgabot commented Jun 21, 2019

Hmm I did set seq_center = "czbiohub" for this... maybe that's what's happening?

@apeltzer
Copy link
Member

Read groups can be a pain. Seq center should however only set the CN:czbiohub in the RG string, which it correctly did according to the error message above. I don't think we currently set this correctly in the rnaseq pipeline, so we could add a piece of code settings this to the input filename for example.

If we have at one point multiple samples being merged together, they should get the same SM entry then, but at the moment we don't really do that.

@drpatelh
Copy link
Member

drpatelh commented Jul 8, 2019

This should be fixed now with this PR to dev:
#221

@drpatelh drpatelh closed this as completed Jul 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants