Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError #7

Closed
rojinsafavi opened this issue Dec 16, 2017 · 14 comments
Closed

ValueError #7

rojinsafavi opened this issue Dec 16, 2017 · 14 comments

Comments

@rojinsafavi
Copy link

rojinsafavi commented Dec 16, 2017

Hello,
I want to use ailgnQC to analyze some nanopore RNA data, but I keep getting this allocation error:

alignqc analyze aln.bam -g ../Mus_musculus.GRCm38.cdna.all.fa -t ../UCSC_Main_on_Mouse__all_mrna.gtf.gz -o report.xhtml --portable_output report.portable.xhtml --threads 10

Exception in thread Thread-4:ext coverage
Traceback (most recent call last):
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 326, in _handle_workers
pool._maintain_pool()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 230, in _maintain_pool
self._repopulate_pool()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 223, in _repopulate_pool
w.start()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/process.py", line 130, in start
self._popen = Popen(self)
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/forking.py", line 121, in init
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

Exception in thread Thread-1:
Traceback (most recent call last):
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 326, in _handle_workers
pool._maintain_pool()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 230, in _maintain_pool
self._repopulate_pool()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 223, in _repopulate_pool
w.start()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/process.py", line 130, in start
self._popen = Popen(self)
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/forking.py", line 121, in init
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

Killedignments, 3213 min context coverage

I would really appreciate if you can help me with that

@rojinsafavi rojinsafavi changed the title erro allocation error Dec 18, 2017
@rojinsafavi
Copy link
Author

Okay, I increased the threads and now I'm not getting allocate error anymore, but I get this:

-bash-4.2$ alignqc analyze aln.bam -g ../Mus_musculus.GRCm38.cdna.all.fa -t ../UCSC_Main_on_Mouse__all_mrna.gtf.gz -o report.xhtml --portable_output report.portable.xhtml --threads 10
Using Rscript version:
R scripting front-end version 3.3.2 (2016-10-31)
Creating initial alignment mapping data
/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/bam_preprocess.py aln.bam --minimum_intron_size 68 -o /tmp/weirathe.WRVHE9/temp/alndata.txt.gz --threads 10 --specific_tempdir /tmp/weirathe.WRVHE9/temp/
read basics

check for best set
0/215
combining results
215
Traverse bam for alignment analysis
/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/traverse_preprocessed.py /tmp/weirathe.WRVHE9/temp/alndata.txt.gz -o /tmp/weirathe.WRVHE9/data/ --specific_tempdir /tmp/weirathe.WRVHE9/temp/ --threads 10 --min_aligned_bases 50 --max_query_overlap 10 --max_target_overlap 10 --max_target_gap 500000 --required_fractional_improvement 0.2
215 alignments 100 reads
Writing chromosome lengths from header
/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/bam_to_chr_lengths.py aln.bam -o /tmp/weirathe.WRVHE9/data/chrlens.txt
Can we find any known read types
/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/get_platform_report.py /tmp/weirathe.WRVHE9/data/lengths.txt.gz /tmp/weirathe.WRVHE9/data/special_report
Go through genepred best alignments and make a bed depth file
Generate the depth bed for the mapped reads
gpd_to_bed_depth.py /tmp/weirathe.WRVHE9/data/best.sorted.gpd.gz -o /tmp/weirathe.WRVHE9/data/depth.sorted.bed.gz --threads 10
Stratify the depth to make it plot quicker and cleaner

Get ready for alignment plot
Make alignment plots
/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/make_alignment_plot.py /tmp/weirathe.WRVHE9/data/lengths.txt.gz --rscript_path Rscript --output_stats /tmp/weirathe.WRVHE9/data/alignment_stats.txt --output /tmp/weirathe.WRVHE9/plots/alignments.png /tmp/weirathe.WRVHE9/plots/alignments.pdf
making plot
Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_gapped_alignment_statistics.r /tmp/weirathe.WRVHE9/data/lengths.txt.gz /tmp/weirathe.WRVHE9/plots/alignments.png
null device
1
Warning messages:
1: In png(infile, bg = "#FFFFFF") :
unable to load shared object '/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/R/library/grDevices/libs//cairo.so':
libjpeg.so.8: cannot open shared object file: No such file or directory
2: In png(infile, bg = "#FFFFFF") : failed to load cairo DLL
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_gapped_alignment_statistics.r /tmp/weirathe.WRVHE9/data/lengths.txt.gz /tmp/weirathe.WRVHE9/plots/alignments.pdf
null device
1
Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
Finished.
Making depth reports
/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/depth_to_coverage_report.py /tmp/weirathe.WRVHE9/data/depth.sorted.bed.gz /tmp/weirathe.WRVHE9/data/chrlens.txt -o /tmp/weirathe.WRVHE9/data
203852380
87887
Making coverage plots
Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_chr_depth.r /tmp/weirathe.WRVHE9/data/line_plot_table.txt.gz /tmp/weirathe.WRVHE9/data/total_distro_table.txt.gz /tmp/weirathe.WRVHE9/data/chr_distro_table.txt.gz /tmp/weirathe.WRVHE9/plots/covgraph.png
Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_chr_depth.r /tmp/weirathe.WRVHE9/data/line_plot_table.txt.gz /tmp/weirathe.WRVHE9/data/total_distro_table.txt.gz /tmp/weirathe.WRVHE9/data/chr_distro_table.txt.gz /tmp/weirathe.WRVHE9/plots/covgraph.pdf
Making chr depth plots
Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_depthmap.r /tmp/weirathe.WRVHE9/temp/depth.coverage-strata.sorted.bed.gz /tmp/weirathe.WRVHE9/data/chrlens.txt /tmp/weirathe.WRVHE9/temp/coverage-strata.key /tmp/weirathe.WRVHE9/plots/perchrdepth.png
Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_depthmap.r /tmp/weirathe.WRVHE9/temp/depth.coverage-strata.sorted.bed.gz /tmp/weirathe.WRVHE9/data/chrlens.txt /tmp/weirathe.WRVHE9/temp/coverage-strata.key /tmp/weirathe.WRVHE9/plots/perchrdepth.pdf
Get the exon distributions
/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/gpd_to_exon_distro.py /tmp/weirathe.WRVHE9/data/best.sorted.gpd.gz -o /tmp/weirathe.WRVHE9/data/exon_size_distro.txt.gz --threads 10
Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_exon_distro.r /tmp/weirathe.WRVHE9/data/exon_size_distro.txt.gz /tmp/weirathe.WRVHE9/plots/exon_size_distro.png
Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_exon_distro.r /tmp/weirathe.WRVHE9/data/exon_size_distro.txt.gz /tmp/weirathe.WRVHE9/plots/exon_size_distro.pdf
Make a UCSC genome browser compatible bed file
gpd_to_UCSC_bed12.py --headername aln.bam:best /tmp/weirathe.WRVHE9/data/best.sorted.gpd.gz -o /tmp/weirathe.WRVHE9/data/best.sorted.bed.gz --color red
gpd_to_UCSC_bed12.py --headername aln.bam:trans-chimera /tmp/weirathe.WRVHE9/data/chimera.gpd.gz -o /tmp/weirathe.WRVHE9/data/chimera.bed.gz --color blue
gpd_to_UCSC_bed12.py --headername aln.bam:gapped /tmp/weirathe.WRVHE9/data/gapped.gpd.gz -o /tmp/weirathe.WRVHE9/data/gapped.bed.gz --color orange
gpd_to_UCSC_bed12.py --headername aln.bam:self-chimera /tmp/weirathe.WRVHE9/data/technical_chimeras.gpd.gz -o /tmp/weirathe.WRVHE9/data/technical_chimeras.bed.gz --color green
gpd_to_UCSC_bed12.py --headername aln.bam:self-atypical /tmp/weirathe.WRVHE9/data/technical_atypical_chimeras.gpd.gz -o /tmp/weirathe.WRVHE9/data/technical_atypical_chimeras.bed.gz --color purple
Making context plot
/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/bam_to_context_error_plot.py aln.bam -r ../Mus_musculus.GRCm38.cdna.all.fa --target --output_raw /tmp/weirathe.WRVHE9/data/context_error_data.txt -o /tmp/weirathe.WRVHE9/plots/context_plot.png /tmp/weirathe.WRVHE9/plots/context_plot.pdf --rscript_path Rscript --random --specific_tempdir /tmp/weirathe.WRVHE9/temp --stopping_point 5000 --input_index /tmp/weirathe.WRVHE9/temp/myindex.bgi
Reading reference fasta
Reading index
407 alignments, 2187 min context coverage
476 alignments, 2187 min context coverage

Killed
-bash-4.2$

and the only output that I see in my directory is Rplots.pdf

@jason-weirather
Copy link
Owner

Hi @rojinsafavi
Sorry for the delay in my response. My multithreading in AlignQC is all-in-all not fantastic. You get reasonable speed ups for a few segments of the pipeline, but the memory requirements skyrocket because I'm not using any shared memory optimizations. I recommend either running as a single thread or running on a very high memory computer with more threads, but going up to more and more threads will only make the memory issues worse. When you run a single thread I try to avoid using multiprocessing calls so the error logs are little more meaningful too. Sometimes if a bug or input error occurs with multiprocessing it can be hard to get the actual error message. Can you try running on a single thread and see if you still have a problem?

@rojinsafavi
Copy link
Author

Thanks Jason! I will run it on a single thread and will report the result to you if I still get an error

@rojinsafavi
Copy link
Author

Hi Jason,

I ran it again with 1 thread, and I got the same error (OSError: [Errno 12] Cannot allocate memory). I have to mention that I'm only testing only 15 fast5 files here ( just for testing purposes).

-bash-4.2$ alignqc analyze trial-fast5/aln.bam -g Mus_musculus.GRCm38.cdna.all.fa -t UCSC_Main_on_Mouse__all_mrna.gtf.gz -o report.xhtml --portable_output report.portable.xhtml --output_folder alignqc-output --threads 1

Using Rscript version:
R scripting front-end version 3.3.2 (2016-10-31)
Creating initial alignment mapping data
/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/bam_preprocess.py trial-fast5/aln.bam --minimum_intron_size 68 -o /tmp/weirathe.3ytJfG/temp/alndata.txt.gz --threads 1 --specific_tempdir /tmp/weirathe.3ytJfG/temp/
read basics

check for best set
0/33
combining results
33
Traverse bam for alignment analysis
/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/traverse_preprocessed.py /tmp/weirathe.3ytJfG/temp/alndata.txt.gz -o /tmp/weirathe.3ytJfG/data/ --specific_tempdir /tmp/weirathe.3ytJfG/temp/ --threads 1 --min_aligned_bases 50 --max_query_overlap 10 --max_target_overlap 10 --max_target_gap 500000 --required_fractional_improvement 0.2
33 alignments 15 reads
Writing chromosome lengths from header
/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/bam_to_chr_lengths.py trial-fast5/aln.bam -o /tmp/weirathe.3ytJfG/data/chrlens.txt
Can we find any known read types
/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/get_platform_report.py /tmp/weirathe.3ytJfG/data/lengths.txt.gz /tmp/weirathe.3ytJfG/data/special_report
Go through genepred best alignments and make a bed depth file
Generate the depth bed for the mapped reads
gpd_to_bed_depth.py /tmp/weirathe.3ytJfG/data/best.sorted.gpd.gz -o /tmp/weirathe.3ytJfG/data/depth.sorted.bed.gz --threads 1
Stratify the depth to make it plot quicker and cleaner

Get ready for alignment plot
Make alignment plots
/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/make_alignment_plot.py /tmp/weirathe.3ytJfG/data/lengths.txt.gz --rscript_path Rscript --output_stats /tmp/weirathe.3ytJfG/data/alignment_stats.txt --output /tmp/weirathe.3ytJfG/plots/alignments.png /tmp/weirathe.3ytJfG/plots/alignments.pdf
making plot
Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_gapped_alignment_statistics.r /tmp/weirathe.3ytJfG/data/lengths.txt.gz /tmp/weirathe.3ytJfG/plots/alignments.png
null device
1
Warning messages:
1: In png(infile, bg = "#FFFFFF") :
unable to load shared object '/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/R/library/grDevices/libs//cairo.so':
libjpeg.so.8: cannot open shared object file: No such file or directory
2: In png(infile, bg = "#FFFFFF") : failed to load cairo DLL
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
5: In min(x) : no non-missing arguments to min; returning Inf
6: In max(x) : no non-missing arguments to max; returning -Inf
7: In min(x) : no non-missing arguments to min; returning Inf
8: In max(x) : no non-missing arguments to max; returning -Inf
Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_gapped_alignment_statistics.r /tmp/weirathe.3ytJfG/data/lengths.txt.gz /tmp/weirathe.3ytJfG/plots/alignments.pdf
null device
1
Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
5: In min(x) : no non-missing arguments to min; returning Inf
6: In max(x) : no non-missing arguments to max; returning -Inf
Finished.
Making depth reports
/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/depth_to_coverage_report.py /tmp/weirathe.3ytJfG/data/depth.sorted.bed.gz /tmp/weirathe.3ytJfG/data/chrlens.txt -o /tmp/weirathe.3ytJfG/data
203852380
13150
Making coverage plots
Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_chr_depth.r /tmp/weirathe.3ytJfG/data/line_plot_table.txt.gz /tmp/weirathe.3ytJfG/data/total_distro_table.txt.gz /tmp/weirathe.3ytJfG/data/chr_distro_table.txt.gz /tmp/weirathe.3ytJfG/plots/covgraph.png
Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_chr_depth.r /tmp/weirathe.3ytJfG/data/line_plot_table.txt.gz /tmp/weirathe.3ytJfG/data/total_distro_table.txt.gz /tmp/weirathe.3ytJfG/data/chr_distro_table.txt.gz /tmp/weirathe.3ytJfG/plots/covgraph.pdf
Making chr depth plots
Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_depthmap.r /tmp/weirathe.3ytJfG/temp/depth.coverage-strata.sorted.bed.gz /tmp/weirathe.3ytJfG/data/chrlens.txt /tmp/weirathe.3ytJfG/temp/coverage-strata.key /tmp/weirathe.3ytJfG/plots/perchrdepth.png
Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_depthmap.r /tmp/weirathe.3ytJfG/temp/depth.coverage-strata.sorted.bed.gz /tmp/weirathe.3ytJfG/data/chrlens.txt /tmp/weirathe.3ytJfG/temp/coverage-strata.key /tmp/weirathe.3ytJfG/plots/perchrdepth.pdf
Get the exon distributions
/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/gpd_to_exon_distro.py /tmp/weirathe.3ytJfG/data/best.sorted.gpd.gz -o /tmp/weirathe.3ytJfG/data/exon_size_distro.txt.gz --threads 1
Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_exon_distro.r /tmp/weirathe.3ytJfG/data/exon_size_distro.txt.gz /tmp/weirathe.3ytJfG/plots/exon_size_distro.png
Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_exon_distro.r /tmp/weirathe.3ytJfG/data/exon_size_distro.txt.gz /tmp/weirathe.3ytJfG/plots/exon_size_distro.pdf
Make a UCSC genome browser compatible bed file
gpd_to_UCSC_bed12.py --headername trial-fast5/aln.bam:best /tmp/weirathe.3ytJfG/data/best.sorted.gpd.gz -o /tmp/weirathe.3ytJfG/data/best.sorted.bed.gz --color red
gpd_to_UCSC_bed12.py --headername trial-fast5/aln.bam:trans-chimera /tmp/weirathe.3ytJfG/data/chimera.gpd.gz -o /tmp/weirathe.3ytJfG/data/chimera.bed.gz --color blue
gpd_to_UCSC_bed12.py --headername trial-fast5/aln.bam:gapped /tmp/weirathe.3ytJfG/data/gapped.gpd.gz -o /tmp/weirathe.3ytJfG/data/gapped.bed.gz --color orange
gpd_to_UCSC_bed12.py --headername trial-fast5/aln.bam:self-chimera /tmp/weirathe.3ytJfG/data/technical_chimeras.gpd.gz -o /tmp/weirathe.3ytJfG/data/technical_chimeras.bed.gz --color green
gpd_to_UCSC_bed12.py --headername trial-fast5/aln.bam:self-atypical /tmp/weirathe.3ytJfG/data/technical_atypical_chimeras.gpd.gz -o /tmp/weirathe.3ytJfG/data/technical_atypical_chimeras.bed.gz --color purple
Making context plot
/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/bam_to_context_error_plot.py trial-fast5/aln.bam -r Mus_musculus.GRCm38.cdna.all.fa --target --output_raw /tmp/weirathe.3ytJfG/data/context_error_data.txt -o /tmp/weirathe.3ytJfG/plots/context_plot.png /tmp/weirathe.3ytJfG/plots/context_plot.pdf --rscript_path Rscript --random --specific_tempdir /tmp/weirathe.3ytJfG/temp --stopping_point 5000 --input_index /tmp/weirathe.3ytJfG/temp/myindex.bgi
Reading reference fasta
Reading index
467 alignments, 1972 min context coverage

536 alignments, 2499 min context coverage

546 alignments, 2499 min context coverage

Exception in thread Thread-4:ext coverage
Traceback (most recent call last):
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 326, in _handle_workers
pool._maintain_pool()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 230, in _maintain_pool
self._repopulate_pool()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 223, in _repopulate_pool
w.start()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/process.py", line 130, in start
self._popen = Popen(self)
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/forking.py", line 121, in init
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

Exception in thread Thread-1:ext coverage
Traceback (most recent call last):
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 326, in _handle_workers
pool._maintain_pool()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 230, in _maintain_pool
self._repopulate_pool()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 223, in _repopulate_pool
w.start()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/process.py", line 130, in start
self._popen = Popen(self)
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/forking.py", line 121, in init
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

Killedignments, 3072 min context coverage
-bash-4.2$
-bash-4.2$
-bash-4.2$

@jason-weirather
Copy link
Owner

Thanks for the error details. Sampling the context errors can also be a little unfriendly when it comes to memory and that looks like where you are running into trouble. The easiest way to fix this it to sample less.

--context_error_stopping_point 1000

will reduce the depth of sampling that is done (I think default is 2500), but the plot it generates should be pretty representative (unless your error rates are super-low).

@rojinsafavi
Copy link
Author

Thanks Jason, I think the main reason that I was getting that error was because I was using toplevel reference genome. I was able to overcome that issue by using gencode.vM16.primary_assembly.annotation.gtf.gz and GRCm38.primary_assembly.genome.fa. But now I'm getting the same error as this issue : #6

Sorting in reference genePred
133000
reading read genepred
stream loci
Traceback (most recent call last):
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/bin/alignqc", line 11, in
load_entry_point('AlignQC==2.0.5', 'console_scripts', 'alignqc')()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/alignqc.py", line 47, in entry_point
main(args,operable_argv)
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/alignqc.py", line 17, in main
analyze.external_cmd(operable_argv,version=version)
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/analyze.py", line 88, in external_cmd
main(args)
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/analyze.py", line 54, in main
prepare_all_data.external(args)
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/prepare_all_data.py", line 844, in external
main(args)
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/prepare_all_data.py", line 69, in main
make_data_bam_annotation(args)
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/prepare_all_data.py", line 725, in make_data_bam_annotation
annotated_read_bias_analysis.external_cmd(cmd)
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/annotated_read_bias_analysis.py", line 341, in external_cmd
main(args)
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/annotated_read_bias_analysis.py", line 41, in main
for l in mls:
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/seq_tools-1.0.10-py2.7.egg/seqtools/stream.py", line 240, in next
r = self.read_entry()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/seq_tools-1.0.10-py2.7.egg/seqtools/stream.py", line 263, in read_entry
try: self._buffers[i] = self._streams[i].next()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/seq_tools-1.0.10-py2.7.egg/seqtools/stream.py", line 178, in next
r = self.read_entry()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/seq_tools-1.0.10-py2.7.egg/seqtools/stream.py", line 195, in read_entry
e = self._stream.next()
File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/seq_tools-1.0.10-py2.7.egg/seqtools/stream.py", line 147, in next
raise ValueError('Expected lines to be ordered but they appear not to be ordered on line '+str(self._ln))
ValueError: Expected lines to be ordered but they appear not to be ordered on line 133849

since you asked for the GFT file, I'm gonna attach it here:

ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M16/gencode.vM16.primary_assembly.annotation.gtf.gz

@rojinsafavi rojinsafavi changed the title allocation error ValueError Dec 19, 2017
@rojinsafavi
Copy link
Author

Hi Jason,
Any updates on this issue?
Kind regards,
Rojin

@jason-weirather
Copy link
Owner

Hi @rojinsafavi Sorry for the delay. I'm a busy these days so if lose track of these I appreciate getting the reminder :) It looks like a problem streaming the data. Is your alignment file sorted by genomic position? If they are ... I have a second more complicated problem that this may be due to. If they are supported by position, do you know if the index of your chromosomes are in alphabetical order? I notice that different aligners have different behaviors when it comes to sorting and sometimes they sort chromosomes alphabetically ... sometimes they do other things. And i may be making the alphabetical assumption in the ordering-check. Something you can try is my sort tool thats in seqtools

seq-tools sort --bam yourbam -o newbam

If this is the cause I may rethink my order check because I don't want to require another sort before running. Thanks for your help in figuring this out.

@rojinsafavi
Copy link
Author

Thanks Jason!
So I'm using minimap2, which outputs a sam file, and then I use samtools to convert the sam to bam. I attached both sam and bam here.
aln.zip
I looked at the sam file, and it seems that it is not sorted, I tried sorting the sam file using samtools, and run it again but I got the same error. I also used the command that you gave me, but the new bamfaile gave the same error again

@jason-weirather
Copy link
Owner

Thanks for posting the file. Problem with samtools sort ... well some would probably consider it a feature not a problem, is that the order of chromosomes in the file is defined by the samfile header and its not actually sorted by any criteria other than the samfile header. So my assumption of order based on alphabet doesn't work if the sam header was sorted otherwise. Since samtools is the gold standard for sam management I'd be better off changing my sorting behavior to fit its convention. In the meantime I'd suggest you use my sorting function I mentioned above. it uses samtools sort to do the sort, but before that, it sorts the header to be sorted. You can see the difference if you do a samtools sort and a seq-tools sort --bam, and then you inspect the outputs with samtools view -H to look at the header. I'll open an issue to change this sorting behavior, but I'm not 100% sure this is causing your problem, so I recommend you resort your file with the seq-tools sort and give it a try for now. Thanks!

@jason-weirather
Copy link
Owner

Hi @rojinsafavi I did some testing this morning to try to narrow down the problem. What I expected to be an issued turned out not to be. I guess my sorts at the beginning of the run mitigate that problem, so I closed the issue I had opened on that. I used the test files you sent me. ... the mouse transcriptome and the aln.bam, and I was able to generate to run without error both on my mac os computer and from the docker.

With those files in a Test subdirectory the command can be run like so:

docker run -v $(pwd)/Test:/Test -t vacation/alignqc alignqc analyze --no_genome -t /Test/gencode.vM16.primary_assembly.annotation.gtf.gz /Test/aln.bam --specific_tempdir /Test/mytemp3 -o /Test/mytest3.xhtml

Test Data Output

Next steps I would suggest is that you also try running from the Docker, and see if you still get the same error you reported before, with the test data you sent me. If not, I will probably need some test data that I can use to replicate the error, and then I can track down more whats going on. Also open to any suggestion if you think you have any idea what is the cause, but sometimes it can be hard to track down without being able to replicate. Thanks!

@jason-weirather
Copy link
Owner

Hi @rojinsafavi Just following up because I was looking through my issues for anything else I can help with and I saw another conversation i had where a streaming error occured with genepred .gpd format files as transcriptome references. These happened because my parser does not deal with periods . symbols in the CDS start and stops. If this could be causing you problems, I recommend you configure your transcriptome reference input to be a GTF format. My treatment of GTF format should not have this same problem. I'm going to update the documentation to recommend GTF format to hopefully help anyone else that may encounter this.

@rojinsafavi
Copy link
Author

Hi Jason,
I installed AlignQC on my mac by :

  1. cloning the git
  2. and then running python setup.py install

and I was able to reproduce the result you made

For some reasons the same thing won't works on our server, and It gave me the error that I sent you. But I was able to save the plots by providing a temp dict ( only for a small subsample)

Our server does not let us use docker installation, I have to check with the admin to see if they can install it on our sever. But I did conda installation and again I got the same error ( but was able to save the plots in the temp dict for that small subsample)

But, I was not able to even save the plots for all my data ( I have about 400,000 fast5 files), how many files do you usually use to get a good approximation? I think I might be using too many files!

Kind regards,
Rojin

@rojinsafavi
Copy link
Author

Hi Jason!

So I was not able to run alignQC on our linux servers, but I managed to run it on my mac (I installed it through pip), it took a while but it worked with no error. Thanks alot for the help, and hope you have a great holiday!
Best,
Rojin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants