Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large memory request #13

Closed
holtjma opened this issue Jul 28, 2021 · 8 comments
Closed

Large memory request #13

holtjma opened this issue Jul 28, 2021 · 8 comments

Comments

@holtjma
Copy link

holtjma commented Jul 28, 2021

Hello,

I ran into an issue while running a basic test with dysgu that seems to be allocating an extremely large array for some reason. The error is below (redacting some file paths):

2021-07-28 08:59:57,769 [INFO   ]  [dysgu-run] Version: 1.2.7
2021-07-28 08:59:57,770 [INFO   ]  run -o output.vcf --mode pe --pl pe /cluster/home/jholt/reference/hg38_asm5_alt/hg38.fa ./working_dir <redacted>/pipeline/merged_alignments/hg38_asm5_alt/sentieon-202010.02/HALB3002753.bam
2021-07-28 08:59:57,770 [INFO   ]  Destination: ./working_dir
2021-07-28 09:43:45,827 [INFO   ]  dysgu fetch <redacted>/pipeline/merged_alignments/hg38_asm5_alt/sentieon-202010.02/HALB3002753.bam written to ./working_dir/HALB3002753.dysgu_reads.bam, n=65472206, time=0:43:48 h:m:s
2021-07-28 09:43:45,827 [INFO   ]  Input file is: ./working_dir/HALB3002753.dysgu_reads.bam
2021-07-28 09:43:48,444 [INFO   ]  Sample name: HALB3002753
2021-07-28 09:43:48,444 [INFO   ]  Writing SVs to output.vcf
2021-07-28 09:43:48,446 [INFO   ]  Running pipeline
2021-07-28 09:43:49,103 [INFO   ]  Removed 34 outliers with insert size >= 903.0
2021-07-28 09:43:49,124 [INFO   ]  Inferred read length 151.0, insert median 391, insert stdev 99
2021-07-28 09:43:49,125 [INFO   ]  Max clustering dist 886
2021-07-28 09:43:49,126 [INFO   ]  Minimum support 3
2021-07-28 09:43:49,126 [INFO   ]  Building graph with clustering distance 886 bp, scope length 886 bp
2021-07-28 10:01:48,141 [INFO   ]  Total input reads 63689262
2021-07-28 10:02:50,425 [INFO   ]  Graph constructed
(315,)
(132,)
(array([[1.320000e+02, 1.870000e+02],
       [1.230000e+02, 1.980000e+02],
       [1.400000e+02, 2.810000e+02],
       ...,
       [1.496422e+06, 1.496339e+06],
       [1.496425e+06, 1.496419e+06],
       [1.496474e+06, 1.496389e+06]]),)
Traceback (most recent call last):
  File "dysgu/call_component.pyx", line 663, in dysgu.call_component.partition_single
  File "<redacted>/miniconda3/envs/dysgu_test/lib/python3.9/site-packages/scipy/cluster/hierarchy.py", line 1064, in linkage
    if not np.all(np.isfinite(y)):
numpy.core._exceptions.MemoryError: Unable to allocate 41.6 GiB for an array with shape (44717395096,) and data type bool

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<redacted>/miniconda3/envs/dysgu_test/bin/dysgu", line 8, in <module>
    sys.exit(cli())
  File "<redacted>/miniconda3/envs/dysgu_test/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "<redacted>/miniconda3/envs/dysgu_test/lib/python3.9/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "<redacted>/miniconda3/envs/dysgu_test/lib/python3.9/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "<redacted>/miniconda3/envs/dysgu_test/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "<redacted>/miniconda3/envs/dysgu_test/lib/python3.9/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "<redacted>/miniconda3/envs/dysgu_test/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "<redacted>/miniconda3/envs/dysgu_test/lib/python3.9/site-packages/dysgu/main.py", line 252, in run_pipeline
    cluster.cluster_reads(ctx.obj)
  File "dysgu/cluster.pyx", line 1110, in dysgu.cluster.cluster_reads
  File "dysgu/cluster.pyx", line 901, in dysgu.cluster.pipe1
  File "dysgu/cluster.pyx", line 653, in dysgu.cluster.component_job
  File "dysgu/call_component.pyx", line 1747, in dysgu.call_component.call_from_block_model
  File "dysgu/call_component.pyx", line 1755, in dysgu.call_component.call_from_block_model
  File "dysgu/call_component.pyx", line 1736, in dysgu.call_component.multi
  File "dysgu/call_component.pyx", line 874, in dysgu.call_component.single
  File "dysgu/call_component.pyx", line 668, in dysgu.call_component.partition_single
  File "<redacted>/miniconda3/envs/dysgu_test/lib/python3.9/site-packages/scipy/cluster/hierarchy.py", line 1060, in linkage
    y = distance.pdist(y, metric)
  File "<redacted>/miniconda3/envs/dysgu_test/lib/python3.9/site-packages/scipy/spatial/distance.py", line 2250, in pdist
    return pdist_fn(X, out=out, **kwargs)
numpy.core._exceptions.MemoryError: Unable to allocate 333. GiB for an array with shape (44717395096,) and data type float64

It seems like the memory requirements should be much lower according to the docs. Any suggestions?

@kcleal
Copy link
Owner

kcleal commented Aug 16, 2021

Hello Matt,

Thanks for reporting this. It looks as though a non-canonical chromosome is causing a problem - the target id (tid) of the problem chromsome appears to be 315. It looks like a large dispersed cluster was formed by reads on that chromosome. I will sort out a fix for this to stop this happening.

@holtjma
Copy link
Author

holtjma commented Aug 16, 2021

Ah, that would make sense. Is there a way for me to limit the chromosomes that are actually tested? I could use that as a workaround, and to verify that it's the issue as well.

@kcleal
Copy link
Owner

kcleal commented Aug 17, 2021

Hi Matt,

I have added a fix which I hope should resolve the issue. You can test by building from source via git clone --recursive https://github.com/kcleal/dysgu.git; cd dysgu; bash INSTALL.sh.
As of v1.2.8 there is also better support for specifying regions of interest/exclude regions. For example you can use a bed file with target chromosomes using --search target_chroms.bed

@holtjma
Copy link
Author

holtjma commented Aug 17, 2021

Hmm, I got an error installing this way (previously, I believe I just used conda and pip). It seemed to make it through all the htslib stuff and finish the dysgu dependencies before throwing the error. Relevant snippet is below:

Using /cluster/home/jholt/githubDL/miniconda3/envs/dysgu/lib/python3.9/site-packages
Finished processing dependencies for dysgu==1.2.8
Traceback (most recent call last):
  File "/cluster/home/jholt/githubDL/miniconda3/envs/dysgu/bin/dysgu", line 33, in <module>
    sys.exit(load_entry_point('dysgu==1.2.8', 'console_scripts', 'dysgu')())
  File "/cluster/home/jholt/githubDL/miniconda3/envs/dysgu/bin/dysgu", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/cluster/home/jholt/githubDL/miniconda3/envs/dysgu/lib/python3.9/importlib/metadata.py", line 77, in load
    module = import_module(match.group('module'))
  File "/cluster/home/jholt/githubDL/miniconda3/envs/dysgu/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/cluster/home/jholt/githubDL/dysgu/dysgu/main.py", line 11, in <module>
    from dysgu import cluster, view, sv2bam
  File "/cluster/home/jholt/githubDL/dysgu/dysgu/view.py", line 11, in <module>
    from dysgu import io_funcs, cluster
ImportError: cannot import name 'io_funcs' from 'dysgu' (/cluster/home/jholt/githubDL/dysgu/dysgu/__init__.py)

I see the io_funcs.cpp and io_funcs.pyx in the dysgu/dysgu folder, so I'm not sure what the issue is. Suggestions on resolving that?

@kcleal
Copy link
Owner

kcleal commented Aug 17, 2021

Import errors can be a pain to debug in my experience. Are you using an env? The paths /cluster/home/jholt/githubDL/miniconda3/envs/dysgu/bin/dysgu and /cluster/home/jholt/githubDL/dysgu/dysgu/__init__.py look a bit different. I will upload the patch to pypi for installing with pip, should be available tomorrow

@holtjma
Copy link
Author

holtjma commented Aug 17, 2021

Yea, I'm using a conda env and following your install from GitHub instructions. I wouldn't be surprised if there's a mismatch there. I'll just wait until it's on pypi and try it that way.

@kcleal
Copy link
Owner

kcleal commented Aug 17, 2021

v1.2.8 should be on pypi now

@kcleal kcleal closed this as completed Aug 19, 2021
@holtjma
Copy link
Author

holtjma commented Aug 19, 2021

Just following up, it did finish this time and without the major memory issue, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants