Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryError: Unable to allocate 192. GiB for an array with shape (160646, 160646) and data type float64 #48

Open
mutantjoo0 opened this issue Apr 14, 2020 · 5 comments

Comments

@mutantjoo0
Copy link

Hello BinSanity developers and community,

I am trying to bin contigs with BinSanity using anvi'o v6.2 and got error which seems like I have a memory issue as shown below. I ran a command anvi-cluster-contigs -p PROFILE.db -c ../../GR_contigs.db -C binsanity --driver binsanity --just-do-it. I wonder if there is a solution to avoid this issue. Thank you for your time and consideration.

Stay healthy and safe,
Joo-Young

(anvio-6.2) bash-4.2$ pwd && ls -lh
/mnt/gs18/scratch/users/leejooy5/anvio_2020Feb/GR_profile_DB/GR-merged/binsanity_tmp
total 1.8G
-rw-r----- 1 leejooy5 mmg 1.1K Apr 14 13:37 binsanity-logfile.txt
-rw-r----- 1 leejooy5 mmg  36M Apr 14 13:37 contig_coverages_log_norm.txt
-rw-r----- 1 leejooy5 mmg  31M Apr 14 13:37 contig_coverages.txt
-rw-r----- 1 leejooy5 mmg 2.6K Apr 14 13:37 logs.txt
-rw-r----- 1 leejooy5 mmg 836M Apr 14 13:37 sequence_contigs.fa
-rw-r----- 1 leejooy5 mmg 847M Apr 14 13:37 sequence_splits.fa
-rw-r----- 1 leejooy5 mmg  38M Apr 14 13:37 split_coverages_log_norm.txt
-rw-r----- 1 leejooy5 mmg  33M Apr 14 13:37 split_coverages.txt

(anvio-6.2) bash-4.2$ cat logs.txt
# DATE: 14 Apr 20 13:19:36
# CMD LINE: Binsanity -c /tmp/local/60268981/tmp1u3por0d/contig_coverages_log_norm.txt -f /tmp/local/60268981/tmp1u3por0d -l sequence_contigs.fa -o /tmp/local/60268981/tmp1u3por0d
Traceback (most recent call last):
  File "/mnt/home/leejooy5/miniconda3/bin/Binsanity", line 219, in <module>
    args.preference, args.inputContigFiles, args.outputdir, args.outname)
  File "/mnt/home/leejooy5/miniconda3/bin/Binsanity", line 63, in affinity_propagation
    convergence), copy=True, preference=int(preference), affinity='euclidean', verbose=False).fit_predict(array)
  File "/mnt/home/leejooy5/miniconda3/lib/python3.7/site-packages/sklearn/cluster/_affinity_propagation.py", line 446, in fit_predict
    return super().fit_predict(X, y)
  File "/mnt/home/leejooy5/miniconda3/lib/python3.7/site-packages/sklearn/base.py", line 462, in fit_predict
    self.fit(X)
  File "/mnt/home/leejooy5/miniconda3/lib/python3.7/site-packages/sklearn/cluster/_affinity_propagation.py", line 381, in fit
    self.affinity_matrix_ = -euclidean_distances(X, squared=True)
  File "/mnt/home/leejooy5/miniconda3/lib/python3.7/site-packages/sklearn/metrics/pairwise.py", line 303, in euclidean_distances
    distances = - 2 * safe_sparse_dot(X, Y.T, dense_output=True)
  File "/mnt/home/leejooy5/miniconda3/lib/python3.7/site-packages/sklearn/utils/extmath.py", line 151, in safe_sparse_dot
    ret = a @ b
MemoryError: Unable to allocate 192. GiB for an array with shape (160646, 160646) and data type float64

        ******************************************************
        **********************BinSanity***********************
        |____________________________________________________|
        |                                                    |
        |             Computing Coverage Array               |
        |____________________________________________________|

          Preference: -3
          Maximum Iterations: 4000
          Convergence Iterations: 400
          Contig Cut-Off: 1000
          Damping Factor: 0.95
          Coverage File: /tmp/local/60268981/tmp1u3por0d/contig_coverages_log_norm.txt
          Fasta File: sequence_contigs.fa
          Output directory: /tmp/local/60268981/tmp1u3por0d
          logfile: binsanity-logfile.txt
          (160646, 21)

         ______________________________________________________
        |                                                      |
        |                 Clustering Contigs                   |
        |______________________________________________________|

@edgraham
Copy link
Owner

Hello Joo-Young,

This is indeed a memory error from BinSanity wich happens frequently above 100,000 contigs (For example when I have recently run datasets with 400,000 contigs >2kbp I used ~600GB RAM). I think your best bet would actually be to run 'Binsanity-lc' which I believe you'll have to do outside anvi'o and then import the results into anvio manually. You can install BinSanity via conda. When you run it through 'Binsanity-lc' this should help reduce the memory intensity although depending on your system it may or may not be enough to completely alleviate the issue. If that is the case there are some other work arounds we can try, but they may sacrifice some amount of the methods accuracy.

-Elaina

@mutantjoo0
Copy link
Author

Thank you for the answer Elaina,

I will try and keep it posted here how it works.

Stay safe and healthy,
Joo-Young

@rebelwebster
Copy link

Hi

I am having the same error as above. However, I don't want to run Binsanity-lc because I would like to work with the results of just binning by coverage. Could you suggest possible workarounds? Thanks!

@edgraham
Copy link
Owner

Hello,

As soon as I get some time I will add a flag to Binsanity-lc to make this possible. In the meantime the quickest workaround that you can do now (unless you want to do a little coding) is to run 'Binsanity-lc' (You can cancel it after it finishes the K-MEAN clustering step or the Stage 1 clustering). This will produce a directory in the output folder named '[Prefix]-KMEAN-BINS'. You'll want to take these subsetted clusters and use the solo 'Binsanity' script on each individual '.fna' file in the directory '[Prefix]-KMEAN-BINS'. Then collate the results.

Let me know if you have further questions, and I'll work on getting the additional flag added to Binsanity-lc as soon as possible.

-Elaina

@rebelwebster
Copy link

Thanks for the rapid response Elaina. I'll give it a go but it's a little hard to cancel it at the right moment. I don't think my python is good enough to code but would be lovely if you could add a flag (and let me know when you do!). Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants