Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clustermap appears to cast arrays to numpy dtype float64 by default #3596

Closed
matt-sd-watson opened this issue Dec 18, 2023 · 2 comments
Closed
Labels

Comments

@matt-sd-watson
Copy link

I am attempting to run sns.clustermap on an array of size (182440, 34) and continue to get the following error:

numpy.core._exceptions._ArrayMemoryError: Unable to allocate 124. GiB for an array with shape (16642085580,) and data type float64

However, the error persists even after I cast the expression array into a lower byte dtype:

low_mem = anndata.AnnData(np.array(adata.X, dtype=np.float32))
sns.clustermap(low_mem.X)

Is the matrix passed to clustermap always cast into float64, or is this logic occurring in the scipy pdist_fn step?

@mwaskom
Copy link
Owner

mwaskom commented Dec 19, 2023

It would not surprise me to learn that there were a cast to float somewhere. It's really hard to say without seeing a reproducible example / full traceback, but the shape in your error message doesn't correspond to the shape of your data so that makes me think it's part of the scipy algorithm.

I'll also note that a clustermap with 100k rows is going to be misleading in the sense that your monitor will only have 1k or so pixels to show it with, so it's going to get very aggressively downsampled.

@matt-sd-watson
Copy link
Author

This has been addressed in the scipy repo where the casting occurs:
scipy/scipy#19707

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants