Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linkage 'Z' contains negative distances. #2804

Closed
2 of 3 tasks
metoru opened this issue Jan 11, 2024 · 5 comments · Fixed by #2928
Closed
2 of 3 tasks

Linkage 'Z' contains negative distances. #2804

metoru opened this issue Jan 11, 2024 · 5 comments · Fixed by #2928
Labels

Comments

@metoru
Copy link

metoru commented Jan 11, 2024

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of scanpy.
  • (optional) I have confirmed this bug exists on the master branch of scanpy.

What happened?

I'm encountering an error when running the sc.pl.rank_genes_groups_heatmap function in the scanpy package. The error message is "Linkage 'Z' contains negative distances." What could be causing this error and how can I fix it?

Minimal code sample

sc.pl.rank_genes_groups_heatmap(adata,  n_genes=10, groupby='clusters',show_gene_labels=True,save='cluster.markers.heatmap.svg')

Error output

sc.pl.rank_genes_groups_heatmap(adata,  n_genes=10, groupby=cluster,show_gene_labels=True,save=(id+'_processed.top10.cluster.markers.heatmap.svg'))
  File "/opt/conda/envs/st/lib/python3.8/site-packages/scanpy/plotting/_tools/__init__.py", line 673, in rank_genes_groups_heatmap
    return _rank_genes_groups_plot(
  File "/opt/conda/envs/st/lib/python3.8/site-packages/scanpy/plotting/_tools/__init__.py", line 592, in _rank_genes_groups_plot
    return heatmap(
  File "/opt/conda/envs/st/lib/python3.8/site-packages/scanpy/plotting/_anndata.py", line 1087, in heatmap
    dendro_data = _reorder_categories_after_dendrogram(
  File "/opt/conda/envs/st/lib/python3.8/site-packages/scanpy/plotting/_anndata.py", line 2134, in _reorder_categories_after_dendrogram
    key = _get_dendrogram_key(adata, dendrogram, groupby)
  File "/opt/conda/envs/st/lib/python3.8/site-packages/scanpy/plotting/_anndata.py", line 2236, in _get_dendrogram_key
    dendrogram(adata, groupby, key_added=dendrogram_key)
  File "/opt/conda/envs/st/lib/python3.8/site-packages/scanpy/tools/_dendrogram.py", line 143, in dendrogram
    dendro_info = sch.dendrogram(z_var, labels=list(categories), no_plot=True)
  File "/opt/conda/envs/st/lib/python3.8/site-packages/scipy/cluster/hierarchy.py", line 3301, in dendrogram
    is_valid_linkage(Z, throw=True, name='Z')
  File "/opt/conda/envs/st/lib/python3.8/site-packages/scipy/cluster/hierarchy.py", line 2280, in is_valid_linkage
    raise ValueError('Linkage %scontains negative distances.' %
ValueError: Linkage 'Z' contains negative distances.

Versions

-----
anndata     0.8.0
scanpy      1.9.3
-----
PIL                 9.4.0
asciitree           NA
beta_ufunc          NA
binom_ufunc         NA
cairocffi           1.6.1
cffi                1.15.1
cloudpickle         2.2.1
colorama            0.4.6
cycler              0.10.0
cython_runtime      NA
cytoolz             0.12.0
dask                2022.11.1
dateutil            2.8.2
defusedxml          0.7.1
entrypoints         0.4
fasteners           0.17.3
fsspec              2023.6.0
google              NA
h5py                3.7.0
igraph              0.9.11
jinja2              3.0.3
joblib              1.2.0
kiwisolver          1.4.4
leidenalg           0.8.10
llvmlite            0.39.1
louvain             0.7.1
lz4                 4.3.2
markupsafe          2.1.3
matplotlib          3.5.2
mpl_toolkits        NA
msgpack             1.0.5
natsort             8.2.0
nbinom_ufunc        NA
numba               0.56.4
numcodecs           0.11.0
numexpr             2.8.4
numpy               1.21.6
packaging           23.1
pandas              1.5.3
pkg_resources       NA
psutil              5.9.5
pyarrow             8.0.0
pycparser           2.21
pyparsing           3.1.0
pytz                2023.3
scipy               1.7.3
session_info        1.0.0
setuptools          68.0.0
setuptools_scm      NA
six                 1.16.0
sklearn             1.0.1
snappy              NA
sphinxcontrib       NA
tblib               1.7.0
texttable           1.6.7
threadpoolctl       3.2.0
tlz                 0.12.0
toolz               0.12.0
typing_extensions   NA
wcwidth             0.2.6
yaml                6.0
zarr                2.15.0
zipp                NA
-----
Python 3.8.15 | packaged by conda-forge | (default, Nov 22 2022, 08:46:39) [GCC 10.4.0]
Linux-3.10.0-1127.el7.x86_64-x86_64-with-glibc2.10
-----

@THZ34
Copy link

THZ34 commented Mar 19, 2024

This problem is caused by the inclusion of the floating point number 1 in the corr_matrix. Due to the limited precision of floating point numbers, 1 may be greater than 1 when converted to a binary floating point number and then converted to decimal.
In scanpy/tools/_dendrogram.py,add

corr_matrix = mean_df.T.corr(method=cor_method)
corr_matrix = np.where(corr_matrix > 1, 1, corr_matrix )

to solve this problem

@flying-sheep
Copy link
Member

Thanks for the investigation! @metoru can you try with #2928?

pip install git+https://github.com/scverse/scanpy.git@fix-dendro-corr

@flying-sheep
Copy link
Member

@THZ34 can you create a reproducer where this happens, so I can add a test?

@THZ34
Copy link

THZ34 commented Mar 22, 2024

@THZ34 can you create a reproducer where this happens, so I can add a test?

OK, I've upload the h5ad file to onedrive: https://bioplot-my.sharepoint.com/:u:/g/personal/tanghongzhen_bioplot_onmicrosoft_com/EUbNHPuin5pGuMPrmch6rsQBjHojfikr38EYgZEL4KAZ2A?e=T2YfkO.
The error will reapper in these code:
import anndata
import scanpy as sc
adata = ad.read_h5ad('debug.h5ad')
sc.tl.dendrogram(adata,groupby='leiden')

@flying-sheep
Copy link
Member

flying-sheep commented Mar 22, 2024

OK, smallest reproducer I could come up with:

import scipy.sparse as sp
import numpy as np
import pandas as pd

import scanpy as sc

rep_pca = np.frombuffer(
    b'\xf0\x08\xc1?\xe6E+\xbe\xcaI6\xbf\xf3\xecT\xbeM"\xb6\xbe\xbb\xcee\xbe\x8e\xb1p\xbcU\x95\x8f\xbdn\x9f\x06\xbe\x00\xe9\x19\xbd\xd7s0\xbc\x0593\xbd9$/<U\xc6\x90:;;\\\xbd\x0c\xee\xa1\xbb,{7<\xab\xe8\x92\xbb[r\xd2\xbc\xb4\xaa\x0f\xbd\xddg\x01\xbd\x8a\xfe\xe8<\x00\xd3\x17\xbdE\xa4\xc3\xbc(4;\xbd\x015\xf5<\xe4\xb0\x1c<\xdaQ/\xbd\x92\xe0\xaf;\xc6\xad\xfc\xbc\xcd;?\xbcv\x93\xc0\xbcT\xc1\x969:\xb3\x8b\xbc\xf90\x93\xbc0x\xfd\xbc\xd8\xe02\xbc\x89\xb0\x83\xbcz\xa7+\xbc\x14\xa5_\xbc$ \xe0\xbbI\x9d@\xbc\x07\x9b>\xbd\xed\xf2\x9c<\xbf+b\xbc\x8d\xcc\x9b;\xb0\x84g\xbc\xc6c)\xbc?\xa9\x89\xbc\x8a\x00\xaf:2\x9f\x8a\xbe\xb1\x1d\xb1\xbe\x1d\x95\x80=J\xce\xaa\xbdP\xc7\xdd\xbd\xd5p\r\xbe\xe5\xa7\xd2\xbc\x08\xbf\xdf\xbc\xcc\x82\x14\xbe\x07\x14\x8d\xbds\x82/\xbd\xe0\xc5\x9a\xbc\xc6\xe7\xd1\xbc\xca\x1b\xfb\xbb1\x07a\xbd="k\xbcm\xe0\xc6<\x13\x07\xf7\xbc\x98#\n\xbd)2\x1c\xbdf*`\xbb,[\x8e\xbb_$*\xbdGF\xda\xbc\xcd\xb1\xe0\xbcg*r<\x85.C<\xa5e\x89\xbd\x81.);*\x94\x86\xbc\x18\x040\xbcFN\xf9\xbc\x81\xd6z\xbb\xdc\xbc\xf1\xbc\x1ba\xfe\xbc4\xaa\x0c\xbd E\x93\xbb\xc9\xde\xb0\xbc\xd1ZJ\xbc\xa6\xf6\xeb\xbc0\xdc\xef\xbbb\x96\xee\xbb\xbd\xda\x9b\xbd\xa9\t\xc5<\x0c\xeeE\xbc%\xce\xa5;\xf7\x97\xcf\xbc\xc3,\x96\xbc\xb2}\x94\xbc0\x9e[\xbb2\x9f\x8a\xbe\xb1\x1d\xb1\xbe\x1d\x95\x80=J\xce\xaa\xbdP\xc7\xdd\xbd\xd5p\r\xbe\xe5\xa7\xd2\xbc\x08\xbf\xdf\xbc\xcc\x82\x14\xbe\x07\x14\x8d\xbds\x82/\xbd\xe0\xc5\x9a\xbc\xc6\xe7\xd1\xbc\xca\x1b\xfb\xbb1\x07a\xbd="k\xbcm\xe0\xc6<\x13\x07\xf7\xbc\x98#\n\xbd)2\x1c\xbdf*`\xbb,[\x8e\xbb_$*\xbdGF\xda\xbc\xcd\xb1\xe0\xbcg*r<\x85.C<\xa5e\x89\xbd\x81.);*\x94\x86\xbc\x18\x040\xbcFN\xf9\xbc\x81\xd6z\xbb\xdc\xbc\xf1\xbc\x1ba\xfe\xbc4\xaa\x0c\xbd E\x93\xbb\xc9\xde\xb0\xbc\xd1ZJ\xbc\xa6\xf6\xeb\xbc0\xdc\xef\xbbb\x96\xee\xbb\xbd\xda\x9b\xbd\xa9\t\xc5<\x0c\xeeE\xbc%\xce\xa5;\xf7\x97\xcf\xbc\xc3,\x96\xbc\xb2}\x94\xbc0\x9e[\xbb2\x9f\x8a\xbe\xb1\x1d\xb1\xbe\x1d\x95\x80=J\xce\xaa\xbdP\xc7\xdd\xbd\xd5p\r\xbe\xe5\xa7\xd2\xbc\x08\xbf\xdf\xbc\xcc\x82\x14\xbe\x07\x14\x8d\xbds\x82/\xbd\xe0\xc5\x9a\xbc\xc6\xe7\xd1\xbc\xca\x1b\xfb\xbb1\x07a\xbd="k\xbcm\xe0\xc6<\x13\x07\xf7\xbc\x98#\n\xbd)2\x1c\xbdf*`\xbb,[\x8e\xbb_$*\xbdGF\xda\xbc\xcd\xb1\xe0\xbcg*r<\x85.C<\xa5e\x89\xbd\x81.);*\x94\x86\xbc\x18\x040\xbcFN\xf9\xbc\x81\xd6z\xbb\xdc\xbc\xf1\xbc\x1ba\xfe\xbc4\xaa\x0c\xbd E\x93\xbb\xc9\xde\xb0\xbc\xd1ZJ\xbc\xa6\xf6\xeb\xbc0\xdc\xef\xbbb\x96\xee\xbb\xbd\xda\x9b\xbd\xa9\t\xc5<\x0c\xeeE\xbc%\xce\xa5;\xf7\x97\xcf\xbc\xc3,\x96\xbc\xb2}\x94\xbc0\x9e[\xbb2\x9f\x8a\xbe\xb1\x1d\xb1\xbe\x1d\x95\x80=J\xce\xaa\xbdP\xc7\xdd\xbd\xd5p\r\xbe\xe5\xa7\xd2\xbc\x08\xbf\xdf\xbc\xcc\x82\x14\xbe\x07\x14\x8d\xbds\x82/\xbd\xe0\xc5\x9a\xbc\xc6\xe7\xd1\xbc\xca\x1b\xfb\xbb1\x07a\xbd="k\xbcm\xe0\xc6<\x13\x07\xf7\xbc\x98#\n\xbd)2\x1c\xbdf*`\xbb,[\x8e\xbb_$*\xbdGF\xda\xbc\xcd\xb1\xe0\xbcg*r<\x85.C<\xa5e\x89\xbd\x81.);*\x94\x86\xbc\x18\x040\xbcFN\xf9\xbc\x81\xd6z\xbb\xdc\xbc\xf1\xbc\x1ba\xfe\xbc4\xaa\x0c\xbd E\x93\xbb\xc9\xde\xb0\xbc\xd1ZJ\xbc\xa6\xf6\xeb\xbc0\xdc\xef\xbbb\x96\xee\xbb\xbd\xda\x9b\xbd\xa9\t\xc5<\x0c\xeeE\xbc%\xce\xa5;\xf7\x97\xcf\xbc\xc3,\x96\xbc\xb2}\x94\xbc0\x9e[\xbb2\x9f\x8a\xbe\xb1\x1d\xb1\xbe\x1d\x95\x80=J\xce\xaa\xbdP\xc7\xdd\xbd\xd5p\r\xbe\xe5\xa7\xd2\xbc\x08\xbf\xdf\xbc\xcc\x82\x14\xbe\x07\x14\x8d\xbds\x82/\xbd\xe0\xc5\x9a\xbc\xc6\xe7\xd1\xbc\xca\x1b\xfb\xbb1\x07a\xbd="k\xbcm\xe0\xc6<\x13\x07\xf7\xbc\x98#\n\xbd)2\x1c\xbdf*`\xbb,[\x8e\xbb_$*\xbdGF\xda\xbc\xcd\xb1\xe0\xbcg*r<\x85.C<\xa5e\x89\xbd\x81.);*\x94\x86\xbc\x18\x040\xbcFN\xf9\xbc\x81\xd6z\xbb\xdc\xbc\xf1\xbc\x1ba\xfe\xbc4\xaa\x0c\xbd E\x93\xbb\xc9\xde\xb0\xbc\xd1ZJ\xbc\xa6\xf6\xeb\xbc0\xdc\xef\xbbb\x96\xee\xbb\xbd\xda\x9b\xbd\xa9\t\xc5<\x0c\xeeE\xbc%\xce\xa5;\xf7\x97\xcf\xbc\xc3,\x96\xbc\xb2}\x94\xbc0\x9e[\xbb2\x9f\x8a\xbe\xb1\x1d\xb1\xbe\x1d\x95\x80=J\xce\xaa\xbdP\xc7\xdd\xbd\xd5p\r\xbe\xe5\xa7\xd2\xbc\x08\xbf\xdf\xbc\xcc\x82\x14\xbe\x07\x14\x8d\xbds\x82/\xbd\xe0\xc5\x9a\xbc\xc6\xe7\xd1\xbc\xca\x1b\xfb\xbb1\x07a\xbd="k\xbcm\xe0\xc6<\x13\x07\xf7\xbc\x98#\n\xbd)2\x1c\xbdf*`\xbb,[\x8e\xbb_$*\xbdGF\xda\xbc\xcd\xb1\xe0\xbcg*r<\x85.C<\xa5e\x89\xbd\x81.);*\x94\x86\xbc\x18\x040\xbcFN\xf9\xbc\x81\xd6z\xbb\xdc\xbc\xf1\xbc\x1ba\xfe\xbc4\xaa\x0c\xbd E\x93\xbb\xc9\xde\xb0\xbc\xd1ZJ\xbc\xa6\xf6\xeb\xbc0\xdc\xef\xbbb\x96\xee\xbb\xbd\xda\x9b\xbd\xa9\t\xc5<\x0c\xeeE\xbc%\xce\xa5;\xf7\x97\xcf\xbc\xc3,\x96\xbc\xb2}\x94\xbc0\x9e[\xbb',
    dtype=np.float32,
).reshape((7, 50))

rep = sc.AnnData(
    sp.csr_matrix(
        (
            np.array([1.2762934659055623, 1.6916760106710726, 1.6916760106710726]),
            np.array([12, 5, 44]),
            np.array([0, 0, 0, 0, 0, 1, 2, 3]),
        ),
        shape=(7, 51),
    ),
    dict(leiden=pd.Categorical(["372", "366", "357", "357", "357", "357", "357"])),
    obsm=dict(X_pca=np.array(rep_pca, dtype=np.float32)),
)
sc.tl.dendrogram(rep, groupby="leiden")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants