Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: The condensed distance matrix must contain only finite values #3

Open
nitishnih opened this issue Jul 16, 2020 · 13 comments

Comments

@nitishnih
Copy link

Hello,

This is error is related to #1. Once that issue was solved and @fbeghini closed it, I reinstalled hclust2 in a conda environment, as follows:

conda create -n hclust
conda activate hclust2
conda config --env --add channels bioconda
conda install --yes hclust2

Using the same merged abundance file mentioned in #1 (created using metaphlan3), I ran the following command:

$ hclust2.py --in merged_abundance_table.txt -l --out heatmap.png

And got the following error:

Traceback (most recent call last):
  File "/opt/conda/envs/hclust2/bin/hclust2.py", line 825, in <module>
    hclust2_main()
  File "/opt/conda/envs/hclust2/bin/hclust2.py", line 805, in hclust2_main
    cl.fhcluster()
  File "/opt/conda/envs/hclust2/bin/hclust2.py", line 386, in fhcluster
    self.fhclusters = sph.linkage(self.f_dm, method=self.args.flinkage)
  File "/opt/conda/envs/hclust2/lib/python3.8/site-packages/scipy/cluster/hierarchy.py", line 1057, in linkage
    raise ValueError("The condensed distance matrix must contain only "
ValueError: The condensed distance matrix must contain only finite values.

This being a different error than before, I assume that hclust2 on bioconda channel had been updated to fix issue #1. In case I was wrong, I followed the advise @fbeghini posted in #1 to manually remove the first line (containing the string #mpa_v30_CHOCOPhlAn_201901) and the column, NCBI_tax_id, but got the same error.

Looking over the matrix in the merged abundance file, it is not immediately clear why the matrix would contain non-finite values.

@fbeghini
Copy link
Member

There're a couple of combinations of rows having lot of 0s and when calculating the correlation both of the numerator and denominator being 0 will result in NaN.

@nitishnih
Copy link
Author

Thanks for the explanation @fbeghini. Does this mean hclust2 cannot process this merged abundance file created by metaphlan? Can this be handled in a different way by the program or do users need a manual workaround?

@fbeghini
Copy link
Member

I'd filter out only the entries on the species level first, and then maybe try a different method for species/feature distance calculation

@BSteel93
Copy link

Hi there,

I'm having the exact same issue as nitishnih when trying to generate a heat map (using hclust2) from a merged abundance table generated by metaphlan3. I also followed the step for altering the abundance table by removing the header and the NCBI_tax_id column. I was just wondering if this issue had been fixed or resolved? Or is the recommended advice to use an alternative species/feature distance calculation method?

@saras224
Copy link

saras224 commented Mar 1, 2021

Kindly Respond to the issue on biobakery help forum regarding the same issue or please explain here itself that how to resolve the error when rows are having 0 values ? how to remove them from the merged_abundance_table_species_table.txt ?
@fbeghini

@ReneKat
Copy link

ReneKat commented Sep 22, 2021

I'm having this issue too. What's the fix?
I am unable to recreate the heatmap in the example. Even when adding the --no_fclustering and --no_sclustering.
Thank you,
Rene

@scleractinia
Copy link

I am also having this issue

@baishengjun
Copy link

It confused me many days.
Does someone have any solutions?
Thanks,
Bai

@kazumaxneo
Copy link

I also encountered this error.
I was able to run successfully when I turned off clustering (--no_fclustering and --no_sclustering).
This error may occur if the samples contain mostly 0's.
You can avoid this by adding a very small value (e.g. 0.01) to all samples compared to the data.

@EricDeveaud
Copy link

maybee you want to check: https://forum.biobakery.org/t/hclust2-py-error-distance-matrix-finite-values/1732/2

I would appreciate if somebody can tell me if the change is valid or not.
Eric

@cjfields
Copy link

cjfields commented Oct 3, 2022

Just a note that I also see this, including with the example data that comes with this repo using the run.sh script. @EricDeveaud's changes (linked above) do seem to progress past the issue, but I run into another downstream problem in matplotlib:

% ./hclust2.py \
    -i examples/HMP-MetaPhlAn/HMP.species.txt \
    -o HMP.sqrt_scale.png \
    --skip_rows 1 \
    --ftop 50 \
    --f_dist_f correlation \
    --s_dist_f braycurtis \
    --cell_aspect_ratio 9 \
    -s --fperc 99 \
    --flabel_size 4 \
    --metadata_rows 2,3,4 \
    --legend_file HMP.sqrt_scale.legend.png \
    --max_flabel_len 100 \
    --metadata_height 0.075 \
    --minv 0.01 \
    --no_slabels \
    --dpi 300 \
    --slinkage complete
Traceback (most recent call last):
  File "/Users/cjfields/research/biotech/swanson/2022-August-metagenome/src/hclust2/./hclust2.py", line 1244, in <module>
    hclust2_main()
  File "/Users/cjfields/research/biotech/swanson/2022-August-metagenome/src/hclust2/./hclust2.py", line 1240, in hclust2_main
    hm.draw()
  File "/Users/cjfields/research/biotech/swanson/2022-August-metagenome/src/hclust2/./hclust2.py", line 1028, in draw
    im = ax_hm.imshow(
  File "/Users/cjfields/miniforge3/lib/python3.9/site-packages/matplotlib/_api/deprecation.py", line 454, in wrapper
    return func(*args, **kwargs)
  File "/Users/cjfields/miniforge3/lib/python3.9/site-packages/matplotlib/__init__.py", line 1423, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "/Users/cjfields/miniforge3/lib/python3.9/site-packages/matplotlib/axes/_axes.py", line 5577, in imshow
    im._scale_norm(norm, vmin, vmax)
  File "/Users/cjfields/miniforge3/lib/python3.9/site-packages/matplotlib/cm.py", line 405, in _scale_norm
    raise ValueError(
ValueError: Passing a Normalize instance simultaneously with vmin/vmax is not supported.  Please pass vmin/vmax directly to the norm when creating it.

Using:

python=3.9.10
matplotlib==3.6.0
numpy==1.23.1
pandas==1.5.0
scipy==1.9.1
setuptools==60.9.3

@pollicipes
Copy link

Hi,
This worked for me:
https://forum.biobakery.org/t/hclust2-py-error-distance-matrix-finite-values/1732
Just modify the script in the __init__ function in line 370 should do it.

Regarding your specific error, I think that if you discard the --minv parameter it should work.

Cheers,
J

@Jesuk555
Copy link

Jesuk555 commented Mar 6, 2024

I have the same problem, I am using it in a cluster whose system is similar to Linux, do you know how to solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests