Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to install hdbscan on colab. #600

Open
Raingel opened this issue Jul 17, 2023 · 69 comments
Open

Unable to install hdbscan on colab. #600

Raingel opened this issue Jul 17, 2023 · 69 comments

Comments

@Raingel
Copy link

Raingel commented Jul 17, 2023

Today I found the following error message when trying to install hdbscan on colab.

 error: subprocess-exited-with-error
  
  × Building wheel for hdbscan (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  Building wheel for hdbscan (pyproject.toml) ... error
  ERROR: Failed building wheel for hdbscan
Failed to build hdbscan
ERROR: Could not build wheels for hdbscan, which is required to install pyproject.toml-based projects

It worked fine when I installed it last week.

I also tried to install the previous version of hdbscan (0.8.29), but it still failed.

image

@mikeldking
Copy link

Seeing this on our CI builds now as well

error: subprocess-exited-with-error
  
  × Building wheel for hdbscan (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [[16](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:17)8 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-38
      creating build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/validity.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/plots.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/flat.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/prediction.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/hdbscan_.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/__init__.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/robust_single_linkage_.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      creating build/lib.linux-x86_64-cpython-38/hdbscan/tests
      copying hdbscan/tests/test_rsl.py -> build/lib.linux-x86_64-cpython-38/hdbscan/tests
      copying hdbscan/tests/test_prediction_utils.py -> build/lib.linux-x86_64-cpython-38/hdbscan/tests
      copying hdbscan/tests/test_flat.py -> build/lib.linux-x86_64-cpython-38/hdbscan/tests
      copying hdbscan/tests/__init__.py -> build/lib.linux-x86_64-cpython-38/hdbscan/tests
      copying hdbscan/tests/test_hdbscan.py -> build/lib.linux-x86_64-cpython-38/hdbscan/tests
      running build_ext
      Compiling hdbscan/_hdbscan_tree.pyx because it changed.
      [1/1] Cythonizing hdbscan/_hdbscan_tree.pyx
      building 'hdbscan._hdbscan_tree' extension
      creating build/temp.linux-x86_64-cpython-38
      creating build/temp.linux-x86_64-cpython-38/hdbscan
      gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/runner/.local/share/hatch/env/virtual/arize-phoenix/C8K4HrkP/type/include -I/opt/hostedtoolcache/Python/3.8.[17](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:18)/x64/include/python3.8 -I/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/numpy/core/include -c hdbscan/_hdbscan_tree.c -o build/temp.linux-x86_64-cpython-38/hdbscan/_hdbscan_tree.o
      In file included from /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:[18](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:19)30,
                       from /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
                       from /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                       from hdbscan/_hdbscan_tree.c:1097:
      /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
         17 | #warning "Using deprecated NumPy API, disable it with " \
            |  ^~~~~~~
      gcc -shared -Wl,--rpath=/opt/hostedtoolcache/Python/3.8.17/x64/lib -Wl,--rpath=/opt/hostedtoolcache/Python/3.8.17/x64/lib build/temp.linux-x86_64-cpython-38/hdbscan/_hdbscan_tree.o -L/opt/hostedtoolcache/Python/3.8.17/x64/lib -o build/lib.linux-x86_64-cpython-38/hdbscan/_hdbscan_tree.cpython-38-x86_64-linux-gnu.so
      /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/Cython/Compiler/Main.py:381: FutureWarning: Cython directive 'language_level' not set, using '3str' for now (Py3). This has changed from earlier releases! File: /tmp/pip-install-sir9k2dg/hdbscan_aa682700701c41ffa445f31aed278805/hdbscan/_hdbscan_tree.pyx
        tree = Parsing.p_module(s, pxd, full_module_name)
      /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/Cython/Compiler/Main.py:381: FutureWarning: Cython directive 'language_level' not set, using '3str' for now (Py3). This has changed from earlier releases! File: /tmp/pip-install-sir9k2dg/hdbscan_aa682700701c41ffa445f31aed278805/hdbscan/_hdbscan_linkage.pyx
        tree = Parsing.p_module(s, pxd, full_module_name)
      
      Error compiling Cython file:
      ------------------------------------------------------------
      ...
      import numpy as np
      cimport numpy as np
      
      from libc.float cimport DBL_MAX
      
      from dist_metrics cimport DistanceMetric
      ^
      ------------------------------------------------------------
      
      hdbscan/_hdbscan_linkage.pyx:12:0: 'dist_metrics.pxd' not found
      
      Error compiling Cython file:
      ------------------------------------------------------------
      ...
      import numpy as np
      cimport numpy as np
      
      from libc.float cimport DBL_MAX
      
      from dist_metrics cimport DistanceMetric
      ^
      ------------------------------------------------------------
      
      hdbscan/_hdbscan_linkage.pyx:12:0: 'dist_metrics/DistanceMetric.pxd' not found
      
      Error compiling Cython file:
      ------------------------------------------------------------
      ...
      
      
      cpdef np.ndarray[np.double_t, ndim=2] mst_linkage_core_vector(
              np.ndarray[np.double_t, ndim=2, mode='c'] raw_data,
              np.ndarray[np.double_t, ndim=1, mode='c'] core_distances,
              DistanceMetric dist_metric,
              ^
      ------------------------------------------------------------
      
      hdbscan/_hdbscan_linkage.pyx:58:8: 'DistanceMetric' is not a type identifier
      
      Error compiling Cython file:
      ------------------------------------------------------------
      ...
                      continue
      
                  right_value = current_distances[j]
                  right_source = current_sources[j]
      
                  left_value = dist_metric.dist(&raw_data_ptr[num_features *
                                                ^
      ------------------------------------------------------------
      
      hdbscan/_hdbscan_linkage.pyx:129:42: Cannot convert 'double_t *' to Python object
      
      Error compiling Cython file:
      ------------------------------------------------------------
      ...
                  right_value = current_distances[j]
                  right_source = current_sources[j]
      
                  left_value = dist_metric.dist(&raw_data_ptr[num_features *
                                                              current_node],
                                                &raw_data_ptr[num_features * j],
                                                ^
      ------------------------------------------------------------
      
      hdbscan/_hdbscan_linkage.pyx:131:42: Cannot convert 'double_t *' to Python object
      Compiling hdbscan/_hdbscan_linkage.pyx because it changed.
      [1/1] Cythonizing hdbscan/_hdbscan_linkage.pyx
      Traceback (most recent call last):
        File "/home/runner/.local/share/hatch/env/virtual/arize-phoenix/C8K4HrkP/type/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/home/runner/.local/share/hatch/env/virtual/arize-phoenix/C8K4HrkP/type/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/home/runner/.local/share/hatch/env/virtual/arize-phoenix/C8K4HrkP/type/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
          return _build_backend().build_wheel(wheel_directory, config_settings,
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 416, in build_wheel
          return self._build_with_temp_dir(['bdist_wheel'], '.whl',
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 401, in _build_with_temp_dir
          self.run_setup()
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 487, in run_setup
          super(_BuildMetaLegacyBackend,
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 338, in run_setup
          exec(code, locals())
        File "<string>", line 96, in <module>
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/__init__.py", line 107, in setup
          return distutils.core.setup(**attrs)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup
          return run_commands(dist)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/core.py", line [20](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:21)1, in run_commands
          dist.run_commands()
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 343, in run
          self.run_command("build")
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build.py", line 131, in run
          self.run_command(cmd_name)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "<string>", line 26, in run
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
          self.build_extensions()
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
          self._build_extensions_serial()
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
          self.build_extension(ext)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/Cython/Distutils/build_ext.py", line 1[22](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:23), in build_extension
          new_ext = cythonize(
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/Cython/Build/Dependencies.py", line 1134, in cythonize
          cythonize_one(*args)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/Cython/Build/Dependencies.py", line 1[30](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:31)1, in cythonize_one
          raise CompileError(None, pyx_file)
      Cython.Compiler.Errors.CompileError: hdbscan/_hdbscan_linkage.pyx
      [end of output]

@fvdnabee
Copy link

fvdnabee commented Jul 17, 2023

We're seeming the same issue since today, on linux x86-64 py3.10. I noticed there weren't any wheels before, so I'm assuming we were building hdbscan from source before. Not sure what change now causes the build failure.

@MrBeeMovie
Copy link

Having this problem as well. Installing using poetry. No changes to lock file. Was working last week.

@Rhaedonius
Copy link
Contributor

Rhaedonius commented Jul 17, 2023

This is also creating issue in Databricks as well. Cython released a new major version (3.0.0) a few hours ago, so there might be an issue with that on these managed enviroments. https://pypi.org/project/Cython/#history
I tried installing the package from master on wsl and it worked with all python versions > 3.8 using the newest cython.
Anyway, it might be worth to pin all requirements to be less than the next major version just to be on the safe side.

EDIT: databricks runtime 10.4 LTS has issues, 11.3 LTS and 12.2 LTS work fine.

@kikefdezl
Copy link

kikefdezl commented Jul 17, 2023

Downgrading Cython to previous release is not working for me. Still same error.

@mikeldking
Copy link

Same, colab doesn't have cython3 for me anyways
Screenshot 2023-07-17 at 11 10 54 AM

@Rhaedonius
Copy link
Contributor

I suggested cython only because of the timing of them releasing a new major version and the errors popping up. it might not be related.

@argonaut76
Copy link

Downgrading Cython to 0.29.36 is also not working for me.

@dafajon
Copy link

dafajon commented Jul 17, 2023

Having the same issue on Kaggle notebooks.

@lmcinnes
Copy link
Collaborator

There was a recent sklearn release that changed some internals the hdbscan relied on (which resulted in the 0.8.30 release to attempt to fix those). It's possible that this is the issue; Can you check what sklearn version you have?

@argonaut76
Copy link

scikit-learn==1.2.2

@kenho211
Copy link

kenho211 commented Jul 17, 2023

Same issue on ubuntu 18.04 using docker image python:3.8.12

@lmcinnes
Copy link
Collaborator

I'm at a bit of a loss; especially if 0.8.29 is also not building anymore. I can at least reproduce this locally, but it is unclear how to fix things since nothing that is currently breaking has changed in quite some time -- so it isn't clear why it is breaking at all.

@lmcinnes
Copy link
Collaborator

lmcinnes commented Jul 17, 2023

Okay, I poked the obvious things in terms of module name resolution issues and it seems to have fixed the problem locally. I don't understand what changed, or, indeed, why this particular change is now required, but given the scale of issues people are having I'm going to push those changes out as a 0.8.31 release and hopefully that solves the problems for some people.

@nchepanov
Copy link

I have an idea. This might be caused by isolated builds. When I install the package it pulls down the most recent version of Cython (regardless of what's installed in my environment).

cython>=0.27 should be updated to be cython>=0.27<3 to prevent latest version of Cython

(comment is being updated as I'm testing my hypothesis...)

jkgoodrich added a commit to broadinstitute/gnomad_methods that referenced this issue Jul 17, 2023
@thomasjv799
Copy link

The new patch kindof solved the issue for me. https://github.com/scikit-learn-contrib/hdbscan/releases/tag/0.8.31

@mikeldking
Copy link

Confirming working on 0.8.31 for me too - which makes me believe @nchepanov 's comment makes sense (e.g. the general release of cython 3.0 caused the break). Aligns timing wise too.

@lmcinnes
Copy link
Collaborator

@nchepanov I believe you are correct; while the changes made allowed Cython 3 to build hdbscan, there seem to be further issues at runtime. Until I have time to figure out and work through all the changes that Cython 3 requires I have added a "<3" requirement for Cython. That seems to resolve all the issues as far as I can tell. I've pushed that out as 0.8.32 and hopefully that can keep things afloat for a while.

Thanks to everyone for flagging the issue and the help tracking down the source of the problem.

@Rhaedonius
Copy link
Contributor

Rhaedonius commented Jul 17, 2023

I have an idea. This might be caused by isolated builds. When I install the package it pulls down the most recent version of Cython (regardless of what's installed in my environment).

cython>=0.27 should be updated to be cython>=0.27<3 to prevent latest version of Cython

(comment is being updated as I'm testing my hypothesis...)

This is more what i was thinking, I did remember something about isolated builds but could not locate it in the python docs.
I changed the build requirement to "cython<3" in pyproject.toml and managed to build the code for hdbscan 0.8.30 under databricks 10.4 LTS and colab.
The cython in requirements might not be needed, as it's not a runtime requirement (still testing for this)

@aaron-skydio
Copy link

I did remember something about isolated builds but could not locate it in the python docs.

pip does not respect installed versions of packages in build-system.requires for PEP517 packages - yeah you can also work around this by installing a working version of cython (2.x), and passing --no-build-isolation to pip install, which will stop it from installing a newer version of cython (3.x) just for the wheel build

@argonaut76
Copy link

argonaut76 commented Jul 17, 2023

0.83.31 is not working for me. I'm running hdbscan inside a dockerized application, and getting the following error:

`Traceback (most recent call last):

File "/usr/src/app/modules/cluster.py", line 26, in fit
clusterer = HDBSCAN(min_cluster_size=min_cluster_size, min_samples=self.min_samples, cluster_selection_method=self.cluster_selection_method).fit(vectors)

File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 1205, in fit
) = hdbscan(clean_data, **kwargs)

File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 884, in hdbscan
_tree_to_labels(

File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 78, in _tree_to_labels
condensed_tree = condense_tree(single_linkage_tree, min_cluster_size)

File "hdbscan/_hdbscan_tree.pyx", line 43, in hdbscan._hdbscan_tree.condense_tree

File "hdbscan/_hdbscan_tree.pyx", line 114, in hdbscan._hdbscan_tree.condense_tree

TypeError: 'numpy.float64' object cannot be interpreted as an integer`

I'm using scikit-learn==1.2.2.

@mikeldking
Copy link

0.83.31 is not working for me. I'm running hdbscan inside a dockerized application, and getting the following error:

`Traceback (most recent call last):

File "/usr/src/app/modules/cluster.py", line 26, in fit clusterer = HDBSCAN(min_cluster_size=min_cluster_size, min_samples=self.min_samples, cluster_selection_method=self.cluster_selection_method).fit(vectors)

File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 1205, in fit ) = hdbscan(clean_data, **kwargs)

File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 884, in hdbscan _tree_to_labels(

File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 78, in _tree_to_labels condensed_tree = condense_tree(single_linkage_tree, min_cluster_size)

File "hdbscan/_hdbscan_tree.pyx", line 43, in hdbscan._hdbscan_tree.condense_tree

File "hdbscan/_hdbscan_tree.pyx", line 114, in hdbscan._hdbscan_tree.condense_tree

TypeError: 'numpy.float64' object cannot be interpreted as an integer`

I'm using scikit-learn==1.2.2.

This is alsow unfortunately the same runtime exception I'm hitting with 0.83.32

@lmcinnes
Copy link
Collaborator

So I definitely saw that runtime error with 0.8.31; in testing that disappeared with 0.8.32. If it is still an issue in 0.8.32 then that's not so good. I was getting all green on the test suite: https://dev.azure.com/lelandmcinnes/HDBSCAN%20builds/_build/results?buildId=901&view=results so I'm not sure what the lingering issue is. Perhaps a clean re-install for 0.8.32?

Sieboldianus added a commit to Sieboldianus/TagMaps that referenced this issue Jul 21, 2023
Sieboldianus added a commit to Sieboldianus/TagMaps that referenced this issue Jul 21, 2023
@lmcinnes
Copy link
Collaborator

There should be no difference between the 0.8.33 on PyPI and the current master on github right now, so having it work with one but not the other seems ... odd. The error is a little different as well. I'll see if there's anything I can do.

@Sieboldianus
Copy link

I am sorry - but if there is anything I can do to further identify the cause, please let me know.

@kingdsl
Copy link

kingdsl commented Jul 22, 2023

Hey there, I'm facing a similar error, I was following the whole thread, and was impossible for me to find a solution, I will post the whole error stack trace, I've downloaded the hdbscan 0.8.33 and this error is raised just when applying the fit_transform of Bertopic.

I hope someone can help me to find a solution.

`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File ~/anaconda3/envs/PYTRC_1/lib/python3.10/site-packages/bertopic/_bertopic.py:3218, in BERTopic._cluster_embeddings(self, umap_embeddings, documents, partial_fit, y)
3217 try:
-> 3218 self.hdbscan_model.fit(umap_embeddings, y=y)
3219 except TypeError:

File ~/anaconda3/envs/PYTRC_1/lib/python3.10/site-packages/hdbscan/hdbscan_.py:1205, in HDBSCAN.fit(self, X, y)
1196 kwargs.update(self.metric_kwargs)
1198 (
1199 self.labels
,
1200 self.probabilities_,
1201 self.cluster_persistence_,
1202 self._condensed_tree,
1203 self._single_linkage_tree,
1204 self._min_spanning_tree,
-> 1205 ) = hdbscan(clean_data, **kwargs)
1207 if self.metric != "precomputed" and not self._all_finite:
1208 # remap indices to align with original data in the case of non-finite entries.

File ~/anaconda3/envs/PYTRC_1/lib/python3.10/site-packages/hdbscan/hdbscan_.py:884, in hdbscan(X, min_cluster_size, min_samples, alpha, cluster_selection_epsilon, max_cluster_size, metric, p, leaf_size, algorithm, memory, approx_min_span_tree, gen_min_span_tree, core_dist_n_jobs, cluster_selection_method, allow_single_cluster, match_reference_implementation, **kwargs)
868 (single_linkage_tree, result_min_span_tree) = memory.cache(
869 _hdbscan_boruvka_balltree
870 )(
(...)
880 **kwargs
881 )
883 return (
--> 884 _tree_to_labels(
885 X,
886 single_linkage_tree,
887 min_cluster_size,
888 cluster_selection_method,
889 allow_single_cluster,
890 match_reference_implementation,
891 cluster_selection_epsilon,
892 max_cluster_size,
893 )
894 + (result_min_span_tree,)
895 )

File ~/anaconda3/envs/PYTRC_1/lib/python3.10/site-packages/hdbscan/hdbscan_.py:78, in _tree_to_labels(X, single_linkage_tree, min_cluster_size, cluster_selection_method, allow_single_cluster, match_reference_implementation, cluster_selection_epsilon, max_cluster_size)
75 """Converts a pretrained tree and cluster size into a
76 set of labels and probabilities.
77 """
---> 78 condensed_tree = condense_tree(single_linkage_tree, min_cluster_size)
79 stability_dict = compute_stability(condensed_tree)

File hdbscan/_hdbscan_tree.pyx:43, in hdbscan._hdbscan_tree.condense_tree()

File hdbscan/_hdbscan_tree.pyx:114, in hdbscan._hdbscan_tree.condense_tree()

TypeError: 'numpy.float64' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
Cell In[15], line 1
----> 1 topics, probs = topic_model.fit_transform(abstracts)

File ~/anaconda3/envs/PYTRC_1/lib/python3.10/site-packages/bertopic/_bertopic.py:389, in BERTopic.fit_transform(self, documents, embeddings, images, y)
386 umap_embeddings = self._reduce_dimensionality(embeddings, y)
388 # Cluster reduced embeddings
--> 389 documents, probabilities = self._cluster_embeddings(umap_embeddings, documents, y=y)
391 # Sort and Map Topic IDs by their frequency
392 if not self.nr_topics:

File ~/anaconda3/envs/PYTRC_1/lib/python3.10/site-packages/bertopic/_bertopic.py:3220, in BERTopic.cluster_embeddings(self, umap_embeddings, documents, partial_fit, y)
3218 self.hdbscan_model.fit(umap_embeddings, y=y)
3219 except TypeError:
-> 3220 self.hdbscan_model.fit(umap_embeddings)
3222 try:
3223 labels = self.hdbscan_model.labels

File ~/anaconda3/envs/PYTRC_1/lib/python3.10/site-packages/hdbscan/hdbscan_.py:1205, in HDBSCAN.fit(self, X, y)
1195 kwargs.pop("prediction_data", None)
1196 kwargs.update(self.metric_kwargs)
1198 (
1199 self.labels
,
1200 self.probabilities_,
1201 self.cluster_persistence_,
1202 self._condensed_tree,
1203 self._single_linkage_tree,
1204 self._min_spanning_tree,
-> 1205 ) = hdbscan(clean_data, **kwargs)
1207 if self.metric != "precomputed" and not self._all_finite:
1208 # remap indices to align with original data in the case of non-finite entries.
1209 self._condensed_tree = remap_condensed_tree(
1210 self._condensed_tree, internal_to_raw, outliers
1211 )

File ~/anaconda3/envs/PYTRC_1/lib/python3.10/site-packages/hdbscan/hdbscan_.py:884, in hdbscan(X, min_cluster_size, min_samples, alpha, cluster_selection_epsilon, max_cluster_size, metric, p, leaf_size, algorithm, memory, approx_min_span_tree, gen_min_span_tree, core_dist_n_jobs, cluster_selection_method, allow_single_cluster, match_reference_implementation, **kwargs)
867 else:
868 (single_linkage_tree, result_min_span_tree) = memory.cache(
869 _hdbscan_boruvka_balltree
870 )(
(...)
880 **kwargs
881 )
883 return (
--> 884 _tree_to_labels(
885 X,
886 single_linkage_tree,
887 min_cluster_size,
888 cluster_selection_method,
889 allow_single_cluster,
890 match_reference_implementation,
891 cluster_selection_epsilon,
892 max_cluster_size,
893 )
894 + (result_min_span_tree,)
895 )

File ~/anaconda3/envs/PYTRC_1/lib/python3.10/site-packages/hdbscan/hdbscan_.py:78, in _tree_to_labels(X, single_linkage_tree, min_cluster_size, cluster_selection_method, allow_single_cluster, match_reference_implementation, cluster_selection_epsilon, max_cluster_size)
65 def _tree_to_labels(
66 X,
67 single_linkage_tree,
(...)
73 max_cluster_size=0,
74 ):
75 """Converts a pretrained tree and cluster size into a
76 set of labels and probabilities.
77 """
---> 78 condensed_tree = condense_tree(single_linkage_tree, min_cluster_size)
79 stability_dict = compute_stability(condensed_tree)
80 labels, probabilities, stabilities = get_clusters(
81 condensed_tree,
82 stability_dict,
(...)
87 max_cluster_size,
88 )

File hdbscan/_hdbscan_tree.pyx:43, in hdbscan._hdbscan_tree.condense_tree()

File hdbscan/_hdbscan_tree.pyx:114, in hdbscan._hdbscan_tree.condense_tree()

TypeError: 'numpy.float64' object cannot be interpreted as an integer`

bamdadsabbagh added a commit to sound-scape-explorer/sound-scape-explorer that referenced this issue Jul 22, 2023
uellue added a commit to LiberTEM/LiberTEM that referenced this issue Jul 24, 2023
uellue added a commit to LiberTEM/LiberTEM that referenced this issue Jul 24, 2023
sk1p pushed a commit to LiberTEM/LiberTEM that referenced this issue Jul 24, 2023
shaneknapp added a commit to shaneknapp/datahub that referenced this issue Jul 24, 2023
@ZacharieBuisson1
Copy link

Hello everyone,

I'm encountering the ongoing issue of "'numpy.float64' object cannot be interpreted as an integer" persistently with the function get_clusters() on my local setup. I'm utilizing the following package versions:

Python 3.11.X
HDBScan 0.8.33
BERTopic 0.15.0

Unfortunately, the solutions proposed earlier have not yielded positive results. Are there any recent developments or updates within these packages that could potentially address this problem? Alternatively, could someone suggest a combination of package versions that might prove effective? I've noticed potential compatibility concerns between HDBScan, BERTopic, Cython, Python versions, and more. Your insights would be greatly appreciated. Thank you.

@stianlagstad
Copy link

Thank you for this informative issue. I understand from reading this that changing cython>=0.27 to cython>=0.27,<3 in the requirements.txt file of hdbscan makes it possible to build it again. For the immediate future we're forced to use an old version of hdbscan (v0.8.26). Could I contribute a PR which updates https://github.com/scikit-learn-contrib/hdbscan/blob/0.8.26/requirements.txt#L1 (note the tag there) with the same fix, so that a new version 0.8.26.1 could be released?

@ruli41
Copy link

ruli41 commented Aug 14, 2023

Hello everyone,

I'm encountering the ongoing issue of "'numpy.float64' object cannot be interpreted as an integer" persistently with the function get_clusters() on my local setup. I'm utilizing the following package versions:

Python 3.11.X
HDBScan 0.8.33
BERTopic 0.15.0

Unfortunately, the solutions proposed earlier have not yielded positive results. Are there any recent developments or updates within these packages that could potentially address this problem? Alternatively, could someone suggest a combination of package versions that might prove effective? I've noticed potential compatibility concerns between HDBScan, BERTopic, Cython, Python versions, and more. Your insights would be greatly appreciated. Thank you.

Hi, I have a similar issue. I am also facing the ongoing issue of TypeError: 'numpy.float64' object cannot be interpreted as an integer. My packages: Python 3.10.11, hdbscan 0.8.33, BERTopic 0.15.0. I installed the packages in a fresh conda environment, could run: from bertopic import BERTopic, but when run topic_model = BERTopic(), got every time TypeError: 'numpy.float64' object cannot be interpreted as an integer. Does someone have an idea how to solve this issue?

@JanElbertMDavid
Copy link

@ruli41 I just fixed the same error by referring to #607 issue and checking @jkmackie 's solution!

@renswilderom
Copy link

@nchepanov Could I ask, so you first install hdbscan, and then cython>=0.27<3? I keep having this error when running the BERTopic package and can't get rid of it.

@lucetka
Copy link

lucetka commented Aug 24, 2023

I was experiencing a similar issue with a Streamlit app deployed on streamlit community server where I had previously specified hdbscan == 0.8.28 in the requirements file but with 0.8.33 it is working again.

tewha pushed a commit to healthtechconnex1/hdbscan that referenced this issue Aug 31, 2023
tewha pushed a commit to healthtechconnex1/hdbscan that referenced this issue Aug 31, 2023
@yudhiesh
Copy link

yudhiesh commented Feb 1, 2024

Still experiencing this with hdbscan==0.8.33 when used in a Steamlit app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests