generated from openproblems-bio/task_template
-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
bugSomething isn't workingSomething isn't working
Description
It looks like my custom fast CCA implementation fails in this particular case and introduces NaNs in the data. Not entirely sure why this happens, but it's also sort of hard to debug without the full dataset, so I'll find a simpler solution.
WARNING Using sklearn for neighbor search with large dataset (92324 cells).
Consider using approximate k-NN search (e.g. pynndescent) or GPU
acceleration (e.g. faiss or rapids)
INFO Using sklearn to compute 30 neighbors.
Traceback (most recent call last):
File "/tmp/nxf.1gyxC2spNC/.viash_script.py", line 58, in
cmap.compute_neighbors(
File "/usr/local/lib/python3.11/site-packages/cellmapper/model/cellmapper.py", line 217, in compute_neighbors
self.knn.compute_neighbors(
File "/usr/local/lib/python3.11/site-packages/cellmapper/model/kernel.py", line 183, in compute_neighbors
backend_x.fit(self.xrep)
File "/usr/local/lib/python3.11/site-packages/cellmapper/model/_knn_backend.py", line 46, in fit
self._nn.fit(data)
File "/usr/local/lib/python3.11/site-packages/sklearn/base.py", line 1365, in wrapper
return fit_method(estimator, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sklearn/neighbors/_unsupervised.py", line 179, in fit
return self._fit(X)
^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sklearn/neighbors/_base.py", line 526, in _fit
X = validate_data(
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sklearn/utils/validation.py", line 2954, in validate_data
out = check_array(X, input_name="X", **check_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sklearn/utils/validation.py", line 1105, in check_array
_assert_all_finite(
File "/usr/local/lib/python3.11/site-packages/sklearn/utils/validation.py", line 120, in _assert_all_finite
_assert_all_finite_element_wise(
File "/usr/local/lib/python3.11/site-packages/sklearn/utils/validation.py", line 169, in _assert_all_finite_element_wise
raise ValueError(msg_err)
ValueError: Input X contains NaN.
NearestNeighbors does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-value
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working