Skip to content

CellMapper linear fails Open Problems NeurIPS 2022 ADT2GEX #13

@Marius1311

Description

@Marius1311

It looks like my custom fast CCA implementation fails in this particular case and introduces NaNs in the data. Not entirely sure why this happens, but it's also sort of hard to debug without the full dataset, so I'll find a simpler solution.

WARNING Using sklearn for neighbor search with large dataset (92324 cells).   
         Consider using approximate k-NN search (e.g. pynndescent) or GPU       
         acceleration (e.g. faiss or rapids)                                   
INFO    Using sklearn to compute 30 neighbors.                                 
Traceback (most recent call last):
  File "/tmp/nxf.1gyxC2spNC/.viash_script.py", line 58, in
    cmap.compute_neighbors(
  File "/usr/local/lib/python3.11/site-packages/cellmapper/model/cellmapper.py", line 217, in compute_neighbors
    self.knn.compute_neighbors(
  File "/usr/local/lib/python3.11/site-packages/cellmapper/model/kernel.py", line 183, in compute_neighbors
    backend_x.fit(self.xrep)
  File "/usr/local/lib/python3.11/site-packages/cellmapper/model/_knn_backend.py", line 46, in fit
    self._nn.fit(data)
  File "/usr/local/lib/python3.11/site-packages/sklearn/base.py", line 1365, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sklearn/neighbors/_unsupervised.py", line 179, in fit
    return self._fit(X)
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sklearn/neighbors/_base.py", line 526, in _fit
    X = validate_data(
        ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sklearn/utils/validation.py", line 2954, in validate_data
    out = check_array(X, input_name="X", **check_params)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sklearn/utils/validation.py", line 1105, in check_array
    _assert_all_finite(
  File "/usr/local/lib/python3.11/site-packages/sklearn/utils/validation.py", line 120, in _assert_all_finite
    _assert_all_finite_element_wise(
  File "/usr/local/lib/python3.11/site-packages/sklearn/utils/validation.py", line 169, in _assert_all_finite_element_wise
    raise ValueError(msg_err)
ValueError: Input X contains NaN.
NearestNeighbors does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-value

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions