Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: negative column index found #11

Closed
ensonario opened this issue Nov 13, 2017 · 7 comments
Closed

ValueError: negative column index found #11

ensonario opened this issue Nov 13, 2017 · 7 comments

Comments

@ensonario
Copy link

ensonario commented Nov 13, 2017

error.zip

A strange issue happen if you try to compute these simple 500 rows (file is attached).

The code:

import umap
import pandas as pd

df = pd.read_csv('error.csv', header=None)
embedding = umap.UMAP(n_neighbors=15, min_dist=0.1,
                      metric='correlation').fit_transform(df.values)

In result we're getting this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-a14777825bbd> in <module>()
      1 embedding = umap.UMAP(n_neighbors=15, min_dist=0.1,
----> 2                       metric='correlation').fit_transform(df.values)

~/venv3/lib/python3.6/site-packages/umap_learn-0.1.3-py3.6.egg/umap/umap_.py in fit_transform(self, X, y)
    790             Embedding of the training data in low-dimensional space.
    791         """
--> 792         self.fit(X)
    793         return self.embedding_

~/venv3/lib/python3.6/site-packages/umap_learn-0.1.3-py3.6.egg/umap/umap_.py in fit(self, X, y)
    757 
    758         graph = fuzzy_simplicial_set(X, self.n_neighbors,
--> 759                                      self._metric, self.metric_kwds)
    760 
    761         if self.n_edge_samples is None:

~/venv3/lib/python3.6/site-packages/scipy/sparse/coo.py in __init__(self, arg1, shape, dtype, copy)
    189             self.data = self.data.astype(dtype, copy=False)
    190 
--> 191         self._check()
    192 
    193     def getnnz(self, axis=None):

~/venv3/lib/python3.6/site-packages/scipy/sparse/coo.py in _check(self)
    241                 raise ValueError('negative row index found')
    242             if self.col.min() < 0:
--> 243                 raise ValueError('negative column index found')
    244 
    245     def transpose(self, axes=None, copy=False):

ValueError: negative column index found

Any help is appreciated.

@lmcinnes
Copy link
Owner

Hi, thanks for trying our the software. I had thought I had caught most instances when this could occur, but apparently not. What has happened, I believe, is that the algorithm has failed to find 15 nearest for at least one point; this breaks a lot of things. This may be data related, or possibly parameters related. I'll grab the data and see if I can see what is going wrong in this case. Thanks for the detailed report and the reproducer data -- it will make this process go much faster.

@ensonario
Copy link
Author

Thanks for the response.

If I try to increase number of nearest points:

embedding = umap.UMAP(n_neighbors=25, min_dist=0.1,
                      metric='correlation').fit_transform(df.values)

I'm getting a new error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-8b76b22db9a0> in <module>()
      1 embedding = umap.UMAP(n_neighbors=25, min_dist=0.1,
----> 2                       metric='correlation').fit_transform(df.values)

~/venv3/lib/python3.6/site-packages/umap_learn-0.1.3-py3.6.egg/umap/umap_.py in fit_transform(self, X, y)
    790             Embedding of the training data in low-dimensional space.
    791         """
--> 792         self.fit(X)
    793         return self.embedding_

~/venv3/lib/python3.6/site-packages/umap_learn-0.1.3-py3.6.egg/umap/umap_.py in fit(self, X, y)
    757 
    758         graph = fuzzy_simplicial_set(X, self.n_neighbors,
--> 759                                      self._metric, self.metric_kwds)
    760 
    761         if self.n_edge_samples is None:

~/venv3/lib/python3.6/site-packages/scipy/sparse/base.py in multiply(self, other)
    297         """Point-wise multiplication by another matrix
    298         """
--> 299         return self.tocsr().multiply(other)
    300 
    301     def maximum(self, other):

~/venv3/lib/python3.6/site-packages/scipy/sparse/compressed.py in multiply(self, other)
    388                 return copy._mul_sparse_matrix(other)
    389             else:
--> 390                 raise ValueError("inconsistent shapes")
    391 
    392         # Assume other is a dense matrix/array, which produces a single-item

ValueError: inconsistent shapes

@lmcinnes
Copy link
Owner

I believe that is likely the same underlying error presenting in a different way when it arrives further downstream in the code. I'll have to look at the data and see what I've missed (as well as hopefully adding a few checks to provide more meaningful error statements when things like this do go wrong).

@lmcinnes
Copy link
Owner

It looks like the RP tree initialisation for NN-descent is carving off some outliers as singletons; this, in turn, makes NN-descent not work as well as it should. The end result is ... less than ideal. I should be able to fix this by randomly initialising any bad points. Hopefully I can get that done later today.

@lmcinnes
Copy link
Owner

This fixed the issue on your data or me locally. If you could rebuild and re-install from master and let me know if it resolves the issue for you as well that would be greatly appreciated.

@ensonario
Copy link
Author

Thanks a lot! Let me test it ;)

@ensonario
Copy link
Author

It seems the problem is solved! Thanks a lot for quick response and the fix! And of course, thanks for this amazing software.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants