-
Notifications
You must be signed in to change notification settings - Fork 806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: cannot assign slice from input of different size #1008
Comments
I believe this is an issue relating to caching compilation in pynndescent. If you reinstall pynndescent, preferably directly from github, it should resolve the issue. To be clear, however, if pynndescent is running then it is computing nearest neighbors of vectors, so it is treating your distance matrix as a large set of sparse vectors. That probably isn't what you want. I would check that this is actually what you want to do. |
Thank you for your quick response! You are right, I should use metric="precomputed" to fit in distance matrix. A related question for the matrix input: my original similarity matrix is sparse, but when I convert it to distance (1-similarity) most elements become 1. This causes memory issues as the matrix is not sparse anymore. Is it possible to use similarity matrix for fitting the model, or is there other way to overcome this? |
Hello - I have the similar issue as the one above. I also reinstalled pynndescent directly from the github master. Note: The script works for smaller files, right now I am running a relatively simple workflow, which breaks on larger files (~44.000,4000): fit = umap.UMAP(n_neighbors=20,n_components=2,min_dist=0.1) ValueError Traceback (most recent call last) File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\umap\umap_.py:2772, in UMAP.fit_transform(self, X, y) File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\umap\umap_.py:2516, in UMAP.fit(self, X, y) File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\umap\umap_.py:328, in nearest_neighbors(X, n_neighbors, metric, metric_kwds, angular, random_state, low_memory, use_pynndescent, n_jobs, verbose) File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\pynndescent\pynndescent_.py:804, in NNDescent.init(self, data, metric, metric_kwds, n_neighbors, n_trees, leaf_size, pruning_degree_multiplier, diversify_prob, n_search_trees, tree_init, init_graph, init_dist, random_state, low_memory, max_candidates, n_iters, delta, n_jobs, compressed, parallel_batch_queries, verbose) File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\pynndescent\rp_trees.py:1097, in rptree_leaf_array(rp_forest) File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\pynndescent\rp_trees.py:1089, in rptree_leaf_array_parallel(rp_forest) File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\joblib\parallel.py:1098, in Parallel.call(self, iterable) File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\joblib\parallel.py:975, in Parallel.retrieve(self) File c:\ProgramData\Anaconda\envs\seis\Lib\multiprocessing\pool.py:774, in ApplyResult.get(self, timeout) File c:\ProgramData\Anaconda\envs\seis\Lib\multiprocessing\pool.py:125, in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception) File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\joblib_parallel_backends.py:620, in SafeFunction.call(self, *args, **kwargs) File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\joblib\parallel.py:288, in BatchedCalls.call(self) File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\joblib\parallel.py:288, in (.0) ValueError: cannot assign slice from input of different size |
I also have the same ValueError problem as above mentioned. |
Hi This problem is happening very frequently. I have a dataset wherein UMAP works well. However, when I tried to build umap by applying a 10 fold cross validation, the error appeared in some folds. Please, advice |
I solved the problem by going back to a previous older version 0.5.0. |
Hi, I want to use UMAP on a large distance matrix (369911x369911). I followed the first example of "UMAP on sparse data" from the tutorial (I've tried either lil or csr sparse matrix). The code worked well with a smaller sample dataset but failed on my large matrix. My sparse matrix is ~9 GB, and I was running it on an HPC node with 10 cpus (~30 GB). The low_memory option was set to True.
The text was updated successfully, but these errors were encountered: