ValueError: cannot assign slice from input of different size #1008

Jiawei-Xing · 2023-05-13T04:49:45Z

Hi, I want to use UMAP on a large distance matrix (369911x369911). I followed the first example of "UMAP on sparse data" from the tutorial (I've tried either lil or csr sparse matrix). The code worked well with a smaller sample dataset but failed on my large matrix. My sparse matrix is ~9 GB, and I was running it on an HPC node with 10 cpus (~30 GB). The low_memory option was set to True.

Traceback (most recent call last):
  File "umap.py", line 51, in <module>
    mapper = reducer.fit(matrix)
  File "/home/xing.232/.local/lib/python3.7/site-packages/umap/umap_.py", line 2526, in fit
    verbose=self.verbose,
  File "/home/xing.232/.local/lib/python3.7/site-packages/umap/umap_.py", line 340, in nearest_neighbors
    compressed=False,
  File "/home/xing.232/.local/lib/python3.7/site-packages/pynndescent/pynndescent_.py", line 804, in __init__
    leaf_array = rptree_leaf_array(self._rp_forest)
  File "/home/xing.232/.local/lib/python3.7/site-packages/pynndescent/rp_trees.py", line 1097, in rptree_leaf_array
    return np.vstack(rptree_leaf_array_parallel(rp_forest))
  File "/home/xing.232/.local/lib/python3.7/site-packages/pynndescent/rp_trees.py", line 1090, in rptree_leaf_array_parallel
    joblib.delayed(get_leaves_from_tree)(rp_tree) for rp_tree in rp_forest
  File "/home/xing.232/.local/lib/python3.7/site-packages/joblib/parallel.py", line 1098, in __call__
    self.retrieve()
  File "/home/xing.232/.local/lib/python3.7/site-packages/joblib/parallel.py", line 975, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/usr/local/anaconda3-2020.02/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
  File "/usr/local/anaconda3-2020.02/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/xing.232/.local/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 620, in __call__
    return self.func(*args, **kwargs)
  File "/home/xing.232/.local/lib/python3.7/site-packages/joblib/parallel.py", line 289, in __call__
    for func, args, kwargs in self.items]
  File "/home/xing.232/.local/lib/python3.7/site-packages/joblib/parallel.py", line 289, in <listcomp>
    for func, args, kwargs in self.items]
ValueError: cannot assign slice from input of different size

The text was updated successfully, but these errors were encountered:

lmcinnes · 2023-05-13T12:54:39Z

I believe this is an issue relating to caching compilation in pynndescent. If you reinstall pynndescent, preferably directly from github, it should resolve the issue.

To be clear, however, if pynndescent is running then it is computing nearest neighbors of vectors, so it is treating your distance matrix as a large set of sparse vectors. That probably isn't what you want. I would check that this is actually what you want to do.

Jiawei-Xing · 2023-05-14T22:22:26Z

Thank you for your quick response! You are right, I should use metric="precomputed" to fit in distance matrix.

A related question for the matrix input: my original similarity matrix is sparse, but when I convert it to distance (1-similarity) most elements become 1. This causes memory issues as the matrix is not sparse anymore. Is it possible to use similarity matrix for fitting the model, or is there other way to overcome this?

ReaganGen · 2023-05-21T20:01:05Z

Hello. I also have the same error. It is weird that Umap works well on some of my datasets, but return this error message on some datasets with the same format. I also try to reinstall pynndescent directly from github. The same error still exits. Could anyone help?

ntmaier · 2023-05-23T21:58:00Z

Hello -

I have the similar issue as the one above. I also reinstalled pynndescent directly from the github master. Note: The script works for smaller files, right now I am running a relatively simple workflow, which breaks on larger files (~44.000,4000):

fit = umap.UMAP(n_neighbors=20,n_components=2,min_dist=0.1)
umap_spectrogram = fit.fit_transform(spectrograms_)

ValueError Traceback (most recent call last)
Cell In[6], line 7
4 # these settings seem to work pretty good n_neighbors = 30, n_components=3, min_dist=0.5
5 fit = umap.UMAP(n_neighbors=20,n_components=2,min_dist=0.1)
----> 7 umap_spectrogram = fit.fit_transform(spectrograms_)

File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\umap\umap_.py:2772, in UMAP.fit_transform(self, X, y)
2742 def fit_transform(self, X, y=None):
2743 """Fit X into an embedded space and return that transformed
2744 output.
2745
(...)
2770 Local radii of data points in the embedding (log-transformed).
2771 """
-> 2772 self.fit(X, y)
2773 if self.transform_mode == "embedding":
2774 if self.output_dens:

File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\umap\umap_.py:2516, in UMAP.fit(self, X, y)
2510 nn_metric = self._input_distance_func
2511 if self.knn_dists is None:
2512 (
2513 self._knn_indices,
2514 self._knn_dists,
2515 self._knn_search_index,
-> 2516 ) = nearest_neighbors(
2517 X[index],
2518 self._n_neighbors,
2519 nn_metric,
2520 self._metric_kwds,
2521 self.angular_rp_forest,
2522 random_state,
2523 self.low_memory,
2524 use_pynndescent=True,
2525 n_jobs=self.n_jobs,
2526 verbose=self.verbose,
2527 )
2528 else:
2529 self._knn_indices = self.knn_indices

File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\umap\umap_.py:328, in nearest_neighbors(X, n_neighbors, metric, metric_kwds, angular, random_state, low_memory, use_pynndescent, n_jobs, verbose)
325 n_trees = min(64, 5 + int(round((X.shape[0]) ** 0.5 / 20.0)))
326 n_iters = max(5, int(round(np.log2(X.shape[0]))))
--> 328 knn_search_index = NNDescent(
329 X,
330 n_neighbors=n_neighbors,
331 metric=metric,
332 metric_kwds=metric_kwds,
333 random_state=random_state,
334 n_trees=n_trees,
335 n_iters=n_iters,
336 max_candidates=60,
337 low_memory=low_memory,
338 n_jobs=n_jobs,
339 verbose=verbose,
340 compressed=False,
341 )
342 knn_indices, knn_dists = knn_search_index.neighbor_graph
344 if verbose:

File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\pynndescent\pynndescent_.py:804, in NNDescent.init(self, data, metric, metric_kwds, n_neighbors, n_trees, leaf_size, pruning_degree_multiplier, diversify_prob, n_search_trees, tree_init, init_graph, init_dist, random_state, low_memory, max_candidates, n_iters, delta, n_jobs, compressed, parallel_batch_queries, verbose)
793 print(ts(), "Building RP forest with", str(n_trees), "trees")
794 self._rp_forest = make_forest(
795 data,
796 n_neighbors,
(...)
802 self._angular_trees,
803 )
--> 804 leaf_array = rptree_leaf_array(self._rp_forest)
805 else:
806 self._rp_forest = None

File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\pynndescent\rp_trees.py:1097, in rptree_leaf_array(rp_forest)
1095 def rptree_leaf_array(rp_forest):
1096 if len(rp_forest) > 0:
-> 1097 return np.vstack(rptree_leaf_array_parallel(rp_forest))
1098 else:
1099 return np.array([[-1]])

File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\pynndescent\rp_trees.py:1089, in rptree_leaf_array_parallel(rp_forest)
1088 def rptree_leaf_array_parallel(rp_forest):
-> 1089 result = joblib.Parallel(n_jobs=-1, require="sharedmem")(
1090 joblib.delayed(get_leaves_from_tree)(rp_tree) for rp_tree in rp_forest
1091 )
1092 return result

File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\joblib\parallel.py:1098, in Parallel.call(self, iterable)
1095 self._iterating = False
1097 with self._backend.retrieval_context():
-> 1098 self.retrieve()
1099 # Make sure that we get a last message telling us we are done
1100 elapsed_time = time.time() - self._start_time

File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\joblib\parallel.py:975, in Parallel.retrieve(self)
973 try:
974 if getattr(self._backend, 'supports_timeout', False):
--> 975 self._output.extend(job.get(timeout=self.timeout))
976 else:
977 self._output.extend(job.get())

File c:\ProgramData\Anaconda\envs\seis\Lib\multiprocessing\pool.py:774, in ApplyResult.get(self, timeout)
772 return self._value
773 else:
--> 774 raise self._value

File c:\ProgramData\Anaconda\envs\seis\Lib\multiprocessing\pool.py:125, in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
123 job, i, func, args, kwds = task
124 try:
--> 125 result = (True, func(*args, **kwds))
126 except Exception as e:
127 if wrap_exception and func is not _helper_reraises_exception:

File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\joblib_parallel_backends.py:620, in SafeFunction.call(self, *args, **kwargs)
618 def call(self, *args, **kwargs):
619 try:
--> 620 return self.func(*args, **kwargs)
621 except KeyboardInterrupt as e:
622 # We capture the KeyboardInterrupt and reraise it as
623 # something different, as multiprocessing does not
624 # interrupt processing for a KeyboardInterrupt
625 raise WorkerInterrupt() from e

File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\joblib\parallel.py:288, in BatchedCalls.call(self)
284 def call(self):
285 # Set the default nested backend to self._backend but do not set the
286 # change the default number of processes to -1
287 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 288 return [func(*args, **kwargs)
289 for func, args, kwargs in self.items]

File c:\ProgramData\Anaconda\envs\seis\Lib\site-packages\joblib\parallel.py:288, in (.0)
284 def call(self):
285 # Set the default nested backend to self._backend but do not set the
286 # change the default number of processes to -1
287 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 288 return [func(*args, **kwargs)
289 for func, args, kwargs in self.items]

ValueError: cannot assign slice from input of different size

liufeifan · 2023-06-20T02:14:21Z

I also have the same ValueError problem as above mentioned.

ogreyesp · 2023-06-25T13:35:18Z

Hi

This problem is happening very frequently. I have a dataset wherein UMAP works well. However, when I tried to build umap by applying a 10 fold cross validation, the error appeared in some folds.

Please, advice

carluqcor · 2023-06-26T14:29:39Z

Hi @ogreyesp, as @lmcinnes said it seems to be pynndescent. Using version pynndescent-0.5.8 works perfectly for me.

liufeifan · 2023-07-20T03:33:50Z

I also have the same ValueError problem as above mentioned.

I solved the problem by going back to a previous older version 0.5.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: cannot assign slice from input of different size #1008

ValueError: cannot assign slice from input of different size #1008

Jiawei-Xing commented May 13, 2023 •

edited

Loading

lmcinnes commented May 13, 2023

Jiawei-Xing commented May 14, 2023

ReaganGen commented May 21, 2023

ntmaier commented May 23, 2023

liufeifan commented Jun 20, 2023

ogreyesp commented Jun 25, 2023 •

edited

Loading

carluqcor commented Jun 26, 2023

liufeifan commented Jul 20, 2023 •

edited

Loading

ValueError: cannot assign slice from input of different size #1008

ValueError: cannot assign slice from input of different size #1008

Comments

Jiawei-Xing commented May 13, 2023 • edited Loading

lmcinnes commented May 13, 2023

Jiawei-Xing commented May 14, 2023

ReaganGen commented May 21, 2023

ntmaier commented May 23, 2023

liufeifan commented Jun 20, 2023

ogreyesp commented Jun 25, 2023 • edited Loading

carluqcor commented Jun 26, 2023

liufeifan commented Jul 20, 2023 • edited Loading

Jiawei-Xing commented May 13, 2023 •

edited

Loading

ogreyesp commented Jun 25, 2023 •

edited

Loading

liufeifan commented Jul 20, 2023 •

edited

Loading