-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slice error using mac M1-max ARM #218
Comments
I found the problem, I did not pass the distance |
I just got this same error on an x86 machine ( edit: Of course, as soon as I comment it starts mysteriously working...was failing consistently before. I wonder if I had some bad version cached or something |
I agree this is odd, and I'll try to keep a lookout for a reproducer. |
I think I have a reproducer but not sure how to share it. It seems completely data specific: I got this error with |
I have a sporadic reproducer with a fairly small array (1.8M on disk, saved as edit: The above array seems to fail consistently only when passed through |
So it was quirky. There was some code added to bail when the tree splitting was not working well and avoid excess depth. Unfortunately that meant that, in rare cases, the size of a leaf could exceed the leaf_size set. This made things not match up when building leaf arrays at the end, because we expected things to match the leaf size. Now we have a max_leaf_size, and expand things in those rare cases. In theory this could blow up terribly for bad data by consuming ungodly amounts of memory, but that's a very rare case indeed, and I'm not sure there is any way to fix it anyway. The best answer in that case is simply to increase the leaf size in the NNDescent params. |
I try the code on a large dataset 200k x 2.5k, using last version v0.5.10 with ever dense or sparse dataset, I have an error:
My code:
´´´
index = pynndescent.NNDescent(crs_test, metric='cosine')
´´´
it run for 10,20 secs than got this error:
ValueError Traceback (most recent call last)
File :1
File ~/miniforge3/envs/tf/lib/python3.9/site-packages/pynndescent/pynndescent_.py:804, in NNDescent.init(self, data, metric, metric_kwds, n_neighbors, n_trees, leaf_size, pruning_degree_multiplier, diversify_prob, n_search_trees, tree_init, init_graph, init_dist, random_state, low_memory, max_candidates, n_iters, delta, n_jobs, compressed, parallel_batch_queries, verbose)
793 print(ts(), "Building RP forest with", str(n_trees), "trees")
794 self._rp_forest = make_forest(
795 data,
796 n_neighbors,
(...)
802 self._angular_trees,
803 )
--> 804 leaf_array = rptree_leaf_array(self._rp_forest)
805 else:
806 self._rp_forest = None
File ~/miniforge3/envs/tf/lib/python3.9/site-packages/pynndescent/rp_trees.py:1097, in rptree_leaf_array(rp_forest)
1095 def rptree_leaf_array(rp_forest):
1096 if len(rp_forest) > 0:
-> 1097 return np.vstack(rptree_leaf_array_parallel(rp_forest))
1098 else:
1099 return np.array([[-1]])
File ~/miniforge3/envs/tf/lib/python3.9/site-packages/pynndescent/rp_trees.py:1089, in rptree_leaf_array_parallel(rp_forest)
1088 def rptree_leaf_array_parallel(rp_forest):
-> 1089 result = joblib.Parallel(n_jobs=-1, require="sharedmem")(
1090 joblib.delayed(get_leaves_from_tree)(rp_tree) for rp_tree in rp_forest
1091 )
1092 return result
File ~/miniforge3/envs/tf/lib/python3.9/site-packages/joblib/parallel.py:1098, in Parallel.call(self, iterable)
1095 self._iterating = False
1097 with self._backend.retrieval_context():
-> 1098 self.retrieve()
1099 # Make sure that we get a last message telling us we are done
1100 elapsed_time = time.time() - self._start_time
File ~/miniforge3/envs/tf/lib/python3.9/site-packages/joblib/parallel.py:975, in Parallel.retrieve(self)
973 try:
974 if getattr(self._backend, 'supports_timeout', False):
--> 975 self._output.extend(job.get(timeout=self.timeout))
976 else:
977 self._output.extend(job.get())
File ~/miniforge3/envs/tf/lib/python3.9/multiprocessing/pool.py:771, in ApplyResult.get(self, timeout)
769 return self._value
770 else:
--> 771 raise self._value
File ~/miniforge3/envs/tf/lib/python3.9/multiprocessing/pool.py:125, in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
123 job, i, func, args, kwds = task
124 try:
--> 125 result = (True, func(*args, **kwds))
126 except Exception as e:
127 if wrap_exception and func is not _helper_reraises_exception:
File ~/miniforge3/envs/tf/lib/python3.9/site-packages/joblib/_parallel_backends.py:620, in SafeFunction.call(self, *args, **kwargs)
618 def call(self, *args, **kwargs):
619 try:
--> 620 return self.func(*args, **kwargs)
621 except KeyboardInterrupt as e:
622 # We capture the KeyboardInterrupt and reraise it as
623 # something different, as multiprocessing does not
624 # interrupt processing for a KeyboardInterrupt
625 raise WorkerInterrupt() from e
File ~/miniforge3/envs/tf/lib/python3.9/site-packages/joblib/parallel.py:288, in BatchedCalls.call(self)
284 def call(self):
285 # Set the default nested backend to self._backend but do not set the
286 # change the default number of processes to -1
287 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 288 return [func(*args, **kwargs)
289 for func, args, kwargs in self.items]
File ~/miniforge3/envs/tf/lib/python3.9/site-packages/joblib/parallel.py:288, in (.0)
284 def call(self):
285 # Set the default nested backend to self._backend but do not set the
286 # change the default number of processes to -1
287 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 288 return [func(*args, **kwargs)
289 for func, args, kwargs in self.items]
ValueError: cannot assign slice from input of different size
The text was updated successfully, but these errors were encountered: