Current implementation uses managed memory to control single reference of indices/distances.
Using managed memory for large indices/distances arrays may oversubscribe the GPU and lead to performance issues.
Explore performance implications with large indices/distances arrays and work on optimizations if needed.
For example, might need to explore the performance implications of doing a gather->merge on gpu->scatter with the reference indices/distances on CPU memory.