# Overview: Why fast-vindex?

Dask provides the `vindex` function for working with advanced indexing. However, the documentation states that this function is less optimized than standard indexing and therefore slower (see [docs](https://docs.dask.org/en/latest/generated/dask.array.Array.vindex.html)).
The goal of the `fast-vindex` library, as its name suggests, is to develop a more optimized method for performing advanced indexing in Dask.

```{eval-rst}
.. _scientific_challenge:
```

## Initial Scientific Challenge

In a scientific use case, the goal was to co-locate satellite data (time, longitude, latitude) with in-situ observations (time, longitude, latitude).
Concretely, this problem involves extracting minicubes from within a larger datacube.

![](/_static/pytcube_schema.png)

Technically, this involves using `vindex` on a Dask array with fancy indexing.

In [1]:
import dask.array as da
import numpy as np

In [2]:
x = da.random.random((10, 10, 10), chunks=(5, 5, 5))
x

Unnamed: 0,Array,Chunk
Bytes,7.81 kiB,0.98 kiB
Shape,"(10, 10, 10)","(5, 5, 5)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 7.81 kiB 0.98 kiB Shape (10, 10, 10) (5, 5, 5) Dask graph 8 chunks in 1 graph layer Data type float64 numpy.ndarray",10  10  10,

Unnamed: 0,Array,Chunk
Bytes,7.81 kiB,0.98 kiB
Shape,"(10, 10, 10)","(5, 5, 5)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [3]:
indexes = (
    np.array([[[[3]], [[4]], [[5]]]]),
    np.array([[[[2], [3], [5]]]]),
    np.array([[[[6, 5, 7]]]]),
)
x.vindex[indexes]

Unnamed: 0,Array,Chunk
Bytes,216 B,216 B
Shape,"(1, 3, 3, 3)","(1, 3, 3, 3)"
Dask graph,1 chunks in 3 graph layers,1 chunks in 3 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 216 B 216 B Shape (1, 3, 3, 3) (1, 3, 3, 3) Dask graph 1 chunks in 3 graph layers Data type float64 numpy.ndarray",1  1  3  3  3,

Unnamed: 0,Array,Chunk
Bytes,216 B,216 B
Shape,"(1, 3, 3, 3)","(1, 3, 3, 3)"
Dask graph,1 chunks in 3 graph layers,1 chunks in 3 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


## The Problem with Dask's vindex

The problem with `vindex` is its lack of optimization. The current implementation relies on point-wise indexing, causing both the initialization time and the size of the task graph to scale proportionally with the number of points to extract.

In [4]:
from fast_vindex.testing import generate_fancy_indexes

In [5]:
shape = (5_000, 5_000, 5_000)
chunks = (500, 500, 500)
x = da.random.random(shape, chunks=chunks) 
x 

Unnamed: 0,Array,Chunk
Bytes,0.91 TiB,0.93 GiB
Shape,"(5000, 5000, 5000)","(500, 500, 500)"
Dask graph,1000 chunks in 1 graph layer,1000 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 0.91 TiB 0.93 GiB Shape (5000, 5000, 5000) (500, 500, 500) Dask graph 1000 chunks in 1 graph layer Data type float64 numpy.ndarray",5000  5000  5000,

Unnamed: 0,Array,Chunk
Bytes,0.91 TiB,0.93 GiB
Shape,"(5000, 5000, 5000)","(500, 500, 500)"
Dask graph,1000 chunks in 1 graph layer,1000 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [6]:
indexes = generate_fancy_indexes(x, n=100, padding=10)

In [7]:
%%time
x.vindex[indexes]

CPU times: user 3.17 s, sys: 228 ms, total: 3.4 s
Wall time: 3.4 s


Unnamed: 0,Array,Chunk
Bytes,6.10 MiB,6.10 MiB
Shape,"(100, 20, 20, 20)","(100, 20, 20, 20)"
Dask graph,1 chunks in 3 graph layers,1 chunks in 3 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 6.10 MiB 6.10 MiB Shape (100, 20, 20, 20) (100, 20, 20, 20) Dask graph 1 chunks in 3 graph layers Data type float64 numpy.ndarray",100  1  20  20  20,

Unnamed: 0,Array,Chunk
Bytes,6.10 MiB,6.10 MiB
Shape,"(100, 20, 20, 20)","(100, 20, 20, 20)"
Dask graph,1 chunks in 3 graph layers,1 chunks in 3 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


**If the number of minicubes to extract doubles**

In [8]:
indexes = generate_fancy_indexes(x, n=200, padding=10)

In [9]:
%%time
_ = x.vindex[indexes]

CPU times: user 6.77 s, sys: 308 ms, total: 7.08 s
Wall time: 7.08 s


Doubling the number of minicubes doubles the task graph creation time.

**If the dimensions of the minicubes to extract double**

In [10]:
indexes = generate_fancy_indexes(x, n=100, padding=20)

In [11]:
%%time
_ = x.vindex[indexes]

CPU times: user 29.4 s, sys: 1.44 s, total: 30.9 s
Wall time: 30.9 s


Doubling the dimensions of the minicubes multiplies the task graph creation time by 8 (2³).

## What `fast-vindex` Offers

`fast-vindex` aims to avoid point-wise indexing. To achieve this, the idea is to split the indices into sub-indices that correspond to chunk boundaries. While this introduces other challenges, it results in significant time savings.

![](/_static/performance_comparison_vindex_vs_fast_vindex.png)

```{note}
In this study, this represents a performance coefficient of 35.
```