# Example Usage of Fast-Vindex with Synthetic Data

---

This notebook demonstrates a **basic workflow** using the Fast-Vindex library with **synthetically generated data**.  

```{seealso} Learning outcomes
At the end of this notebook, you will learn how to:
- Generate synthetic darray and indexing for testing or experimentation.
- Apply the `patched_vindex` function for fast and flexible multidimensional indexing.
```

In [1]:
from fast_vindex import patched_vindex
from fast_vindex.testing import (
    generate_darray,
    generate_fancy_indexes
)

## Dask Cluster Setup

In this section, we initialize a **Dask distributed cluster** to enable parallel and scalable data processing.
This will allow computations on large datasets to be executed efficiently across multiple cores or nodes.

In [2]:
from dask.distributed import Client

client = Client()
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: /proxy/8787/status,

0,1
Dashboard: /proxy/8787/status,Workers: 4
Total threads: 8,Total memory: 10.00 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:53806,Workers: 0
Dashboard: /proxy/8787/status,Total threads: 0
Started: Just now,Total memory: 0 B

0,1
Comm: tcp://127.0.0.1:51298,Total threads: 2
Dashboard: /proxy/42418/status,Memory: 2.50 GiB
Nanny: tcp://127.0.0.1:59856,
Local directory: /dev/shm/pbs.4176367.datarmor0/dask-scratch-space/worker-k5dzd3dz,Local directory: /dev/shm/pbs.4176367.datarmor0/dask-scratch-space/worker-k5dzd3dz

0,1
Comm: tcp://127.0.0.1:54631,Total threads: 2
Dashboard: /proxy/53715/status,Memory: 2.50 GiB
Nanny: tcp://127.0.0.1:56122,
Local directory: /dev/shm/pbs.4176367.datarmor0/dask-scratch-space/worker-fj7xo0x0,Local directory: /dev/shm/pbs.4176367.datarmor0/dask-scratch-space/worker-fj7xo0x0

0,1
Comm: tcp://127.0.0.1:58514,Total threads: 2
Dashboard: /proxy/55390/status,Memory: 2.50 GiB
Nanny: tcp://127.0.0.1:54509,
Local directory: /dev/shm/pbs.4176367.datarmor0/dask-scratch-space/worker-aviwmxba,Local directory: /dev/shm/pbs.4176367.datarmor0/dask-scratch-space/worker-aviwmxba

0,1
Comm: tcp://127.0.0.1:56848,Total threads: 2
Dashboard: /proxy/45366/status,Memory: 2.50 GiB
Nanny: tcp://127.0.0.1:33099,
Local directory: /dev/shm/pbs.4176367.datarmor0/dask-scratch-space/worker-ccbs7r0j,Local directory: /dev/shm/pbs.4176367.datarmor0/dask-scratch-space/worker-ccbs7r0j


## Generate Synthetic Data

In this section, we will create a **synthetic multidimensional Dask array** (darray) along with a set of **indexes** that can be used for advanced indexing operations.

First, we generate a **dask array** using the `generate_darray()` function. This function creates a multidimensional array with a specified shape and chunking. Here, we simulate a large dataset with **three dimensions of size 1,000 × 1,000 × 1,000**. Using Dask arrays with chunking allows us to efficiently handle large datasets without consuming excessive memory.

In [3]:
darray = generate_darray(shape=(1_000, 1_000, 1_000), chunks=(100, 100, 100), fmt="drandom")
darray

Unnamed: 0,Array,Chunk
Bytes,7.45 GiB,7.63 MiB
Shape,"(1000, 1000, 1000)","(100, 100, 100)"
Dask graph,1000 chunks in 1 graph layer,1000 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 7.45 GiB 7.63 MiB Shape (1000, 1000, 1000) (100, 100, 100) Dask graph 1000 chunks in 1 graph layer Data type float64 numpy.ndarray",1000  1000  1000,

Unnamed: 0,Array,Chunk
Bytes,7.45 GiB,7.63 MiB
Shape,"(1000, 1000, 1000)","(100, 100, 100)"
Dask graph,1000 chunks in 1 graph layer,1000 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray


Next, we generate **indexes corresponding to the darray** using `generate_fancy_indexes()`. The `padding` parameter ensures that the points are placed safely inside the array boundaries.

In [4]:
indexes = generate_fancy_indexes(darray, n=1_000, padding=5)

## Patched Vindex

Next, we use **`patched_vindex()`** to perform fast and flexible indexing on our synthetic dataset. This temporarily **overrides Dask’s built-in `vindex`** with the optimized version provided by Fast-Vindex.

In [5]:
with patched_vindex():
    result = darray.vindex[indexes]
result

Unnamed: 0,Array,Chunk
Bytes,7.63 MiB,7.63 MiB
Shape,"(1000, 10, 10, 10)","(1000, 10, 10, 10)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 7.63 MiB 7.63 MiB Shape (1000, 10, 10, 10) (1000, 10, 10, 10) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",1000  1  10  10  10,

Unnamed: 0,Array,Chunk
Bytes,7.63 MiB,7.63 MiB
Shape,"(1000, 10, 10, 10)","(1000, 10, 10, 10)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


At this stage, the indexing is performed **lazily**, meaning that Dask only builds the computation graph without actually loading data into memory.

To obtain the actual values, we trigger the computation using **`.compute()`** with the Dask scheduler:

In [6]:
values = result.compute()

After this step, `values` contains the data extracted from the darray at the points defined by `indexes`, demonstrating how **Fast-Vindex** improves performance for complex or high-dimensional array indexing.