# Extraction

## Overview

The `pytcube.extraction` function is designed to extract localized data from a spatiotemporal `DataCube` based on observations defined in a `Dataset`. It allows users to define a buffer around each observation, resulting in smaller, more manageable data subsets known as Minicubes.

## Function Signature

```python
pytcube.extraction(
    datacube: xr.Dataset,
    dataset: xr.Dataset,
    buffer: dict[str, int],
    n_obs_per_batch: int,
    optimize: bool = False,
    path: str | Path | None = None,
    chunk: dict[str, Any] = {'time': 1, 'lon': 'auto', 'lat': 'auto'}
) -> list
```

## Parameters

- `datacube` `(xr.Dataset)`: The input DataCube containing spatiotemporal data structured with dimensions of time, lon, and lat. This dataset should include the relevant variables for extraction.
- `dataset` `(xr.Dataset)`: The input Dataset consisting of observations. Each observation should be associated with specific spatiotemporal coordinates.
- `buffer` `(dict[str, int])`: A dictionary that defines the size of the buffer around each observation for the extraction. The keys should include:
    - time: The buffer size in the time dimension.
    - lon: The buffer size in the longitude dimension.
    - lat: The buffer size in the latitude dimension.
- `n_obs_per_batch` `(int)`: The number of observations to process in each extraction batch.
- `optimize` `(bool, optional)`: A flag to indicate whether to optimize the extraction process. The default value is False.
- `path` `(str | Path | None, optional)`: The path where the extracted data should be saved, if necessary. The default is None.
- `chunk` `(dict[str, Any], optional)`: A dictionary that specifies the chunking strategy for the output data. The default is {'time': 1, 'lon': 'auto', 'lat': 'auto'}.

## Returns

- `(list)`: A list of batches, where each batch contains the extracted Minicubes corresponding to the observations from the input Dataset.

## Example Usage

In [7]:
import pytcube
from pytcube.utils import datacube, dataset

# Define buffer sizes
buffer = {
    "time": 360, # in minutes
    "lon": 100, # in kilometers
    "lat": 100 # in kilometers
}

# Specify number of observations per batch
n_obs_per_batch = 50

# Perform extraction
batches = pytcube.extraction(
    datacube=datacube,
    dataset=dataset,
    buffer=buffer,
    n_obs_per_batch=n_obs_per_batch
)

# Display the results
batches

2024-10-31 15:03:43,529|INFO    |Formatting of the datacube                                                                          |datacube.py:28@_format()
2024-10-31 15:03:43,530|INFO    |Formatting of the dataset                                                                           |dataset.py:34@_format()
2024-10-31 15:03:43,531|INFO    |Calculation of buffer indices                                                                       |datacube.py:42@get_ibuffer()
2024-10-31 15:03:43,533|INFO    |Setup of the DataCube and the Dataset                                                               |processing.py:44@setup()
2024-10-31 15:03:43,547|INFO    |Processing of observations and creation of batches                                                  |processing.py:190@processing()


[Delayed('batch-8ee283b5-4259-4af2-a433-1a3e1e1304e2'),
 Delayed('batch-13f1e61c-d65c-4f14-a742-d31cd5e3000c')]