# Preliminaries

## Imports

In [None]:
import tempfile
from pathlib import Path

import holoviews as hv
import hvplot.pandas  # noqa
import panel as pn

hv.extension("bokeh")
pn.extension()
# If in google colab, run hack that allows holoviews to work properly
try:
    import google.colab  # noqa

    def _render(self, **kwargs):
        hv.extension("bokeh")
        return hv.Store.render(self)

    hv.core.Dimensioned._repr_mimebundle_ = _render
except ModuleNotFoundError:
    pass

TMP_NOTEBOOK_ROOT = Path(tempfile.mkdtemp()) / "processing_data" / "sample_transform"

## Loading the dataset

In [None]:
from bridge.providers.vision import Coco2017Detection

root_dir = TMP_NOTEBOOK_ROOT / "coco"

provider = Coco2017Detection(root_dir, split="val", img_source="stream")
ds = provider.build_dataset()
ds

# SampleTransforms

Manipulating data in Bridge is done through SampleTransforms. If you recall, data is stored in Bridge Elements rather than Samples, but in many cases we want to transform all Elements in a Sample together (for example, crop an image and remove all bboxes outside of the crop).

Bridge utilizes SampleTransforms in two contexts:

- `new_sample = sample.transform(sample_transform)`
- `new_ds = ds.transform_samples(sample_transform)` - iterate over all samples and transform each one.

## An Example
We will use `AlbumentationsCompose`, a subclass of `SampleTransform` to take a sample from our dataset and flip it:

In [None]:
import albumentations as A
import holoviews as hv
import panel as pn

from bridge.primitives.sample.transform.vision import AlbumentationsCompose

hv.extension("bokeh")
pn.extension()


def flip_sample(sample):
    transform = AlbumentationsCompose(albm_transforms=[A.HorizontalFlip(always_apply=True)], bbox_format="coco")
    flipped_sample = sample.transform(transform)
    return flipped_sample


sample = ds.iget(2)
flipped = flip_sample(sample)

opts = dict(frame_width=300)
hv.Layout([sample.show(sample_plot_kwargs=opts), flipped.show(sample_plot_kwargs=opts)])

We would like to apply this transformation to the entire dataset; how can we do this?

At first glance, this may seem rather straightforward: Perform `sample.transform()` iteratively (i.e. call `ds.transform_data()`) over the entire dataset, and we've successfully transformed our dataset.

Well... not exactly. Observe the following:

In [None]:
print(sample._element._load_mechanism.url_or_data)
print(flipped._element._load_mechanism.url_or_data)

See how `url_or_data` has changed for the flipped sample? Consider that the LoadMechanism for the original sample is just configured to just load an image from a URL; when we apply an augmentation, our new image is not the same as the source. To keep the new image we need to store it somewhere, detached from the original source.

The default implementation for `sample.transform()` is to save the new data to RAM, but keeping the entire transformed dataset in RAM cannot scale.

To understand how we can solve this issue, and how did the `url_or_data` property change to begin with, proceed to the next tutorial where we talk about **CacheMechanisms**.