# Preliminaries
## Installation
To be able to run this tutorial, please install the following libraries:

In [None]:
!pip install bridge-ds
!pip install pycocotools

## Imports

In [None]:
import tempfile
from pathlib import Path

import holoviews as hv
import panel as pn

hv.extension("bokeh")
pn.extension()
# If in google colab, run hack that allows holoviews to work properly
try:
    import google.colab  # noqa

    def _render(self, **kwargs):
        hv.extension("bokeh")
        return hv.Store.render(self)

    hv.core.Dimensioned._repr_mimebundle_ = _render
except ModuleNotFoundError:
    pass

TMP_NOTEBOOK_ROOT = Path(tempfile.mkdtemp()) / "basics" / "table_api"

## Loading a dataset

To create BridgeDS Dataset objects, it's recommended to utilize a **DatasetProvider**. In this instance, we'll employ the Coco2017Detection provider:

In [None]:
from bridge.display.vision import Holoviews
from bridge.providers.vision import Coco2017Detection

root_dir = TMP_NOTEBOOK_ROOT / "coco"

provider = Coco2017Detection(root_dir, split="train", img_source="stream")
ds = provider.build_dataset(display_engine=Holoviews(bbox_format="xywh"))
ds

# TableAPI

In BridgeDS, we use two complementing approaches to view datasets. We call them the **Sample API** and the **Table API**. This tutorial is about the latter.

The table API can be loosely described as:
> A dataset can be viewed as a table where every row represents a single element. Elements have unique ids but share the sample_id with other elements from the same Sample.

## Tables
Like in the previous tutorial, we semantically split the elements into two groups: **ds.samples** containing images and **ds.annotations** containing bboxes:

In [None]:
ds.samples.head()

In [None]:
ds.annotations.head()

## Filter
Using tables allows us to easily filter out images or bboxes using familiar Pandas syntax. Note that when filtering samples, BridgeDS automatically filters out corresponding annotations:

In [None]:
print("Original dataset:")
print(ds, "\n")
print("Filter out images (and corresponding bboxes) where the license < 3:")
print(ds.select_samples(lambda samples, anns: samples.license < 3), "\n")
print("Filter all bboxes with iscrowd==0. This leaves us with some empty images:")
print(ds.select_annotations(lambda samples, anns: anns.iscrowd != 0), "\n")
print("We can pipe both selectors to filter out the bboxes, and subsequently filter out empty images:")
print(
    ds.select_annotations(lambda samples, anns: anns.iscrowd != 0).select_samples(
        lambda samples, anns: samples.index.get_level_values("sample_id").isin(anns.index.get_level_values("sample_id"))
    )
)

## Assign
We can assign new columns to either `ds.samples` or `ds.annotations` using familiar syntax. Let's assign the value `n_bboxes` to every sample:

In [None]:
ds = ds.assign_samples(
    n_bboxes=lambda samples, anns: anns.groupby("sample_id")
    .size()
    .reindex(samples.index.get_level_values("sample_id"))
    .values
)
ds.samples.head()

## Sorting
We can sort the tables using familiar Pandas syntax:

In [None]:
sorted_ds = ds.sort_samples("n_bboxes", ascending=False)
sorted_ds.samples.head()

Note that if we sort the samples table, we can change the positional index used by the Sample API (ds.iget). The next cell will show the dataset in order from most bboxes per image to least:

In [None]:
sorted_ds.show()