# Preliminaries
## Installation
To be able to run this tutorial, please install the following libraries:

In [None]:
!pip install bridge-ds
!pip install pycocotools

## Imports

In [None]:
import tempfile
from pathlib import Path

import holoviews as hv
import panel as pn

hv.extension("bokeh")
pn.extension()
# If in google colab, run hack that allows holoviews to work properly
try:
    import google.colab  # noqa

    def _render(self, **kwargs):
        hv.extension("bokeh")
        return hv.Store.render(self)

    hv.core.Dimensioned._repr_mimebundle_ = _render
except ModuleNotFoundError:
    pass

TMP_NOTEBOOK_ROOT = Path(tempfile.mkdtemp()) / "basics" / "sample_api"

## Loading a dataset

To create Dataset objects, it's recommended to utilize a **DatasetProvider**. In this instance, we'll employ the Coco2017Detection provider:

In [None]:
from bridge.display.vision import Holoviews
from bridge.providers.vision import Coco2017Detection

root_dir = TMP_NOTEBOOK_ROOT / "coco"

provider = Coco2017Detection(root_dir)
ds = provider.build_dataset(display_engine=Holoviews(bbox_format="xywh"))
ds


# Sample API

In BridgeDS, we use two complementing approaches to view datasets. We call them the **Sample API** and the **Table API**. This tutorial is about the former.

Sample API can be loosely described as:
> A dataset can be viewed as a collection of samples, where samples are pythonic objects (Sample) that contain a collection of elements.

Let's demonstrate how to use it:

## Indexing

`ds.iget / ds.get` are our equivalents of `df.iloc / df.loc`, used for fetching individual samples from the dataset:

In [None]:
sample = ds.get(34)  # get sample with index 34
print("Sample ID:", sample.id)
sample = ds.iget(1)  # get sample with positional index 1
print("Sample ID:", sample.id)

## Properties
The sample object itself is rather lean, it exposes only its _id_, its _elements_, and its _display_engine_.

If you recall, _elements_ in BridgeDS can be anything - from raw data objects like images or text, to annotations such as bboxes, segmaps or class labels.

Let's see what elements our current sample has:


In [None]:
print("Sample ID:", sample.id)
print("Total num elements in sample:", len(sample), "\n")
for etype, elist in sample.elements.items():
    print(f"Num elements with etype={etype}:", len(elist))

We see one image element and two bboxes. Having one "raw data" element (the image) and multiple "annotation" elements is actually a common use-case. For this reason, we implement COCO using a sub-class of Sample called **SingularSample**, that exposes a more convenient API where the sample has a special element available with `sample.element`, and the rest of the elements are available at `sample.annotations`:

In [None]:
print("The 'sample element' (the image):")
print(f"class: {type(sample).__name__}")
print(f"etype: {sample.element.etype}")
print(f"image shape: {sample.element.data.shape}")
print(f"element_id: {sample.element.id}\n")
print("The annotation elements:")
print(f"n_bboxes: {len(sample.annotations['bbox'])}")
[print(bb_element.data) for bb_element in sample.annotations["bbox"]];

As you can see, **elements** are container objects of the actual data - they have the `.data` property. A **sample** is just a collection of **elements**, and is the representation of an individual example from the dataset.

## The DisplayEngine
We will elaborate on how the DisplayEngine works in a separate tutorial, but for basic purposes it's worth noting that both the Dataset and the Sample objects expose a `.show()` method, which takes advantage of the DisplayEngine and produces the following:

In [None]:
sample.show()

In [None]:
# display entire dataset with interactive interface:
ds.show()

As you can see, our class labels are integers rather than strings, because that's how the raw data is present in the COCO dataset. If you would like to learn how to change this, or in general how to perform dataset-wide operations, proceed to our next tutorial about the **Table API**