[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/merantix-momentum/squirrel-datasets-core/blob/main/examples/01.Getting_Started.ipynb)

In [None]:
try:
    import matplotlib
    import squirrel
    import squirrel_datasets_core

    restart_colab = False
except:
    !pip install -q --ignore-requires-python --upgrade squirrel-datasets-core matplotlib # noqa
    import matplotlib
    import squirrel
    import squirrel_datasets_core

    restart_colab = True

print(squirrel.__version__)
print(squirrel_datasets_core.__version__)

: 

If you run this tutorial in google colab, there unfortunately are some tweaks we need to apply to make it work - we will skip this step automatically if we cannot detect colab:

In [None]:
try:
    import google.colab

    if restart_colab:
        !pip install -q --upgrade --force pyyaml==5.4.1

        # need to restart kernel
        import os

        os.kill(os.getpid(), 9)
except:
    # not in colab
    pass

# Introduction

Squirrel enables you to efficiently load existing datasets and also share datasets using `Catalog` and `Driver`,
apply transformations in a performant and scalable way using `Iterstream`,
and store datasets in a way that is most convenient for deep learning applications with `SquirrelStore`.

Let's see an example.

In [None]:
from squirrel.catalog import Catalog

# init catalog with in-built datasets
cat = Catalog.from_plugins()

# access training images from CIFAR-10 dataset
driver = cat["cifar10"].get_driver()
it = driver.get_iter()

# retrieve single sample from CIFAR-10
sample = it.take(1).collect()[0]

In [None]:
import matplotlib.pyplot as plt

# plot sample and label
plt.title(f"Class: {sample[1]}")
plt.imshow(sample[0])

In this example, we use `Catalog` to load `CIFAR-10` dataset.
`driver.get_iter()` gives us a `Composable` object from the package `Iterstream`, which enables us to apply transformations using convenient methods such as `map` and `filter`.

Let's see an example of these transformations.

In [None]:
from squirrel.iterstream import IterableSource

it = IterableSource([1, 2, 3]).map(lambda x: x + 1).async_map(lambda x: x**2).filter(lambda x: x % 2 == 0)
for i in it:
    print(i)

`Iterstream` can scale up or out your data loading using asynchronous execution using a local executor or a dask cluster, and just-in-time compilation using numba.
Be sure to checkout the documentation.