In [None]:
try:
    import matplotlib
    import squirrel
    import squirrel_datasets_core
except:
    !pip install -q --ignore-requires-python --upgrade squirrel-datasets-core matplotlib # noqa
    import matplotlib
    import squirrel
    import squirrel_datasets_core

print(squirrel.__version__)
print(squirrel_datasets_core.__version__)

# Introduction

Squirrel enables you to efficiently load existing datasets and also share datasets using `Catalog` and `Driver`,
apply transformations in a performant and scalable way using `Iterstream`,
and store datasets in a way that is most convenient for deep learning applications with `SquirrelStore`.

Let's see an example.

In [None]:
from squirrel.catalog import Catalog

# init catalog with in-built datasets
cat = Catalog.from_plugins()

# access training images from imagenet dataset
driver = cat["cifar10"].get_driver()
it = driver.get_iter()

# retrieve single sample from imagenet
sample = it.take(1).collect()[0]

In [None]:
import matplotlib.pyplot as plt

# plot sample and label
plt.title(f"Class: {sample[1]}")
plt.imshow(sample[0])

In this example, we use `Catalog` to load `imagenet` dataset.
`driver.get_iter()` gives us a `Composable` object from the package `Iterstream`, which enables us to apply transformations using convenient methods such as `map` and `filter`.

Let's see an example of these transformations.

In [None]:
from squirrel.iterstream import IterableSource

it = IterableSource([1, 2, 3]).map(lambda x: x + 1).async_map(lambda x: x ** 2).filter(lambda x: x % 2 == 0)
for i in it:
    print(i)

`Iterstream` can scale up or out your data loading using asynchronous execution using a loacal executor or a dask cluster, and just-in-time compilation using numba.
Be sure to checkout the documentation.