# PyTorch: Using `ShardReader` to read WebDataset formatted Shards

The `ShardReader` class can be used to read WebDataset formatted shards from buckets and objects through URLs or by passing in buckets directly. The `ShardReader` class will yield an iterator contain a tuple with the sample basename and a sample content dictionary. This dictionary is keyed by file extension (e.g "png") and has values containing the contents of the associated file in bytes. So, given a shard with a sample in it containing a "cls" and "png" file, you can read the shard using `ShardReader` and then access the sample and it's contents directly by iterating through the `ShardReader` instance.

### Install necessary packages

In [None]:
from aistore.sdk import Client
from aistore.pytorch.shard_reader import AISShardReader

### Run an AIStore Cluster, either locally or elsewhere, and configure the endpoint and bucket which you want to use

In [None]:
AIS_ENDPOINT = "http://localhost:8080"
AIS_PROVIDER = "ais"
BCK_NAME = "test-data"

client = Client(endpoint=AIS_ENDPOINT)
bucket = client.bucket(BCK_NAME, AIS_PROVIDER).create(exist_ok=True)

### Populate the bucket with WebDataset formatted shards using the AIS CLI

To download the entire set:

```console
ais start download "https://storage.googleapis.com/webdataset/testdata/publaynet-train-{000000..000009}.tar" ais://test-data
```

You can use your own data here as well. Just ensure that your bucket has shards that are formatted in WebDataset format.

### Create a ShardReader and use it to read your bucket 

In [None]:
shard_reader = AISShardReader(bucket_list=bucket)

# Note that the webdataset format stores multiple files to one dataset indexed by basename
for basename, content_dict in shard_reader:
    print(basename, list(content_dict.keys()))

### You can also use a `DataLoader` if you would like

In [None]:
from torch.utils.data import DataLoader

loader = DataLoader(shard_reader, batch_size=60, num_workers=4)

# basenames, content_dicts have size batch_size each
for basenames, content_dicts in loader:
    print(basename, list(content_dict.keys()))