### How to use
The datasets library allows you to load and pre-process your dataset in pure Python, at scale. The dataset can be downloaded and prepared in one call to your local drive by using the load_dataset function.

For example, to download the Persian config, simply specify the corresponding language config name (i.e., "fa" for Farsi):

In [None]:
from datasets import load_dataset

cv_13 = load_dataset("mozilla-foundation/common_voice_13_0", "hi", split="train")

Using the datasets library, you can also stream the dataset on-the-fly by adding a streaming=True argument to the load_dataset function call. Loading a dataset in streaming mode loads individual samples of the dataset at a time, rather than downloading the entire dataset to disk.

In [None]:
from datasets import load_dataset

cv_13 = load_dataset("mozilla-foundation/common_voice_13_0", "hi", split="train", streaming=True)

print(next(iter(cv_13)))

### Local

In [None]:
from datasets import load_dataset
from torch.utils.data.sampler import BatchSampler, RandomSampler

cv_13 = load_dataset("mozilla-foundation/common_voice_13_0", "hi", split="train")
batch_sampler = BatchSampler(RandomSampler(cv_13), batch_size=32, drop_last=False)
dataloader = DataLoader(cv_13, batch_sampler=batch_sampler)

### Streaming

In [None]:
from datasets import load_dataset
from torch.utils.data import DataLoader

cv_13 = load_dataset("mozilla-foundation/common_voice_13_0", "hi", split="train")
dataloader = DataLoader(cv_13, batch_size=32)