SpeechBrain is an open-source toolkit for speech processing, which includes ASR. It provides flexibility and is suitable for creating lightweight models. Here’s how you can train a small ASR model using SpeechBrain with the Common Voice dataset:

### Step 1: Install SpeechBrain and Dependencies
First, install SpeechBrain and other necessary packages:

In [None]:
pip install speechbrain
pip install torchaudio
pip install datasets

### Step 2: Load and Preprocess the Common Voice Dataset
Use the datasets library to load the Common Voice dataset and prepare it for training:

In [None]:
import torchaudio
from datasets import load_dataset, load_metric
import speechbrain as sb
from speechbrain.dataio.dataio import read_audio
from speechbrain.dataio.batch import PaddedBatch

# Load the Common Voice dataset
common_voice_train = load_dataset("JaepaX/corean_dataset", split="train")
common_voice_test = load_dataset("JaepaX/corean_dataset", split="test")

# Define data preparation functions
def prepare_common_voice(batch):
    batch["speech"] = read_audio(batch["path"])
    batch["target"] = batch["sentence"]
    return batch

common_voice_train = common_voice_train.map(prepare_common_voice)
common_voice_test = common_voice_test.map(prepare_common_voice)

### Step 3: Define the ASR Model
Define the model architecture using SpeechBrain’s Brain class:

In [None]:
import speechbrain as sb
from hyperpyyaml import load_hyperpyyaml

# Define the model
class ASR(sb.Brain):
    def compute_forward(self, batch, stage):
        batch = batch.to(self.device)
        wavs, wav_lens = batch.speech
        outputs = self.modules.wav2vec2(wavs)
        logits = self.modules.output(outputs)
        return logits, wav_lens

    def compute_objectives(self, predictions, batch, stage):
        logits, wav_lens = predictions
        ids = batch.id
        targets, target_lens = batch.target_encoded
        loss = self.hparams.compute_cost(logits, targets, wav_lens, target_lens)
        return loss

    def fit_batch(self, batch):
        predictions = self.compute_forward(batch, sb.Stage.TRAIN)
        loss = self.compute_objectives(predictions, batch, sb.Stage.TRAIN)
        self.optimizer.zero_grad()
        loss.backward()
        self.optimizer.step()
        return loss.detach()

    def evaluate_batch(self, batch, stage):
        predictions = self.compute_forward(batch, stage)
        loss = self.compute_objectives(predictions, batch, stage)
        return loss.detach()

# Load the hyperparameters file
with open('hyperparams.yaml') as fin:
    hparams = load_hyperpyyaml(fin)

### Step 4: Define Data Pipelines
Set up data pipelines for loading and tokenizing the dataset:

In [None]:
# Define the data pipeline
def dataio_prep(hparams):
    # Define audio pipeline
    @sb.utils.data_pipeline.takes("path")
    @sb.utils.data_pipeline.provides("speech")
    def audio_pipeline(path):
        sig = read_audio(path)
        yield sig

    # Define text pipeline
    @sb.utils.data_pipeline.takes("sentence")
    @sb.utils.data_pipeline.provides("target")
    def text_pipeline(sentence):
        yield sentence

    data_pipeline = {
        "audio": audio_pipeline,
        "text": text_pipeline,
    }

    datasets = {
        "train": sb.dataio.dataset.DynamicItemDataset.from_dataset(common_voice_train),
        "test": sb.dataio.dataset.DynamicItemDataset.from_dataset(common_voice_test),
    }

    sb.dataio.dataset.add_dynamic_item(datasets.values(), data_pipeline)
    sb.dataio.dataset.set_output_keys(
        datasets.values(), ["id", "speech", "target"]
    )

    return datasets

### Step 5: Training Configuration
Configure the training process in a hyperparams.yaml file:

### Step 6: Train the Model
Finally, create a training script and start training:

In [None]:
# Import the necessary modules
import os
import torch

# Load the hyperparameters
with open("hyperparams.yaml") as fin:
    hparams = load_hyperpyyaml(fin)

# Create the datasets
datasets = dataio_prep(hparams)

# Initialize the Brain object
asr_brain = ASR(
    modules=hparams["modules"],
    opt_class=hparams["optimizer"],
    hparams=hparams,
    run_opts={"device": "cuda" if torch.cuda.is_available() else "cpu"},
    checkpointer=sb.utils.checkpoints.Checkpointer(hparams["output_folder"]),
)

# Train the model
asr_brain.fit(
    epoch_counter=sb.utils.epoch_loop.EpochCounter(max_epochs=hparams["epochs"]),
    train_set=datasets["train"],
    valid_set=datasets["test"],
    train_loader_kwargs={"batch_size": hparams["batch_size"]},
    valid_loader_kwargs={"batch_size": hparams["batch_size"]},
)

This guide provides a basic setup for training a smaller ASR model using SpeechBrain and the Common Voice dataset. Adjust parameters, paths, and other configurations as needed for your specific use case.