#### Reference

Firstly, Please upvote/refer to [@snnclsr](https://www.kaggle.com/snnclsr) discussions and inference [notebook]https://www.kaggle.com/code/snnclsr/0-444-optimize-decoding-parameters-with-optuna).


Third, please upvote this one :)

## What this notebook features??

- I wanted to showcase the impact of finetuning the models on competition dataset.
- Current version comprises of finetuned model only with 10% of competition training data.
- I will publish the training code in upcoming days. You can refer to this [dataset]()

Public models from hugging faces:
* `https://huggingface.co/ai4bharat/indicwav2vec_v1_bengali` for Wav2vec2CTC Model only
* `https://huggingface.co/arijitx/wav2vec2-xls-r-300m-bengali` for Language Model

I didn't trained these models using the competitaion data at all. I just want to know public models score as baseline.  

So we may get higher and higher score by fine-tuning on competition data.

**Note: I only finetuned the indicwav2vec_v1_bengali which is a CTC model. I am still using the public LM model mentioned above.**


### Everything above PLUS

- How to find the best decoding params using optuna on valid dataset.

It's being suggested by the authors of the [pyctcdecode](https://github.com/kensho-technologies/pyctcdecode/tree/main) developers that we should perform a parameter search because it can improve our results on a specific tasks other than English such as ours.

> (Note: pyctcdecode contains several free hyperparameters that can strongly influence error rate and wall time. Default values for these parameters were (merely) chosen in order to yield good performance for one particular use case. For best results, especially when working with languages other than English, users are encouraged to perform a hyperparameter optimization study on their own data.)

So we will give it a try to find the best parameters in the validation split of the train dataset (because of the time constraints we will only use 5k). Here are the list of decoding params for easy access:

```python
# from: https://github.com/kensho-technologies/pyctcdecode/blob/main/pyctcdecode/constants.py
# default parameters for decoding (can be modified)
DEFAULT_ALPHA = 0.515
DEFAULT_BETA = 1.665
DEFAULT_UNK_LOGP_OFFSET = -10.0
DEFAULT_BEAM_WIDTH = 100
DEFAULT_HOTWORD_WEIGHT = 10.0
DEFAULT_PRUNE_LOGP = -10.0
DEFAULT_PRUNE_BEAMS = False
DEFAULT_MIN_TOKEN_LOGP = -5.0
DEFAULT_SCORE_LM_BOUNDARY = True

# other constants for decoding
AVG_TOKEN_LEN = 6  # average number of characters expected per token (used for UNK scoring)
MIN_TOKEN_CLIP_P = 1e-15  # clipping to avoid underflow in case of malformed logit input
LOG_BASE_CHANGE_FACTOR = 1.0 / math.log10(math.e)  # kenlm returns base10 but we like natural
```

## Import

In [67]:
# !cp -r ../input/python-packages2 ./

# !tar xvfz ./python-packages2/jiwer.tgz
# !pip install ./jiwer/jiwer-2.3.0-py3-none-any.whl -f ./ --no-index
# !tar xvfz ./python-packages2/normalizer.tgz
# !pip install ./normalizer/bnunicodenormalizer-0.0.24.tar.gz -f ./ --no-index
# !tar xvfz ./python-packages2/pyctcdecode.tgz
# !pip install ./pyctcdecode/attrs-22.1.0-py2.py3-none-any.whl -f ./ --no-index --no-deps
# !pip install ./pyctcdecode/exceptiongroup-1.0.0rc9-py3-none-any.whl -f ./ --no-index --no-deps
# !pip install ./pyctcdecode/hypothesis-6.54.4-py3-none-any.whl -f ./ --no-index --no-deps
# !pip install ./pyctcdecode/numpy-1.21.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl -f ./ --no-index --no-deps
# !pip install ./pyctcdecode/pygtrie-2.5.0.tar.gz -f ./ --no-index --no-deps
# !pip install ./pyctcdecode/sortedcontainers-2.4.0-py2.py3-none-any.whl -f ./ --no-index --no-deps
# !pip install ./pyctcdecode/pyctcdecode-0.4.0-py2.py3-none-any.whl -f ./ --no-index --no-deps

# !tar xvfz ./python-packages2/pypikenlm.tgz
# !pip install ./pypikenlm/pypi-kenlm-0.1.20220713.tar.gz -f ./ --no-index --no-deps

In [68]:
# !pip install ../input/jiwer-3-0-3/jiwer-3.0.3-py3-none-any.whl

In [69]:
# rm -r python-packages2 jiwer normalizer pyctcdecode pypikenlm

In [70]:
import typing as tp
from pathlib import Path
from functools import partial
from dataclasses import dataclass, field

import pandas as pd
import pyctcdecode
import numpy as np
from tqdm.notebook import tqdm

import librosa

import pyctcdecode
import kenlm
import torch
from transformers import Wav2Vec2Processor, Wav2Vec2ProcessorWithLM, Wav2Vec2ForCTC
from bnunicodenormalizer import Normalizer

import cloudpickle as cpkl

In [71]:
FIND_PARAMS = True

ROOT = Path.cwd().parent
INPUT = ROOT / "input"
DATA = INPUT / "bengaliai-speech"
TRAIN = DATA / "train_mp3s"
TEST = DATA / "test_mp3s"

SAMPLING_RATE = 16_000
MODEL_PATH = INPUT / "saved_model-finetune-from-beggining-small-fold/ensemble/"
LM_PATH = INPUT / "arijitx-full-model/wav2vec2-xls-r-300m-bengali/language_model/"

### load model, processor, decoder

In [72]:
model = Wav2Vec2ForCTC.from_pretrained(MODEL_PATH)
processor = Wav2Vec2Processor.from_pretrained(MODEL_PATH)

In [73]:
vocab_dict = processor.tokenizer.get_vocab()
sorted_vocab_dict = {k: v for k, v in sorted(vocab_dict.items(), key=lambda item: item[1])}

decoder = pyctcdecode.build_ctcdecoder(
    list(sorted_vocab_dict.keys()),
    str(LM_PATH / "5gram.bin"),
)

Unigrams not provided and cannot be automatically determined from LM file (only arpa format). Decoding accuracy might be reduced.
Found entries of length > 1 in alphabet. This is unusual unless style is BPE, but the alphabet was not recognized as BPE type. Is this correct?
No known unigrams provided, decoding results might be a lot worse.


In [74]:
processor_with_lm = Wav2Vec2ProcessorWithLM(
    feature_extractor=processor.feature_extractor,
    tokenizer=processor.tokenizer,
    decoder=decoder
)

## prepare dataloader

In [75]:
class BengaliSRTestDataset(torch.utils.data.Dataset):
    
    def __init__(
        self,
        audio_paths: list[str],
        sampling_rate: int
    ):
        self.audio_paths = audio_paths
        self.sampling_rate = sampling_rate
        
    def __len__(self,):
        return len(self.audio_paths)
    
    def __getitem__(self, index: int):
        audio_path = self.audio_paths[index]
        sr = self.sampling_rate
        w = librosa.load(audio_path, sr=sr, mono=False)[0]
        
        return w

In [76]:
if not torch.cuda.is_available():
    device = torch.device("cpu")
else:
    device = torch.device("cuda")

model = model.to(device)
model = model.eval()
model = model.half()

# Finding the best decoding params

In [77]:
import jiwer

bnorm = Normalizer()

def postprocess(sentence):
    period_set = set([".", "?", "!", "।"])
    _words = [bnorm(word)['normalized']  for word in sentence.split()]
    sentence = " ".join([word for word in _words if word is not None])
    try:
        if sentence[-1] not in period_set:
            sentence+="।"
    except:
        sentence = "।"
    return sentence


def score(gts, preds):
    return jiwer.wer(gts, preds)


def inference(m, data_loader):
    logits = []
    with torch.no_grad():
        for batch in tqdm(data_loader):
            x = batch["input_values"]
            x = x.to(device, non_blocking=True)
            with torch.cuda.amp.autocast(True):
                y = model(x).logits
            y = y.detach().cpu().numpy()
            logits.extend(y)
    return logits


def decode(logits, params={"beam_width": 1024}, pp=True):    
    pred_sentence_list = [processor_with_lm.decode(sentence, **params).text for sentence in tqdm(logits)]
    if pp:
        pred_sentence_list = [postprocess(s) for s in pred_sentence_list]
    return pred_sentence_list

In [78]:
constants = """
# from: https://github.com/kensho-technologies/pyctcdecode/blob/main/pyctcdecode/constants.py
# default parameters for decoding (can be modified)
DEFAULT_ALPHA = 0.495
DEFAULT_BETA = 1.275
DEFAULT_UNK_LOGP_OFFSET = -10.0
DEFAULT_BEAM_WIDTH = 100
DEFAULT_HOTWORD_WEIGHT = 10.0
DEFAULT_PRUNE_LOGP = -10.0
DEFAULT_PRUNE_BEAMS = False
DEFAULT_MIN_TOKEN_LOGP = -5.0
DEFAULT_SCORE_LM_BOUNDARY = True

# other constants for decoding
AVG_TOKEN_LEN = 6  # average number of characters expected per token (used for UNK scoring)
MIN_TOKEN_CLIP_P = 1e-15  # clipping to avoid underflow in case of malformed logit input
LOG_BASE_CHANGE_FACTOR = 1.0 / math.log10(math.e)  # kenlm returns base10 but we like natural
"""

In [79]:
def objective(trial):
    """
    alpha: weight for language model during shallow fusion
    beta: weight for length score adjustment of during scoring
    unk_score_offset: amount of log score offset for unknown tokens
    lm_score_boundary: whether to have kenlm respect boundaries when scoring
    """
    alpha = trial.suggest_float("alpha", 0.0, 2.15)
    beta = trial.suggest_float("beta", 0.0, 2.05)
    beam_width = trial.suggest_categorical("beam_width", [512, 768, 1024, 1500])
    gts = valid["sentence"].values.tolist()
    decode_params = {
        "alpha": alpha,
        "beta": beta,
        "beam_width": beam_width
    }
    preds = decode(logits, params=decode_params, pp=True)
    wer_score = score(gts, preds)
    return wer_score

In [80]:
# Default decoding configuration in the public notebook.
best_params = {"beam_width": 1024}

if FIND_PARAMS:
    import optuna
    from optuna.trial import TrialState
    
    valid = pd.read_csv(DATA / "excluded_valid.csv") # dtype={"id": str}
    valid_audio_paths = [str(TRAIN / f"{aid}.mp3") for aid in valid["id"].values]

    valid_dataset = BengaliSRTestDataset(
        valid_audio_paths, SAMPLING_RATE
    )

    collate_func = partial(
        processor_with_lm.feature_extractor,
        return_tensors="pt", sampling_rate=SAMPLING_RATE,
        padding=True,
    )

    valid_loader = torch.utils.data.DataLoader(
        valid_dataset, batch_size=16, shuffle=False,
        num_workers=2, collate_fn=collate_func, drop_last=False,
        pin_memory=True,
    )
    # Calculating the base score
    print(constants)
    logits = inference(model, valid_loader)
    base_preds = decode(logits)
    gts = valid["sentence"].values.tolist()
    base_wer_score = score(gts, base_preds)
    print(f"Base wer score: {base_wer_score}")

    study = optuna.create_study(direction="minimize")
    study.optimize(objective, n_trials=200)

    pruned_trials = study.get_trials(deepcopy=False, states=[TrialState.PRUNED])
    complete_trials = study.get_trials(deepcopy=False, states=[TrialState.COMPLETE])

    print("Study statistics: ")
    print("  Number of finished trials: ", len(study.trials))
    print("  Number of pruned trials: ", len(pruned_trials))
    print("  Number of complete trials: ", len(complete_trials))

    print("Best trial:")
    trial = study.best_trial

    print("  Value: ", trial.value)

    print("  Params: ")
    for key, value in trial.params.items():
        print("    {}: {}".format(key, value))
    
    if study.best_value < base_wer_score:
        print(f"Base score improved to {study.best_value} from {base_wer_score}. Assigning {study.best_params} to best_params")
        best_params = study.best_params


# from: https://github.com/kensho-technologies/pyctcdecode/blob/main/pyctcdecode/constants.py
# default parameters for decoding (can be modified)
DEFAULT_ALPHA = 0.495
DEFAULT_BETA = 1.275
DEFAULT_UNK_LOGP_OFFSET = -10.0
DEFAULT_BEAM_WIDTH = 100
DEFAULT_HOTWORD_WEIGHT = 10.0
DEFAULT_PRUNE_LOGP = -10.0
DEFAULT_PRUNE_BEAMS = False
DEFAULT_MIN_TOKEN_LOGP = -5.0
DEFAULT_SCORE_LM_BOUNDARY = True

# other constants for decoding
AVG_TOKEN_LEN = 6  # average number of characters expected per token (used for UNK scoring)
MIN_TOKEN_CLIP_P = 1e-15  # clipping to avoid underflow in case of malformed logit input
LOG_BASE_CHANGE_FACTOR = 1.0 / math.log10(math.e)  # kenlm returns base10 but we like natural



  0%|          | 0/249 [00:00<?, ?it/s]

  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 02:20:07,662] A new study created in memory with name: no-name-8364b4cf-9503-4149-9076-bb51bd0839a9


Base wer score: 0.2479709234006504


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 02:24:41,481] Trial 0 finished with value: 0.24646790369742846 and parameters: {'alpha': 0.5738285534290504, 'beta': 0.9177384660097494, 'beam_width': 768}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 02:30:08,269] Trial 1 finished with value: 0.29379936053343536 and parameters: {'alpha': 1.1200655851193853, 'beta': 1.429717513957985, 'beam_width': 512}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 02:34:13,308] Trial 2 finished with value: 0.2515508430574154 and parameters: {'alpha': 0.22371921701895284, 'beta': 0.15727010372316572, 'beam_width': 512}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 02:43:48,564] Trial 3 finished with value: 0.36922362200420844 and parameters: {'alpha': 1.3924503317055872, 'beta': 0.42620040140092447, 'beam_width': 768}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 02:47:12,812] Trial 4 finished with value: 0.2716093241876862 and parameters: {'alpha': 0.8741599257550358, 'beta': 0.38581815006798875, 'beam_width': 256}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 02:51:03,602] Trial 5 finished with value: 0.24715109447162026 and parameters: {'alpha': 0.5703507197089507, 'beta': 0.687753586132387, 'beam_width': 512}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 02:59:03,820] Trial 6 finished with value: 0.4330609679446889 and parameters: {'alpha': 1.6708579583052319, 'beta': 0.30533388436741316, 'beam_width': 512}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 03:04:47,247] Trial 7 finished with value: 0.26021370207416716 and parameters: {'alpha': 0.11662858121076977, 'beta': 0.04596704065419328, 'beam_width': 768}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 03:15:17,173] Trial 8 finished with value: 0.41059765528926295 and parameters: {'alpha': 1.6253061139521403, 'beta': 0.9881966676736271, 'beam_width': 768}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 03:20:22,762] Trial 9 finished with value: 0.44959418468013007 and parameters: {'alpha': 1.9196399456107398, 'beta': 1.7200286357896009, 'beam_width': 256}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 03:25:05,877] Trial 10 finished with value: 0.24930997731806628 and parameters: {'alpha': 0.6501151357308682, 'beta': 1.1633378698707748, 'beam_width': 768}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 03:28:48,342] Trial 11 finished with value: 0.24676850763807284 and parameters: {'alpha': 0.5344072533831687, 'beta': 0.8656319610417009, 'beam_width': 512}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 03:33:07,397] Trial 12 finished with value: 0.24654988659033147 and parameters: {'alpha': 0.43211094148076584, 'beta': 0.8178130178406104, 'beam_width': 768}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 03:39:48,511] Trial 13 finished with value: 0.28166589238378925 and parameters: {'alpha': 0.032256625612143486, 'beta': 0.6675953835368515, 'beam_width': 768}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 03:43:58,497] Trial 14 finished with value: 0.24821687207935944 and parameters: {'alpha': 0.4214196267480822, 'beta': 1.2353508557009707, 'beam_width': 768}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 03:50:01,706] Trial 15 finished with value: 0.2679474216380182 and parameters: {'alpha': 0.8828282001510945, 'beta': 0.7356702235000755, 'beam_width': 768}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 03:52:43,428] Trial 16 finished with value: 0.25529472849998636 and parameters: {'alpha': 0.4113083982961295, 'beta': 2.03077567177948, 'beam_width': 256}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 03:57:04,305] Trial 17 finished with value: 0.25349110485612003 and parameters: {'alpha': 0.28580952632804424, 'beta': 1.0253452504696459, 'beam_width': 768}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 04:02:40,163] Trial 18 finished with value: 0.2587653376328806 and parameters: {'alpha': 0.7792083454976303, 'beta': 0.5661234783186773, 'beam_width': 768}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 04:09:59,717] Trial 19 finished with value: 0.2956029841773017 and parameters: {'alpha': 1.0942128140257594, 'beta': 0.8995112335192871, 'beam_width': 768}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 04:14:02,610] Trial 20 finished with value: 0.29355341185472633 and parameters: {'alpha': 0.0278675027613412, 'beta': 1.3877324831930513, 'beam_width': 256}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 04:17:44,510] Trial 21 finished with value: 0.24649523132839615 and parameters: {'alpha': 0.5103592105081431, 'beta': 0.8307183120391806, 'beam_width': 512}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 04:21:27,444] Trial 22 finished with value: 0.25111360096193264 and parameters: {'alpha': 0.29445429021097913, 'beta': 0.8144400474173673, 'beam_width': 512}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 04:25:30,705] Trial 23 finished with value: 0.2506217036045145 and parameters: {'alpha': 0.6543961051532622, 'beta': 0.5690794665603989, 'beam_width': 512}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 04:29:46,411] Trial 24 finished with value: 0.24717842210258792 and parameters: {'alpha': 0.4251012005637086, 'beta': 0.9507657432762526, 'beam_width': 768}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 04:33:55,964] Trial 25 finished with value: 0.2554586942857924 and parameters: {'alpha': 0.7615810354922494, 'beta': 1.0723660794116567, 'beam_width': 512}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 04:38:33,091] Trial 26 finished with value: 0.2555406771786954 and parameters: {'alpha': 0.22074717414895362, 'beta': 0.7940651455018298, 'beam_width': 768}. Best is trial 0 with value: 0.24646790369742846.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 04:42:58,256] Trial 27 finished with value: 0.2451835050419479 and parameters: {'alpha': 0.43951872071726716, 'beta': 0.5661632069353444, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 04:46:29,565] Trial 28 finished with value: 0.28040882135927636 and parameters: {'alpha': 0.9564632437035115, 'beta': 0.6149188787431143, 'beam_width': 256}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 04:51:52,852] Trial 29 finished with value: 0.28959090536441395 and parameters: {'alpha': 1.0218348120312564, 'beta': 0.49402103864991215, 'beam_width': 512}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 04:56:13,446] Trial 30 finished with value: 0.257153007405788 and parameters: {'alpha': 0.7247558393614073, 'beta': 0.2752311953765172, 'beam_width': 512}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 05:00:31,670] Trial 31 finished with value: 0.2476976470909737 and parameters: {'alpha': 0.3906546846024348, 'beta': 0.7472852846929698, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 05:04:57,292] Trial 32 finished with value: 0.2458666958161397 and parameters: {'alpha': 0.5140710891272441, 'beta': 0.898132632703925, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 05:09:37,294] Trial 33 finished with value: 0.24573005766130135 and parameters: {'alpha': 0.5442898570409174, 'beta': 0.5106089055734173, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 05:14:32,576] Trial 34 finished with value: 0.2482715273412948 and parameters: {'alpha': 0.6104740158152636, 'beta': 0.45843117985032605, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 05:22:47,201] Trial 35 finished with value: 0.32049845598885035 and parameters: {'alpha': 1.1775662198465486, 'beta': 0.3503393101450814, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 05:27:37,628] Trial 36 finished with value: 0.2558412811193398 and parameters: {'alpha': 0.1986988715082868, 'beta': 0.5342637019792257, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 05:33:29,055] Trial 37 finished with value: 0.26283715464706364 and parameters: {'alpha': 0.7909284528252879, 'beta': 0.24010210212366734, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 05:38:12,754] Trial 38 finished with value: 0.24687781816194354 and parameters: {'alpha': 0.5859562937031144, 'beta': 0.6683655993824971, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 05:43:36,806] Trial 39 finished with value: 0.26130680731287403 and parameters: {'alpha': 0.13169085477188452, 'beta': 0.4249156477748313, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 05:48:10,678] Trial 40 finished with value: 0.24671385237613752 and parameters: {'alpha': 0.3374564298119956, 'beta': 0.16842140433350206, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 05:51:50,512] Trial 41 finished with value: 0.24630393791162244 and parameters: {'alpha': 0.5071431486830862, 'beta': 0.9133952726705908, 'beam_width': 512}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 05:56:15,594] Trial 42 finished with value: 0.24592135107807503 and parameters: {'alpha': 0.5155749734861691, 'beta': 0.9373848456131084, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 06:00:35,012] Trial 43 finished with value: 0.24627661028065476 and parameters: {'alpha': 0.48581861183266817, 'beta': 0.9893413019252522, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 06:05:20,884] Trial 44 finished with value: 0.24950127073484 and parameters: {'alpha': 0.650707574408002, 'beta': 1.0964360308066223, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 06:09:44,220] Trial 45 finished with value: 0.2460579892329134 and parameters: {'alpha': 0.5069271071792483, 'beta': 0.9467467578566073, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 06:14:06,427] Trial 46 finished with value: 0.2483535102341978 and parameters: {'alpha': 0.3408223683755263, 'beta': 0.6618749090183088, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 06:18:42,026] Trial 47 finished with value: 0.2460579892329134 and parameters: {'alpha': 0.549154510977835, 'beta': 0.7561515855939427, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 06:23:52,290] Trial 48 finished with value: 0.2508403246522559 and parameters: {'alpha': 0.6622524925318758, 'beta': 0.38287773145223103, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 06:26:50,801] Trial 49 finished with value: 0.25461153772579453 and parameters: {'alpha': 0.2547564785129721, 'beta': 0.9109424358968454, 'beam_width': 256}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 06:31:50,677] Trial 50 finished with value: 0.2672642308638264 and parameters: {'alpha': 0.14369704975538405, 'beta': 1.165311734289087, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 06:36:23,011] Trial 51 finished with value: 0.24570273003033366 and parameters: {'alpha': 0.5219476528950173, 'beta': 0.6954437758967771, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 06:40:56,704] Trial 52 finished with value: 0.245593419506463 and parameters: {'alpha': 0.5156202065055946, 'beta': 0.6040234049779031, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 06:45:17,695] Trial 53 finished with value: 0.24728773262645862 and parameters: {'alpha': 0.3628337221241783, 'beta': 0.6096870380242427, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 06:50:03,681] Trial 54 finished with value: 0.24704178394774956 and parameters: {'alpha': 0.5788813386513447, 'beta': 0.5108222243092719, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 06:54:25,597] Trial 55 finished with value: 0.24592135107807503 and parameters: {'alpha': 0.4419950410982237, 'beta': 0.7194268091078734, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 06:59:33,019] Trial 56 finished with value: 0.2519061022599951 and parameters: {'alpha': 0.7096119653797257, 'beta': 0.8371827165787251, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 07:03:59,487] Trial 57 finished with value: 0.245675402399366 and parameters: {'alpha': 0.4615251304248385, 'beta': 0.6154323059462967, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 07:06:57,281] Trial 58 finished with value: 0.24851747602000382 and parameters: {'alpha': 0.31242871534168537, 'beta': 0.4385091292107306, 'beam_width': 256}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 07:12:51,488] Trial 59 finished with value: 0.26428551908835024 and parameters: {'alpha': 0.8412654448902865, 'beta': 0.6043283767372827, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 07:17:20,313] Trial 60 finished with value: 0.24540212608968928 and parameters: {'alpha': 0.4638010074004926, 'beta': 0.5231867425699952, 'beam_width': 768}. Best is trial 27 with value: 0.2451835050419479.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 07:21:46,760] Trial 61 finished with value: 0.24496488399420654 and parameters: {'alpha': 0.43093363578737115, 'beta': 0.5214128255295939, 'beam_width': 768}. Best is trial 61 with value: 0.24496488399420654.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 07:26:14,038] Trial 62 finished with value: 0.24512884978001256 and parameters: {'alpha': 0.446657001429826, 'beta': 0.5116669870183392, 'beam_width': 768}. Best is trial 61 with value: 0.24496488399420654.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 07:30:44,660] Trial 63 finished with value: 0.24496488399420654 and parameters: {'alpha': 0.4302192863195373, 'beta': 0.34409084184847893, 'beam_width': 768}. Best is trial 61 with value: 0.24496488399420654.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 07:35:12,913] Trial 64 finished with value: 0.2451835050419479 and parameters: {'alpha': 0.41367476066255954, 'beta': 0.372093963628152, 'beam_width': 768}. Best is trial 61 with value: 0.24496488399420654.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 07:39:39,734] Trial 65 finished with value: 0.245757385292269 and parameters: {'alpha': 0.39068339278189135, 'beta': 0.37024730851918264, 'beam_width': 768}. Best is trial 61 with value: 0.24496488399420654.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 07:44:28,096] Trial 66 finished with value: 0.2527532588199929 and parameters: {'alpha': 0.22318231104569383, 'beta': 0.34293893065116715, 'beam_width': 768}. Best is trial 61 with value: 0.24496488399420654.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 07:47:28,636] Trial 67 finished with value: 0.2497745470445167 and parameters: {'alpha': 0.2758822009064077, 'beta': 0.44763442651381513, 'beam_width': 256}. Best is trial 61 with value: 0.24496488399420654.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 07:51:59,360] Trial 68 finished with value: 0.2455660918754953 and parameters: {'alpha': 0.41196272288213936, 'beta': 0.2874771676215683, 'beam_width': 768}. Best is trial 61 with value: 0.24496488399420654.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 07:58:10,695] Trial 69 finished with value: 0.2687399229360807 and parameters: {'alpha': 0.07026126901599888, 'beta': 0.29216993966643595, 'beam_width': 768}. Best is trial 61 with value: 0.24496488399420654.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 08:02:45,631] Trial 70 finished with value: 0.2455660918754953 and parameters: {'alpha': 0.3913302237506136, 'beta': 0.09633305429206468, 'beam_width': 768}. Best is trial 61 with value: 0.24496488399420654.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 08:07:22,461] Trial 71 finished with value: 0.24542945372065694 and parameters: {'alpha': 0.38612844157453013, 'beta': 0.011171843837351134, 'beam_width': 768}. Best is trial 61 with value: 0.24496488399420654.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 08:11:59,768] Trial 72 finished with value: 0.24641324843549312 and parameters: {'alpha': 0.33273587923113035, 'beta': 0.000811388140910552, 'beam_width': 768}. Best is trial 61 with value: 0.24496488399420654.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 08:16:33,984] Trial 73 finished with value: 0.24553876424452764 and parameters: {'alpha': 0.4450566182451988, 'beta': 0.2108083106309436, 'beam_width': 768}. Best is trial 61 with value: 0.24496488399420654.


  0%|          | 0/3977 [00:00<?, ?it/s]

[I 2023-10-02 08:21:35,740] Trial 74 finished with value: 0.25510343508321265 and parameters: {'alpha': 0.18901837917869097, 'beta': 0.21744349580145397, 'beam_width': 768}. Best is trial 61 with value: 0.24496488399420654.


Study statistics: 
  Number of finished trials:  75
  Number of pruned trials:  0
  Number of complete trials:  75
Best trial:
  Value:  0.24496488399420654
  Params: 
    alpha: 0.43093363578737115
    beta: 0.5214128255295939
    beam_width: 768
Base score improved to 0.24496488399420654 from 0.2479709234006504. Assigning {'alpha': 0.43093363578737115, 'beta': 0.5214128255295939, 'beam_width': 768} to best_params


# Inference with the best params

In [81]:
# Please see the Version 3. of this notebook to see the results.
# best_params = {'alpha': 0.345, 'beta': 0.06, 'beam_width': 768}

In [82]:
print(f"Running the inference with params: {best_params}")

Running the inference with params: {'alpha': 0.43093363578737115, 'beta': 0.5214128255295939, 'beam_width': 768}


In [83]:
# test = pd.read_csv(DATA / "sample_submission.csv", dtype={"id": str})
# test_audio_paths = [str(TEST / f"{aid}.mp3") for aid in test["id"].values]

# test_dataset = BengaliSRTestDataset(
#     test_audio_paths, SAMPLING_RATE
# )
# collate_func = partial(
#     processor_with_lm.feature_extractor,
#     return_tensors="pt", sampling_rate=SAMPLING_RATE,
#     padding=True,
# )
# test_loader = torch.utils.data.DataLoader(
#     test_dataset, batch_size=8, shuffle=False,
#     num_workers=2, collate_fn=collate_func, drop_last=False,
#     pin_memory=True,
# )

# pred_sentence_list = []

# with torch.no_grad():
#     for batch in tqdm(test_loader):
#         x = batch["input_values"]
#         x = x.to(device, non_blocking=True)
#         with torch.cuda.amp.autocast(True):
#             y = model(x).logits
#         y = y.detach().cpu().numpy()
        
#         for l in y:  
#             sentence = processor_with_lm.decode(l, **best_params).text
#             pred_sentence_list.append(sentence)


# pp_pred_sentence_list = [postprocess(s) for s in tqdm(pred_sentence_list)]

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

## Make Submission

In [84]:
# test["sentence"] = pp_pred_sentence_list
# test.to_csv("submission.csv", index=False)
# print(test.head())

             id                                           sentence
0  0f3dac00655e                          একটু বয়স হলে একটি বিদেশি।
1  a9395e01ad21  কী কারণে তুমি এতাবৎ কাল পর্যন্ত এ দারুল দৈব দু...
2  bf36ea8b718d  এ কারণে সরকার নির্ধারিত হারে পরিবহনজনিত ক্ষতি ...


## EOF