# Infer transcription results without punctuation

This notebooks augments a run with transcriptions without punctuation. By doing so, I get aanother virtual test run for librispeech without puntuation. The difference of errors can be interpreted as all the instances, where the system predicted the correct word, but either the case, or the punctuation did not match. These instances have lower impact on overall text understanding.

In [1]:
import pandas as pd

In [2]:
run = "../out/baseline_large-v3-turbo.csv"
df = pd.read_csv(run)
df

Unnamed: 0,id,wer,pred_transcription,time
0,2961-961-0000,0.220503,socrates begins the timaeus with a summary of...,4.024658
1,4970-29093-0000,0.111667,you'll never dig it out of the astor library ...,4.773546
2,6930-76324-0001,0.180556,they were certainly no near the solution of t...,3.610766
3,7729-102255-0000,0.241182,the bogus legislature numbered thirty-six mem...,9.543177
4,5105-28240-0000,0.119658,"Fast as his legs could carry him, Servadak ha...",3.614152
...,...,...,...,...
78,1995-1837-0000,0.138614,he knew the silver fleece his and zora's must...,3.913258
79,237-126133-0000,0.240449,here she would stay comforted and soothed amo...,3.372552
80,6829-68771-0000,0.236994,so to the surprise of the democratic committe...,4.585420
81,5683-32879-0000,0.230769,it was not very much past eleven that morning...,3.784462


In [3]:
print("Average WER:", df["wer"].mean())
print("Standard deviation WER:", df["wer"].std())

Average WER: 0.1759498098019635
Standard deviation WER: 0.07930608679759424


In [4]:
def remove_pc(s: str) -> str:
    """Remove the PC from the string."""
    return s.lower().replace(".", "").replace(",", "").replace("'", "").replace('"', "").replace("`", "").replace("(", "").replace(")", "").replace("[", "").replace("]", "").replace("{", "").replace("}", "").replace(";", "").replace(":", "").replace("!", "").replace("?", "")

In [8]:
def load_ground_truth(id: str) -> str:
    """Load the ground truth for the given id."""
    with open(f"../data/librispeech-pc-test-clean/{id}/{id}.txt", "r") as f:
        return f.read().strip()

In [9]:
df["true_transcription"] = df["id"].apply(load_ground_truth)
df["true_transcription_clean"] = df["true_transcription"].apply(remove_pc)
df["pred_transcription_clean"] = df["pred_transcription"].apply(remove_pc)

In [10]:
import jiwer

df["wer_clean"] = df.apply(
    lambda row: jiwer.wer(row["true_transcription_clean"], row["pred_transcription_clean"]),
    axis=1,
)

In [11]:
librispeech_total_seconds = 17943.12
total_time = df["time"].sum()

In [12]:
print("Realtime factor", librispeech_total_seconds / total_time, "\n")
print("Mean WER:", df["wer"].mean())
print("Standard deviation WER:", df["wer"].std(), "\n")
print("Mean WER (no punctuation):", df["wer_clean"].mean())
print("Standard deviation WER (no punctuation):", df["wer_clean"].std())

Realtime factor 42.56759713089274 

Mean WER: 0.1759498098019635
Standard deviation WER: 0.07930608679759424 

Mean WER (no punctuation): 0.04101236386149508
Standard deviation WER (no punctuation): 0.054833869140468


In [13]:
# Remove true transcriptions from output
df = df.drop(columns=["true_transcription", "true_transcription_clean"])
df.to_csv(run, index=False)