Skip to content
master
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Word Error Rate Estimation Without ASR Output: e-WER2

This is the second version of e-WER (e-WER2).

New Features!

  • An end-to-end multistream architecture to predictthe WER per sentence using language-independent phonotactic features.
  • Our novel system is able to learn acoustic-lexical embeddings
  • We estimate the error rate directly without having access to the ASR results nor the ASR system – no-box WER estimation
System Pearson RSME e-WER (ref WER=28.5)
e-WER Glass Box 0.82 0.17 27.3%
e-WER Black Box 0.68 0.19 35.8%
e-WER2 Glass Box 0.74 0.19. 27.9%
e-WER2 Black Box 0.66 0.21 30.9%
e-WER No Box 0.56 0.24 30.9%

Model definition

An end-to-end multistream based regression model to predict the WER per sentence.

We combine four streams: lexical, phonotactic, acoustics and numerical features into a single end-to-end network to estimate word error rate directly. We jointly train the multistream network to obtain a joint feature space in which another fully connected layer to estimate the WER directly.

Results

Test set cumulative WER over all sentences X-axis is duration in hours and Y-axis is WER in %.

Citation

More details about this work can be found in INTERSPEECH 2020 and ACL 2018 papers:

@InProceedings{,
    author={Ali, Ahmed and Renals, Steve},
      title={Word Error Rate Estimation Without ASR Output: e-WER2},
      booktitle={INTERSPEECH},
      year={2020}, 

 @InProceedings{,
    author={Ali, Ahmed and Renals, Steve},
      title={Word Error Rate Estimation for Speech Recognition: e-WER},
      booktitle={ACL},
      year={2018}, 

About

Word Error Rate Estimation

Resources

Releases

No releases published

Packages

No packages published