Hi,
Thanks for the great toolkit. I've been using MeetEval recently while experimenting with multi-speaker ASR evaluation.
I’m an engineer/researcher from Korea, working with multilingual speech recognition systems.
While using MeetEval, I noticed that the current metrics are mainly WER-based (WER, cpWER, tcpWER). This works well for languages with clear whitespace word boundaries like English.
However for languages such as:
word segmentation can be ambiguous, and evaluation scores can depend on the tokenizer used. Because of this, many ASR benchmarks for these languages also report Character Error Rate (CER).
So I was wondering whether it would make sense to add character-level variants of the existing metrics, for example:
Conceptually this would only change the evaluation unit (characters instead of words) while keeping the existing pipeline (speaker permutation, timing constraints, etc.) the same.
Before trying to implement this, I wanted to ask:
- Would adding character-level metrics fit within the scope of MeetEval?
- If yes, would you prefer a separate CER implementation, or a more general token unit abstraction?
If this sounds useful, I’d be happy to try preparing a PR.
Thanks!
Hi,
Thanks for the great toolkit. I've been using MeetEval recently while experimenting with multi-speaker ASR evaluation.
I’m an engineer/researcher from Korea, working with multilingual speech recognition systems.
While using MeetEval, I noticed that the current metrics are mainly WER-based (WER, cpWER, tcpWER). This works well for languages with clear whitespace word boundaries like English.
However for languages such as:
word segmentation can be ambiguous, and evaluation scores can depend on the tokenizer used. Because of this, many ASR benchmarks for these languages also report Character Error Rate (CER).
So I was wondering whether it would make sense to add character-level variants of the existing metrics, for example:
Conceptually this would only change the evaluation unit (characters instead of words) while keeping the existing pipeline (speaker permutation, timing constraints, etc.) the same.
Before trying to implement this, I wanted to ask:
If this sounds useful, I’d be happy to try preparing a PR.
Thanks!