[FT] Enhancing CorpusLevelTranslationMetric with Asian Language Support

## Issue encountered
While working on several Japanese benchmark tasks, I observed that standard BLEU, CHRF, and TER metrics are suboptimal for Asian languages.
To address this, I propose adding a parameter to CorpusLevelTranslationMetric that allows integration with tokenizers tailored for Asian languages.

## Solution/Feature
`SacreBLEU` already includes tokenizers designed for Asian languages, which lack space-separated words. By modifying the implementation slightly, we can extend CorpusLevelTranslationMetric to better handle these languages.

https://github.com/mjpost/sacrebleu/blob/0f351010b8b641aaa59fe75b98d7cc522bf221eb/sacrebleu/metrics/bleu.py#L110-L208

## Possible alternatives
A clear and concise description of any alternative solutions or features you've considered.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FT] Enhancing CorpusLevelTranslationMetric with Asian Language Support #478

Issue encountered

Solution/Feature

Possible alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FT] Enhancing CorpusLevelTranslationMetric with Asian Language Support #478

Description

Issue encountered

Solution/Feature

Possible alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions