You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HardMTBench: Stress-Testing Chinese-English Translation on Knowledge-Intensive Domains
HardMTBench Data Format
File
HardMTBench.jsonl — One JSON object per line, 20,000 records in total (10,000 ZH→EN + 10,000 EN→ZH).
Field Description
Field
Type
Description
source_text
string
The original text to be translated (source side).
reference
string
Human-verified reference translation for the source text.
id
string
Unique identifier for each test item. Format: {pair_md5}_{direction}, e.g. abc123_zh2en. Items sharing the same pair_md5 prefix are bidirectional counterparts.
source_language
string
Language of source_text. Values: "Chinese" or "English".
target_language
string
Language of reference / expected translation output. Values: "Chinese" or "English".
domain
string
Top-level domain category (12 domains in total, in English). E.g. "Finance", "Medical", "Legal".
sub_domain
string
Fine-grained sub-domain label (in English). E.g. "Fiscal Policy / Local Government Debt", "Capital Markets / Financing".
domain_keywords
list[string]
Key domain-specific terms/phrases extracted from the source text.
terminology
list[object]
Bilingual terminology pairs with category. Each object contains:
- source (string): term in source language
- target (string): term in target language
- category (string): terminology category label
scores
object
LLM-annotated quality and difficulty scores. Contains:
- domain_knowledge_density (int, 0–100): how dense the domain knowledge is
- translation_correctness (int, 0–100): correctness of the reference translation
- term_density (float, 0–100): density of domain terminology
- hardness (float, 0–100): composite hardness score, computed as H = 0.4×D + 0.4×F + 0.2×T where D = domain_knowledge_density, F = translation_difficulty, T = term_density