Resources for our AAAI 2024 paper: Translate Meanings, Not Just Words: IdiomKB's Role in Optimizing Idiomatic Translation with Language Models
The dataset comprises three JSON files, each containing idioms and their corresponding meanings for a specific language:
en_idiom_meaning.json
: For English idiomszh_idiom_meaning.json
: For Chinese idiomsja_idiom_meaning.json
: For Japanese idioms
Each data entry in the JSON file is a JSON object containing the following fields:
id
: A unique integer identifier for the idiomidiom
: The idiom in the corresponding languageen_meaning
: The English meaning of the idiomzh_meaning
: The Chinese meaning of the idiomja_meaning
: The Japanese meaning of the idiom
Here's an example data entry from zh_idiom_meaning.json
:
{
"id": 862,
"idiom": "厝火积薪",
"en_meaning": "to accumulate anger or resentment",
"zh_meaning": "比喻积累了许多危险因素而暗藏着许多麻烦或祸害",
"ja_meaning": "問題が蓄積され、いつか大きなトラブルになる可能性がある"
}
The method for constructing IdiomKB is detailed in Section 3.1 of the paper titled "IdiomKB Construction: Knowledge Distillation from LLMs". The data in this repository was generated by OpenAI's gpt-3.5-turbo-0613 model, so there may be errors.
We kindly request that you cite our paper if you use this repository. If you have any questions or need further information, please do not hesitate to contact us!
@misc{li2023translate,
title={Translate Meanings, Not Just Words: IdiomKB's Role in Optimizing Idiomatic Translation with Language Models},
author={Shuang Li and Jiangjie Chen and Siyu Yuan and Xinyi Wu and Hao Yang and Shimin Tao and Yanghua Xiao},
year={2023},
eprint={2308.13961},
archivePrefix={arXiv},
primaryClass={cs.CL}
}