This dataset contains of sentence pairs extracted from CiNii (https://ci.nii.ac.jp/).
Each sentence pair is annotated with a similarity score 0 (low semantic similarity) to 5 (high semantic similarity).
If you use this dataset please cite our paper:
@article{mutinda2021semantic,
title={Semantic Textual Similarity in Japanese Clinical Domain Texts Using BERT},
author={Mutinda, Faith Wavinya and Yada, Shuntaro and Wakamiya, Shoko and Aramaki, Eiji},
journal={Methods of Information in Medicine},
year={2021},
publisher={Georg Thieme Verlag KG}
}