We manually annotate predicate–argument structures for the 600 L2-L1 pairs as the basis for the semantic analysis of learner Chinese. The dataset includes four typologically different mother tongues, i.e., English (ENG), Japanese (JPN), Russian (RUS) and Arabic (ARA). Sub-corpus of each language consists of 150 sentence pairs.
The work is published in EMNLP 2018, entitled with "Semantic Role Labeling for Learner Chinese: the Importance of Syntactic Parsing and L2-L1 Parallel Data". This project maintains the dataset. Hope the data can be helpful for your research in the field of semantic parsing for interlanguage. If you use the dataset, please cite the following papers:
Zi Lin, Yuguang Duan, Yuanyuan Zhao, Weiwei Sun and Xiaojun Wan. Semantic Role Labeling for Learner Chinese: the Importance of Syntactic Parsing and L2-L1 Parallel Data. The 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018).
Yuanyuan Zhao, Nan Jiang, Weiwei Sun and Xiaojun Wan. Overview of the NLPCC 2018 Shared Task: Grammatical Error Correction. Natural Language Processing and Chinese Computing (NLPCC 2018).