CoLAC: Corpus of Linguistic Acceptability for Chinese

Details of CoLAC

We present Corpus of Linguistic Acceptability in Chinese (CoLAC), the first large-scale acceptability dataset in a non-Indo-European language handcrafted by linguists to evaluate the grammatical proficiency of language models. Our dataset consists of 7,495 sentences collected from one syntax textbook, one linguistics handbook, and 68 linguistics journal articles, all verified by native speakers of Mandarin.

Every example sentence has two labels:

label0: a single label from the linguist who proposed the example (Note that this label is not from a single linguist, as we collected examples from one syntax textbook, one handbook for Chinese syntax and about 70 journal articles authored by different theoretical syntacticians), which we call linguist label,
label1: a crowd label, mapped from the mean ratings from other native speakers of Mandarin Chinese. This label is used in all our experiments.

Statistics of CoLAC:

Baselines

We ran several baselines, using XLM-R, the Chinese RoBERTa, variants of InstructGPT, ChatGPT and mTk. Results are shown below.

Cross-lingual transfer learning

For details of the experiments, see our paper.

Citation

@misc{hu2023revisiting,
      title={Revisiting Acceptability Judgements}, 
      author={Hai Hu and Ziyin Zhang and Weifang Huang and Jackie Yan-Ki Lai and Aini Li and Yina Patterson and Jiahui Huang and Peng Zhang and Chien-Jer Charles Lin and Rui Wang},
      year={2023},
      eprint={2305.14091},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2305.14091}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
CoLAC-final.zip		CoLAC-final.zip
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CoLAC-final.zip

CoLAC-final.zip

README.md

README.md

Repository files navigation

CoLAC: Corpus of Linguistic Acceptability for Chinese

Details of CoLAC

Baselines

Cross-lingual transfer learning

Citation

About

Releases

Packages

huhailinguist/CoLAC

Folders and files

Latest commit

History

CoLAC-final.zip

CoLAC-final.zip

README.md

README.md

Repository files navigation

CoLAC: Corpus of Linguistic Acceptability for Chinese

Details of CoLAC

Baselines

Cross-lingual transfer learning

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages