Skip to content

pkucoli/UST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Universal Semantic Tagging (UST; Bjerva et al., 2016) aims to provide lightweight unified analysis for all languages at the word level. This moderate-sized corpus provides high-quality manual annotations of UST for English and Chinese.

The data source consists of 1100 English--Chinese parallel sentences from the Wall Street Journal (WSJ) section of Penn TreeBank (PTB; Marcus et al., 1993) and 1000 sentences from Chinese TreeBank (CTB; Xue et al., 2005). Chinese counterparts of original English sentences in WSJ are literally translated by English–Chinese bilinguals.

The first column is tokens; the second one is POS tags automatically predicated by the Stanford CoreNLP tool; the last column is manual annotations of UST, whose observed inter-annotator agreement achieves 92.9% and 91.2% for English and Chinese respectively.

Further details about this corpus can be found in the paper titled Universal Semantic Tagging for English and Mandarin Chinese, which is to be published in the proceeding of NAACL 2021. The link of this article will be updated once it is got.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published