OpenKorPos

OpenKorPos is a Korean part-of-speech tagging corpus. It is a free, open alternative to the Sejong corpus and Modu corpus.

For background of this work, please refer to our paper.

Building

Building the corpus requires Python 3.9+, Click, and Ninja. You can install all dependencies using the provided requirements.txt.

pip install -r requirements.txt

To build the corpus, you will need to generate the corresponding ninja files, then build.

python openkorpos.py ningen base
ninja

You can also enable all the quarantined (flagged) sentences to be included into the generated corpus.

python openkorpos.py ningen --flagged base

Using

The build artifacts get dropped into the build directory. Each file is a JSON lines formatted file, encoded in UTF-8.

Citing

If you need to cite this work before it is made available in the ACL Anthology bibtex, please use the following:

@inproceedings{Moon:LREC2022,
    title = "OpenKorPOS: Democratizing Korean Tokenization with Voting-Based Open Corpus Annotation",
    author = "Moon, Sangwhan and 
              Cho, Won Ik  and 
              Han, Hye Joo  and 
              Okazaki, Naoaki and 
              Kim, Nam Soo",
    booktitle = "Proceedings of the 13th Language Resources and Evaluation Conference (LREC)",
    month = June,
    year = "2022",
    address = "Marseille",
    publisher = "European Language Resources Association",
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
build		build
data		data
rules		rules
.gitignore		.gitignore
.gitmodules		.gitmodules
license.txt		license.txt
openkorpos.pdf		openkorpos.pdf
openkorpos.py		openkorpos.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenKorPos

Building

Using

Citing

About

Releases

Packages

Languages

License

openkorpos/openkorpos

Folders and files

Latest commit

History

Repository files navigation

OpenKorPos

Building

Using

Citing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages