Skip to content

Automatic Korean Hanja tagging tool powered by Hanjaro (hanjaro.juntong.or.kr)

License

Notifications You must be signed in to change notification settings

kaniblu/hanja-tagger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hanja-tagger

Automatic Korean Hanja tagging tool powered by Hanjaro (hanjaro.juntong.or.kr)

Getting Started

Install this package by running the standard setup.py install command, after cloning this repo.

   python setup.py install

Tagging Korean with Hanja

First, initialize a Hanjaro object, which will manage relevants sessions and cookie. You should be able to obtain raw tagging results from Hanjaro.

>>> from hanjatagger import Hanjaro
>>> with Hanjaro() as hjr:
...     print(hjr.query("안녕하세요"))
"안녕(安寧)하세요"

This package comes with a more programming-friendly wrapper for the query results. Use HanjaroTagger to obtain more stream-lined tag results from the query.

>>> from hanjatagger import Hanjaro, HanjaroTagger
>>> with Hanjaro() as hjr:
...    tagger = HanjaroTagger(hjr)
...    print(tagger.tag("안녕하세요"))
"安寧   "

The return string is as long as the input query (len(ret) == len(q)), and all hanja characters will be replaced by chinese characters, while other non-hanja characters replaced by spaces.

Several options can be configured for HanjaroTagger during initization:

  • simplified_han: (bool) if true, it converts Traditional Chinese characters (zh-cn) into Simplified Chinese characters (hans-cn).
  • unified_cjk: (bool) if true, it converts Chinese characters possibly encoded in CJK compatibility unicodes into CJK Unified Ideographs.

Disclaimer

This package comes with a crawler that should only be used for non-commercial and research purposes. Furthermore, it is the responsibility of the user to ensure that using this package incurs no damage to the owner of the website.

About

Automatic Korean Hanja tagging tool powered by Hanjaro (hanjaro.juntong.or.kr)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages