Python Keyphrase Extraction with Korean Language support

Background

pke module is the one-stop shop for keyword extraction in Python (except RAKE). However, due to its lack of support for languages not included in Spacy language models, it cannot be applied on Korean text. The only change I made here is to add that functionality.

Requirements

The same requirements of pke are applied here as I only tweaked for the preprocessing of text. Additionally, to handle Korean text POS tagging, konlpy (for Unix-based systems) or eunjeon (for Windows-based systems) is needed. As I employed Mecab module, Mecab-ko should be installed before use.

Usages

The same syntax applies as the original package. Refer to the following sample script. (sample.py)

import unsupervised

text = {'ko':'5일 블룸버그통신에 따르면 블랙록, 스테이트 스트리트, JP모간 자산운용, UBS 자산운용 등 세계 최대 규모 자산운용사들은 올해 하반기에도 주식시장이 계속해서 오를 것이란 전망을 제시하고 있다. \
MSCI 전 국가 세계 지수가 올해 12% 상승하며 역대 고점까지 올랐지만 추가 상승 가능성이 높다는 것이다.',
'en':'A mathematical model of ion exchange is considered, allowing for ion exchanger compression in the process of ion exchange. \
Two inverse problems are investigated for this model, unique solvability is proved, and numerical solution methods are proposed. \
The efficiency of the proposed methods is demonstrated by a numerical experiment.'}

extractor = {'TextRank':{'ko':unsupervised.TextRank(),'en':unsupervised.TextRank()},
'TopicRank':{'ko':unsupervised.TopicRank(),'en':unsupervised.TopicRank()},
'YAKE':{'ko':unsupervised.YAKE(),'en':unsupervised.YAKE()}}

for ext in extractor:
    for lang in text:
        extractor[ext][lang].load_document(text[lang],language=lang)
        extractor[ext][lang].candidate_selection()
        extractor[ext][lang].candidate_weighting()
        print(f'{ext} - {lang}')
        print(extractor[ext][lang].get_n_best())

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
graphs		graphs
models		models
supervised		supervised
unsupervised		unsupervised
README.md		README.md
__init__.py		__init__.py
base.py		base.py
data_structures.py		data_structures.py
langcodes.py		langcodes.py
readers.py		readers.py
sample.py		sample.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Keyphrase Extraction with Korean Language support

Background

Requirements

Usages

About

Releases

Packages

Languages

staedi/pke_u

Folders and files

Latest commit

History

Repository files navigation

Python Keyphrase Extraction with Korean Language support

Background

Requirements

Usages

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages