Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KNP seems to ignore the timeout option #47

Closed
CLRafaelR opened this issue May 5, 2021 · 9 comments
Closed

KNP seems to ignore the timeout option #47

CLRafaelR opened this issue May 5, 2021 · 9 comments

Comments

@CLRafaelR
Copy link

KNP module seems to ignore the timeout option. In the following example, I set the timeout period to 0 for a time-consuming process (option='-tab -anaphora -semantic-head') with a long sentence. Although I gave no time for the parser to process that sentence, the parse is done and it takes 2--3 seconds.

#!/usr/bin/env python3.6
# coding: utf-8

from pyknp import KNP
import time

# Default is JUMAN++. If you use JUMAN, use KNP(jumanpp=False)
knp = KNP(option='-tab -anaphora -semantic-head', timeout=0)

start = time.time()

"""
Source of the to-be-parsed sentence:
https://ja.uncyclopedia.info/wiki/%E8%AA%AD%E3%81%BF%E3%81%AB%E3%81%8F%E3%81%84%E6%96%87%E7%AB%A0/%E6%96%87%E3%81%AE%E6%A7%8B%E9%80%A0%E3%81%8C%E5%8E%9F%E5%9B%A0%E3%81%A7%E8%AA%AD%E3%81%BF%E3%81%AB%E3%81%8F%E3%81%84%E6%96%87%E7%AB%A0#.E5.8F.A5.E8.AA.AD.E7.82.B9.E3.81.8C.E5.B0.91.E3.81.AA.E3.81.99.E3.81.8E.E3.82.8B.E6.96.87.E7.AB.A0
"""

result = knp.parse(
    "このような句読点が少ない文章は文章の流れをつかみにくく読みづらいという欠点を持つが書く人はこの方が早く書けることもあるので「句読点が多い文章」よりは出現しやすくさらにパソコンやワープロなどで書き印刷する場合は紙のスペースの節約になり省資源にも多少は貢献すると思われるのみならず最近の女子高生が携帯電話からブログに書いた文章においては句読点がないのが普通だったりするんだけれども読みやすさ第一を心がける場合においては句読点をつけすぎなさすぎないことはやめましょう。")

elapsed_time = time.time() - start

print("文節")
for bnst in result.bnst_list():  # 各文節へのアクセス
    print("\tID:%d, 見出し:%s, 係り受けタイプ:%s, 親文節ID:%d, 素性:%s"
          % (bnst.bnst_id, "".join(mrph.midasi for mrph in bnst.mrph_list()), bnst.dpndtype, bnst.parent_id, bnst.fstring))

print("基本句")
for tag in result.tag_list():  # 各基本句へのアクセス
    print("\tID:%d, 見出し:%s, 係り受けタイプ:%s, 親基本句ID:%d, 素性:%s"
          % (tag.tag_id, "".join(mrph.midasi for mrph in tag.mrph_list()), tag.dpndtype, tag.parent_id, tag.fstring))

print("形態素")
for mrph in result.mrph_list():  # 各形態素へのアクセス
    print("\tID:%d, 見出し:%s, 読み:%s, 原形:%s, 品詞:%s, 品詞細分類:%s, 活用型:%s, 活用形:%s, 意味情報:%s, 代表表記:%s"
          % (mrph.mrph_id, mrph.midasi, mrph.yomi, mrph.genkei, mrph.hinsi, mrph.bunrui, mrph.katuyou1, mrph.katuyou2, mrph.imis, mrph.repname))

print(round(elapsed_time, 2))

The option appears to be merely initialised in the definition but no function seems to take timeout as its argument. Would you mind rechecking here?

@nobu-g
Copy link
Member

nobu-g commented May 5, 2021

Thanks!
timeout attribute of KNP class is not used and this is apparently a bug.
I will fix it.

@CLRafaelR
Copy link
Author

@nobu-g

I appreciate your attendance!

@CLRafaelR
Copy link
Author

CLRafaelR commented May 6, 2021

@nobu-g

I don’t think that -timeout option properly works on bash either. The example sentence I mentioned above is parsed, although I set 0 sec for the timeout. Would you mind rechecking whether timeout option of knp itself functions, too?

echo "このような句読点が少ない文章は文章の流れをつかみにくく読みづらいという欠点を持つが書く人はこの方が早く書けることもあるので「句読点が多い文章」よりは出現しやすくさらにパソコンやワープロなどで書き印刷する場合は紙のスペースの節約になり省資源にも多少は貢献すると思われるのみならず最近の女子高生が携帯電話からブログに書いた文章においては句読点がないのが普通だったりするんだけれども読みやすさ第一を心がける場合においては句読点をつけすぎなさすぎないことはやめましょう。" | jumanpp | knp -tab -anaphora -semantic-head -timeout 0

@nobu-g
Copy link
Member

nobu-g commented May 9, 2021

Could you make an issue on ku-nlp/knp?

@CLRafaelR
Copy link
Author

@nobu-g

Done, I submitted the issue: ku-nlp/knp#7

@nobu-g nobu-g closed this as completed in 73b095b May 11, 2021
@CLRafaelR
Copy link
Author

CLRafaelR commented May 12, 2021

Thank you for the bug fix.

I want to reinstall pyknp via the following command, but this command does not work since the GitHub repo is missing setup.py and __version__.py

pip install git+https://github.com/ku-nlp/pyknp.git@master#egg=pyknp

So currently, I got the latest stable release of pyknp (0.4.6) from here, copied setup.py and __version__.py of the stable version and placed these two files to my local git repo of pyknp. I succeeded in the reinstallation and I'm now trying to use the timeout option.

@CLRafaelR
Copy link
Author

@nobu-g

I confirmed that the bug is fixed in my local environment too. I appreciate your help and maintenance!

@nobu-g
Copy link
Member

nobu-g commented May 12, 2021

I could not reproduce your issue...
pip supports pyproject.toml and does not require setup.py or __version__.py.
https://github.com/pypa/pip/blob/main/NEWS.rst#features-18

Could you upgrade pip and try again?
Or in another way, you can install the latest pyknp just by pip install pyknp.

@CLRafaelR
Copy link
Author

Sorry for my late reply. I successfully installed the latest pyknp via pip without any copy-and-paste procedure after upgrading pip. Thank you for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants