New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

LRNounExtractor_v2 후처리 함수 #13

Open

lovit opened this issue Jun 27, 2018 · 1 comment

Owner

lovit commented Jun 27, 2018

후처리 과정에서 걸러지는 단어 예시

확실한 경우 : '이상으로'
문맥에 따라 다른 경우 : '최고가', '만족도'

'최고가', '만족도'는 문맥에 따라서 true 일수도 false 일수도 있으나, base postprocessor 로는 이를 구분할 수 없음.

Owner Author

lovit commented Jun 29, 2018

N = N + J 인지 확인하는 과정에서 '지금은' 은 '지금 + 은'이기 때문에 걸러진다. 하지만, '이력서', '고양이' 역시 '이력+서', '고양 +이'로 걸러진다.

lrgraph_origin.get_r('점심은')

[('', 9581),
 ('?', 1714),
 ('요?', 324),
 ('요', 65),
 ('용?', 61),
 ('여?', 39),
 ('여', 15),
 ('용', 13),
 ('유?', 10),
 ('유', 5)]

lrgraph_origin.get_r('이력서')

[('', 1469),
 ('에', 88),
 ('는', 81),
 ('를', 69),
 ('도', 62),
 ('랑', 52),
 ('가', 30),
 ('만', 28),
 ('?', 23),
 ('나', 16)]

lrgraph_origin.get_r('고양이')

 [('', 2814),
 ('가', 665),
 ('는', 406),
 ('랑', 210),
 ('도', 187),
 ('들', 139),
 ('를', 106),
 ('?', 93),
 ('야', 74),
 ('한테', 64)]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment