Add feature(#232) to split Inflect type words of mecab #341
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
예전에 제가 제기했던 이슈 #232 를 구현하여 PR 드립니다.
목표 :
현재 konlpy.tag의 Mecab을 불러와서 형태소 분석을 하면
'힘든 하루였다'란 문장을
[('힘든', 'VA+ETM'), ('하루', 'NNG'), ('였', 'VCP+EP'), ('다', 'EC')]
이렇게 표기합니다.
이를
[('힘들', 'VA'), ('ㄴ', 'ETM'), ('하루', 'NNG'), ('이', 'VCP'), ('었', 'EP'), ('다', 'EC')]
로 볼 수 있는 옵션을 추가했습니다.
mecab.pos('힘든 하루였다', split_inflect=True)
구현:
mecab에서 이미 inflect type 단어를 형태소로 나누어 따로 표기해줍니다.
그래서 필요한 부분을 추출해서 썼습니다.