New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

단순한 Tokenizer 인터페이스 #87

Open

Wook0129 opened this issue Feb 10, 2019 · 0 comments

Wook0129 commented Feb 10, 2019

튜토리얼을 보면 토크나이징을 위해 필요한 준비과정이 아래 3단계로 보입니다.

WordExtractor로 general한 단어 가능성을 계산
NounExtractor로 명사 가능성을 계산
위의 2가지 정보를 조합한 점수(명사에 좀 더 가중치를 준, 단어 가능성 점수)로 Tokenizer를 초기화하여 사용

현재는 토크나이징을 위해 세 가지 모듈을 모두 불러 와야 하는데요,
사용 편의성을 위해 위의 3단계를 감싸는 단순화된 토크나이저 인터페이스(i.e. fit / transform ..)가 있으면 좋겠습니다.

lovit mentioned this issue

tokenizer refactoring #143

Open

3 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment