-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to add some new words to the training corpus? #4
Comments
"쏠편한" is new word. But both of "s20 통장을" and "s20통장을" is correct expression. because of compound word. Even if you insert the learning data, it is difficult for the issue to be solved. Rather, there is also a risk that the overall performance of the model will decline when it tries to learn. For these patterns, it may be effective to set up post-processing rules for the sentence. Any ideas of this? |
Thank you for your quick reply, |
Perhaps there are various patterns of pykospacing results. So we need to consider the changes in these.
The larger the dictionary, the more likely it will be a performance issue, but the code below might be the starting point of post-processing.
Perhaps there will be other good ideas. |
Good idea |
s20통장을쏠편한입출금으로 전환 => current result => s20 통장을 쏠 편한 입출금으로 전환
s20통장을쏠편한입출금으로 전환 => I'd like to below => s20통장을 쏠편한 입출금으로 전환
The text was updated successfully, but these errors were encountered: