Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to add some new words to the training corpus? #4

Closed
sunparkmos opened this issue Jun 14, 2018 · 4 comments
Closed

How to add some new words to the training corpus? #4

sunparkmos opened this issue Jun 14, 2018 · 4 comments

Comments

@sunparkmos
Copy link

s20통장을쏠편한입출금으로 전환 => current result => s20 통장을 쏠 편한 입출금으로 전환
s20통장을쏠편한입출금으로 전환 => I'd like to below => s20통장을 쏠편한 입출금으로 전환

@haven-jeon
Copy link
Owner

haven-jeon commented Jun 14, 2018

"쏠편한" is new word.
http://www.techholic.co.kr/news/articleView.html?idxno=173901

But both of "s20 통장을" and "s20통장을" is correct expression. because of compound word.

Even if you insert the learning data, it is difficult for the issue to be solved. Rather, there is also a risk that the overall performance of the model will decline when it tries to learn. For these patterns, it may be effective to set up post-processing rules for the sentence.

Any ideas of this?

@sunparkmos
Copy link
Author

Thank you for your quick reply,
What are the post processing rules for the sentences? Could you explain it more?
쏠편한 is a new product name by a bank.
I thought the following way to correct from "쏠 편한", after Kospacing to "쏠편한".
Look for the pattern "쏠 편한" in the resulting sentence, and then then change to "쏠편한".
Do you have other ways, then pls let me know them,

@haven-jeon
Copy link
Owner

haven-jeon commented Jun 14, 2018

Perhaps there are various patterns of pykospacing results. So we need to consider the changes in these.

"쏠 편한", "쏠편 한", "쏠 편 한", "쏠편한"

The larger the dictionary, the more likely it will be a performance issue, but the code below might be the starting point of post-processing.

>> import re
>> s = "쏠편한" # word in user dictionary 
>> p = re.compile('\s*'.join(s))
>> p.sub(s, 's20 통장을 쏠 편한 입출금으로 전환')
's20 통장을 쏠편한 입출금으로 전환'

Perhaps there will be other good ideas.

@sunparkmos
Copy link
Author

Good idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants