-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Purpose of adding a dummy whitespace at the beginning of each line of sentence #15
Comments
We would like to make the same treatment for the following two sentences world Both are converted to _world "_world" tends to be extracted as one token by putting a dummy whitespace. |
Cool! that is actually intriguing! |
Seems the issue was resolved. closing. |
28 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have seen in the help text of
spm_train
the following about the parameter:--add_dummy_prefix (Add dummy whitespace at the beginning of text) type: bool default: true
Is there any explanation that why the default behavior is adding a prefix whitespace? I am just wondering what's the intention or advantages of doing this.
The text was updated successfully, but these errors were encountered: