Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Purpose of adding a dummy whitespace at the beginning of each line of sentence #15

Closed
frankxu2004 opened this issue May 27, 2017 · 3 comments

Comments

@frankxu2004
Copy link
Contributor

I have seen in the help text of spm_train the following about the parameter:

--add_dummy_prefix (Add dummy whitespace at the beginning of text) type: bool default: true

Is there any explanation that why the default behavior is adding a prefix whitespace? I am just wondering what's the intention or advantages of doing this.

@taku910
Copy link
Collaborator

taku910 commented Jun 3, 2017

We would like to make the same treatment for the following two sentences

world
Hello world

Both are converted to

_world
_Hello_world

"_world" tends to be extracted as one token by putting a dummy whitespace.

@frankxu2004
Copy link
Contributor Author

Cool! that is actually intriguing!

@taku910 taku910 closed this as completed Jun 13, 2017
@taku910
Copy link
Collaborator

taku910 commented Jun 13, 2017

Seems the issue was resolved. closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants