Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix collisions between oov words and in-vocab words (#447) #482

Merged
merged 2 commits into from Jan 31, 2019

Conversation

mtreviso
Copy link
Contributor

@mtreviso mtreviso commented Dec 6, 2018

Fix issue #447 by verifying if <unk> is not in specials list and then creates a defaultdict without default_factory.

@mttk
Copy link
Contributor

mttk commented Jan 31, 2019

Thanks, this will work for now, but needs to be handled better.

@raheelqader
Copy link

raheelqader commented Feb 25, 2019

This change broke my code as my unknown token was <UNK> (in capital). I had to manually change <unk> to <UNK> in vocab.py to fix it.

@cpuhrsch
Copy link
Contributor

cpuhrsch commented Aug 2, 2019

The documentation wasn't updated that Vocab now by default doesn't prepend the unk token anymore (as is being verified in this test) unless passed explicitly. I fixed it here and included in the default specials list to reflect the default behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants