Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SubwordVocab #399

Merged
merged 2 commits into from Sep 21, 2018
Merged

Fix SubwordVocab #399

merged 2 commits into from Sep 21, 2018

Conversation

nzw0301
Copy link
Contributor

@nzw0301 nzw0301 commented Sep 21, 2018

Fix #398 and use max_size.

from torchtext.vocab import SubwordVocab
from collections import Counter

counter = Counter("torch torch test".split())

v1 = SubwordVocab(counter, max_size=10)
print(v1.itos, v1.stoi)

v2 = SubwordVocab(counter, max_size=10)
print(v2.itos, v2.stoi)

This version's output

# v1
['<pad>', 't', 'c', 'e', 'h', 'o', 'r', 's', 'torch', 'test', 'est'] defaultdict(<function _default_unk_index at 0x11bc13158>,
{'<pad>': 0, 't': 1, 'c': 2, 'e': 3, 'h': 4, 'o': 5, 'r': 6, 's': 7, 'torch': 8, 'test': 9, 'est': 10})

# v2: the same to v1
['<pad>', 't', 'c', 'e', 'h', 'o', 'r', 's', 'torch', 'test', 'est'] defaultdict(<function _default_unk_index at 0x11bc13158>,
{'<pad>': 0, 't': 1, 'c': 2, 'e': 3, 'h': 4, 'o': 5, 'r': 6, 's': 7, 'torch': 8, 'test': 9, 'est': 10})

@mttk mttk merged commit 97db140 into pytorch:master Sep 21, 2018
@mttk
Copy link
Contributor

mttk commented Sep 21, 2018

Thanks @nzw0301

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants