-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr] KeyError: None #592
Comments
Can you post your script so I could reproduce the case? |
Feel free to re-open the issue if you still have questions. |
I come across the same issue, and it only happen when I define my own unk_token and set min_freq >1 at the same time. here's the code I use: SRC = data.Field(lower=True, unk_token="my_unk_token") train, val, test = datasets.IWSLT.splits(exts=('.de', '.en'), fields=(SRC, TGT)) SRC.build_vocab(train, min_freq=10) train_iter = data.BucketIterator(dataset=train, batch_size=64, batch = next(iter(train_iter)) |
I am still getting this issue. As @TinaChen95 mentioned, min_freq set to 1 works fine. when min_freq > 2, build_vocab(..) builds vocab as per min_freq, but KeyError is thrown while iterating over BucketIterator. |
I think so at least for the issue I am facing I figured out that unk_token needs to be passed in ReversibleField constructor even if you want to use default unk_token. That is because ReversibleField uses ' UNK ' as unk_token, while in Vocab we have 'unk' as unk_token. Since there is already open bug #706 so customization is not possible atm. |
torchtext=0.4.0 BUG:
Traceback (most recent call last):
File "/Users/weipengfei/workspaces/FastNLPProjects/research01/Intent+SlotFilling01.py", line 112, in
for i, batch in enumerate(train_iter):
File "/miniconda3/lib/python3.7/site-packages/torchtext/data/iterator.py", line 156, in iter
yield Batch(minibatch, self.dataset, self.device)
File "/miniconda3/lib/python3.7/site-packages/torchtext/data/batch.py", line 34, in init
setattr(self, name, field.process(batch, device=device))
File "/miniconda3/lib/python3.7/site-packages/torchtext/data/field.py", line 237, in process
tensor = self.numericalize(padded, device=device)
File "/miniconda3/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in numericalize
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
File "/miniconda3/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
File "/miniconda3/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
KeyError: None
The same code torchtext=0.3.1 No problem, please tell me what caused it, thank you.
The text was updated successfully, but these errors were encountered: