Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arr = [[self.vocab.stoi[x] for x in ex] for ex in arr] KeyError: None #592

Closed
wpfnlp opened this issue Aug 21, 2019 · 5 comments
Closed
Labels

Comments

@wpfnlp
Copy link

wpfnlp commented Aug 21, 2019

torchtext=0.4.0 BUG:

Traceback (most recent call last):
File "/Users/weipengfei/workspaces/FastNLPProjects/research01/Intent+SlotFilling01.py", line 112, in
for i, batch in enumerate(train_iter):
File "/miniconda3/lib/python3.7/site-packages/torchtext/data/iterator.py", line 156, in iter
yield Batch(minibatch, self.dataset, self.device)
File "/miniconda3/lib/python3.7/site-packages/torchtext/data/batch.py", line 34, in init
setattr(self, name, field.process(batch, device=device))
File "/miniconda3/lib/python3.7/site-packages/torchtext/data/field.py", line 237, in process
tensor = self.numericalize(padded, device=device)
File "/miniconda3/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in numericalize
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
File "/miniconda3/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
File "/miniconda3/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
KeyError: None

The same code torchtext=0.3.1 No problem, please tell me what caused it, thank you.

@zhangguanheng66
Copy link
Contributor

Can you post your script so I could reproduce the case?

@zhangguanheng66
Copy link
Contributor

Feel free to re-open the issue if you still have questions.

@TinaChen95
Copy link

TinaChen95 commented Oct 11, 2019

I come across the same issue, and it only happen when I define my own unk_token and set min_freq >1 at the same time.

here's the code I use:

SRC = data.Field(lower=True, unk_token="my_unk_token")
TGT = data.Field(lower=True)

train, val, test = datasets.IWSLT.splits(exts=('.de', '.en'), fields=(SRC, TGT))

SRC.build_vocab(train, min_freq=10)

train_iter = data.BucketIterator(dataset=train, batch_size=64,
sort_key=lambda x: data.interleave_keys(len(x.src), len(x.trg)))

batch = next(iter(train_iter))

@VP-0822
Copy link

VP-0822 commented Jun 2, 2020

I am still getting this issue. As @TinaChen95 mentioned, min_freq set to 1 works fine. when min_freq > 2, build_vocab(..) builds vocab as per min_freq, but KeyError is thrown while iterating over BucketIterator.

@VP-0822
Copy link

VP-0822 commented Jun 2, 2020

I think so at least for the issue I am facing I figured out that unk_token needs to be passed in ReversibleField constructor even if you want to use default unk_token. That is because ReversibleField uses ' UNK ' as unk_token, while in Vocab we have 'unk' as unk_token. Since there is already open bug #706 so customization is not possible atm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants