Error when using provided vocab for experimental IMDB dataset #681

bentrevett · 2020-01-21T16:55:15Z

🐛 Bug

Describe the bug
When you try and provide a vocabulary to the new experimental IMDB dataset you get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-83dc4fb02510> in <module>
      5 vocab = train_data.get_vocab()
      6 
----> 7 train_data, test_data = IMDB(vocab = vocab)

~/.conda/envs/pytorch14/lib/python3.7/site-packages/torchtext/experimental/datasets/text_classification.py in IMDB(*args, **kwargs)
    129     """
    130 
--> 131     return _setup_datasets(*(("IMDB",) + args), **kwargs)
    132 
    133 

~/.conda/envs/pytorch14/lib/python3.7/site-packages/torchtext/experimental/datasets/text_classification.py in _setup_datasets(dataset_name, root, ngrams, vocab, removed_tokens, tokenizer, data_select)
     81         logging.info('Creating {} data'.format(item))
     82         data_iter = _create_data_from_iterator(vocab, iters_group[item], removed_tokens)
---> 83         for cls, tokens in data_iter:
     84             data[item]['data'].append((torch.tensor(cls),
     85                                        torch.tensor([token_id for token_id in tokens])))

~/.conda/envs/pytorch14/lib/python3.7/site-packages/torchtext/experimental/datasets/text_classification.py in _create_data_from_iterator(vocab, iterator, removed_tokens)
     16 
     17 def _create_data_from_iterator(vocab, iterator, removed_tokens):
---> 18     for cls, tokens in iterator:
     19         yield cls, iter(map(lambda x: vocab[x],
     20                         filter(lambda x: x not in removed_tokens, tokens)))

ValueError: too many values to unpack (expected 2)

This even happens when you try to use the vocabulary created by the dataset itself.

To Reproduce
Using edited vocabulary:

from torchtext.experimental.datasets import IMDB
from torchtext.vocab import Vocab

train_data, test_data = IMDB()

old_vocab = train_data.get_vocab()

new_vocab = Vocab(counter = old_vocab.freqs,
                  max_size = 25_000)

train_data, test_data = IMDB(vocab = new_vocab)

Using un-edited vocabulary:

from torchtext.experimental.datasets import IMDB

train_data, test_data = IMDB()

vocab = train_data.get_vocab()

train_data, test_data = IMDB(vocab = vocab)

Expected behavior
TorchText to create the new datasets with the provided vocabulary, such as with the current text_classification datasets, e.g.

import torchtext

train_data, _ = torchtext.datasets.text_classification.AG_NEWS()

old_vocab = train_data.get_vocab()
    
new_vocab = torchtext.vocab.Vocab(counter = old_vocab.freqs, 
                                  max_size = 25_000)

train_data, test_data = torchtext.datasets.text_classification.AG_NEWS(vocab = new_vocab)

Environment
Python 3.7, PyTorch 1.4 and TorchText 0.5

The text was updated successfully, but these errors were encountered:

Sandy4321 · 2023-06-09T18:13:06Z

still but for jupyter windows 11
torchtext installed today so it is fresh version , but on plain python no error

and

Unexpected exception formatting exception. Falling back to standard exception
Traceback (most recent call last):
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\IPython\core\interactiveshell.py", line 3505, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "C:\Users\cde3\AppData\Local\Temp\ipykernel_9924\258056427.py", line 1, in
from torchtext.datasets import IMDB
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\torchtext_init_.py", line 6, in
from torchtext import _extension # noqa: F401
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\torchtext_extension.py", line 64, in
_init_extension()
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\torchtext_extension.py", line 58, in _init_extension
_load_lib("libtorchtext")
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\torchtext_extension.py", line 50, in load_lib
torch.ops.load_library(path)
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\torch_ops.py", line 787, in load_library
File "C:\Users\cde3\AppData\Local\Programs\Python\Python310\lib\ctypes_init.py", line 374, in init
self._handle = _dlopen(self._name, mode)
OSError: [WinError 127] The specified procedure could not be found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\IPython\core\interactiveshell.py", line 2102, in showtraceback
stb = self.InteractiveTB.structured_traceback(
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\IPython\core\ultratb.py", line 1310, in structured_traceback
return FormattedTB.structured_traceback(
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\IPython\core\ultratb.py", line 1199, in structured_traceback
return VerboseTB.structured_traceback(
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\IPython\core\ultratb.py", line 1052, in structured_traceback
formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context,
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\IPython\core\ultratb.py", line 978, in format_exception_as_a_whole
frames.append(self.format_record(record))
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\IPython\core\ultratb.py", line 878, in format_record
frame_info.lines, Colors, self.has_colors, lvals
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\IPython\core\ultratb.py", line 712, in lines
return self._sd.lines
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\stack_data\utils.py", line 144, in cached_property_wrapper
value = obj.dict[self.func.name] = self.func(obj)
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\stack_data\core.py", line 734, in lines
pieces = self.included_pieces
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\stack_data\utils.py", line 144, in cached_property_wrapper
value = obj.dict[self.func.name] = self.func(obj)
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\stack_data\core.py", line 681, in included_pieces
pos = scope_pieces.index(self.executing_piece)
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\stack_data\utils.py", line 144, in cached_property_wrapper
value = obj.dict[self.func.name] = self.func(obj)
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\stack_data\core.py", line 660, in executing_piece
return only(
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\executing\executing.py", line 190, in only
raise NotOneValueFound('Expected one value, found 0')
executing.executing.NotOneValueFound: Expected one value, found 0

srujan-landeri · 2023-12-09T11:24:35Z

Hey, Just got the same issue, Have you fixed it?

zhangguanheng66 mentioned this issue Jan 21, 2020

Bug fix in experimental IMDB #683

Merged

zhangguanheng66 closed this as completed in #683 Jan 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when using provided vocab for experimental IMDB dataset #681

Error when using provided vocab for experimental IMDB dataset #681

bentrevett commented Jan 21, 2020 •

edited

Sandy4321 commented Jun 9, 2023

srujan-landeri commented Dec 9, 2023

Error when using provided vocab for experimental IMDB dataset #681

Error when using provided vocab for experimental IMDB dataset #681

Comments

bentrevett commented Jan 21, 2020 • edited

🐛 Bug

Sandy4321 commented Jun 9, 2023

srujan-landeri commented Dec 9, 2023

bentrevett commented Jan 21, 2020 •

edited