Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using provided vocab for experimental IMDB dataset #681

Closed
bentrevett opened this issue Jan 21, 2020 · 2 comments 路 Fixed by #683
Closed

Error when using provided vocab for experimental IMDB dataset #681

bentrevett opened this issue Jan 21, 2020 · 2 comments 路 Fixed by #683

Comments

@bentrevett
Copy link
Contributor

bentrevett commented Jan 21, 2020

馃悰 Bug

Describe the bug
When you try and provide a vocabulary to the new experimental IMDB dataset you get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-83dc4fb02510> in <module>
      5 vocab = train_data.get_vocab()
      6 
----> 7 train_data, test_data = IMDB(vocab = vocab)

~/.conda/envs/pytorch14/lib/python3.7/site-packages/torchtext/experimental/datasets/text_classification.py in IMDB(*args, **kwargs)
    129     """
    130 
--> 131     return _setup_datasets(*(("IMDB",) + args), **kwargs)
    132 
    133 

~/.conda/envs/pytorch14/lib/python3.7/site-packages/torchtext/experimental/datasets/text_classification.py in _setup_datasets(dataset_name, root, ngrams, vocab, removed_tokens, tokenizer, data_select)
     81         logging.info('Creating {} data'.format(item))
     82         data_iter = _create_data_from_iterator(vocab, iters_group[item], removed_tokens)
---> 83         for cls, tokens in data_iter:
     84             data[item]['data'].append((torch.tensor(cls),
     85                                        torch.tensor([token_id for token_id in tokens])))

~/.conda/envs/pytorch14/lib/python3.7/site-packages/torchtext/experimental/datasets/text_classification.py in _create_data_from_iterator(vocab, iterator, removed_tokens)
     16 
     17 def _create_data_from_iterator(vocab, iterator, removed_tokens):
---> 18     for cls, tokens in iterator:
     19         yield cls, iter(map(lambda x: vocab[x],
     20                         filter(lambda x: x not in removed_tokens, tokens)))

ValueError: too many values to unpack (expected 2)

This even happens when you try to use the vocabulary created by the dataset itself.

To Reproduce
Using edited vocabulary:

from torchtext.experimental.datasets import IMDB
from torchtext.vocab import Vocab

train_data, test_data = IMDB()

old_vocab = train_data.get_vocab()

new_vocab = Vocab(counter = old_vocab.freqs,
                  max_size = 25_000)

train_data, test_data = IMDB(vocab = new_vocab)

Using un-edited vocabulary:

from torchtext.experimental.datasets import IMDB

train_data, test_data = IMDB()

vocab = train_data.get_vocab()

train_data, test_data = IMDB(vocab = vocab)

Expected behavior
TorchText to create the new datasets with the provided vocabulary, such as with the current text_classification datasets, e.g.

import torchtext

train_data, _ = torchtext.datasets.text_classification.AG_NEWS()

old_vocab = train_data.get_vocab()
    
new_vocab = torchtext.vocab.Vocab(counter = old_vocab.freqs, 
                                  max_size = 25_000)

train_data, test_data = torchtext.datasets.text_classification.AG_NEWS(vocab = new_vocab)

Environment
Python 3.7, PyTorch 1.4 and TorchText 0.5

@Sandy4321
Copy link

still but for jupyter windows 11
torchtext installed today so it is fresh version , but on plain python no error
image

and

image

Unexpected exception formatting exception. Falling back to standard exception
Traceback (most recent call last):
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\IPython\core\interactiveshell.py", line 3505, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "C:\Users\cde3\AppData\Local\Temp\ipykernel_9924\258056427.py", line 1, in
from torchtext.datasets import IMDB
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\torchtext_init_.py", line 6, in
from torchtext import _extension # noqa: F401
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\torchtext_extension.py", line 64, in
_init_extension()
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\torchtext_extension.py", line 58, in _init_extension
_load_lib("libtorchtext")
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\torchtext_extension.py", line 50, in load_lib
torch.ops.load_library(path)
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\torch_ops.py", line 787, in load_library
File "C:\Users\cde3\AppData\Local\Programs\Python\Python310\lib\ctypes_init
.py", line 374, in init
self._handle = _dlopen(self._name, mode)
OSError: [WinError 127] The specified procedure could not be found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\IPython\core\interactiveshell.py", line 2102, in showtraceback
stb = self.InteractiveTB.structured_traceback(
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\IPython\core\ultratb.py", line 1310, in structured_traceback
return FormattedTB.structured_traceback(
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\IPython\core\ultratb.py", line 1199, in structured_traceback
return VerboseTB.structured_traceback(
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\IPython\core\ultratb.py", line 1052, in structured_traceback
formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context,
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\IPython\core\ultratb.py", line 978, in format_exception_as_a_whole
frames.append(self.format_record(record))
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\IPython\core\ultratb.py", line 878, in format_record
frame_info.lines, Colors, self.has_colors, lvals
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\IPython\core\ultratb.py", line 712, in lines
return self._sd.lines
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\stack_data\utils.py", line 144, in cached_property_wrapper
value = obj.dict[self.func.name] = self.func(obj)
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\stack_data\core.py", line 734, in lines
pieces = self.included_pieces
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\stack_data\utils.py", line 144, in cached_property_wrapper
value = obj.dict[self.func.name] = self.func(obj)
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\stack_data\core.py", line 681, in included_pieces
pos = scope_pieces.index(self.executing_piece)
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\stack_data\utils.py", line 144, in cached_property_wrapper
value = obj.dict[self.func.name] = self.func(obj)
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\stack_data\core.py", line 660, in executing_piece
return only(
File "C:\my_py_environments\py310_env_apr2023\lib\site-packages\executing\executing.py", line 190, in only
raise NotOneValueFound('Expected one value, found 0')
executing.executing.NotOneValueFound: Expected one value, found 0

@srujan-landeri
Copy link

Hey, Just got the same issue, Have you fixed it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants