Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Trainer on TPU : KeyError '__getstate__' #4327

Closed
2 of 4 tasks
astariul opened this issue May 13, 2020 · 7 comments
Closed
2 of 4 tasks

🐛 Trainer on TPU : KeyError '__getstate__' #4327

astariul opened this issue May 13, 2020 · 7 comments

Comments

@astariul
Copy link
Contributor

astariul commented May 13, 2020

🐛 Bug

Information

Model I am using : ELECTRA base

Language I am using the model on : English

The problem arises when using:

  • the official example scripts
  • my own modified scripts

The tasks I am working on is:

  • an official GLUE/SQUaD task
  • my own task or dataset

To reproduce

I'm trying to fine-tune a model on Colab TPU, using the new Trainer API. But I'm struggling.

Here is a self-contained Colab notebook to reproduce the error (it's a dummy example).

When running the notebook, I get the following error :

File "/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils.py", line 199, in __getattr__
    return self.data[item]
KeyError: '__getstate__'

Full stack trace :

Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/distributed/parallel_loader.py", line 172, in _worker
    batch = xm.send_cpu_data_to_device(batch, device)
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/core/xla_model.py", line 624, in send_cpu_data_to_device
    return ToXlaTensorArena(convert_fn, select_fn).transform(data)
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/core/xla_model.py", line 307, in transform
    return self._replace_tensors(inputs)
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/core/xla_model.py", line 301, in _replace_tensors
    convert_fn)
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/utils/utils.py", line 199, in for_each_instance_rewrite
    return _for_each_instance_rewrite(value, select_fn, fn, rwmap)
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/utils/utils.py", line 179, in _for_each_instance_rewrite
    result.append(_for_each_instance_rewrite(x, select_fn, fn, rwmap))
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/utils/utils.py", line 187, in _for_each_instance_rewrite
    result = copy.copy(value)
  File "/usr/lib/python3.6/copy.py", line 96, in copy
    rv = reductor(4)
  File "/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils.py", line 199, in __getattr__
    return self.data[item]
KeyError: '__getstate__'

Any hint on how to make this dummy example work is welcomed.

Environment info

  • transformers version: 2.9.0
  • Platform: Linux-4.19.104+-x86_64-with-Ubuntu-18.04-bionic
  • Python version: 3.6.9
  • PyTorch version (GPU?): 1.6.0a0+cf82011 (False)
  • Tensorflow version (GPU?): 2.2.0 (False)
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

@jysohn23 @julien-c

@astariul astariul changed the title 🐛 Trainer on TPU : 🐛 Trainer on TPU : KeyError '__getstate__' May 13, 2020
@astariul
Copy link
Contributor Author

As a temporary work-around, I made the BatchEncoding object pickable :

from transformers.tokenization_utils import BatchEncoding

def red(self):
    return BatchEncoding, (self.data, )

BatchEncoding.__reduce__ = red

Not closing yet, as this seems to be just a work-around and not a real solution.

@julien-c
Copy link
Member

Cc @mfuntowicz

@stale
Copy link

stale bot commented Jul 13, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jul 13, 2020
@wenfeixiang1991
Copy link

Hi,I got same error with transformers 2.9.0, and 2.9.1 following error:


Traceback (most recent call last):
Traceback (most recent call last):
File "/Users/kiwi/anaconda/python.app/Contents/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File "/Users/kiwi/anaconda/python.app/Contents/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/Users/kiwi/anaconda/python.app/Contents/lib/python3.6/site-packages/transformers/tokenization_utils.py", line 203, in getattr
return self.data[item]
File "/Users/kiwi/anaconda/python.app/Contents/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File "/Users/kiwi/anaconda/python.app/Contents/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/Users/kiwi/anaconda/python.app/Contents/lib/python3.6/site-packages/transformers/tokenization_utils.py", line 203, in getattr
return self.data[item]
KeyError: 'getstate'
KeyError: 'getstate'
Traceback (most recent call last):
Traceback (most recent call last):
File "/Users/kiwi/anaconda/python.app/Contents/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File "/Users/kiwi/anaconda/python.app/Contents/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/Users/kiwi/anaconda/python.app/Contents/lib/python3.6/site-packages/transformers/tokenization_utils.py", line 203, in getattr
return self.data[item]
File "/Users/kiwi/anaconda/python.app/Contents/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File "/Users/kiwi/anaconda/python.app/Contents/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
KeyError: 'getstate'
File "/Users/kiwi/anaconda/python.app/Contents/lib/python3.6/site-packages/transformers/tokenization_utils.py", line 203, in getattr
return self.data[item]
KeyError: 'getstate'

My code pieces:

dl = DataLoader(data, batch_size=self.batch_size, shuffle=self.shuffle, collate_fn=partial(self.tok_collate),
num_workers=2)
return dl

def tok_collate(self, batch_data):

    encoded = self.tokenizer.batch_encode_plus(
        [x[0] for x in batch_data],
        add_special_tokens=True,
        #return_tensors='pt',
        pad_to_max_length=True)

    for i in range(len(encoded['input_ids'])):
        print("tokens         : {}".format([self.tokenizer.convert_ids_to_tokens(s) for s in encoded['input_ids'][i]]))
        print("input_ids      : {}".format(encoded['input_ids'][i]))
        print("token_type_ids : {}".format(encoded['token_type_ids'][i]))
        print("attention_mask : {}".format(encoded['attention_mask'][i]))
        print('------------')

    if self.predict:
        return encoded
    else:
        labels = torch.tensor([x[1] for x in batch_data])
        # print('labels: ', labels)
        return encoded, labels

@stale stale bot removed the wontfix label Jul 16, 2020
@wenfeixiang1991
Copy link

@colanim How did you solve it?

@astariul
Copy link
Contributor Author

I think it was fixed in the latest version of transformers.

If you need to work with an older version of transformers, for me the work-around I mentioned earlier was working :

As a temporary work-around, I made the BatchEncoding object pickable :

from transformers.tokenization_utils import BatchEncoding

def red(self):
    return BatchEncoding, (self.data, )

BatchEncoding.__reduce__ = red

@LysandreJik
Copy link
Member

Indeed, this should have been fixed in the versions v3+. Thanks for opening an issue @colanim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants