Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: Can't pickle local object 'create_collate_fn.<locals>.collate_fn' #15

Closed
CaffreyR opened this issue Jul 15, 2022 · 2 comments

Comments

@CaffreyR
Copy link

When I tried to run the demo, I found this error! @dptam @jmohta @muqeeth

Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
WARNING:datasets.builder:Reusing dataset super_glue (/Users/caffrey/Documents/research/t-few-master/cache/super_glue/rte/1.0.2/d040c658e2ddef6934fdd97deb45c777b6ff50c524781ea434e7219b56a428a7)
Missing logger folder: exp_out/first_exp/log
WARNING:datasets.builder:Reusing dataset super_glue (/Users/caffrey/Documents/research/t-few-master/cache/super_glue/rte/1.0.2/d040c658e2ddef6934fdd97deb45c777b6ff50c524781ea434e7219b56a428a7)
Train size 32
Eval size 277

  | Name  | Type                       | Params
-----------------------------------------------------
0 | model | T5ForConditionalGeneration | 2.8 B 
-----------------------------------------------------
2.8 B     Trainable params
0         Non-trainable params
2.8 B     Total params
11,399.029Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]Traceback (most recent call last):
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/caffrey/Documents/paper/t-few-master/src/pl_train.py", line 86, in <module>
    main(config)
  File "/Users/caffrey/Documents/paper/t-few-master/src/pl_train.py", line 57, in main
    trainer.fit(model, datamodule)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
    self._call_and_handle_interrupt(
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
    results = self._run_stage()
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
    return self._run_train()
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1345, in _run_train
    self._run_sanity_check()
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1413, in _run_sanity_check
    val_loop.run()
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 155, in advance
    dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 199, in run
    self.on_run_start(*args, **kwargs)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 88, in on_run_start
    self._data_fetcher = iter(data_fetcher)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/utilities/fetching.py", line 178, in __iter__
    self.dataloader_iter = iter(self.dataloader)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 443, in __iter__
    return self._get_iterator()
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 389, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1062, in __init__
    w.start()
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'create_collate_fn.<locals>.collate_fn'

@muqeeth
Copy link
Collaborator

muqeeth commented Jul 15, 2022

Are you running this code on a node with multiple GPUs?
I remember we had similar issue of pickling the scheduler when we run on a server with multi GPUs. Try export CUDA_VISIBLE_DEVICES=0 in terminal so that the code uses only one GPU. This fixed the issue for us. Let us know if it works out for you.

@CaffreyR
Copy link
Author

It seems that if you have an GPU of Nvidia, we should point export the CUDA device. If you have non Nvidia-GPU, we should make the num workers=0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants