Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing config.split_option_flag? #2

Open
hunterlang opened this issue May 12, 2022 · 3 comments
Open

Missing config.split_option_flag? #2

hunterlang opened this issue May 12, 2022 · 3 comments

Comments

@hunterlang
Copy link

Hi, thanks for the code!

When I run:

CUDA_VISIBLE_DEVICES=0 python -m src.pl_train -c t03b.json+rte.json -k save_model=False exp_name=first_exp3

I get:

Reusing dataset super_glue (/localdata/hjl/hf/super_glue/rte/1.0.2/d040c658e2ddef6934fdd97deb45c777b6ff50c524781ea434e7219b56a428a7)
Train size 32
Eval size 277
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Missing logger folder: /home/hjl/t-few/exp_out/first_exp3/log

  | Name  | Type                       | Params
-----------------------------------------------------
0 | model | T5ForConditionalGeneration | 2.8 B
-----------------------------------------------------
2.8 B     Trainable params
0         Non-trainable params
2.8 B     Total params
11,399.029Total estimated model params size (MB)
Validation sanity check:   0%|                                                                                                                                                                                                                             | 0/18 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/hjl/t-few/src/pl_train.py", line 86, in <module>
    main(config)
  File "/home/hjl/t-few/src/pl_train.py", line 57, in main
    trainer.fit(model, datamodule)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 741, in fit
    self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
    self._dispatch()
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
    self.training_type_plugin.start_training(self)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
    self._results = trainer.run_stage()
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
    return self._run_train()
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1311, in _run_train
    self._run_sanity_check(self.lightning_module)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1375, in _run_sanity_check
    self._evaluation_loop.run()
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
    dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 122, in advance
    output = self._evaluation_step(batch, batch_idx, dataloader_idx)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 217, in _evaluation_step
    output = self.trainer.accelerator.validation_step(step_kwargs)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 236, in validation_step
    return self.training_type_plugin.validation_step(*step_kwargs.values())
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 219, in validation_step
    return self.model.validation_step(*args, **kwargs)
  File "/home/hjl/t-few/src/models/EncoderDecoder.py", line 229, in validation_step
    batch_output = self.predict(batch)
  File "/home/hjl/t-few/src/models/EncoderDecoder.py", line 139, in predict
    if not self.config.split_option_flag:
AttributeError: 'Config' object has no attribute 'split_option_flag'

I can't find a reference to split_option_flag in any of the config files.
Should I manually set it?

Thanks!

@hunterlang hunterlang changed the title Missing config.split_options_flag? Missing config.split_option_flag? May 12, 2022
@dptam
Copy link
Collaborator

dptam commented May 12, 2022

Hi

Yea, we forgot to include the split_option_flag in the Config. Commit e115a18 should fix that. Let me know if there are any more issues.

@hunterlang
Copy link
Author

That works, thanks.

On the subject of small fixes, I had to remove the trailing slash from the origin_model in:
https://github.com/r-three/t-few/blob/master/configs/t03b.json

to get .from_pretrained() to not complain.

@HaokunLiu
Copy link
Collaborator

Well, you caught us. When we run it on our cluster, we saved the T0(3B) model to a local path. I guess that a trace of it after we changed it back to the HF-downloadable model names.

Thanks for figuring it out and sharing it with us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants