Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--config_overrides doesn't appear to work in run_clm.py when trying to specify a larger GPT model #14389

Closed
Adrian-1234 opened this issue Nov 14, 2021 · 4 comments · Fixed by #14466
Assignees

Comments

@Adrian-1234
Copy link

Adrian-1234 commented Nov 14, 2021

Environment info

  • transformers version: 4.10.2

  • Platform: Linux-5.4.0-84-generic-x86_64-with-Ubuntu-20.04-focal

  • Python version: 3.7.12

  • PyTorch version (GPU?): 1.9.0+cu111 (True)

  • Tensorflow version (GPU?): 2.7.0 (True)

  • Flax version (CPU?/GPU?/TPU?): not installed (NA)

  • Jax version: not installed

  • JaxLib version: not installed

  • Using GPU in script?:

  • Using distributed or parallel set-up in script?:

Who can help

Information

Using the following to train GPT-2 from scratch:

python3.7 run_clm.py \
--model_type "gpt2" \
--tokenizer_name "gpt2" \
--train_file "train_tmp.txt" \
--validation_file "eval_tmp.txt" \
--pad_to_max_length yes \
--do_train \
--do_eval \
--max_seq_length=1024 \
--per_gpu_train_batch_size 1 \
--save_steps -1 \
--num_train_epochs 10 \
--fp16_full_eval \
--output_dir=checkpoints \
--config_overrides="n_embd=1024,n_head=16,n_layer=24,n_positions=1024,n_ctx=1024,layer_norm_epsilon=1e-5,initializer_range=0.02"

The --config_overrides doesn't appear to take effect:

Starting the training o/p's :

Model config GPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "gradient_checkpointing": false,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "resid_pdrop": 0.1,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_version": "4.10.2",
  "use_cache": true,
  "vocab_size": 50257
}

To reproduce

Steps to reproduce the behavior:

  1. Running the above training script ignore the parameters in --config_overrides

Expected behavior

I was expecting the --config_overrides string to override the training parameters. Although I see documentation suggesting that it is possible to specify something like --model_type="gpt2-medium" this produces an error such as no such model.

Perhaps there is an alternative way to specify a medium or large GPT-2 model ?

Thanks.

@LysandreJik
Copy link
Member

Hey @Adrian-1234, we recommend using the model_name_or_path parameter to specify a particular checkpoint.
cc @sgugger

@sgugger
Copy link
Collaborator

sgugger commented Nov 19, 2021

--config_overrides is actually an addition by @stas00

@stas00 stas00 self-assigned this Nov 19, 2021
@stas00
Copy link
Contributor

stas00 commented Nov 19, 2021

I will try to reproduce the issue and will then follow up.

I edited the OP to add formatting.

@stas00
Copy link
Contributor

stas00 commented Nov 20, 2021

There is no problem, other than logger misinformation.

Please see #14466 for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants