--config_overrides doesn't appear to work in run_clm.py when trying to specify a larger GPT model #14389

Adrian-1234 · 2021-11-14T19:21:57Z

Environment info

transformers version: 4.10.2
Platform: Linux-5.4.0-84-generic-x86_64-with-Ubuntu-20.04-focal
Python version: 3.7.12
PyTorch version (GPU?): 1.9.0+cu111 (True)
Tensorflow version (GPU?): 2.7.0 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help

Information

Using the following to train GPT-2 from scratch:

python3.7 run_clm.py \
--model_type "gpt2" \
--tokenizer_name "gpt2" \
--train_file "train_tmp.txt" \
--validation_file "eval_tmp.txt" \
--pad_to_max_length yes \
--do_train \
--do_eval \
--max_seq_length=1024 \
--per_gpu_train_batch_size 1 \
--save_steps -1 \
--num_train_epochs 10 \
--fp16_full_eval \
--output_dir=checkpoints \
--config_overrides="n_embd=1024,n_head=16,n_layer=24,n_positions=1024,n_ctx=1024,layer_norm_epsilon=1e-5,initializer_range=0.02"

The --config_overrides doesn't appear to take effect:

Starting the training o/p's :

Model config GPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "gradient_checkpointing": false,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "resid_pdrop": 0.1,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_version": "4.10.2",
  "use_cache": true,
  "vocab_size": 50257
}

To reproduce

Steps to reproduce the behavior:

Running the above training script ignore the parameters in --config_overrides

Expected behavior

I was expecting the --config_overrides string to override the training parameters. Although I see documentation suggesting that it is possible to specify something like --model_type="gpt2-medium" this produces an error such as no such model.

Perhaps there is an alternative way to specify a medium or large GPT-2 model ?

Thanks.

The text was updated successfully, but these errors were encountered:

LysandreJik · 2021-11-19T09:37:26Z

Hey @Adrian-1234, we recommend using the model_name_or_path parameter to specify a particular checkpoint.
cc @sgugger

sgugger · 2021-11-19T13:04:51Z

--config_overrides is actually an addition by @stas00

stas00 · 2021-11-19T18:01:14Z

I will try to reproduce the issue and will then follow up.

I edited the OP to add formatting.

stas00 · 2021-11-20T00:49:46Z

There is no problem, other than logger misinformation.

Please see #14466 for details.

stas00 self-assigned this Nov 19, 2021

stas00 mentioned this issue Nov 20, 2021

[test] add test for --config_overrides #14466

Merged

LysandreJik closed this as completed in #14466 Nov 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--config_overrides doesn't appear to work in run_clm.py when trying to specify a larger GPT model #14389

--config_overrides doesn't appear to work in run_clm.py when trying to specify a larger GPT model #14389

Adrian-1234 commented Nov 14, 2021 •

edited by stas00

Loading

LysandreJik commented Nov 19, 2021

sgugger commented Nov 19, 2021

stas00 commented Nov 19, 2021 •

edited

Loading

stas00 commented Nov 20, 2021 •

edited

Loading

--config_overrides doesn't appear to work in run_clm.py when trying to specify a larger GPT model #14389

--config_overrides doesn't appear to work in run_clm.py when trying to specify a larger GPT model #14389

Comments

Adrian-1234 commented Nov 14, 2021 • edited by stas00 Loading

Environment info

Who can help

Information

To reproduce

Expected behavior

LysandreJik commented Nov 19, 2021

sgugger commented Nov 19, 2021

stas00 commented Nov 19, 2021 • edited Loading

stas00 commented Nov 20, 2021 • edited Loading

Adrian-1234 commented Nov 14, 2021 •

edited by stas00

Loading

stas00 commented Nov 19, 2021 •

edited

Loading

stas00 commented Nov 20, 2021 •

edited

Loading