ValueError: Variable model/wpe already exists, disallowed #80

yissachar · 2019-07-01T20:33:51Z

Seeing the same error as outlined in #12, however I am on 0.5.3.

Generate the first time:

sess = gpt2.start_tf_sess()
gpt2.finetune(sess,
              '../output/instructions.csv',
              model_name=model_name,
              steps=1000)

It works fine. In a new cell, copy paste the above to fine tune further but get an error about model/wpe already existing. I tried explicitly setting restore_from='latest' even though that seems to be the default, and it didn't help.

ValueError Traceback (most recent call last)
in
4 model_name=model_name,
5 restore_from='latest',
----> 6 steps=1000)

/opt/conda/lib/python3.6/site-packages/gpt_2_simple/gpt_2.py in finetune(sess, dataset, steps, model_name, combine, batch_size, learning_rate, accumulate_gradients, restore_from, run_name, sample_every, sample_length, sample_num, save_every, print_every, max_checkpoints, use_memory_saving_gradients, only_train_transformer_layers, overwrite)
163
164 context = tf.placeholder(tf.int32, [batch_size, None])
--> 165 output = model.model(hparams=hparams, X=context)
166 loss = tf.reduce_mean(
167 tf.nn.sparse_softmax_cross_entropy_with_logits(

/opt/conda/lib/python3.6/site-packages/gpt_2_simple/src/model.py in model(hparams, X, past, scope, reuse)
151
152 wpe = tf.get_variable('wpe', [hparams.n_ctx, hparams.n_embd],
--> 153 initializer=tf.random_normal_initializer(stddev=0.01))
154 wte = tf.get_variable('wte', [hparams.n_vocab, hparams.n_embd],
155 initializer=tf.random_normal_initializer(stddev=0.02))

/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py in get_variable(name, shape, dtype, initializer, regularizer, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
1477 constraint=constraint,
1478 synchronization=synchronization,
-> 1479 aggregation=aggregation)
1480
1481

/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py in get_variable(self, var_store, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
1218 constraint=constraint,
1219 synchronization=synchronization,
-> 1220 aggregation=aggregation)
1221
1222 def _get_partitioned_variable(self,

/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py in get_variable(self, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
545 constraint=constraint,
546 synchronization=synchronization,
--> 547 aggregation=aggregation)
548
549 def _get_partitioned_variable(self,

/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py in _true_getter(name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, constraint, synchronization, aggregation)
497 constraint=constraint,
498 synchronization=synchronization,
--> 499 aggregation=aggregation)
500
501 # Set trainable value based on synchronization value.

/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py in _get_single_variable(self, name, shape, dtype, initializer, regularizer, partition_info, reuse, trainable, collections, caching_device, validate_shape, use_resource, constraint, synchronization, aggregation)
846 tb = [x for x in tb if "tensorflow/python" not in x[0]][:3]
847 raise ValueError("%s Originally defined at:\n\n%s" % (err_msg, "".join(
--> 848 traceback.format_list(tb))))
849 found_var = self._vars[name]
850 if not shape.is_compatible_with(found_var.get_shape()):

ValueError: Variable model/wpe already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

File "/opt/conda/lib/python3.6/site-packages/gpt_2_simple/src/model.py", line 153, in model
initializer=tf.random_normal_initializer(stddev=0.01))
File "/opt/conda/lib/python3.6/site-packages/gpt_2_simple/gpt_2.py", line 165, in finetune
output = model.model(hparams=hparams, X=context)
File "", line 5, in
steps=1000) # steps is max number of training steps

The text was updated successfully, but these errors were encountered:

woctezuma · 2019-07-01T20:46:33Z

In a new cell, copy paste the above to fine tune further

Try to restart the Python session. #77

From the README:

NB: Restart the Python session first if you want to finetune on another dataset or load another model.

From the notebook:

IMPORTANT NOTE: If you want to rerun this cell, restart the VM first (Runtime -> Restart Runtime). You will need to rerun imports but not recopy files.

yissachar · 2019-07-01T20:54:51Z

Thanks, that is what I did to workaround this, but it seems like it would be desirable to allow users to re-finetune without restating. Is there some fundamental limitation that prevents this?

Also, I had read the README but it wasn't clear to me that re-finetuning on the same model was covered by this - perhaps the wording can be tweaked to make this clearer?

woctezuma · 2019-07-01T20:58:59Z

it wasn't clear to me that re-finetuning on the same model was covered by this - perhaps the wording can be tweaked to make this clearer?

I agree. The README makes it sound like one does not have to restart the VM if the dataset is identical.

It could be changed to:

NB: Restart the Python session first if you want to finetune further.

minimaxir · 2019-07-01T22:13:28Z

Agree that a README change would be more clear (my use case for retraining on the same dataset is through the CLI which refreshes the session; hadn't considered the Colab notebook use case).

I'll push a change today.

Thanks, that is what I did to workaround this, but it seems like it would be desirable to allow users to re-finetune without restating. Is there some fundamental limitation that prevents this?

It's more-or-less due to how TensorFlow works and I'm not skilled enough with low-level TF to find a workaround.

However, I think I can add a reset function to avoid reloading the notebook, as the implementations used in the Cloud Run APIs reset correctly.

tkocmathla · 2019-07-02T13:58:01Z

Try adding tf.reset_default_graph() before each fine-tuning session. This works for me to continue fine-tuning:

import tensorflow as tf
# ...

tf.reset_default_graph()
sess = gpt2.start_tf_sess()
gpt2.finetune(sess,
              'dataset.txt',
              model_name='345M',
              steps=10)

loretoparisi · 2021-02-18T20:25:55Z

In my case I put it here

    tf.reset_default_graph()
    if not sess:
        sess = gpt2.start_tf_sess()
    else:
        sess = gpt2.reset_session(sess)
    
    gpt2.load_gpt2(sess, run_name=run_name)

and it perfectly worked! Thanks!

Provide `reuse` option when creating a Tensorflow session. Should address minimaxir#80 and https://stackoverflow.com/questions/50210785 in a backwards compatible way.

Mennaruuk · 2022-03-10T05:49:34Z

For users encountering the error AttributeError: module 'tensorflow' has no attribute 'reset_default_graph', try adding the following at the top of the finetuning code:

import tensorflow as tf
tf.compat.v1.reset_default_graph()

(Source)

woctezuma mentioned this issue Jul 28, 2019

Unload and reload different model in same file. #91

Closed

chutaklee mentioned this issue Aug 17, 2019

Contine training with checkpoint loaded from Google Drive failed #106

Open

danielenricocahall mentioned this issue Jun 17, 2021

Provide reuse option when creating TF Session #272

Open

Dobby233Liu mentioned this issue Aug 26, 2021

Provide reuse option when creating TF Session Dobby233Liu/gpt-2-simple#8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Variable model/wpe already exists, disallowed #80

ValueError: Variable model/wpe already exists, disallowed #80

yissachar commented Jul 1, 2019

woctezuma commented Jul 1, 2019 •

edited

Loading

yissachar commented Jul 1, 2019

woctezuma commented Jul 1, 2019 •

edited

Loading

minimaxir commented Jul 1, 2019 •

edited

Loading

tkocmathla commented Jul 2, 2019 •

edited

Loading

loretoparisi commented Feb 18, 2021 •

edited

Loading

Mennaruuk commented Mar 10, 2022

ValueError: Variable model/wpe already exists, disallowed #80

ValueError: Variable model/wpe already exists, disallowed #80

Comments

yissachar commented Jul 1, 2019

woctezuma commented Jul 1, 2019 • edited Loading

yissachar commented Jul 1, 2019

woctezuma commented Jul 1, 2019 • edited Loading

minimaxir commented Jul 1, 2019 • edited Loading

tkocmathla commented Jul 2, 2019 • edited Loading

loretoparisi commented Feb 18, 2021 • edited Loading

Mennaruuk commented Mar 10, 2022

woctezuma commented Jul 1, 2019 •

edited

Loading

woctezuma commented Jul 1, 2019 •

edited

Loading

minimaxir commented Jul 1, 2019 •

edited

Loading

tkocmathla commented Jul 2, 2019 •

edited

Loading

loretoparisi commented Feb 18, 2021 •

edited

Loading