Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Variable model/wpe already exists, disallowed #80

Open
yissachar opened this issue Jul 1, 2019 · 7 comments
Open

ValueError: Variable model/wpe already exists, disallowed #80

yissachar opened this issue Jul 1, 2019 · 7 comments

Comments

@yissachar
Copy link

Seeing the same error as outlined in #12, however I am on 0.5.3.

Generate the first time:

sess = gpt2.start_tf_sess()
gpt2.finetune(sess,
              '../output/instructions.csv',
              model_name=model_name,
              steps=1000) 

It works fine. In a new cell, copy paste the above to fine tune further but get an error about model/wpe already existing. I tried explicitly setting restore_from='latest' even though that seems to be the default, and it didn't help.


ValueError Traceback (most recent call last)
in
4 model_name=model_name,
5 restore_from='latest',
----> 6 steps=1000)

/opt/conda/lib/python3.6/site-packages/gpt_2_simple/gpt_2.py in finetune(sess, dataset, steps, model_name, combine, batch_size, learning_rate, accumulate_gradients, restore_from, run_name, sample_every, sample_length, sample_num, save_every, print_every, max_checkpoints, use_memory_saving_gradients, only_train_transformer_layers, overwrite)
163
164 context = tf.placeholder(tf.int32, [batch_size, None])
--> 165 output = model.model(hparams=hparams, X=context)
166 loss = tf.reduce_mean(
167 tf.nn.sparse_softmax_cross_entropy_with_logits(

/opt/conda/lib/python3.6/site-packages/gpt_2_simple/src/model.py in model(hparams, X, past, scope, reuse)
151
152 wpe = tf.get_variable('wpe', [hparams.n_ctx, hparams.n_embd],
--> 153 initializer=tf.random_normal_initializer(stddev=0.01))
154 wte = tf.get_variable('wte', [hparams.n_vocab, hparams.n_embd],
155 initializer=tf.random_normal_initializer(stddev=0.02))

/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py in get_variable(name, shape, dtype, initializer, regularizer, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
1477 constraint=constraint,
1478 synchronization=synchronization,
-> 1479 aggregation=aggregation)
1480
1481

/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py in get_variable(self, var_store, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
1218 constraint=constraint,
1219 synchronization=synchronization,
-> 1220 aggregation=aggregation)
1221
1222 def _get_partitioned_variable(self,

/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py in get_variable(self, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
545 constraint=constraint,
546 synchronization=synchronization,
--> 547 aggregation=aggregation)
548
549 def _get_partitioned_variable(self,

/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py in _true_getter(name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, constraint, synchronization, aggregation)
497 constraint=constraint,
498 synchronization=synchronization,
--> 499 aggregation=aggregation)
500
501 # Set trainable value based on synchronization value.

/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py in _get_single_variable(self, name, shape, dtype, initializer, regularizer, partition_info, reuse, trainable, collections, caching_device, validate_shape, use_resource, constraint, synchronization, aggregation)
846 tb = [x for x in tb if "tensorflow/python" not in x[0]][:3]
847 raise ValueError("%s Originally defined at:\n\n%s" % (err_msg, "".join(
--> 848 traceback.format_list(tb))))
849 found_var = self._vars[name]
850 if not shape.is_compatible_with(found_var.get_shape()):

ValueError: Variable model/wpe already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

File "/opt/conda/lib/python3.6/site-packages/gpt_2_simple/src/model.py", line 153, in model
initializer=tf.random_normal_initializer(stddev=0.01))
File "/opt/conda/lib/python3.6/site-packages/gpt_2_simple/gpt_2.py", line 165, in finetune
output = model.model(hparams=hparams, X=context)
File "", line 5, in
steps=1000) # steps is max number of training steps

@woctezuma
Copy link
Contributor

woctezuma commented Jul 1, 2019

In a new cell, copy paste the above to fine tune further

Try to restart the Python session. #77

From the README:

NB: Restart the Python session first if you want to finetune on another dataset or load another model.

From the notebook:

IMPORTANT NOTE: If you want to rerun this cell, restart the VM first (Runtime -> Restart Runtime). You will need to rerun imports but not recopy files.

@yissachar
Copy link
Author

Thanks, that is what I did to workaround this, but it seems like it would be desirable to allow users to re-finetune without restating. Is there some fundamental limitation that prevents this?

Also, I had read the README but it wasn't clear to me that re-finetuning on the same model was covered by this - perhaps the wording can be tweaked to make this clearer?

@woctezuma
Copy link
Contributor

woctezuma commented Jul 1, 2019

it wasn't clear to me that re-finetuning on the same model was covered by this - perhaps the wording can be tweaked to make this clearer?

I agree. The README makes it sound like one does not have to restart the VM if the dataset is identical.

It could be changed to:

NB: Restart the Python session first if you want to finetune further.

@minimaxir
Copy link
Owner

minimaxir commented Jul 1, 2019

Agree that a README change would be more clear (my use case for retraining on the same dataset is through the CLI which refreshes the session; hadn't considered the Colab notebook use case).

I'll push a change today.

Thanks, that is what I did to workaround this, but it seems like it would be desirable to allow users to re-finetune without restating. Is there some fundamental limitation that prevents this?

It's more-or-less due to how TensorFlow works and I'm not skilled enough with low-level TF to find a workaround.

However, I think I can add a reset function to avoid reloading the notebook, as the implementations used in the Cloud Run APIs reset correctly.

@tkocmathla
Copy link

tkocmathla commented Jul 2, 2019

Try adding tf.reset_default_graph() before each fine-tuning session. This works for me to continue fine-tuning:

import tensorflow as tf
# ...

tf.reset_default_graph()
sess = gpt2.start_tf_sess()
gpt2.finetune(sess,
              'dataset.txt',
              model_name='345M',
              steps=10)

@loretoparisi
Copy link

loretoparisi commented Feb 18, 2021

In my case I put it here

    tf.reset_default_graph()
    if not sess:
        sess = gpt2.start_tf_sess()
    else:
        sess = gpt2.reset_session(sess)
    
    gpt2.load_gpt2(sess, run_name=run_name)

and it perfectly worked! Thanks!

danielenricocahall added a commit to danielenricocahall/gpt-2-simple that referenced this issue Jun 17, 2021
Provide `reuse` option when creating a Tensorflow session. Should address minimaxir#80 and https://stackoverflow.com/questions/50210785 in a backwards compatible way.
@Mennaruuk
Copy link

For users encountering the error AttributeError: module 'tensorflow' has no attribute 'reset_default_graph', try adding the following at the top of the finetuning code:

import tensorflow as tf
tf.compat.v1.reset_default_graph()

(Source)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants