Loading optimizer state without using model.compile #15917
Labels
stale
stat:awaiting response from contributor
type:support
User is asking for help / asking an implementation question. Stackoverflow would be better suited.
TF: 2.5, compiled
Environment: GCP cloud TPU V2-32
The optimizer state is saved when calling compile on the model and saving. However, when using the apply_gradients method on the optimizer instead of fit, there is no compile required.
In this instance, we save optimize states using np.save(PATH, optimizer.get_weights()). When continuing the re-train using a distributed approach, loading these using the optimizer.set_weights isn't working. Things we have tried to resolve this:
opt_weights = np.load(opt_path, allow_pickle=True)
grad_vars = model.trainable_weights
zero_grads = [tf.zeros_like(w) for w in grad_vars]
optimizer.apply_gradients(zip(zero_grads, grad_vars))
optimizer.set_weights(opt_weights)
But this results in NotImplementedError: TPUStrategy.run(fn, ...) does not support pure eager execution. please make sure the function passed into
strategy.run
is atf.function
orstrategy.run
is called inside atf.function
if eager behavior is enabledWhich is pretty self explanatory:
Any advice on how to load weights for an optimizer when doing distributed learning on TPUs would be greatly appreciated. Even a workaround by temporary compiling and saving a model would be okay for now. We run many expensive experiments and not having the weights for the optimizer to restart and tune is a big challenge.
I'm not sure if this is best posted here or on the TF issues board.
The text was updated successfully, but these errors were encountered: