Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can we save model so for multiple predictions #54

Closed
jhssyb opened this issue May 30, 2020 · 9 comments
Closed

How can we save model so for multiple predictions #54

jhssyb opened this issue May 30, 2020 · 9 comments

Comments

@jhssyb
Copy link

jhssyb commented May 30, 2020

Hi Lu, just wondering how can we save network structure and weights after lengthy training so that next time we can only use the pretrained model for predictions rather than starting from scratch (first training then predictions)?

@lululxvi
Copy link
Owner

lululxvi commented May 30, 2020

  • To save model, there are two ways:
  1. Save the model only in the last step. Use model_save_path in Mode.train.
  2. Save the model every certain steps. Use ModelCheckpoint. See Poisson_Dirichlet_1d.py.
checker = dde.callbacks.ModelCheckpoint(
    "model/model.ckpt", save_better_only=True, period=1000
)
model.train(epochs=epochs, callbacks=[checker])
model.restore("model/model.ckpt-?", verbose=1)  # Replace ? with the exact filename
  • To continue training from a saved model, use model_restore_path in Mode.train.

@jhssyb
Copy link
Author

jhssyb commented May 31, 2020

Thanks for your detailed answer. I can save the model now

@tanielfranklin
Copy link

Hi, what about inverse problems with trainable variables? When I restore a checkpoint a "Key Variable/Adam not found in checkpoint" error is thrown. Any tip?

@lululxvi
Copy link
Owner

You can open a new issue with more details.

@SongPapers
Copy link

Hi Dr.Lu:
I want to save model by the second way: using ModelCheckpoint. But I meet a problem. For briefly, I put my code.

data = dde.data.TimePDE(
    geomtime,
    pde,
    [IC_A, IC_u, ],
    num_domain = 100,
    num_boundary = 10,
    num_initial  = 10,
)

net = dde.maps.FNN(
    layer_sizes = [dim_input] + hidden_layers_num + [dim_output],
    activation         = activation_select,
    kernel_initializer = "Glorot normal",
)

model2 = dde.Model(data, net)
model2.compile('adam', 
              lr = 0.0005,
)

checkpointer = dde.callbacks.ModelCheckpoint(
    filepath = "Model/DeepXDE/model2.ckpt", 
    verbose=1, 
    save_better_only=False,  # also not work when select True 
    period=10
)

losshistory, train_state = model2.train(
    epochs = 40,
    callbacks=[checkpointer],
)

Normally, it should be loaded by model.restore(). But it not work.

图片

The problem is '' model2.ckpt-xxx is not in all_model_checkpoint_paths. Manually adding it. ''

Continue to run the code:

model22 = dde.Model(data, net)
model22.compile("adam", lr=0.0005,)
model22.restore("./Model/DeepXDE/model2.ckpt-30",verbose=1)

图片

The text of the error message is soooooo long.

---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
   1374     try:
-> 1375       return fn(*args)
   1376     except errors.OpError as e:

D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\client\session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1358       self._extend_graph()
-> 1359       return self._call_tf_sessionrun(options, feed_dict, fetch_list,
   1360                                       target_list, run_metadata)

D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\client\session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
   1450                           run_metadata):
-> 1451     return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
   1452                                             fetch_list, target_list,

NotFoundError: Key beta1_power_4 not found in checkpoint
	 [[{{node save_5/RestoreV2}}]]

During handling of the above exception, another exception occurred:

NotFoundError                             Traceback (most recent call last)
D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\training\saver.py in restore(self, sess, save_path)
   1302       else:
-> 1303         sess.run(self.saver_def.restore_op_name,
   1304                  {self.saver_def.filename_tensor_name: save_path})

D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
    966     try:
--> 967       result = self._run(None, fetches, feed_dict, options_ptr,
    968                          run_metadata_ptr)

D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1189     if final_fetches or final_targets or (handle and feed_dict_tensor):
-> 1190       results = self._do_run(handle, final_targets, final_fetches,
   1191                              feed_dict_tensor, options, run_metadata)

D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\client\session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1367     if handle is None:
-> 1368       return self._do_call(_run_fn, feeds, fetches, targets, options,
   1369                            run_metadata)

D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
   1393                     'disable_meta_optimizer = True')
-> 1394       raise type(e)(node_def, op, message)
   1395 

NotFoundError: Key beta1_power_4 not found in checkpoint
	 [[node save_5/RestoreV2 (defined at D:\Code\Python\VenvTensorflow\lib\site-packages\deepxde\model.py:128) ]]

Original stack trace for 'save_5/RestoreV2':
  File "D:\python3.8.7\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\python3.8.7\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\traitlets\config\application.py", line 846, in launch_instance
    app.start()
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\ipykernel\kernelapp.py", line 677, in start
    self.io_loop.start()
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tornado\platform\asyncio.py", line 199, in start
    self.asyncio_loop.run_forever()
  File "D:\python3.8.7\lib\asyncio\base_events.py", line 570, in run_forever
    self._run_once()
  File "D:\python3.8.7\lib\asyncio\base_events.py", line 1859, in _run_once
    handle._run()
  File "D:\python3.8.7\lib\asyncio\events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\ipykernel\kernelbase.py", line 457, in dispatch_queue
    await self.process_one()
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\ipykernel\kernelbase.py", line 446, in process_one
    await dispatch(*args)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\ipykernel\kernelbase.py", line 353, in dispatch_shell
    await result
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\ipykernel\kernelbase.py", line 648, in execute_request
    reply_content = await reply_content
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\ipykernel\ipkernel.py", line 353, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\ipykernel\zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 2901, in run_cell
    result = self._run_cell(
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 2947, in _run_cell
    return runner(coro)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\IPython\core\async_helpers.py", line 68, in _pseudo_sync_runner
    coro.send(None)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 3172, in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 3364, in run_ast_nodes
    if (await self.run_code(code, result,  async_=asy)):
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 3444, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "C:\Users\GeWen\AppData\Local\Temp/ipykernel_3456/4087475617.py", line 2, in <module>
    model22.compile("adam", lr=0.0005,)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\deepxde\utils\internal.py", line 26, in wrapper
    result = f(*args, **kwargs)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\deepxde\model.py", line 111, in compile
    self._compile_tensorflow_compat_v1(lr, loss_fn, decay, loss_weights)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\deepxde\model.py", line 128, in _compile_tensorflow_compat_v1
    self.saver = tf.train.Saver(max_to_keep=None)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 836, in __init__
    self.build()
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 848, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 876, in _build
    self.saver_def = self._builder._build_internal(  # pylint: disable=protected-access
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 515, in _build_internal
    restore_op = self._AddRestoreOps(filename_tensor, saveables,
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 335, in _AddRestoreOps
    all_tensors = self.bulk_restore(filename_tensor, saveables, preferred_shard,
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 583, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 1489, in restore_v2
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 748, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 3557, in _create_op_internal
    ret = Operation(
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in __init__
    self._traceback = tf_stack.extract_stack_for_node(self._c_op)


During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\training\py_checkpoint_reader.py in get_tensor(self, tensor_str)
     68   try:
---> 69     return CheckpointReader.CheckpointReader_GetTensor(
     70         self, compat.as_bytes(tensor_str))

RuntimeError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

NotFoundError                             Traceback (most recent call last)
D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\training\saver.py in restore(self, sess, save_path)
   1313       try:
-> 1314         names_to_keys = object_graph_key_mapping(save_path)
   1315       except errors.NotFoundError:

D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\training\saver.py in object_graph_key_mapping(checkpoint_path)
   1631   reader = py_checkpoint_reader.NewCheckpointReader(checkpoint_path)
-> 1632   object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY)
   1633   object_graph_proto = (trackable_object_graph_pb2.TrackableObjectGraph())

D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\training\py_checkpoint_reader.py in get_tensor(self, tensor_str)
     73   except RuntimeError as e:
---> 74     error_translator(e)
     75 

D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\training\py_checkpoint_reader.py in error_translator(e)
     34       'matching files for') in error_message:
---> 35     raise errors_impl.NotFoundError(None, None, error_message)
     36   elif 'Sliced checkpoints are not supported' in error_message or (

NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

NotFoundError                             Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_3456/4087475617.py in <module>
      1 model22 = dde.Model(data, net)
      2 model22.compile("adam", lr=0.0005,)
----> 3 model22.restore("./Model/DeepXDE/model2.ckpt-30",verbose=1)
      4 # model22.restore("./Model/DeepXDE_Blood1.ckpt-2000",verbose=1)

D:\Code\Python\VenvTensorflow\lib\site-packages\deepxde\model.py in restore(self, save_path, verbose)
    620         if verbose > 0:
    621             print("Restoring model from {} ...\n".format(save_path))
--> 622         self.saver.restore(self.sess, save_path)
    623 
    624     def print_model(self):

D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\training\saver.py in restore(self, sess, save_path)
   1317         # is a graph mismatch. Re-raise the original error with
   1318         # a helpful message (b/110263146)
-> 1319         raise _wrap_restore_error_with_msg(
   1320             err, "a Variable name or other graph key that is missing")
   1321 

NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key beta1_power_4 not found in checkpoint
	 [[node save_5/RestoreV2 (defined at D:\Code\Python\VenvTensorflow\lib\site-packages\deepxde\model.py:128) ]]

Original stack trace for 'save_5/RestoreV2':
  File "D:\python3.8.7\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\python3.8.7\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\traitlets\config\application.py", line 846, in launch_instance
    app.start()
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\ipykernel\kernelapp.py", line 677, in start
    self.io_loop.start()
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tornado\platform\asyncio.py", line 199, in start
    self.asyncio_loop.run_forever()
  File "D:\python3.8.7\lib\asyncio\base_events.py", line 570, in run_forever
    self._run_once()
  File "D:\python3.8.7\lib\asyncio\base_events.py", line 1859, in _run_once
    handle._run()
  File "D:\python3.8.7\lib\asyncio\events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\ipykernel\kernelbase.py", line 457, in dispatch_queue
    await self.process_one()
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\ipykernel\kernelbase.py", line 446, in process_one
    await dispatch(*args)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\ipykernel\kernelbase.py", line 353, in dispatch_shell
    await result
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\ipykernel\kernelbase.py", line 648, in execute_request
    reply_content = await reply_content
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\ipykernel\ipkernel.py", line 353, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\ipykernel\zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 2901, in run_cell
    result = self._run_cell(
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 2947, in _run_cell
    return runner(coro)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\IPython\core\async_helpers.py", line 68, in _pseudo_sync_runner
    coro.send(None)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 3172, in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 3364, in run_ast_nodes
    if (await self.run_code(code, result,  async_=asy)):
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 3444, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "C:\Users\GeWen\AppData\Local\Temp/ipykernel_3456/4087475617.py", line 2, in <module>
    model22.compile("adam", lr=0.0005,)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\deepxde\utils\internal.py", line 26, in wrapper
    result = f(*args, **kwargs)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\deepxde\model.py", line 111, in compile
    self._compile_tensorflow_compat_v1(lr, loss_fn, decay, loss_weights)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\deepxde\model.py", line 128, in _compile_tensorflow_compat_v1
    self.saver = tf.train.Saver(max_to_keep=None)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 836, in __init__
    self.build()
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 848, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 876, in _build
    self.saver_def = self._builder._build_internal(  # pylint: disable=protected-access
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 515, in _build_internal
    restore_op = self._AddRestoreOps(filename_tensor, saveables,
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 335, in _AddRestoreOps
    all_tensors = self.bulk_restore(filename_tensor, saveables, preferred_shard,
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 583, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 1489, in restore_v2
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 748, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 3557, in _create_op_internal
    ret = Operation(
  File "D:\Code\Python\VenvTensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in __init__
    self._traceback = tf_stack.extract_stack_for_node(self._c_op)

Help me plz, how to solve it.

@lululxvi
Copy link
Owner

You used Windows. Could you try Linux?

@SongPapers
Copy link

You used Windows. Could you try Linux?

It seems to be the same. My device is Ubuntu 19.10 (GNU/Linux 5.3.0-64-generic x86_64) . Also I use the same code, however the result is not good. I put my code and error message:

model2 = dde.Model(data, net)
model2.compile('adam', 
              lr = 0.0005,
)

checkpointer = dde.callbacks.ModelCheckpoint(
    filepath = "./Model/model2.ckpt", 
    verbose=1, 
    save_better_only=False,
    period=5
)

losshistory, train_state = model2.train(
    epochs = 20,
    callbacks=[checkpointer],
)

图片
图片

Continue to run the code:

model22 = dde.Model(data, net)
model22.compile("adam", lr=0.0005,)
model22.restore("./Model/model2.ckpt-20",verbose=1)

图片

INFO:tensorflow:Restoring parameters from ./Model/model2.ckpt-20

2021-12-01 10:19:28.836960: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at save_restore_v2_ops.cc:207 : Not found: Key beta1_power_4 not found in checkpoint

---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
~/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1374     try:
-> 1375       return fn(*args)
   1376     except errors.OpError as e:

~/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1359       return self._call_tf_sessionrun(options, feed_dict, fetch_list,
-> 1360                                       target_list, run_metadata)
   1361 

~/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
   1452                                             fetch_list, target_list,
-> 1453                                             run_metadata)
   1454 

NotFoundError: 2 root error(s) found.
  (0) Not found: Key beta1_power_4 not found in checkpoint
	 [[{{node save_6/RestoreV2}}]]
  (1) Not found: Key beta1_power_4 not found in checkpoint
	 [[{{node save_6/RestoreV2}}]]
	 [[save_6/RestoreV2/_369]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

NotFoundError                             Traceback (most recent call last)
~/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/training/saver.py in restore(self, sess, save_path)
   1303         sess.run(self.saver_def.restore_op_name,
-> 1304                  {self.saver_def.filename_tensor_name: save_path})
   1305     except errors.NotFoundError as err:

~/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    967       result = self._run(None, fetches, feed_dict, options_ptr,
--> 968                          run_metadata_ptr)
    969       if run_metadata:

~/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1190       results = self._do_run(handle, final_targets, final_fetches,
-> 1191                              feed_dict_tensor, options, run_metadata)
   1192     else:

~/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1368       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1369                            run_metadata)
   1370     else:

~/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1393                     'disable_meta_optimizer = True')
-> 1394       raise type(e)(node_def, op, message)
   1395 

NotFoundError: 2 root error(s) found.
  (0) Not found: Key beta1_power_4 not found in checkpoint
	 [[node save_6/RestoreV2 (defined at home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/deepxde/model.py:128) ]]
  (1) Not found: Key beta1_power_4 not found in checkpoint
	 [[node save_6/RestoreV2 (defined at home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/deepxde/model.py:128) ]]
	 [[save_6/RestoreV2/_369]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'save_6/RestoreV2':
  File "usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/traitlets/config/application.py", line 846, in launch_instance
    app.start()
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/ipykernel/kernelapp.py", line 677, in start
    self.io_loop.start()
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 199, in start
    self.asyncio_loop.run_forever()
  File "usr/lib/python3.7/asyncio/base_events.py", line 534, in run_forever
    self._run_once()
  File "usr/lib/python3.7/asyncio/base_events.py", line 1771, in _run_once
    handle._run()
  File "usr/lib/python3.7/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 457, in dispatch_queue
    await self.process_one()
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 446, in process_one
    await dispatch(*args)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 353, in dispatch_shell
    await result
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 648, in execute_request
    reply_content = await reply_content
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/ipykernel/ipkernel.py", line 353, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/ipykernel/zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2902, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2947, in _run_cell
    return runner(coro)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
    coro.send(None)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3173, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3364, in run_ast_nodes
    if (await self.run_code(code, result,  async_=asy)):
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3444, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "tmp/ipykernel_11067/3468769675.py", line 2, in <module>
    model22.compile("adam", lr=0.0005,)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/deepxde/utils/internal.py", line 26, in wrapper
    result = f(*args, **kwargs)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/deepxde/model.py", line 111, in compile
    self._compile_tensorflow_compat_v1(lr, loss_fn, decay, loss_weights)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/deepxde/model.py", line 128, in _compile_tensorflow_compat_v1
    self.saver = tf.train.Saver(max_to_keep=None)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 836, in __init__
    self.build()
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 848, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 886, in _build
    build_restore=build_restore)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 516, in _build_internal
    restore_sequentially, reshape)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 336, in _AddRestoreOps
    restore_sequentially)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 583, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1493, in restore_v2
    name=name)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 750, in _apply_op_helper
    attrs=attr_protos, op_def=op_def)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3565, in _create_op_internal
    op_def=op_def)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2045, in __init__
    self._traceback = tf_stack.extract_stack_for_node(self._c_op)


During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
~/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/training/py_checkpoint_reader.py in get_tensor(self, tensor_str)
     69     return CheckpointReader.CheckpointReader_GetTensor(
---> 70         self, compat.as_bytes(tensor_str))
     71   # TODO(b/143319754): Remove the RuntimeError casting logic once we resolve the

RuntimeError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

NotFoundError                             Traceback (most recent call last)
~/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/training/saver.py in restore(self, sess, save_path)
   1313       try:
-> 1314         names_to_keys = object_graph_key_mapping(save_path)
   1315       except errors.NotFoundError:

~/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/training/saver.py in object_graph_key_mapping(checkpoint_path)
   1631   reader = py_checkpoint_reader.NewCheckpointReader(checkpoint_path)
-> 1632   object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY)
   1633   object_graph_proto = (trackable_object_graph_pb2.TrackableObjectGraph())

~/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/training/py_checkpoint_reader.py in get_tensor(self, tensor_str)
     73   except RuntimeError as e:
---> 74     error_translator(e)
     75 

~/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/training/py_checkpoint_reader.py in error_translator(e)
     34       'matching files for') in error_message:
---> 35     raise errors_impl.NotFoundError(None, None, error_message)
     36   elif 'Sliced checkpoints are not supported' in error_message or (

NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

NotFoundError                             Traceback (most recent call last)
/tmp/ipykernel_11067/3468769675.py in <module>
      2 model22.compile("adam", lr=0.0005,)
      3 # model22.restore("./Model/DeepXDE_Blood/model2.ckpt-200",verbose=1)
----> 4 model22.restore("./Model/model2.ckpt-20",verbose=1)

~/.venv/venvTensorflow/lib/python3.7/site-packages/deepxde/model.py in restore(self, save_path, verbose)
    620         if verbose > 0:
    621             print("Restoring model from {} ...\n".format(save_path))
--> 622         self.saver.restore(self.sess, save_path)
    623 
    624     def print_model(self):

~/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/training/saver.py in restore(self, sess, save_path)
   1318         # a helpful message (b/110263146)
   1319         raise _wrap_restore_error_with_msg(
-> 1320             err, "a Variable name or other graph key that is missing")
   1321 
   1322       # This is an object-based checkpoint. We'll print a warning and then do

NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

2 root error(s) found.
  (0) Not found: Key beta1_power_4 not found in checkpoint
	 [[node save_6/RestoreV2 (defined at home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/deepxde/model.py:128) ]]
  (1) Not found: Key beta1_power_4 not found in checkpoint
	 [[node save_6/RestoreV2 (defined at home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/deepxde/model.py:128) ]]
	 [[save_6/RestoreV2/_369]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'save_6/RestoreV2':
  File "usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/traitlets/config/application.py", line 846, in launch_instance
    app.start()
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/ipykernel/kernelapp.py", line 677, in start
    self.io_loop.start()
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 199, in start
    self.asyncio_loop.run_forever()
  File "usr/lib/python3.7/asyncio/base_events.py", line 534, in run_forever
    self._run_once()
  File "usr/lib/python3.7/asyncio/base_events.py", line 1771, in _run_once
    handle._run()
  File "usr/lib/python3.7/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 457, in dispatch_queue
    await self.process_one()
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 446, in process_one
    await dispatch(*args)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 353, in dispatch_shell
    await result
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 648, in execute_request
    reply_content = await reply_content
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/ipykernel/ipkernel.py", line 353, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/ipykernel/zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2902, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2947, in _run_cell
    return runner(coro)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
    coro.send(None)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3173, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3364, in run_ast_nodes
    if (await self.run_code(code, result,  async_=asy)):
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3444, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "tmp/ipykernel_11067/3468769675.py", line 2, in <module>
    model22.compile("adam", lr=0.0005,)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/deepxde/utils/internal.py", line 26, in wrapper
    result = f(*args, **kwargs)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/deepxde/model.py", line 111, in compile
    self._compile_tensorflow_compat_v1(lr, loss_fn, decay, loss_weights)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/deepxde/model.py", line 128, in _compile_tensorflow_compat_v1
    self.saver = tf.train.Saver(max_to_keep=None)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 836, in __init__
    self.build()
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 848, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 886, in _build
    build_restore=build_restore)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 516, in _build_internal
    restore_sequentially, reshape)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 336, in _AddRestoreOps
    restore_sequentially)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 583, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1493, in restore_v2
    name=name)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 750, in _apply_op_helper
    attrs=attr_protos, op_def=op_def)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3565, in _create_op_internal
    op_def=op_def)
  File "home/npuheart0/.venv/venvTensorflow/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2045, in __init__
    self._traceback = tf_stack.extract_stack_for_node(self._c_op)

@lululxvi
Copy link
Owner

lululxvi commented Dec 1, 2021

Did you define dde.Model(data, net) twice in the same code? In one code, you can only define one model. Put model2 and model22 in two files. Also check FAQ for other possible reasons.

@SongPapers
Copy link

Did you define dde.Model(data, net) twice in the same code? In one code, you can only define one model. Put model2 and model22 in two files. Also check FAQ for other possible reasons.

Thx a lot~ If put model2 and model22 in two files, the code is good running. What I thought was to define two models using dde.Model(data, net). One runs directly while the other get the first model information ( restore("./Model/DeepXDE/model2.ckpt-30",verbose=1) ) and change some hyperparameters to train.

Ok, I will remember to put two these in different files.
Thanks again~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants