Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Mikula finetuning script #132

Closed
wants to merge 9 commits into from
Closed

Added Mikula finetuning script #132

wants to merge 9 commits into from

Conversation

oumayb
Copy link
Contributor

@oumayb oumayb commented Sep 21, 2018

Notebook for mikula dataset

@mathieuboudreau
Copy link
Member

@oumayb Could you maybe add the config files as well, so that we can see the differences between _from_scratch and _finetuned?

@coveralls
Copy link

coveralls commented Sep 21, 2018

Coverage Status

Coverage remained the same at 82.738% when pulling c1f42a8 on dev_mikula into a7da6a1 on master.

@oumayb
Copy link
Contributor Author

oumayb commented Sep 22, 2018

@mathieuboudreau sure!
I was thinking about where to add them, maybe I could also add the trained model for the Mikula data, along with the config files, like it's done with the existing TEM and SEM models?

@mathieuboudreau
Copy link
Member

@oumayb I don't know if the trained models should all be included here – are the files large?

As for config files, it may be worth having a separate folder (e.g. axondeepseg/configs/) that we can sort/store the config files separately from the core source code (currently SEM/TEM config files are inside the source code folder axondeepseg/AxonDeepSeg/models/). @jcohenadad thoughts?

@jcohenadad
Copy link
Member

I would not copy Mikula's model under the GH's repository, because (i) it is still preliminary and (ii) we should instead store them somewhere else (along with the config files, which I believe should be side-by-side with the model). New issue here: #133.

@oumayb could you please indicate where the model currently is?

@alexfoias
Copy link
Contributor

alexfoias commented Sep 25, 2018

@jcohenadad & @mathieuboudreau for the moment the trained models & config files are located in duke/projects/axondeepseg/20180921_mikula. Do you want me to create the OSF platform similar to the one for SCT ? Maybe we can also migrate the current models from here on OSF.

@mathieuboudreau
Copy link
Member

@oumayb I ran your Jupyter Notebook with the data located in duke/projects/axondeepseg/20180921_mikula as suggest by @alexfoias. The "From Scratch" section trained fine, but I encountered an error in the "Finetuned" section. As you didn't specify an exact folder location for the TEM model you were training from, I tried both the one located in this repo (axondeepseg/AxonDeepSeg/models/default_TEM_model_v1/), and the one on duke (duke/projects/axondeepseg/baselines/baseline_tem512-7678/).

The errors I got were not always the same each time I tried running the training, differening slightly every time. See the following for three cases:

  • NotFoundError (see above for traceback): Key cconv-d2-c2/convolution/bn/moving_mean not found in checkpoint
  • NotFoundError (see above for traceback): Key cconv-d2-c2/convolution/bn/moving_variance not found in checkpoint
  • NotFoundError (see above for traceback): Key cconv-d3-c2/convolution/bn/beta not found in checkpoint

And see below for the full error logs:

Run 1


('Layer: ', 0, ' Conv: ', 0, 'Features: ', [1, 16])
('Size:', 5)
('Layer: ', 0, ' Conv: ', 1, 'Features: ', [16, 16])
('Size:', 5)
('Layer: ', 0, ' Conv: ', 2, 'Features: ', [16, 16])
('Size:', 5)
('Layer: ', 1, ' Conv: ', 0, 'Features: ', [16, 32])
('Size:', 3)
('Layer: ', 1, ' Conv: ', 1, 'Features: ', [32, 32])
('Size:', 3)
('Layer: ', 1, ' Conv: ', 2, 'Features: ', [32, 32])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 0, 'Features: ', [32, 64])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 1, 'Features: ', [64, 64])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 2, 'Features: ', [64, 64])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 0, 'Features: ', [64, 128])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 1, 'Features: ', [128, 128])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 2, 'Features: ', [128, 128])
('Size:', 3)
('Layer: ', 0, ' Conv: ', 0, 'Features: ', [1, 16])
('Size:', 5)
('Layer: ', 0, ' Conv: ', 1, 'Features: ', [16, 16])
('Size:', 5)
('Layer: ', 0, ' Conv: ', 2, 'Features: ', [16, 16])
('Size:', 5)
('Layer: ', 1, ' Conv: ', 0, 'Features: ', [16, 32])
('Size:', 3)
('Layer: ', 1, ' Conv: ', 1, 'Features: ', [32, 32])
('Size:', 3)
('Layer: ', 1, ' Conv: ', 2, 'Features: ', [32, 32])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 0, 'Features: ', [32, 64])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 1, 'Features: ', [64, 64])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 2, 'Features: ', [64, 64])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 0, 'Features: ', [64, 128])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 1, 'Features: ', [128, 128])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 2, 'Features: ', [128, 128])
('Size:', 3)
Total number of parameters to train: 1953219
INFO:tensorflow:Restoring parameters from TEM_model_v1/model.ckpt
---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1326     try:
-> 1327       return fn(*args)
   1328     except errors.OpError as e:

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1305                                    feed_dict, fetch_list, target_list,
-> 1306                                    status, run_metadata)
   1307 

/usr/lib64/python3.6/contextlib.py in __exit__(self, type, value, traceback)
     87             try:
---> 88                 next(self.gen)
     89             except StopIteration:

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in raise_exception_on_not_ok_status()
    465           compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 466           pywrap_tensorflow.TF_GetCode(status))
    467   finally:

NotFoundError: Key cconv-d2-c2/convolution/bn/moving_mean not found in checkpoint
	 [[Node: save/RestoreV2_42 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_42/tensor_names, save/RestoreV2_42/shape_and_slices)]]
	 [[Node: save/RestoreV2_141/_3 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_330_save/RestoreV2_141", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

During handling of the above exception, another exception occurred:

NotFoundError                             Traceback (most recent call last)
<ipython-input-5-2736b74cfde9> in <module>
----> 1 train_model(path_trainingset=path_trainingset, path_model=path_model, config=config, path_model_init=path_model_init)

~/neuropoly/github/axondeepseg/AxonDeepSeg/train_network.py in train_model(path_trainingset, path_model, config, path_model_init, save_trainable, gpu, debug_mode, gpu_per)
    327         if path_model_init:
    328             folder_restored_model = path_model_init
--> 329             saver.restore(session, folder_restored_model + "/model.ckpt")
    330 
    331             if save_trainable:

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/training/saver.py in restore(self, sess, save_path)
   1558     logging.info("Restoring parameters from %s", save_path)
   1559     sess.run(self.saver_def.restore_op_name,
-> 1560              {self.saver_def.filename_tensor_name: save_path})
   1561 
   1562   @staticmethod

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    893     try:
    894       result = self._run(None, fetches, feed_dict, options_ptr,
--> 895                          run_metadata_ptr)
    896       if run_metadata:
    897         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1122     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1123       results = self._do_run(handle, final_targets, final_fetches,
-> 1124                              feed_dict_tensor, options, run_metadata)
   1125     else:
   1126       results = []

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1319     if handle is None:
   1320       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1321                            options, run_metadata)
   1322     else:
   1323       return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1338         except KeyError:
   1339           pass
-> 1340       raise type(e)(node_def, op, message)
   1341 
   1342   def _extend_graph(self):

NotFoundError: Key cconv-d2-c2/convolution/bn/moving_mean not found in checkpoint
	 [[Node: save/RestoreV2_42 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_42/tensor_names, save/RestoreV2_42/shape_and_slices)]]
	 [[Node: save/RestoreV2_141/_3 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_330_save/RestoreV2_141", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Caused by op 'save/RestoreV2_42', defined at:
  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 505, in start
    self.io_loop.start()
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 132, in start
    self.asyncio_loop.run_forever()
  File "/usr/lib64/python3.6/asyncio/base_events.py", line 422, in run_forever
    self._run_once()
  File "/usr/lib64/python3.6/asyncio/base_events.py", line 1432, in _run_once
    handle._run()
  File "/usr/lib64/python3.6/asyncio/events.py", line 145, in _run
    self._callback(*self._args)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/ioloop.py", line 758, in _run_callback
    ret = callback()
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/gen.py", line 1233, in inner
    self.run()
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/gen.py", line 1147, in run
    yielded = self.gen.send(value)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 357, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 267, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 534, in execute_request
    user_expressions, allow_stdin,
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 294, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 536, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2819, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2845, in _run_cell
    return runner(coro)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/IPython/core/async_helpers.py", line 67, in _pseudo_sync_runner
    coro.send(None)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3020, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3191, in run_ast_nodes
    if (yield from self.run_code(code, result)):
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3267, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-2736b74cfde9>", line 1, in <module>
    train_model(path_trainingset=path_trainingset, path_model=path_model, config=config, path_model_init=path_model_init)
  File "/home/mabou_local/neuropoly/github/axondeepseg/AxonDeepSeg/train_network.py", line 283, in train_model
    saver = tf.train.Saver(tf.model_variables())
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1140, in __init__
    self.build()
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1172, in build
    filename=self._filename)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 688, in build
    restore_sequentially, reshape)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 407, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 247, in restore_op
    [spec.tensor.dtype])[0])
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 663, in restore_v2
    dtypes=dtypes, name=name)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

NotFoundError (see above for traceback): Key cconv-d2-c2/convolution/bn/moving_mean not found in checkpoint
	 [[Node: save/RestoreV2_42 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_42/tensor_names, save/RestoreV2_42/shape_and_slices)]]
	 [[Node: save/RestoreV2_141/_3 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_330_save/RestoreV2_141", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Run 2


('Layer: ', 0, ' Conv: ', 0, 'Features: ', [1, 16])
('Size:', 5)
('Layer: ', 0, ' Conv: ', 1, 'Features: ', [16, 16])
('Size:', 5)
('Layer: ', 0, ' Conv: ', 2, 'Features: ', [16, 16])
('Size:', 5)
('Layer: ', 1, ' Conv: ', 0, 'Features: ', [16, 32])
('Size:', 3)
('Layer: ', 1, ' Conv: ', 1, 'Features: ', [32, 32])
('Size:', 3)
('Layer: ', 1, ' Conv: ', 2, 'Features: ', [32, 32])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 0, 'Features: ', [32, 64])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 1, 'Features: ', [64, 64])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 2, 'Features: ', [64, 64])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 0, 'Features: ', [64, 128])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 1, 'Features: ', [128, 128])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 2, 'Features: ', [128, 128])
('Size:', 3)
('Layer: ', 0, ' Conv: ', 0, 'Features: ', [1, 16])
('Size:', 5)
('Layer: ', 0, ' Conv: ', 1, 'Features: ', [16, 16])
('Size:', 5)
('Layer: ', 0, ' Conv: ', 2, 'Features: ', [16, 16])
('Size:', 5)
('Layer: ', 1, ' Conv: ', 0, 'Features: ', [16, 32])
('Size:', 3)
('Layer: ', 1, ' Conv: ', 1, 'Features: ', [32, 32])
('Size:', 3)
('Layer: ', 1, ' Conv: ', 2, 'Features: ', [32, 32])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 0, 'Features: ', [32, 64])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 1, 'Features: ', [64, 64])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 2, 'Features: ', [64, 64])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 0, 'Features: ', [64, 128])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 1, 'Features: ', [128, 128])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 2, 'Features: ', [128, 128])
('Size:', 3)
Total number of parameters to train: 1953219
INFO:tensorflow:Restoring parameters from TEM_model_v1/model.ckpt
---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1326     try:
-> 1327       return fn(*args)
   1328     except errors.OpError as e:

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1305                                    feed_dict, fetch_list, target_list,
-> 1306                                    status, run_metadata)
   1307 

/usr/lib64/python3.6/contextlib.py in __exit__(self, type, value, traceback)
     87             try:
---> 88                 next(self.gen)
     89             except StopIteration:

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in raise_exception_on_not_ok_status()
    465           compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 466           pywrap_tensorflow.TF_GetCode(status))
    467   finally:

NotFoundError: Key cconv-d2-c2/convolution/bn/moving_variance not found in checkpoint
	 [[Node: save/RestoreV2_43 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_43/tensor_names, save/RestoreV2_43/shape_and_slices)]]

During handling of the above exception, another exception occurred:

NotFoundError                             Traceback (most recent call last)
<ipython-input-8-2736b74cfde9> in <module>
----> 1 train_model(path_trainingset=path_trainingset, path_model=path_model, config=config, path_model_init=path_model_init)

~/neuropoly/github/axondeepseg/AxonDeepSeg/train_network.py in train_model(path_trainingset, path_model, config, path_model_init, save_trainable, gpu, debug_mode, gpu_per)
    327         if path_model_init:
    328             folder_restored_model = path_model_init
--> 329             saver.restore(session, folder_restored_model + "/model.ckpt")
    330 
    331             if save_trainable:

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/training/saver.py in restore(self, sess, save_path)
   1558     logging.info("Restoring parameters from %s", save_path)
   1559     sess.run(self.saver_def.restore_op_name,
-> 1560              {self.saver_def.filename_tensor_name: save_path})
   1561 
   1562   @staticmethod

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    893     try:
    894       result = self._run(None, fetches, feed_dict, options_ptr,
--> 895                          run_metadata_ptr)
    896       if run_metadata:
    897         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1122     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1123       results = self._do_run(handle, final_targets, final_fetches,
-> 1124                              feed_dict_tensor, options, run_metadata)
   1125     else:
   1126       results = []

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1319     if handle is None:
   1320       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1321                            options, run_metadata)
   1322     else:
   1323       return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1338         except KeyError:
   1339           pass
-> 1340       raise type(e)(node_def, op, message)
   1341 
   1342   def _extend_graph(self):

NotFoundError: Key cconv-d2-c2/convolution/bn/moving_variance not found in checkpoint
	 [[Node: save/RestoreV2_43 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_43/tensor_names, save/RestoreV2_43/shape_and_slices)]]

Caused by op 'save/RestoreV2_43', defined at:
  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 505, in start
    self.io_loop.start()
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 132, in start
    self.asyncio_loop.run_forever()
  File "/usr/lib64/python3.6/asyncio/base_events.py", line 422, in run_forever
    self._run_once()
  File "/usr/lib64/python3.6/asyncio/base_events.py", line 1432, in _run_once
    handle._run()
  File "/usr/lib64/python3.6/asyncio/events.py", line 145, in _run
    self._callback(*self._args)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/ioloop.py", line 758, in _run_callback
    ret = callback()
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/gen.py", line 1233, in inner
    self.run()
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/gen.py", line 1147, in run
    yielded = self.gen.send(value)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 357, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 267, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 534, in execute_request
    user_expressions, allow_stdin,
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 294, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 536, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2819, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2845, in _run_cell
    return runner(coro)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/IPython/core/async_helpers.py", line 67, in _pseudo_sync_runner
    coro.send(None)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3020, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3191, in run_ast_nodes
    if (yield from self.run_code(code, result)):
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3267, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-8-2736b74cfde9>", line 1, in <module>
    train_model(path_trainingset=path_trainingset, path_model=path_model, config=config, path_model_init=path_model_init)
  File "/home/mabou_local/neuropoly/github/axondeepseg/AxonDeepSeg/train_network.py", line 283, in train_model
    saver = tf.train.Saver(tf.model_variables())
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1140, in __init__
    self.build()
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1172, in build
    filename=self._filename)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 688, in build
    restore_sequentially, reshape)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 407, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 247, in restore_op
    [spec.tensor.dtype])[0])
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 663, in restore_v2
    dtypes=dtypes, name=name)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

NotFoundError (see above for traceback): Key cconv-d2-c2/convolution/bn/moving_variance not found in checkpoint
	 [[Node: save/RestoreV2_43 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_43/tensor_names, save/RestoreV2_43/shape_and_slices)]]

Run 3


('Layer: ', 0, ' Conv: ', 0, 'Features: ', [1, 16])
('Size:', 5)
('Layer: ', 0, ' Conv: ', 1, 'Features: ', [16, 16])
('Size:', 5)
('Layer: ', 0, ' Conv: ', 2, 'Features: ', [16, 16])
('Size:', 5)
('Layer: ', 1, ' Conv: ', 0, 'Features: ', [16, 32])
('Size:', 3)
('Layer: ', 1, ' Conv: ', 1, 'Features: ', [32, 32])
('Size:', 3)
('Layer: ', 1, ' Conv: ', 2, 'Features: ', [32, 32])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 0, 'Features: ', [32, 64])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 1, 'Features: ', [64, 64])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 2, 'Features: ', [64, 64])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 0, 'Features: ', [64, 128])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 1, 'Features: ', [128, 128])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 2, 'Features: ', [128, 128])
('Size:', 3)
('Layer: ', 0, ' Conv: ', 0, 'Features: ', [1, 16])
('Size:', 5)
('Layer: ', 0, ' Conv: ', 1, 'Features: ', [16, 16])
('Size:', 5)
('Layer: ', 0, ' Conv: ', 2, 'Features: ', [16, 16])
('Size:', 5)
('Layer: ', 1, ' Conv: ', 0, 'Features: ', [16, 32])
('Size:', 3)
('Layer: ', 1, ' Conv: ', 1, 'Features: ', [32, 32])
('Size:', 3)
('Layer: ', 1, ' Conv: ', 2, 'Features: ', [32, 32])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 0, 'Features: ', [32, 64])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 1, 'Features: ', [64, 64])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 2, 'Features: ', [64, 64])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 0, 'Features: ', [64, 128])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 1, 'Features: ', [128, 128])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 2, 'Features: ', [128, 128])
('Size:', 3)
Total number of parameters to train: 1953219
INFO:tensorflow:Restoring parameters from TEM_model_v1/model.ckpt
---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1326     try:
-> 1327       return fn(*args)
   1328     except errors.OpError as e:

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1305                                    feed_dict, fetch_list, target_list,
-> 1306                                    status, run_metadata)
   1307 

/usr/lib64/python3.6/contextlib.py in __exit__(self, type, value, traceback)
     87             try:
---> 88                 next(self.gen)
     89             except StopIteration:

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in raise_exception_on_not_ok_status()
    465           compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 466           pywrap_tensorflow.TF_GetCode(status))
    467   finally:

NotFoundError: Key cconv-d3-c2/convolution/bn/beta not found in checkpoint
	 [[Node: save/RestoreV2_55 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_55/tensor_names, save/RestoreV2_55/shape_and_slices)]]
	 [[Node: save/RestoreV2_39/_233 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_560_save/RestoreV2_39", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

During handling of the above exception, another exception occurred:

NotFoundError                             Traceback (most recent call last)
<ipython-input-26-2736b74cfde9> in <module>
----> 1 train_model(path_trainingset=path_trainingset, path_model=path_model, config=config, path_model_init=path_model_init)

~/neuropoly/github/axondeepseg/AxonDeepSeg/train_network.py in train_model(path_trainingset, path_model, config, path_model_init, save_trainable, gpu, debug_mode, gpu_per)
    327         if path_model_init:
    328             folder_restored_model = path_model_init
--> 329             saver.restore(session, folder_restored_model + "/model.ckpt")
    330 
    331             if save_trainable:

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/training/saver.py in restore(self, sess, save_path)
   1558     logging.info("Restoring parameters from %s", save_path)
   1559     sess.run(self.saver_def.restore_op_name,
-> 1560              {self.saver_def.filename_tensor_name: save_path})
   1561 
   1562   @staticmethod

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    893     try:
    894       result = self._run(None, fetches, feed_dict, options_ptr,
--> 895                          run_metadata_ptr)
    896       if run_metadata:
    897         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1122     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1123       results = self._do_run(handle, final_targets, final_fetches,
-> 1124                              feed_dict_tensor, options, run_metadata)
   1125     else:
   1126       results = []

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1319     if handle is None:
   1320       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1321                            options, run_metadata)
   1322     else:
   1323       return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

~/venv_ads2/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1338         except KeyError:
   1339           pass
-> 1340       raise type(e)(node_def, op, message)
   1341 
   1342   def _extend_graph(self):

NotFoundError: Key cconv-d3-c2/convolution/bn/beta not found in checkpoint
	 [[Node: save/RestoreV2_55 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_55/tensor_names, save/RestoreV2_55/shape_and_slices)]]
	 [[Node: save/RestoreV2_39/_233 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_560_save/RestoreV2_39", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Caused by op 'save/RestoreV2_55', defined at:
  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 505, in start
    self.io_loop.start()
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 132, in start
    self.asyncio_loop.run_forever()
  File "/usr/lib64/python3.6/asyncio/base_events.py", line 422, in run_forever
    self._run_once()
  File "/usr/lib64/python3.6/asyncio/base_events.py", line 1432, in _run_once
    handle._run()
  File "/usr/lib64/python3.6/asyncio/events.py", line 145, in _run
    self._callback(*self._args)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/ioloop.py", line 758, in _run_callback
    ret = callback()
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/gen.py", line 1233, in inner
    self.run()
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/gen.py", line 1147, in run
    yielded = self.gen.send(value)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 357, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 267, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 534, in execute_request
    user_expressions, allow_stdin,
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 294, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 536, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2819, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2845, in _run_cell
    return runner(coro)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/IPython/core/async_helpers.py", line 67, in _pseudo_sync_runner
    coro.send(None)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3020, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3191, in run_ast_nodes
    if (yield from self.run_code(code, result)):
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3267, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-26-2736b74cfde9>", line 1, in <module>
    train_model(path_trainingset=path_trainingset, path_model=path_model, config=config, path_model_init=path_model_init)
  File "/home/mabou_local/neuropoly/github/axondeepseg/AxonDeepSeg/train_network.py", line 283, in train_model
    saver = tf.train.Saver(tf.model_variables())
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1140, in __init__
    self.build()
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1172, in build
    filename=self._filename)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 688, in build
    restore_sequentially, reshape)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 407, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 247, in restore_op
    [spec.tensor.dtype])[0])
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 663, in restore_v2
    dtypes=dtypes, name=name)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/mabou_local/venv_ads2/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

NotFoundError (see above for traceback): Key cconv-d3-c2/convolution/bn/beta not found in checkpoint
	 [[Node: save/RestoreV2_55 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_55/tensor_names, save/RestoreV2_55/shape_and_slices)]]
	 [[Node: save/RestoreV2_39/_233 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_560_save/RestoreV2_39", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

@oumayb do you know what might be causing this error? Maybe @perone too?

@oumayb
Copy link
Contributor Author

oumayb commented Nov 5, 2018

@mathieuboudreau this usually happens when we're finetuning a model using another one that doesn't have the exact same architecture, so I think the problem comes from the config files that were used. I'll look more into it and get back to you!

@mathieuboudreau
Copy link
Member

@oumayb Ok that makes sense – I was wondering how we could have reused the TEM model to finetune the SEM one, which has a different architechture. Thanks!

@oumayb
Copy link
Contributor Author

oumayb commented Nov 5, 2018

@mathieuboudreau yes exactly, they differ in the depth and in the number of convolutions per layer, which should be taken into account in the config files.

@mathieuboudreau
Copy link
Member

@oumayb Ok! That wasn't clear to me when reading the Jupyter Notebook + using the files that @alexfoias suggested. If you kept the most up-to-date files somewhere, please let me know!

@oumayb
Copy link
Contributor Author

oumayb commented Nov 5, 2018

@mathieuboudreau sure! I'll get back to you soon with the files

@oumayb
Copy link
Contributor Author

oumayb commented Nov 11, 2018

@mathieuboudreau convolution_per_layer should be [2, 2, 2, 2] instead of [3, 3, 3, 3] when finetuning from the TEM model. I hope this works now!

@mathieuboudreau
Copy link
Member

Thanks @oumayb ! That helped me get a little bit further, but now I'm encountering another error:


('Layer: ', 0, ' Conv: ', 0, 'Features: ', [1, 16])
('Size:', 5)
('Layer: ', 0, ' Conv: ', 1, 'Features: ', [16, 16])
('Size:', 5)
('Layer: ', 1, ' Conv: ', 0, 'Features: ', [16, 32])
('Size:', 3)
('Layer: ', 1, ' Conv: ', 1, 'Features: ', [32, 32])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 0, 'Features: ', [32, 64])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 1, 'Features: ', [64, 64])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 0, 'Features: ', [64, 128])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 1, 'Features: ', [128, 128])
('Size:', 3)
('Layer: ', 0, ' Conv: ', 0, 'Features: ', [1, 16])
('Size:', 5)
('Layer: ', 0, ' Conv: ', 1, 'Features: ', [16, 16])
('Size:', 5)
('Layer: ', 1, ' Conv: ', 0, 'Features: ', [16, 32])
('Size:', 3)
('Layer: ', 1, ' Conv: ', 1, 'Features: ', [32, 32])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 0, 'Features: ', [32, 64])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 1, 'Features: ', [64, 64])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 0, 'Features: ', [64, 128])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 1, 'Features: ', [128, 128])
('Size:', 3)
Total number of parameters to train: 1552387
INFO:tensorflow:Restoring parameters from TEM_model_v1/model.ckpt
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-42-2736b74cfde9> in <module>
----> 1 train_model(path_trainingset=path_trainingset, path_model=path_model, config=config, path_model_init=path_model_init)

~/neuropoly/github/axondeepseg_mikula/AxonDeepSeg/train_network.py in train_model(path_trainingset, path_model, config, path_model_init, save_trainable, gpu, debug_mode, gpu_per)
    333 
    334             file = open(folder_restored_model + '/evolution.pkl', 'r')
--> 335             evolution_restored = pickle.load(file)
    336             last_epoch = evolution_restored["steps"][-1]
    337         # Else, initializing the variables

TypeError: a bytes-like object is required, not 'str'

It looks like it has trouble opening the evolution file of the old model that I'm trying to finetune from. You didn't indicate where the model you were finetuning from was located (just its name TEM_model_v1), so I simply tried using the baseline TEM one (with the green label) on duke (duke/projects/axondeepseg/baselines/baseline_tem5127678). I also tried using the default one in our repo (axondeepseg/AxonDeepSeg/models/default_TEM_model_v1), but since the project's gitignore file used to ignore evolution.pkl files, I can't finetune from it either since evolution.pkl is missing from that folder.

@oumayb or @perone have you encountered this issue? Not sure if this is a Python 2 to 3 conversion functionality difference (seems like file is not being loaded in byte mode by default?). I tried opening the file with these lines below instead on the command line, which works, but want to make sure that this is a bug before trying to fix it:

file = open(path_model_init + '/evolution.pkl', 'rb')
evolution_restored = pickle.load(file, encoding='bytes')

@oumayb which version of ADS were you using when doing your finetuning work, do you know? Were you using the Python 2 or Python 3 version?

@mathieuboudreau
Copy link
Member

Hmm, loaded using the lines above doesn't seem to work correctly, as the field "steps" isn't loaded, which is called in the next line. There must be something that else I'm missing that you were doing @oumayb ?

@mathieuboudreau
Copy link
Member

Ok, finally got it working. You must have been using the Python 2 version of ADS when originally training your finetuned models.

The old model which we're finetuning from was saved using Python 2, and there is a mismatch in how pickle deals with the pickling/unpickling files (and in particular, numpy arrays) between Python 2/3. I think I dealt with this once before while writing the unit tests, but the lines mentioned above were some of the few that don't have coverage yet, so this bug existed since the Python 3 upgrade.

I had to change these lines:

https://github.com/neuropoly/axondeepseg/blob/432af8265acc02401d8edd1e93e6e45b19471dbe/AxonDeepSeg/train_network.py#L334-L335

to this:

file = open(path_model_init + '/evolution.pkl', 'rb')
evolution_restored = pickle.load(file, encoding='latin1')

and then the training started successfully.

I'll have to make a similar change, write a test(s), and commit it. The lines above might be enough, I just need to check to make sure they would work correctly with an evolution file pickled in Python 3 as well.

@jcohenadad
Copy link
Member

Thank you so much for taking care of that @mathieuboudreau!!! 👍

@mathieuboudreau
Copy link
Member

I think this PR has gotten too stale to merge. In particular, I think @vs74's new Keras implementation means we'll need to use a different way to finetune. While we should still keep this PR in consideration moving forward, I'm going to close it for now.

@vasudev-sharma vasudev-sharma deleted the dev_mikula branch November 4, 2020 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants