Sync "https://github.com/jakeret/tf_unet/pull/202" with master and resolve conflicts #276

ashahba · 2019-07-09T17:29:53Z

When using the image gcr.io/deeplearning-platform-release/tf-cpu.1-14 and while following this steps: https://github.com/IntelAI/models/blob/v1.4.0/benchmarks/image_segmentation/tensorflow/unet/README.md
I get the following error:

2019-07-09 17:03:43.718942: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2019-07-09 17:03:43.771372: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
W0709 17:03:43.853285 139741926975296 deprecation_wrapper.py:119] From /workspace/models/tf_unet/unet.py:301: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

W0709 17:03:43.874177 139741926975296 deprecation.py:323] From /root/miniconda3/lib/python3.5/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Traceback (most recent call last):
  File "/workspace/benchmarks/image_segmentation/tensorflow/unet/inference/fp32/unet_infer.py", line 78, in <module>
    prediction = net.predict(arg_parser.parse_args().ckpt_path, x_test)
  File "/workspace/models/tf_unet/unet.py", line 274, in predict
    self.restore(sess, model_path)
  File "/workspace/models/tf_unet/unet.py", line 302, in restore
    saver.restore(sess, model_path)
  File "/root/miniconda3/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1278, in restore
    compat.as_text(save_path))
ValueError: The passed save_path is not a valid checkpoint: /checkpoints/model.cpkt
Ran inference with batch size 1
Log location outside container: /jenkins/workspace/Intel-Models-Benchmark-fp32-Trigger/intel-models/benchmarks/common/tensorflow/logs/benchmark_unet_inference_fp32_20190709_170331.log
nrvalgo_jenkinsadm@aipg-fm-skx-48:/jenkins/workspace/Intel-Models-Benchmark-fp32-Trigger/intel-models/benchmarks$ ls $CHECKPOINT_DIR/
checkpoint  events.out.tfevents.1548972182.4e4b03cdde24  model.ckpt.data-00000-of-00001  model.ckpt.index  model.ckpt.meta

…layer

ashahba · 2019-07-09T18:19:42Z

@jakeret this is basically just bringing #202 up to date with master.
I also realized the issue with https://github.com/IntelAI/models/blob/v1.4.0/benchmarks/image_segmentation/tensorflow/unet/README.md was that I was using checkpoint_name=model.cpkt not realizing that it's now checkpoint_name=model.ckpt and I fixed our docs.

Thanks.

ashahba · 2019-07-09T18:22:22Z

@mpjlu would you also please review and provide feedback if needed.

Thanks.

jakeret · 2019-07-11T16:57:27Z

hi @ashahba , thank you for your contribution.
I wasn't aware that this repo is being used in IntelAI benchmarks, nice.

I hadn't merged #202 because of two reasons

the thread handling should not be part of the PR as it has nothing to do with the dropout
In my understanding if we set e.g. keep_prop != 1 e.g. 0.5 it can't be changed for validation or prediction (where we don't want any regularization) as it is a fix part of the graph. Or am I missing something?

ashahba · 2019-07-24T17:23:52Z

Thanks @jakeret
That sounds great. In the meantime I'm unblocked right now but I keep my eyes open for the any activity on #202

Peng Meng and others added 4 commits July 30, 2018 15:12

fix dropout = 1.0 issue. If dropout = 1.0, it should not run dropout …

ed64abf

…layer

add inter and intra param

8fc7cd2

fix keep_prob type

7a2adbf

Sync with jakeret/tf_unet and resolve conflicts

756dce1

ashahba changed the title ~~Ashahba/unet hotfix~~ Fix for The passed save_path is not a valid checkpoint: /checkpoints/model.cpkt Jul 9, 2019

ashahba changed the title ~~Fix for The passed save_path is not a valid checkpoint: /checkpoints/model.cpkt~~ Fix for "passed save_path is not a valid checkpoint: /checkpoints/model.cpkt" Jul 9, 2019

ashahba changed the title ~~Fix for "passed save_path is not a valid checkpoint: /checkpoints/model.cpkt"~~ Sync "https://github.com/jakeret/tf_unet/pull/202" with master and resolve conflicts Jul 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync "https://github.com/jakeret/tf_unet/pull/202" with master and resolve conflicts #276

Sync "https://github.com/jakeret/tf_unet/pull/202" with master and resolve conflicts #276

ashahba commented Jul 9, 2019

ashahba commented Jul 9, 2019

ashahba commented Jul 9, 2019 •

edited

Loading

jakeret commented Jul 11, 2019

ashahba commented Jul 24, 2019

Sync "https://github.com/jakeret/tf_unet/pull/202" with master and resolve conflicts #276

Are you sure you want to change the base?

Sync "https://github.com/jakeret/tf_unet/pull/202" with master and resolve conflicts #276

Conversation

ashahba commented Jul 9, 2019

ashahba commented Jul 9, 2019

ashahba commented Jul 9, 2019 • edited Loading

jakeret commented Jul 11, 2019

ashahba commented Jul 24, 2019

ashahba commented Jul 9, 2019 •

edited

Loading