During eval, getting "ValueError: model_fn should return an EstimatorSpec". During training, OK #9

aurotripathy · 2019-07-05T16:49:20Z

Thank you for BERT-multi-gpu.

I'm running run_pretraining_gpu_v2.py on the provided dataset sample_text.txt.
The only change, I made was to the n_gpus flag (in may case, 3).

Training was fine. But I also have --do_eval=True (as below).

CUDA_VISIBLE_DEVICES=0,1,2 python run_pretraining_gpu_v2.py \
  --input_file=/tmp/tf_examples.tfrecord \
  --output_dir=/tmp/pretraining_output \
  --do_train=True \
  --do_eval=True \
  --bert_config_file=./bert_config.json \
  --train_batch_size=32 \
  --max_seq_length=128 \
  --max_predictions_per_seq=20 \
  --num_train_steps=20 \
  --num_warmup_steps=10 \
  --learning_rate=2e-5

The error below on TF 1.14.0


 I0705 09:37:15.292903 140495488808704 estimator.py:1147] Done calling model_fn.
I0705 09:37:15.293055 140495488808704 coordinator.py:219] Error reported to Coordinator: model_fn should return an EstimatorSpec.
Traceback (most recent call last):
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception
    yield
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 911, in run
    self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1150, in _call_model_fn
    raise ValueError('model_fn should return an EstimatorSpec.')
ValueError: model_fn should return an EstimatorSpec.
Traceback (most recent call last):
  File "run_pretraining_gpu_v2.py", line 501, in <module>
    tf.app.run()
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "run_pretraining_gpu_v2.py", line 487, in main
    input_fn=eval_input_fn, steps=FLAGS.max_eval_steps)
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 477, in evaluate
    name=name)
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 517, in _actual_eval
    return _evaluate()
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 501, in _evaluate
    self._evaluate_build_graph(input_fn, hooks, checkpoint_path))
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1498, in _evaluate_build_graph
    self._call_model_fn_eval_distributed(input_fn, self.config))
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1586, in _call_model_fn_eval_distributed
    args=(features, labels, ModeKeys.EVAL, config)))
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1555, in call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 693, in _call_for_each_replica
    fn, args, kwargs)
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 195, in _call_for_each_replica
    coord.join(threads)
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception
    yield
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 911, in run
    self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1150, in _call_model_fn
    raise ValueError('model_fn should return an EstimatorSpec.')
ValueError: model_fn should return an EstimatorSpec.

The text was updated successfully, but these errors were encountered:

guotong1988 · 2019-07-08T03:24:35Z

Sorry, I did not try --do_eval=True.
During training, the framework will print the dev eval result and output the model file you want, without --do_eval=True.

aurotripathy · 2019-07-08T05:45:37Z

Thank you for the hint, I'll take a look.

hrdxwandg · 2021-07-21T08:35:50Z

same error, but not find eval result, can help?

hrdxwandg · 2021-07-21T08:46:31Z

def model_fn(features, labels, mode, params): # pylint: disable=unused-argument

where eval=True, still return TPUEstimatorSpec, so cause error

guotong1988 · 2021-07-21T09:19:36Z

Try another tensorflow API.

BiEchi · 2022-02-07T17:56:43Z

Sorry, I did not try --do_eval=True. During training, the framework will print the dev eval result and output the model file you want, without --do_eval=True.

Take a look at the error! Modifying model_fn in model_fn_builder suffices to deal with this error.

guotong1988 closed this as completed Jul 10, 2019

guotong1988 mentioned this issue Sep 14, 2022

model_fn should return an EstimatorSpec. #34

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

During eval, getting "ValueError: model_fn should return an EstimatorSpec". During training, OK #9

During eval, getting "ValueError: model_fn should return an EstimatorSpec". During training, OK #9

aurotripathy commented Jul 5, 2019 •

edited

Loading

guotong1988 commented Jul 8, 2019 •

edited

Loading

aurotripathy commented Jul 8, 2019

hrdxwandg commented Jul 21, 2021

hrdxwandg commented Jul 21, 2021

guotong1988 commented Jul 21, 2021

BiEchi commented Feb 7, 2022

During eval, getting "ValueError: model_fn should return an EstimatorSpec". During training, OK #9

During eval, getting "ValueError: model_fn should return an EstimatorSpec". During training, OK #9

Comments

aurotripathy commented Jul 5, 2019 • edited Loading

guotong1988 commented Jul 8, 2019 • edited Loading

aurotripathy commented Jul 8, 2019

hrdxwandg commented Jul 21, 2021

hrdxwandg commented Jul 21, 2021

guotong1988 commented Jul 21, 2021

BiEchi commented Feb 7, 2022

aurotripathy commented Jul 5, 2019 •

edited

Loading

guotong1988 commented Jul 8, 2019 •

edited

Loading