Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During eval, getting "ValueError: model_fn should return an EstimatorSpec". During training, OK #9

Closed
aurotripathy opened this issue Jul 5, 2019 · 6 comments

Comments

@aurotripathy
Copy link

aurotripathy commented Jul 5, 2019

Thank you for BERT-multi-gpu.

I'm running run_pretraining_gpu_v2.py on the provided dataset sample_text.txt.
The only change, I made was to the n_gpus flag (in may case, 3).

Training was fine. But I also have --do_eval=True (as below).

CUDA_VISIBLE_DEVICES=0,1,2 python run_pretraining_gpu_v2.py \
  --input_file=/tmp/tf_examples.tfrecord \
  --output_dir=/tmp/pretraining_output \
  --do_train=True \
  --do_eval=True \
  --bert_config_file=./bert_config.json \
  --train_batch_size=32 \
  --max_seq_length=128 \
  --max_predictions_per_seq=20 \
  --num_train_steps=20 \
  --num_warmup_steps=10 \
  --learning_rate=2e-5

The error below on TF 1.14.0


 I0705 09:37:15.292903 140495488808704 estimator.py:1147] Done calling model_fn.
I0705 09:37:15.293055 140495488808704 coordinator.py:219] Error reported to Coordinator: model_fn should return an EstimatorSpec.
Traceback (most recent call last):
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception
    yield
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 911, in run
    self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1150, in _call_model_fn
    raise ValueError('model_fn should return an EstimatorSpec.')
ValueError: model_fn should return an EstimatorSpec.
Traceback (most recent call last):
  File "run_pretraining_gpu_v2.py", line 501, in <module>
    tf.app.run()
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "run_pretraining_gpu_v2.py", line 487, in main
    input_fn=eval_input_fn, steps=FLAGS.max_eval_steps)
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 477, in evaluate
    name=name)
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 517, in _actual_eval
    return _evaluate()
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 501, in _evaluate
    self._evaluate_build_graph(input_fn, hooks, checkpoint_path))
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1498, in _evaluate_build_graph
    self._call_model_fn_eval_distributed(input_fn, self.config))
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1586, in _call_model_fn_eval_distributed
    args=(features, labels, ModeKeys.EVAL, config)))
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1555, in call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 693, in _call_for_each_replica
    fn, args, kwargs)
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 195, in _call_for_each_replica
    coord.join(threads)
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception
    yield
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 911, in run
    self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
  File "/home/auro/anaconda3/envs/tf-py2/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1150, in _call_model_fn
    raise ValueError('model_fn should return an EstimatorSpec.')
ValueError: model_fn should return an EstimatorSpec.

@guotong1988
Copy link
Owner

guotong1988 commented Jul 8, 2019

Sorry, I did not try --do_eval=True.
During training, the framework will print the dev eval result and output the model file you want, without --do_eval=True.

@aurotripathy
Copy link
Author

Thank you for the hint, I'll take a look.

@hrdxwandg
Copy link

same error, but not find eval result, can help?

@hrdxwandg
Copy link

def model_fn(features, labels, mode, params): # pylint: disable=unused-argument

where eval=True, still return TPUEstimatorSpec, so cause error

@guotong1988
Copy link
Owner

Try another tensorflow API.

@BiEchi
Copy link

BiEchi commented Feb 7, 2022

Sorry, I did not try --do_eval=True. During training, the framework will print the dev eval result and output the model file you want, without --do_eval=True.

Take a look at the error! Modifying model_fn in model_fn_builder suffices to deal with this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants