Program blocked #42

Rejuy · 2022-10-09T09:10:27Z

Hi there! I ran into some problems when I'm running the project.
I did as the README.md says, and when it was executing this line, it got blocked and never return. How could this happen? I have no idea. Could you give me some advice? Thanks a lot!

# learner.py
loss_info = self._generic_learner.run(self._steps_per_iter,
                                              self._train_iterator)

Rejuy · 2022-10-10T06:27:05Z

I enter the function and add some log. Surprisingly, I found that in the module learner.py of tf_agents, the return problem is acutally this:

  def run(self, iterations=1, iterator=None, parallel_iterations=10):
    """ ...
    """
    ...
    with self.train_summary_writer.as_default(), \
         common.soft_device_placement(), \
         tf.compat.v2.summary.record_if(_summary_record_if), \
         self.strategy.scope():
      iterator = iterator or self._experience_iterator
      loss_info = self._train(tf.constant(iterations),
                              iterator,
                              parallel_iterations)
      logging.info("return back to run")
      train_step_val = self.train_step.numpy()
      for trigger in self.triggers:
        trigger(train_step_val)

      return loss_info

@common.function(autograph=True)
  def _train(self, iterations, iterator, parallel_iterations):
    # ...
    logging.info("_train start")
    loss_info = self.single_train_step(iterator)
    for _ in tf.range(iterations - 1):
      tf.autograph.experimental.set_loop_options(
          parallel_iterations=parallel_iterations)
      loss_info = self.single_train_step(iterator)

    def _reduce_loss(loss):
        # ...

    # ...
    reduced_loss_info = tf.nest.map_structure(_reduce_loss, loss_info)
    logging.info("_train end")
    return reduced_loss_info

All log in _train can be found, indicating _train is done. However, it never returned to loss_info.

      loss_info = self._train(tf.constant(iterations),
                              iterator,
                              parallel_iterations)
      logging.info("return back to run")

This means that the log above never get printed. It's very weird. How could this happen?

ayamayaa · 2023-02-16T21:57:14Z

I come across the same issue. By any chance you got a solution? Thanks!

JIEEEN · 2023-11-16T04:59:35Z

i got same issue. how could this happen?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Program blocked #42

Program blocked #42

Rejuy commented Oct 9, 2022

Rejuy commented Oct 10, 2022

ayamayaa commented Feb 16, 2023

JIEEEN commented Nov 16, 2023

Program blocked #42

Program blocked #42

Comments

Rejuy commented Oct 9, 2022

Rejuy commented Oct 10, 2022

ayamayaa commented Feb 16, 2023

JIEEEN commented Nov 16, 2023