Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Program blocked #42

Open
Rejuy opened this issue Oct 9, 2022 · 3 comments
Open

Program blocked #42

Rejuy opened this issue Oct 9, 2022 · 3 comments

Comments

@Rejuy
Copy link

Rejuy commented Oct 9, 2022

Hi there! I ran into some problems when I'm running the project.
I did as the README.md says, and when it was executing this line, it got blocked and never return. How could this happen? I have no idea. Could you give me some advice? Thanks a lot!

# learner.py
loss_info = self._generic_learner.run(self._steps_per_iter,
                                              self._train_iterator)
@Rejuy
Copy link
Author

Rejuy commented Oct 10, 2022

I enter the function and add some log. Surprisingly, I found that in the module learner.py of tf_agents, the return problem is acutally this:

  def run(self, iterations=1, iterator=None, parallel_iterations=10):
    """ ...
    """
    ...
    with self.train_summary_writer.as_default(), \
         common.soft_device_placement(), \
         tf.compat.v2.summary.record_if(_summary_record_if), \
         self.strategy.scope():
      iterator = iterator or self._experience_iterator
      loss_info = self._train(tf.constant(iterations),
                              iterator,
                              parallel_iterations)
      logging.info("return back to run")
      train_step_val = self.train_step.numpy()
      for trigger in self.triggers:
        trigger(train_step_val)

      return loss_info

@common.function(autograph=True)
  def _train(self, iterations, iterator, parallel_iterations):
    # ...
    logging.info("_train start")
    loss_info = self.single_train_step(iterator)
    for _ in tf.range(iterations - 1):
      tf.autograph.experimental.set_loop_options(
          parallel_iterations=parallel_iterations)
      loss_info = self.single_train_step(iterator)

    def _reduce_loss(loss):
        # ...

    # ...
    reduced_loss_info = tf.nest.map_structure(_reduce_loss, loss_info)
    logging.info("_train end")
    return reduced_loss_info

All log in _train can be found, indicating _train is done. However, it never returned to loss_info.

      loss_info = self._train(tf.constant(iterations),
                              iterator,
                              parallel_iterations)
      logging.info("return back to run")

This means that the log above never get printed. It's very weird. How could this happen?

@ayamayaa
Copy link

I come across the same issue. By any chance you got a solution? Thanks!

@JIEEEN
Copy link

JIEEEN commented Nov 16, 2023

i got same issue. how could this happen?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants