The agent hit the global step limit, how do I restore from checkpoints and restore trainning #62

syhdog · 2017-01-31T07:41:41Z

I am working on a ideal that requires long time training, about 10 days. I forgot to modify the global step limit. So the agent stopped at 100M step. I want to restore the model and go on my training. I have been look through the code and wandered what should I do.

Sincerely thank you to open this project, it has been a very very respectable work and help a lot with my research. Truly, we find it really enjoyable to develop agents. Thank you a lot.

KaixiangLin · 2017-01-31T14:13:50Z

In worker.py, add those two lines to specify the variables you want to restore:

    variables_to_restore = [v for v in tf.all_variables() if v.name.startswith("global")]
    pre_train_saver = FastSaver(variables_to_restore)

Add one more line in init_fn

    def init_fn(ses):
        logger.info("Initializing all parameters.")
        ses.run(init_all_op)
        pre_train_saver.restore(ses,
                                "THE_PATH_TO_YOUR_MODEL/model.ckpt-4986751")

syhdog · 2017-02-06T03:23:31Z

Than you very much, it's be a great help.

tlbtlbtlb closed this as completed Feb 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The agent hit the global step limit, how do I restore from checkpoints and restore trainning #62

The agent hit the global step limit, how do I restore from checkpoints and restore trainning #62

syhdog commented Jan 31, 2017

KaixiangLin commented Jan 31, 2017

syhdog commented Feb 6, 2017

The agent hit the global step limit, how do I restore from checkpoints and restore trainning #62

The agent hit the global step limit, how do I restore from checkpoints and restore trainning #62

Comments

syhdog commented Jan 31, 2017

KaixiangLin commented Jan 31, 2017

syhdog commented Feb 6, 2017