[rllib] restore from checkpoint using train.py #3204

rnunziata · 2018-11-02T17:28:47Z

I am trying to run train.py against a prior checkpoint but it does not seem to pick it up or complain about the restore parameter passed. See the mean score below. Does train not use this parameter? I could not find any examples showing its use even though it is in the arg list.

pong-impala:
    env: Pong-ram-v4 
    run: IMPALA
    checkpoint_freq: 20    
    config:
        sample_batch_size: 50
        train_batch_size: 500
        num_workers: 7   

========================================================================================= 
>>   python python/ray/rllib/train.py  --config-file=my.yaml 
========================================================================================= 

== Status ==
Using FIFO scheduling algorithm.
Resources requested: 8/8 CPUs, 1/1 GPUs
Result logdir: /home/rjn/ray_results/pong-impala
RUNNING trials:
 - IMPALA_Pong-ram-v4_0:	RUNNING [pid=18500], 3487 s, 343 iter, 11877000 ts, -16.8 rew

Result for IMPALA_Pong-ram-v4_0:
  date: 2018-11-02_12-59-44
  done: false
  episode_len_mean: 2317.63
  episode_reward_max: -9.0
  episode_reward_mean: -16.81
  episode_reward_min: -21.0
  episodes_this_iter: 15
  episodes_total: 7434
  experiment_id: 382b7dff22c94d4f9168bc03ba30e821
  hostname: rjn-Oryx-Pro
  info:
    learner:
      cur_lr: 0.0005000000237487257
      entropy: 824.3236083984375
      grad_gnorm: 40.000003814697266
      policy_loss: 16.117586135864258
      var_gnorm: 30.87358856201172
      vf_explained_var: 0.34759581089019775
      vf_loss: 35.219303131103516
    learner_queue:
      size_count: 23822
      size_mean: 0.0
      size_quantiles:
      - 0.0
      - 0.0
      - 0.0
      - 0.0
      - 0.0
      size_std: 0.0
    num_steps_replayed: 0
    num_steps_sampled: 11911200
    num_steps_trained: 11911000
    num_weight_syncs: 238224
    sample_throughput: 3360.381
    timing_breakdown:
      enqueue_time_ms: 0.026
      learner_dequeue_time_ms: 125.537
      learner_grad_time_ms: 16.766
      learner_load_time_ms: .nan
      learner_load_wait_time_ms: .nan
      put_weights_time_ms: 7.462
      sample_processing_time_ms: 17.836
      sample_time_ms: 17.855
      train_time_ms: 17.855
    train_throughput: 2800.317
  iterations_since_restore: 344
  node_ip: 192.168.1.100
  num_metric_batches_dropped: 0
  pid: 18500
  policy_reward_mean: {}
  time_since_restore: 3497.607877254486
  time_this_iter_s: 10.153211116790771
  time_total_s: 3497.607877254486
  timestamp: 1541177984
  timesteps_since_restore: 11911200
  timesteps_this_iter: 34200
  timesteps_total: 11911200
  training_iteration: 344
 
 
========================================================================================= 
  >>    python python/ray/rllib/train.py  --restore=~/ray_results/pong-impala/IMPALA_Pong-ram-v4_0_2018-11-02_12-01-21_zpUuC/checkpoint_340y58PU_   --config-file=my.yaml  
==========================================================================================
  
  


== Status ==
Using FIFO scheduling algorithm.



Result for IMPALA_Pong-ram-v4_0:
  date: 2018-11-02_13-11-54
  done: false
  episode_len_mean: 1176.6
  episode_reward_max: -19.0
  episode_reward_mean: -20.6
  episode_reward_min: -21.0
  episodes_this_iter: 20
  episodes_total: 20
  experiment_id: 3a9271cd0da94a3cac878fdd551b1a63
  hostname: rjn-Oryx-Pro
  info:
    learner:
      cur_lr: 0.0005000000237487257
      entropy: 836.6295166015625
      grad_gnorm: 40.000003814697266
      policy_loss: 181.09475708007812
      var_gnorm: 22.668785095214844
      vf_explained_var: 0.2981424331665039
      vf_loss: 42.735008239746094
    learner_queue:
      size_count: 52
      size_mean: 0.0
      size_quantiles:
      - 0.0
      - 0.0
      - 0.0
      - 0.0
      - 0.0
      size_std: 0.0
    num_steps_replayed: 0
    num_steps_sampled: 26100
    num_steps_trained: 26000
    num_weight_syncs: 522
    sample_throughput: 3736.855
    timing_breakdown:
      enqueue_time_ms: 0.022
      learner_dequeue_time_ms: 109.484
      learner_grad_time_ms: 14.186
      learner_load_time_ms: .nan
      learner_load_wait_time_ms: .nan
      put_weights_time_ms: 6.008
      sample_processing_time_ms: 17.375
      sample_time_ms: 17.394
      train_time_ms: 17.394
    train_throughput: 5749.008
  iterations_since_restore: 1
  node_ip: 192.168.1.100
  num_metric_batches_dropped: 0
  pid: 21621
  policy_reward_mean: {}
  time_since_restore: 10.150846004486084
  time_this_iter_s: 10.150846004486084
  time_total_s: 10.150846004486084
  timestamp: 1541178714
  timesteps_since_restore: 26100
  timesteps_this_iter: 26100
  timesteps_total: 26100
  training_iteration: 1

The text was updated successfully, but these errors were encountered:

## What do these changes do? Clean up the checkpointing to handle the new checkpoint dirs. Add a test for rollout.py ## Related issue number #3206 #3204

richardliaw · 2018-11-06T06:57:04Z

I think the issue here is that if you specify a config, you can't override arguments. You'll have to include restore as a parameter in your config file.

This is something discussed in #2986, but I guess we never put in a warning...

Try it out and let me know if you run into anything.

rnunziata · 2018-11-06T16:02:36Z

sorry it says that in the train.py i missed that ...

ericl mentioned this issue Nov 2, 2018

[rllib] Fix rllib rollouts script and add test #3211

Merged

ericl added this to Needs triage in RLlib via automation Nov 2, 2018

richardliaw pushed a commit that referenced this issue Nov 5, 2018

[rllib] Fix rllib rollouts script and add test (#3211)

813f517

## What do these changes do? Clean up the checkpointing to handle the new checkpoint dirs. Add a test for rollout.py ## Related issue number #3206 #3204

richardliaw changed the title ~~restore from checkpoint using train.py~~ [rllib] restore from checkpoint using train.py Nov 6, 2018

rnunziata closed this as completed Nov 6, 2018

RLlib automation moved this from Needs triage to Done Nov 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rllib] restore from checkpoint using train.py #3204

[rllib] restore from checkpoint using train.py #3204

rnunziata commented Nov 2, 2018

richardliaw commented Nov 6, 2018 •

edited

rnunziata commented Nov 6, 2018 •

edited

[rllib] restore from checkpoint using train.py #3204

[rllib] restore from checkpoint using train.py #3204

Comments

rnunziata commented Nov 2, 2018

richardliaw commented Nov 6, 2018 • edited

rnunziata commented Nov 6, 2018 • edited

richardliaw commented Nov 6, 2018 •

edited

rnunziata commented Nov 6, 2018 •

edited