Continuing training on a previous trained model #599

venkatesh-chinni · 2019-12-04T12:26:00Z

Hi, I have trained an agent using PPO2 for 10000 steps and saved the model . I feel that the model can be improved by letting it train for more episodes. So I want to load this model and continue training on the loaded model which is already trained for 10000 steps. I have gone through the documentation but could find anything related to this. Is a feature available currently in stable baselines for this?

araffin · 2019-12-04T12:51:58Z

Please read the documentation more carefully ;)
You have an example of what you are looking for here: https://stable-baselines.readthedocs.io/en/master/guide/examples.html abd here: https://stable-baselines.readthedocs.io/en/master/guide/examples.html#continual-learning

venkatesh-chinni · 2019-12-04T13:57:19Z

I am training a model and saving it

model = PPO2(CustomPolicy,env,gamma=1, n_steps=132, ent_coef=0.01,
             learning_rate=2.5e-4, vf_coef=0.5, max_grad_norm=0.5, lam=0.95,
             nminibatches=4, noptepochs=4, cliprange=0.2, cliprange_vf=None,
             verbose=0, tensorboard_log="./03_12_2019_logs/", _init_setup_model=True,
             policy_kwargs=None, full_tensorboard_log=False)

model.learn(total_timesteps=10000,callback=None, seed=None,
            log_interval=1, tb_log_name="Logs", reset_num_timesteps=True)

model.save("agent_03_12_2019")

So now if I wish to continue with the training on the same environment as earlier, I am supposed to load the model and continue with the training. Say like

model = PPO2.load("agent_03_12_2019")
model.learn(total_timesteps=10000,callback=None, seed=None,
            log_interval=1, tb_log_name="Logs", reset_num_timesteps=True)

So will this continue the training ?

venkatesh-chinni · 2019-12-04T15:15:04Z

okay, i need to use model.set_env(env) before model.learn. Thanks.

But one thing I found missing is , when I trained my model for first time I have integrated it with tensorbaord, now when I do the continual training, the tensorboard graphs are not updated for the new timesteps. Am I missing something ?

araffin · 2019-12-04T16:00:10Z

the tensorboard graphs are not updated for the new timesteps

again please read the doc about tensorboard integration, we cover that issue.

EDIT: you may need to set num_timesteps manually to continue properly the graphs

matthew-hsr · 2019-12-10T16:28:34Z

@araffin

Thanks a lot for your answer!

I must be missing something though - reading the code, I don't see how set_env makes it learn continuously. It seems like set_env in base_class.py only changes self.envs and self.env, and reading the code for Runner and learn (in particular ppo2.py), I can't find the code that makes it learn continuously instead of starting fresh.

Could you elaborate on which part of the code makes it learn continuously? Is it related to the variable _init_setup_model somewhere?

Thanks in advance!

venkatesh-chinni · 2019-12-28T07:00:39Z

I have tried with

model = PPO2(CustomPolicy,env,gamma=1, n_steps=132, ent_coef=0.01,
             learning_rate=2.5e-4, vf_coef=0.5, max_grad_norm=0.5, lam=0.95,
             nminibatches=4, noptepochs=4, cliprange=0.2, cliprange_vf=None,
             verbose=0, tensorboard_log="./03_12_2019_logs/", _init_setup_model=True,
             policy_kwargs=None, full_tensorboard_log=False)

model.learn(total_timesteps=10000,callback=None, seed=None,
            log_interval=1, tb_log_name="Logs", reset_num_timesteps=False)

model.save("agent_03_12_2019")

model = PPO2.load("agent_03_12_2019")

n_cpu = 8
env = DummyVecEnv([lambda: AHUenv() for i in range(n_cpu)])
model.set_env(env)


model.learn(total_timesteps=2000000,callback=None, seed=None,
            log_interval=1, tb_log_name="Logs", reset_num_timesteps=False)

model.save("agent_03_12_2019_continued_training_1")

With this the continual training is happening but the tensor board graphs are being updated. I have manually changed the reset_num_timesteps to False. but still the tensorboard graphs are not updated.

cevans3098 · 2019-12-30T00:02:59Z

I'm working on some similar code, but I am am having issues. I believe the vectorized environments are not being closed correctly or reinitialized properly:

  File "C:\Anaconda3\envs\envTF1\lib\site-packages\stable_baselines\ppo2\ppo2.py", line 442, in __init__
    super().__init__(env=env, model=model, n_steps=n_steps)
  File "C:\Anaconda3\envs\envTF1\lib\site-packages\stable_baselines\common\runners.py", line 19, in __init__
    self.obs[:] = env.reset()
  File "C:\Anaconda3\envs\envTF1\lib\site-packages\stable_baselines\common\vec_env\subproc_vec_env.py", line 111, in reset
    remote.send(('reset', None))
  File "C:\Anaconda3\envs\envTF1\lib\multiprocessing\connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "C:\Anaconda3\envs\envTF1\lib\multiprocessing\connection.py", line 280, in _send_bytes
    ov, err = _winapi.WriteFile(self._handle, buf, overlapped=True)
BrokenPipeError: [WinError 232] The pipe is being closed

a sampling of my code is shown below:

    env = SubprocVecEnv(env_list)

    model = PPO2(policy ='CustomPolicy', env = env, verbose = 1, 
                 vf_coef = VF_COEFF,
                 noptepochs = EPOCHS,
                 ent_coef = ENT_COEFF,
                 learning_rate = LEARNING_RATE,
                 tensorboard_log = tensorboard_log_location,
                 n_steps = NSTEPS,
                 nminibatches = MINIBATCHES)

    model.save(results_folder + run_name)

    # Training the model
    for i in range(number_training_steps):
        logname = run_name + '_' + str(i)
        model.learn(total_timesteps = int((total_timesteps/number_training_steps)),
                    reset_num_timesteps = False,
                    tb_log_name = logname)
        
        env.close()
        
        path = results_folder + logname
        model.save(path)


        if i < number_training_steps:
            env = SubprocVecEnv(env_list)
            model.load(load_path=path, env=env)

The the first training will complete, but when the model attempts to execute the learn method on the second iteration, the BrokenPipeError: [WinError 232] The pipe is being closed is thrown.

Not sure what this error means or how to resolve the problem. pointing me towards documentation or pointing out coding mistakes would be appreciated

Configuration:
python 3.6
stable-baselines: 2.8
tensorflow: 1.14

EDIT:
I resolved this problem by using model.set_env(env)

yjc765 · 2020-02-11T13:06:55Z

I have tried with

model = PPO2(CustomPolicy,env,gamma=1, n_steps=132, ent_coef=0.01,
             learning_rate=2.5e-4, vf_coef=0.5, max_grad_norm=0.5, lam=0.95,
             nminibatches=4, noptepochs=4, cliprange=0.2, cliprange_vf=None,
             verbose=0, tensorboard_log="./03_12_2019_logs/", _init_setup_model=True,
             policy_kwargs=None, full_tensorboard_log=False)

model.learn(total_timesteps=10000,callback=None, seed=None,
            log_interval=1, tb_log_name="Logs", reset_num_timesteps=False)

model.save("agent_03_12_2019")

model = PPO2.load("agent_03_12_2019")

n_cpu = 8
env = DummyVecEnv([lambda: AHUenv() for i in range(n_cpu)])
model.set_env(env)


model.learn(total_timesteps=2000000,callback=None, seed=None,
            log_interval=1, tb_log_name="Logs", reset_num_timesteps=False)

model.save("agent_03_12_2019_continued_training_1")

With this the continual training is happening but the tensor board graphs are being updated. I have manually changed the reset_num_timesteps to False. but still the tensorboard graphs are not updated.

Hello, have you solved the problem that the tensorboard is not updated? Thanks!

jtromans · 2020-04-08T15:02:26Z

@araffin I note that you mentioned:
"EDIT: you may need to set num_timesteps manually to continue properly the graphs"

Where do you set this parameter? Obviously it is not an argument to the PPO learn.

SheilaGLZ · 2020-05-28T01:49:14Z

I could solve the tensorboard problem. When loading the model, set tensorboard_log too:
model = PPO2.load(model_path, tensorboard_log="some_name")
After that I could see my logs again

anguyenbus · 2020-09-12T05:35:30Z

Hi,

my case is: if have a changing environment like a RNN network, and the reinforcment learning agent is to control whether I should mask the input. (inout - input*mask) and mask is [0,1] discrete.

So what I did is that I train RNN for some epochs, and RNN is the observation of agent.

Agent will then train by model.learn(5000), RNN is in infernce mode when model is trained

Then I go back and train RNN with model.predict(deterministic= True) for predicting the mask.

I am not sure if the model will sample from the updated RNN environment?

Damuna · 2022-08-15T09:58:44Z

To sum up (I'm stupid so it took me a while to put everything together), the working code is something like this:

#Model (if no model is saved)
model = PPO('MlpPolicy', env, verbose=1, tensorboard_log=logdir)		#Multi-layer perceptron

#continue training (Path to the last saved model)
model_path = f"models/1660556760/5000"
log_path = f"logs/PPO/1660556760/5000"
model = PPO.load(model_path, tensorboard_log=log_path)
model.set_env(env)''

Comment the second part the first time you run the code, and the first one the other times, then learn the model.

araffin added question Further information is requested RTFM Answer is the documentation labels Dec 4, 2019

matthew-hsr mentioned this issue Dec 12, 2019

[Question] Default activation function for MLP Policy #616

Closed

araffin closed this as completed Jan 19, 2020

Miffyli mentioned this issue Feb 17, 2020

Possibility to resume training #692

Closed

araffin mentioned this issue Mar 12, 2020

How to retrain the saved model? #739

Closed

araffin mentioned this issue May 20, 2020

Should TensorboardWriter close its tf.summary.FileWriter? #855

Open

fqidz mentioned this issue Mar 29, 2024

Cannot load DQN model LucasAlegre/sumo-rl#195

Closed

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continuing training on a previous trained model #599

Continuing training on a previous trained model #599

venkatesh-chinni commented Dec 4, 2019 •

edited

araffin commented Dec 4, 2019

venkatesh-chinni commented Dec 4, 2019

venkatesh-chinni commented Dec 4, 2019 •

edited

araffin commented Dec 4, 2019 •

edited

matthew-hsr commented Dec 10, 2019 •

edited

venkatesh-chinni commented Dec 28, 2019 •

edited

cevans3098 commented Dec 30, 2019 •

edited

yjc765 commented Feb 11, 2020

jtromans commented Apr 8, 2020

SheilaGLZ commented May 28, 2020

anguyenbus commented Sep 12, 2020

Damuna commented Aug 15, 2022 •

edited

This comment was marked as off-topic.

Continuing training on a previous trained model #599

Continuing training on a previous trained model #599

Comments

venkatesh-chinni commented Dec 4, 2019 • edited

araffin commented Dec 4, 2019

venkatesh-chinni commented Dec 4, 2019

venkatesh-chinni commented Dec 4, 2019 • edited

araffin commented Dec 4, 2019 • edited

matthew-hsr commented Dec 10, 2019 • edited

venkatesh-chinni commented Dec 28, 2019 • edited

cevans3098 commented Dec 30, 2019 • edited

yjc765 commented Feb 11, 2020

jtromans commented Apr 8, 2020

SheilaGLZ commented May 28, 2020

anguyenbus commented Sep 12, 2020

Damuna commented Aug 15, 2022 • edited

This comment was marked as off-topic.

venkatesh-chinni commented Dec 4, 2019 •

edited

venkatesh-chinni commented Dec 4, 2019 •

edited

araffin commented Dec 4, 2019 •

edited

matthew-hsr commented Dec 10, 2019 •

edited

venkatesh-chinni commented Dec 28, 2019 •

edited

cevans3098 commented Dec 30, 2019 •

edited

Damuna commented Aug 15, 2022 •

edited