Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuing training on a previous trained model #599

Closed
venkatesh-chinni opened this issue Dec 4, 2019 · 13 comments
Closed

Continuing training on a previous trained model #599

venkatesh-chinni opened this issue Dec 4, 2019 · 13 comments
Labels
question Further information is requested RTFM Answer is the documentation

Comments

@venkatesh-chinni
Copy link

venkatesh-chinni commented Dec 4, 2019

Hi, I have trained an agent using PPO2 for 10000 steps and saved the model . I feel that the model can be improved by letting it train for more episodes. So I want to load this model and continue training on the loaded model which is already trained for 10000 steps. I have gone through the documentation but could find anything related to this. Is a feature available currently in stable baselines for this?

@araffin araffin added question Further information is requested RTFM Answer is the documentation labels Dec 4, 2019
@araffin
Copy link
Collaborator

araffin commented Dec 4, 2019

Please read the documentation more carefully ;)
You have an example of what you are looking for here: https://stable-baselines.readthedocs.io/en/master/guide/examples.html abd here: https://stable-baselines.readthedocs.io/en/master/guide/examples.html#continual-learning

@venkatesh-chinni
Copy link
Author

I am training a model and saving it

model = PPO2(CustomPolicy,env,gamma=1, n_steps=132, ent_coef=0.01,
             learning_rate=2.5e-4, vf_coef=0.5, max_grad_norm=0.5, lam=0.95,
             nminibatches=4, noptepochs=4, cliprange=0.2, cliprange_vf=None,
             verbose=0, tensorboard_log="./03_12_2019_logs/", _init_setup_model=True,
             policy_kwargs=None, full_tensorboard_log=False)

model.learn(total_timesteps=10000,callback=None, seed=None,
            log_interval=1, tb_log_name="Logs", reset_num_timesteps=True)

model.save("agent_03_12_2019")

So now if I wish to continue with the training on the same environment as earlier, I am supposed to load the model and continue with the training. Say like

model = PPO2.load("agent_03_12_2019")
model.learn(total_timesteps=10000,callback=None, seed=None,
            log_interval=1, tb_log_name="Logs", reset_num_timesteps=True)

So will this continue the training ?

@venkatesh-chinni
Copy link
Author

venkatesh-chinni commented Dec 4, 2019

okay, i need to use model.set_env(env) before model.learn. Thanks.

But one thing I found missing is , when I trained my model for first time I have integrated it with tensorbaord, now when I do the continual training, the tensorboard graphs are not updated for the new timesteps. Am I missing something ?

@araffin
Copy link
Collaborator

araffin commented Dec 4, 2019

the tensorboard graphs are not updated for the new timesteps

again please read the doc about tensorboard integration, we cover that issue.

EDIT: you may need to set num_timesteps manually to continue properly the graphs

@matthew-hsr
Copy link

matthew-hsr commented Dec 10, 2019

@araffin

Thanks a lot for your answer!

I must be missing something though - reading the code, I don't see how set_env makes it learn continuously. It seems like set_env in base_class.py only changes self.envs and self.env, and reading the code for Runner and learn (in particular ppo2.py), I can't find the code that makes it learn continuously instead of starting fresh.

Could you elaborate on which part of the code makes it learn continuously? Is it related to the variable _init_setup_model somewhere?

Thanks in advance!

@venkatesh-chinni
Copy link
Author

venkatesh-chinni commented Dec 28, 2019

I have tried with

model = PPO2(CustomPolicy,env,gamma=1, n_steps=132, ent_coef=0.01,
             learning_rate=2.5e-4, vf_coef=0.5, max_grad_norm=0.5, lam=0.95,
             nminibatches=4, noptepochs=4, cliprange=0.2, cliprange_vf=None,
             verbose=0, tensorboard_log="./03_12_2019_logs/", _init_setup_model=True,
             policy_kwargs=None, full_tensorboard_log=False)

model.learn(total_timesteps=10000,callback=None, seed=None,
            log_interval=1, tb_log_name="Logs", reset_num_timesteps=False)

model.save("agent_03_12_2019")
model = PPO2.load("agent_03_12_2019")

n_cpu = 8
env = DummyVecEnv([lambda: AHUenv() for i in range(n_cpu)])
model.set_env(env)


model.learn(total_timesteps=2000000,callback=None, seed=None,
            log_interval=1, tb_log_name="Logs", reset_num_timesteps=False)

model.save("agent_03_12_2019_continued_training_1")

With this the continual training is happening but the tensor board graphs are being updated. I have manually changed the reset_num_timesteps to False. but still the tensorboard graphs are not updated.

@cevans3098
Copy link

cevans3098 commented Dec 30, 2019

I'm working on some similar code, but I am am having issues. I believe the vectorized environments are not being closed correctly or reinitialized properly:

  File "C:\Anaconda3\envs\envTF1\lib\site-packages\stable_baselines\ppo2\ppo2.py", line 442, in __init__
    super().__init__(env=env, model=model, n_steps=n_steps)
  File "C:\Anaconda3\envs\envTF1\lib\site-packages\stable_baselines\common\runners.py", line 19, in __init__
    self.obs[:] = env.reset()
  File "C:\Anaconda3\envs\envTF1\lib\site-packages\stable_baselines\common\vec_env\subproc_vec_env.py", line 111, in reset
    remote.send(('reset', None))
  File "C:\Anaconda3\envs\envTF1\lib\multiprocessing\connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "C:\Anaconda3\envs\envTF1\lib\multiprocessing\connection.py", line 280, in _send_bytes
    ov, err = _winapi.WriteFile(self._handle, buf, overlapped=True)
BrokenPipeError: [WinError 232] The pipe is being closed

a sampling of my code is shown below:

    env = SubprocVecEnv(env_list)

    model = PPO2(policy ='CustomPolicy', env = env, verbose = 1, 
                 vf_coef = VF_COEFF,
                 noptepochs = EPOCHS,
                 ent_coef = ENT_COEFF,
                 learning_rate = LEARNING_RATE,
                 tensorboard_log = tensorboard_log_location,
                 n_steps = NSTEPS,
                 nminibatches = MINIBATCHES)

    model.save(results_folder + run_name)

    # Training the model
    for i in range(number_training_steps):
        logname = run_name + '_' + str(i)
        model.learn(total_timesteps = int((total_timesteps/number_training_steps)),
                    reset_num_timesteps = False,
                    tb_log_name = logname)
        
        env.close()
        
        path = results_folder + logname
        model.save(path)


        if i < number_training_steps:
            env = SubprocVecEnv(env_list)
            model.load(load_path=path, env=env)

The the first training will complete, but when the model attempts to execute the learn method on the second iteration, the BrokenPipeError: [WinError 232] The pipe is being closed is thrown.

Not sure what this error means or how to resolve the problem. pointing me towards documentation or pointing out coding mistakes would be appreciated

Configuration:
python 3.6
stable-baselines: 2.8
tensorflow: 1.14

EDIT:
I resolved this problem by using model.set_env(env)

@araffin araffin closed this as completed Jan 19, 2020
@yjc765
Copy link

yjc765 commented Feb 11, 2020

I have tried with

model = PPO2(CustomPolicy,env,gamma=1, n_steps=132, ent_coef=0.01,
             learning_rate=2.5e-4, vf_coef=0.5, max_grad_norm=0.5, lam=0.95,
             nminibatches=4, noptepochs=4, cliprange=0.2, cliprange_vf=None,
             verbose=0, tensorboard_log="./03_12_2019_logs/", _init_setup_model=True,
             policy_kwargs=None, full_tensorboard_log=False)

model.learn(total_timesteps=10000,callback=None, seed=None,
            log_interval=1, tb_log_name="Logs", reset_num_timesteps=False)

model.save("agent_03_12_2019")
model = PPO2.load("agent_03_12_2019")

n_cpu = 8
env = DummyVecEnv([lambda: AHUenv() for i in range(n_cpu)])
model.set_env(env)


model.learn(total_timesteps=2000000,callback=None, seed=None,
            log_interval=1, tb_log_name="Logs", reset_num_timesteps=False)

model.save("agent_03_12_2019_continued_training_1")

With this the continual training is happening but the tensor board graphs are being updated. I have manually changed the reset_num_timesteps to False. but still the tensorboard graphs are not updated.

Hello, have you solved the problem that the tensorboard is not updated? Thanks!

@jtromans
Copy link

jtromans commented Apr 8, 2020

@araffin I note that you mentioned:
"EDIT: you may need to set num_timesteps manually to continue properly the graphs"

Where do you set this parameter? Obviously it is not an argument to the PPO learn.

@SheilaGLZ
Copy link

I could solve the tensorboard problem. When loading the model, set tensorboard_log too:
model = PPO2.load(model_path, tensorboard_log="some_name")
After that I could see my logs again

@anguyenbus
Copy link

Hi,

my case is: if have a changing environment like a RNN network, and the reinforcment learning agent is to control whether I should mask the input. (inout - input*mask) and mask is [0,1] discrete.

So what I did is that I train RNN for some epochs, and RNN is the observation of agent.

Agent will then train by model.learn(5000), RNN is in infernce mode when model is trained

Then I go back and train RNN with model.predict(deterministic= True) for predicting the mask.

I am not sure if the model will sample from the updated RNN environment?

@Damuna
Copy link

Damuna commented Aug 15, 2022

To sum up (I'm stupid so it took me a while to put everything together), the working code is something like this:

#Model (if no model is saved)
model = PPO('MlpPolicy', env, verbose=1, tensorboard_log=logdir)		#Multi-layer perceptron

#continue training (Path to the last saved model)
model_path = f"models/1660556760/5000"
log_path = f"logs/PPO/1660556760/5000"
model = PPO.load(model_path, tensorboard_log=log_path)
model.set_env(env)''

Comment the second part the first time you run the code, and the first one the other times, then learn the model.

@milliyang

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested RTFM Answer is the documentation
Projects
None yet
Development

No branches or pull requests

10 participants