New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continuing training on a previous trained model #599
Comments
Please read the documentation more carefully ;) |
I am training a model and saving it
So now if I wish to continue with the training on the same environment as earlier, I am supposed to load the model and continue with the training. Say like
So will this continue the training ? |
okay, i need to use But one thing I found missing is , when I trained my model for first time I have integrated it with tensorbaord, now when I do the continual training, the tensorboard graphs are not updated for the new timesteps. Am I missing something ? |
again please read the doc about tensorboard integration, we cover that issue. EDIT: you may need to set |
Thanks a lot for your answer! I must be missing something though - reading the code, I don't see how Could you elaborate on which part of the code makes it learn continuously? Is it related to the variable Thanks in advance! |
I have tried with
With this the continual training is happening but the tensor board graphs are being updated. I have manually changed the |
I'm working on some similar code, but I am am having issues. I believe the vectorized environments are not being closed correctly or reinitialized properly:
a sampling of my code is shown below: env = SubprocVecEnv(env_list)
model = PPO2(policy ='CustomPolicy', env = env, verbose = 1,
vf_coef = VF_COEFF,
noptepochs = EPOCHS,
ent_coef = ENT_COEFF,
learning_rate = LEARNING_RATE,
tensorboard_log = tensorboard_log_location,
n_steps = NSTEPS,
nminibatches = MINIBATCHES)
model.save(results_folder + run_name)
# Training the model
for i in range(number_training_steps):
logname = run_name + '_' + str(i)
model.learn(total_timesteps = int((total_timesteps/number_training_steps)),
reset_num_timesteps = False,
tb_log_name = logname)
env.close()
path = results_folder + logname
model.save(path)
if i < number_training_steps:
env = SubprocVecEnv(env_list)
model.load(load_path=path, env=env) The the first training will complete, but when the model attempts to execute the learn method on the second iteration, the BrokenPipeError: [WinError 232] The pipe is being closed is thrown. Not sure what this error means or how to resolve the problem. pointing me towards documentation or pointing out coding mistakes would be appreciated Configuration: EDIT: |
Hello, have you solved the problem that the tensorboard is not updated? Thanks! |
@araffin I note that you mentioned: Where do you set this parameter? Obviously it is not an argument to the PPO learn. |
I could solve the tensorboard problem. When loading the model, set |
Hi, my case is: if have a changing environment like a RNN network, and the reinforcment learning agent is to control whether I should mask the input. (inout - input*mask) and mask is [0,1] discrete. So what I did is that I train RNN for some epochs, and RNN is the observation of agent. Agent will then train by model.learn(5000), RNN is in infernce mode when model is trained Then I go back and train RNN with model.predict(deterministic= True) for predicting the mask. I am not sure if the model will sample from the updated RNN environment? |
To sum up (I'm stupid so it took me a while to put everything together), the working code is something like this:
Comment the second part the first time you run the code, and the first one the other times, then learn the model. |
Hi, I have trained an agent using PPO2 for 10000 steps and saved the model . I feel that the model can be improved by letting it train for more episodes. So I want to load this model and continue training on the loaded model which is already trained for 10000 steps. I have gone through the documentation but could find anything related to this. Is a feature available currently in stable baselines for this?
The text was updated successfully, but these errors were encountered: