-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
steps_per_epoch in DDPG. #776
Comments
Similar to #352, what is the definition of "epochs" here? |
Thank you for your reply! I meant one episode as one a sequence of states, actions and rewards, which ends with terminal state. I just wonder if we can set the length of this sequence in DDPG algorithm, something like 20, which means the agent can only interact with the environment for 20 steps. And we reset the environment after 20 steps, and repeat so on. |
I am still rather uncertain what is it you want to achieve, exactly. The naming of DDPG parameters can be bit vague: |
Thanks for helping me understand the parameters. It is getting closer. I attached a screenshot of DDPG alg from original paper (https://arxiv.org/pdf/1509.02971.pdf). |
T usually, and in this case, signifies the end of the episode. So the action selection, storing, network optimisation and target update occurs once per environment step. So when the episode has finished, the noise and the environment are reset. This is done here: stable-baselines/stable_baselines/ddpg/ddpg.py Lines 831 to 847 in 950c2a5
and here: stable-baselines/stable_baselines/ddpg/ddpg.py Lines 934 to 951 in 950c2a5
|
Thanks for the reply! Exactly what I am asking for! If I understand correctly, ddpg in stable_baselines can only end of the episode if done is True, which in some cases means the reward reaches its maximum or the policy is finely tuned. I feel this is slightly different from the original algorithms, which can terminate the episode after fixed number of steps, T, without caring the reward or policies. Especially, for some complex environment, it might take really long time till "done" is True. Is there a way to predefine the length of episodes in my script (not changing stable-baselines/stable_baselines/ddpg/ddpg.py). Looking forward to the comments! |
The |
Hi, I saw in openai spinups
which specifies the number of steps in each episode/epoch. Is there a similar setting in stable_baselines?
Thanks!
The text was updated successfully, but these errors were encountered: