New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) implementation #4
Comments
Thanks a lot for your interest in this project. The PowerGym is an environment like Atari games. It does not include the any controller except for a random agent. Will close this issue for now. Feel free to reopen it. |
Thank you so much for your kind response. I have another doubt, in your
research paper in equation (1) you have stated the physical networked
constraints where we need to have line resistance and reactance. In the
real case scenario i.e. in real transmission lines how do we get those
values for each time interval?
…On Tue, May 17, 2022, 12:32 AM bruinWang ***@***.***> wrote:
Thanks a lot for your interest in this project. The PowerGym is an
environment like Atari games. It does not include the any controller except
for a random agent.
PowerGym adopts a standard OpenAI Gym interface. Though we did not include
the PPO and SAC agents mentioned in the paper in this repo, you can try
some existing well developed toolboxes such as OpenAI Spinning Up, stable
baseline etc.
Of course, if you are interested particularly in the PPO and SAC agents
used in this paper, here are some more details: PPO
<https://arxiv.org/pdf/2109.12073.pdf>, SAC
<https://arxiv.org/pdf/2109.08512.pdf>. We haven't published these codes
yet. If we do, we will link to this repo.
Will close this issue for now. Feel free to reopen it.
—
Reply to this email directly, view it on GitHub
<#4 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AV3724XFL2LLFTEL73LC47DVKKLULANCNFSM5V7FXPYQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
@sahasubhajit Sorry for my late response. I was on vacation. In real-life scenarios, line resistance and reactance are given as system configurations, and they do not change over different time intervals. Technically, these parameters can change over years, however, as the duration of the RL task is usually 24 hours, the line resistance and reactance are assumed to be the same over intervals. |
I have just gone through the paper "PowerGym: A Reinforcement Learning Environment for Volt-Var Control in Power Distribution Systems". I have found it insightful and thanks for sharing this repository. I have understood that the file
random_agent.py
performs random actions but I was also looking for agents trained by Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) and the training implementation (as mentioned in the paper) codes. Do I need to code those or other reinforcement learning by myself using the action space and observation space mentioned atenv.py
? Or I have missed those training codes?The text was updated successfully, but these errors were encountered: