Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) implementation #4

Closed
sahasubhajit opened this issue May 15, 2022 · 3 comments

Comments

@sahasubhajit
Copy link

I have just gone through the paper "PowerGym: A Reinforcement Learning Environment for Volt-Var Control in Power Distribution Systems". I have found it insightful and thanks for sharing this repository. I have understood that the file random_agent.py performs random actions but I was also looking for agents trained by Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) and the training implementation (as mentioned in the paper) codes. Do I need to code those or other reinforcement learning by myself using the action space and observation space mentioned at env.py ? Or I have missed those training codes?

@bruinWang
Copy link
Contributor

Thanks a lot for your interest in this project. The PowerGym is an environment like Atari games. It does not include the any controller except for a random agent.
PowerGym adopts a standard OpenAI Gym interface. Though we did not include the PPO and SAC agents mentioned in the paper in this repo, you can try some existing well developed toolboxes such as OpenAI Spinning Up, stable baseline etc.
Of course, if you are interested particularly in the PPO and SAC agents used in this paper, here are some more details: PPO, SAC. We haven't published these codes yet. If we do, we will link to this repo.

Will close this issue for now. Feel free to reopen it.

@sahasubhajit
Copy link
Author

sahasubhajit commented Oct 11, 2022 via email

@bruinWang
Copy link
Contributor

@sahasubhajit Sorry for my late response. I was on vacation. In real-life scenarios, line resistance and reactance are given as system configurations, and they do not change over different time intervals. Technically, these parameters can change over years, however, as the duration of the RL task is usually 24 hours, the line resistance and reactance are assumed to be the same over intervals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants