This project implements a Soft Actor Critic (SAC) which can additionally be used as a Spiking Actor Network Soft Actor Critic (SANSAC) via SpikingJelly. This project uses the Bipedal Walker environment from gymnasium but it can be used with other gymnasium environments through modification of input dimensions.
This code also uses wandb for logging data however this code is not integral to the main algorithm.
pip install torch gymnasium matplotlib numpy pynvml wandb spikingjelly
This code also requires cupy and cuda to run for gpu acceleration. These can be disabled if needed.
env = gym.make("BipedalWalker-v3")
agent = Agent(env, hidden_dim=256, hidden_dim2=256, seed=42, SAN=True)
agent.train(9000)
agent.save_model()
avg_reward=agetn.eval_agent()In addition code can also be run through runner.py enabling/disabling wandb as needed.
-
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel & Sergey Levine. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. CoRR, abs/1801.01290 (2018).
⟨http://arxiv.org/abs/1801.01290⟩ -
Wei Fang, Yanqi Chen, Jianhao Ding, Zhaofei Yu, Timothée Masquelier, Ding Chen, Liwei Huang, Huihui Zhou, Guoqi Li & Yonghong Tian. SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligence. Science Advances 9(40): eadi1480 (2023).
doi: 10.1126/sciadv.adi1480 ⟨https://www.science.org/doi/10.1126/sciadv.adi1480⟩ -
Lukas Biewald. Experiment Tracking with Weights and Biases. (2020).
Software: ⟨https://www.wandb.com/⟩ -
Mark Towers, Ariel Kwiatkowski, Jordan Terry, John U. Balis, Gianluca De Cola, Tristan Deleu, Manuel Goulão, Andreas Kallinteris, Markus Krimmel, Arjun KG et al. Gymnasium: A Standard Interface for Reinforcement Learning Environments. arXiv:2407.17032 (2024).
⟨https://arxiv.org/abs/2407.17032⟩