Robust Deep RL with Soft Actor Critic approach

Robust Deep RL with a Soft Actor-Critic approach with adversarial perturbation on state observations

I designed new Robust Deep RL with a Soft Actor-Critic approach with adversarial perturbation on state observations. My work is based on SA-MDP, which is proposed by Zhang et al. (2020). For more detailed explanation, please check attached pdf file. **2022 Spring Semester, Personal Project Research _Kyungphil Park

SA-MDP(State Adversarial-MDP)

SA-MDP assumes that the fixed-adversarial attack is the situation of the worst-case with the most minimized Q value following equations, and Zhang et al. (2020) newly define it as a SA-MDP. **Zhang et al. (2020)

SA-SAC Regularizer

SA-SAC

In our work, we need to solve a minimax problem: minimizing the policy loss for a worst case

object function

Codes

I designed Robust Deep RL with a soft actor critic approach in discrete action space. I tested SA-SAC in a several atari gym environments. SAC codes are based on the **bernomone's github codes.

Train SA-SAC agent

At first, make new three directories saved_models, vidoes and Logs.

Before you start training, set n_steps, memory_size, train_start, reg_train_start … at the config01.json file.
n_steps : total nubmer of steps you want to train.
memory_size: buffer memory size
train_start: number of steps when training begins.
reg_train_start: number of steps when training with SA-Regularizer begins.

train.py (train vanilla SAC)

train.py 
	--config=config01.json(default)
	--new=1(default) # set 0 when you load pretrained models
 	--game=BeamRider(default) # set any atari game environment

example: python train.py , python [train.py](http://train.py) —game=Assault

robust_train.py (train SA-SAC)

robust_train.py 
	--config=config01.json(default)
	--new=1(default) # set 0 when you load pretrained models
 	--game=BeamRider(default) # set any atari game environment

example: python robust_train.py , python robust_[train.py](http://train.py) —game=Assault

generate_match_video.py

render atari game video with your trained models.

generate_match_video.py
	--config=config01.json(default)
	--seed=0(default)
  	--game=BeamRider(default) # set any atari game environment 
  	--random=False(default) # set 1 when you want to test random action.

example: python generate_match_video.py, python generate_match_video[.py](http://train.py) —game=Assault --random=1

PGD_generate_video.py

(+ PGD attack(adversarial perturbation on state observation)

render atari game video with your trained models

PGD_generate_video.py
	--config=config01.json(default)
	--seed=0(default)
	--game=BeamRider(default) # set any atari game environment 
  	--steps=10(default) # set PGD attack steps number.

example: python PGD_generate_video.py, python PGD_generate_video[.py](http://train.py) —game=Assault

evalulation.py

test trained models for several episodes.

evalulation.py
	--config=config01.json(default)
	--seed=0(default)
  	--game=BeamRider(default) # set any atari game environment 
  	--iter=10(default) # set iteration number(tot episode number).

example: python evalulation.py, python evalulation[.py](http://train.py) —game=Assault —iter=30

pgd_evalulation.py

(+ PGD attack(adversarial perturbation on state observation)

test trained models for several episodes.

pgd_evalulation.py
	--config=config01.json(default)
	--seed=0(default)
  	--game=BeamRider(default) # set any atari game environment 
  	--iter=10(default) # set iteration number(tot episode number).

example: python pgd_evalulation.py, python pgd_evalulation[.py](http://train.py) —game=Assault —iter=30

Results

References

[1] Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations, Zhang et al. (2020)

[2] Discrete Soft Actor Critic, bernomone's github codes

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
auto_LiRPA		auto_LiRPA
.gitignore		.gitignore
PGD_generate_video.py		PGD_generate_video.py
README.md		README.md
Robust Deep Reinforcement Learning for Soft-Actor-Critic against Adversarial Perturbation .pdf		Robust Deep Reinforcement Learning for Soft-Actor-Critic against Adversarial Perturbation .pdf
config01.json		config01.json
env.yml		env.yml
evaluation.py		evaluation.py
generate_match_video.py		generate_match_video.py
logger.py		logger.py
memories.py		memories.py
neural_nets_SAC.py		neural_nets_SAC.py
pgd_evaulation.py		pgd_evaulation.py
qnet_agentsSAC_auto.py		qnet_agentsSAC_auto.py
robust_train.py		robust_train.py
train.py		train.py

kyungphilDev/Robust-Deep-RL_Soft-Actor-Critic-Approach

Folders and files

Latest commit

History

Repository files navigation

Robust Deep RL with Soft Actor Critic approach

Robust Deep RL with a Soft Actor-Critic approach with adversarial perturbation on state observations

SA-MDP(State Adversarial-MDP)

SA-SAC Regularizer

SA-SAC

Codes

Train SA-SAC agent

train.py (train vanilla SAC)

robust_train.py (train SA-SAC)

generate_match_video.py

PGD_generate_video.py

evalulation.py

pgd_evalulation.py

Results

References

About

Topics

Resources

Stars

Watchers

Forks

Languages