Table 1 shows the values of the parameters involved in the learning of π. For each domain, we describe the state space S and the action space A, the algorithm we use to learn the policy π, the learning rate α, and the maximum number of steps per episode K. Additionally, we set the number of episodes H to 1000 and the discount factor γ to 0.99 in all the domains.
-
Notifications
You must be signed in to change notification settings - Fork 0
Data used in the work Clustering-based Attack Detection for Adversarial Reinforcement Learning
License
rmajadas/Adversarial-detector
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Data used in the work Clustering-based Attack Detection for Adversarial Reinforcement Learning
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published