You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am testing the chain environment with 10 nodes in interactive mode. The minimum amount of steps required to get the flag is 22 and the respective total_reward from the environment is 2154.0. This is achieved with 11 local exploits and 11 "connect and infect" actions.
What I don't understand is how the reported cumulative rewards in the baselines in notebook_benchmark-chain.ipynb reach 6000.0 or more. How is this possible?
In the DQL exploit run there is this printout for example:
Can you exploit how does the DQL agent get a reward of 6099.0 in 22 steps? I would appreciate a clarification since it is needed for a paper comparison.
Thank you!
The text was updated successfully, but these errors were encountered:
It seems that when the Gym environment is initialized the following is passed as a winning reward: winning_reward=5000.0 (line 396 in cyberbattle_env.py). If the agent reaches the goal this is added as a reward overriding the value of the node. This does not happen when you play the interactive game, which returns the string "FLAG: flag discovered!" and just adds the node value which is 1000.
Can someone please confirm that this explanation is correct and it is a matter of just adding 4000 to the final reward to compare the results?
@MariaRigaki The explanation is correct. The attacker gets the final winning reward if one is specified when the environment is created. The reason why this does not happen in the interactive game is because in notebook_benchmark-chain.ipynb the environment that gets instantiated is CyberBattleToyCtf-v0. This environment is defined in __init__.py as:
Hello, I am testing the chain environment with 10 nodes in interactive mode. The minimum amount of steps required to get the flag is 22 and the respective
total_reward
from the environment is 2154.0. This is achieved with 11 local exploits and 11 "connect and infect" actions.What I don't understand is how the reported cumulative rewards in the baselines in
notebook_benchmark-chain.ipynb
reach 6000.0 or more. How is this possible?In the DQL exploit run there is this printout for example:
Can you exploit how does the DQL agent get a reward of 6099.0 in 22 steps? I would appreciate a clarification since it is needed for a paper comparison.
Thank you!
The text was updated successfully, but these errors were encountered: