Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about rewards in chain environment #105

Closed
MariaRigaki opened this issue Jul 24, 2023 · 2 comments
Closed

Question about rewards in chain environment #105

MariaRigaki opened this issue Jul 24, 2023 · 2 comments
Labels
question Further information is requested

Comments

@MariaRigaki
Copy link

MariaRigaki commented Jul 24, 2023

Hello, I am testing the chain environment with 10 nodes in interactive mode. The minimum amount of steps required to get the flag is 22 and the respective total_reward from the environment is 2154.0. This is achieved with 11 local exploits and 11 "connect and infect" actions.

What I don't understand is how the reported cumulative rewards in the baselines in notebook_benchmark-chain.ipynb reach 6000.0 or more. How is this possible?

In the DQL exploit run there is this printout for example:

Episode 2|Iteration 22|reward: 6099.0|last_reward_at:   22|Elapsed Time: 0:00:00||

  Episode 2 ended at t=22 
  Breakdown [Reward/NoReward (Success rate)]
    explore-local: 0/0 (NaN)
    explore-remote: 0/0 (NaN)
    explore-connect: 0/0 (NaN)
    exploit-local: 11/0 (1.00)
    exploit-remote: 0/0 (NaN)
    exploit-connect: 11/0 (1.00)
  exploit deflected to exploration: 0

Can you exploit how does the DQL agent get a reward of 6099.0 in 22 steps? I would appreciate a clarification since it is needed for a paper comparison.

Thank you!

@MariaRigaki
Copy link
Author

MariaRigaki commented Jul 25, 2023

It seems that when the Gym environment is initialized the following is passed as a winning reward:
winning_reward=5000.0 (line 396 in cyberbattle_env.py). If the agent reaches the goal this is added as a reward overriding the value of the node. This does not happen when you play the interactive game, which returns the string "FLAG: flag discovered!" and just adds the node value which is 1000.

Can someone please confirm that this explanation is correct and it is a matter of just adding 4000 to the final reward to compare the results?

@blumu blumu added the question Further information is requested label Oct 13, 2023
@blumu
Copy link
Contributor

blumu commented Oct 13, 2023

@MariaRigaki The explanation is correct. The attacker gets the final winning reward if one is specified when the environment is created. The reason why this does not happen in the interactive game is because in notebook_benchmark-chain.ipynb the environment that gets instantiated is CyberBattleToyCtf-v0. This environment is defined in __init__.py as:

register(
    id='CyberBattleToyCtf-v0',
    cyberbattle_env_identifiers=toy_ctf.ENV_IDENTIFIERS,
    entry_point='cyberbattle._env.cyberbattle_toyctf:CyberBattleToyCtf',
    kwargs={'defender_agent': None,
            'attacker_goal': AttackerGoal(own_atleast=6),
            'defender_goal': DefenderGoal(eviction=True)
            },
)

which does not have a winning_reward parameter.

@blumu blumu closed this as completed Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants