Question about rewards in chain environment #105

MariaRigaki · 2023-07-24T17:56:18Z

Hello, I am testing the chain environment with 10 nodes in interactive mode. The minimum amount of steps required to get the flag is 22 and the respective total_reward from the environment is 2154.0. This is achieved with 11 local exploits and 11 "connect and infect" actions.

What I don't understand is how the reported cumulative rewards in the baselines in notebook_benchmark-chain.ipynb reach 6000.0 or more. How is this possible?

In the DQL exploit run there is this printout for example:

Episode 2|Iteration 22|reward: 6099.0|last_reward_at:   22|Elapsed Time: 0:00:00||

  Episode 2 ended at t=22 
  Breakdown [Reward/NoReward (Success rate)]
    explore-local: 0/0 (NaN)
    explore-remote: 0/0 (NaN)
    explore-connect: 0/0 (NaN)
    exploit-local: 11/0 (1.00)
    exploit-remote: 0/0 (NaN)
    exploit-connect: 11/0 (1.00)
  exploit deflected to exploration: 0

Can you exploit how does the DQL agent get a reward of 6099.0 in 22 steps? I would appreciate a clarification since it is needed for a paper comparison.

Thank you!

The text was updated successfully, but these errors were encountered:

MariaRigaki · 2023-07-25T07:10:18Z

It seems that when the Gym environment is initialized the following is passed as a winning reward:
winning_reward=5000.0 (line 396 in cyberbattle_env.py). If the agent reaches the goal this is added as a reward overriding the value of the node. This does not happen when you play the interactive game, which returns the string "FLAG: flag discovered!" and just adds the node value which is 1000.

Can someone please confirm that this explanation is correct and it is a matter of just adding 4000 to the final reward to compare the results?

blumu · 2023-10-13T18:10:08Z

@MariaRigaki The explanation is correct. The attacker gets the final winning reward if one is specified when the environment is created. The reason why this does not happen in the interactive game is because in notebook_benchmark-chain.ipynb the environment that gets instantiated is CyberBattleToyCtf-v0. This environment is defined in __init__.py as:

register(
    id='CyberBattleToyCtf-v0',
    cyberbattle_env_identifiers=toy_ctf.ENV_IDENTIFIERS,
    entry_point='cyberbattle._env.cyberbattle_toyctf:CyberBattleToyCtf',
    kwargs={'defender_agent': None,
            'attacker_goal': AttackerGoal(own_atleast=6),
            'defender_goal': DefenderGoal(eviction=True)
            },
)

which does not have a winning_reward parameter.

blumu added the question Further information is requested label Oct 13, 2023

blumu closed this as completed Oct 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about rewards in chain environment #105

Question about rewards in chain environment #105

MariaRigaki commented Jul 24, 2023 •

edited

MariaRigaki commented Jul 25, 2023 •

edited

blumu commented Oct 13, 2023

Question about rewards in chain environment #105

Question about rewards in chain environment #105

Comments

MariaRigaki commented Jul 24, 2023 • edited

MariaRigaki commented Jul 25, 2023 • edited

blumu commented Oct 13, 2023

MariaRigaki commented Jul 24, 2023 •

edited

MariaRigaki commented Jul 25, 2023 •

edited