Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The repeated discovery of the same node via different methods leads to reward #18

Closed
jaromiru opened this issue Jun 22, 2021 · 9 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@jaromiru
Copy link
Contributor

jaromiru commented Jun 22, 2021

I noticed by generating a random CyberBattleRandom-v0 environment and using the agent with random actions, that if the same node is discovered with different methods (e.g., traceroute / shared files) it leads to a repeated reward.

It seems as incorrect behaviour, because this way the agent is incentivized to repeatedly discover the same nodes with different actions.

@jaromiru jaromiru changed the title The repeated discovery of the same node via different methods lead to reward The repeated discovery of the same node via different methods leads to reward Jun 22, 2021
@blumu
Copy link
Contributor

blumu commented Jun 22, 2021

@jaromiru That's indeed the current behaviour. We currently only prevent repeated reward when the same attack action is used to discover the node:

return True, ActionResult(reward=0.0 if already_executed else SUCCEEDED_ATTACK_REWARD - vulnerability.cost,

We could address this by returning a reward of 0 if all the nodes revealed by the attack are all already discovered by the agent, though I am not sure what should be the returned reward when only some of the revealed nodes were known. One option is to return a fraction of the reward given by the number of newly discovered nodes divided by the number of nodes revealed by the attack. Alternatively we could use the intrinsic value assigned to each node to weigh the importance of each newly discovered node when calculating the reward. What do you think?

@jaromiru
Copy link
Contributor Author

Ok, let's take a step back. In my opinion, the reward should be based on the results, not on whether an attack was successful or not. The network itself can change (you allow it explicitly e.g., in ExternalRandomEvents DefenderAgent), so the same attack can actually result in different results over time.

In the case of discovered nodes, the reward should also be additive, i.e. the total reward for discovering several nodes should be in total the same, regardless of the exact steps taken to discover them. Neither of the proposed solutions seem right. The first one does not cope with the dynamically changing network, or chances that nodes are actually discovered. The second leaks information about nodes' value through the reward to the agent. A straightforward solution seems to be simply a fixed reward per every newly discovered node.

Since this is a quite substantial design decision, it would be better if more competent people gave this a thought, though.

@blumu
Copy link
Contributor

blumu commented Jun 24, 2021

@jaromiru Good points. Alternatively we could also have the environment assign a 'discovery value' to every node in addition to their intrinsic value, and use that instead of a fixed reward when the node gets discovered the first time.

@jaromiru
Copy link
Contributor Author

I thought more about giving a fraction of the intrinsic reward for a discovery. It leaks some information, true, but that is not necessarily a bad thing. It will point the agent in the direction of high value targets. And compared to the 'discovery value' - one less parameter to worry about. So, for a game-like environment, as this is, it seems allright.

@blumu blumu added bug Something isn't working enhancement New feature or request labels Jun 24, 2021
@blumu
Copy link
Contributor

blumu commented Jun 24, 2021

@jaromiru Fair point, though contrary to what I proposed earlier the fraction should not be calculated as value(nodes_newly_discovered) / value(nodes_revealed_by_attack), otherwise this could lead to different rewards depending on the attacks execution order.

For example, suppose you have two attacks A1 and A2 where:

  • A1 reveals nodes 1, 2 with respective value 1, 2;
  • A2 reveals node 2, 3 with respective value 2, 3.

Then A1 followed by A2 yields a total reward of 1 + 3/5
while A2 followed by A1 yields a total reward of 1 + 1/3.

@blumu
Copy link
Contributor

blumu commented Jun 24, 2021

In PR #26 that I just submitted, the intrinsic value of the newly discovered nodes gets returned as the reward.
Would this be a reasonable solution?

@jaromiru
Copy link
Contributor Author

jaromiru commented Jun 25, 2021

I thought that you propose simply [c * sum(intrinsic value of newly discovered nodes)], with some c < 1. That's additive and reasonable. Is that what you implemented?

@blumu
Copy link
Contributor

blumu commented Jun 25, 2021

@jaromiru Yes, that's what I implemented, except that currently c = 1. Also I added a fixed reward for newly discovered credentials and node properties.

@jaromiru
Copy link
Contributor Author

Ok, I think we can close this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants