The repeated discovery of the same node via different methods leads to reward #18

jaromiru · 2021-06-22T09:48:35Z

I noticed by generating a random CyberBattleRandom-v0 environment and using the agent with random actions, that if the same node is discovered with different methods (e.g., traceroute / shared files) it leads to a repeated reward.

It seems as incorrect behaviour, because this way the agent is incentivized to repeatedly discover the same nodes with different actions.

The text was updated successfully, but these errors were encountered:

blumu · 2021-06-22T21:55:49Z

@jaromiru That's indeed the current behaviour. We currently only prevent repeated reward when the same attack action is used to discover the node:

CyberBattleSim/cyberbattle/simulation/actions.py

Line 317 in 56fdb67

    
           return True, ActionResult(reward=0.0 if already_executed else SUCCEEDED_ATTACK_REWARD - vulnerability.cost,

We could address this by returning a reward of 0 if all the nodes revealed by the attack are all already discovered by the agent, though I am not sure what should be the returned reward when only some of the revealed nodes were known. One option is to return a fraction of the reward given by the number of newly discovered nodes divided by the number of nodes revealed by the attack. Alternatively we could use the intrinsic value assigned to each node to weigh the importance of each newly discovered node when calculating the reward. What do you think?

jaromiru · 2021-06-23T08:51:16Z

Ok, let's take a step back. In my opinion, the reward should be based on the results, not on whether an attack was successful or not. The network itself can change (you allow it explicitly e.g., in ExternalRandomEvents DefenderAgent), so the same attack can actually result in different results over time.

In the case of discovered nodes, the reward should also be additive, i.e. the total reward for discovering several nodes should be in total the same, regardless of the exact steps taken to discover them. Neither of the proposed solutions seem right. The first one does not cope with the dynamically changing network, or chances that nodes are actually discovered. The second leaks information about nodes' value through the reward to the agent. A straightforward solution seems to be simply a fixed reward per every newly discovered node.

Since this is a quite substantial design decision, it would be better if more competent people gave this a thought, though.

blumu · 2021-06-24T00:33:51Z

@jaromiru Good points. Alternatively we could also have the environment assign a 'discovery value' to every node in addition to their intrinsic value, and use that instead of a fixed reward when the node gets discovered the first time.

jaromiru · 2021-06-24T10:11:36Z

I thought more about giving a fraction of the intrinsic reward for a discovery. It leaks some information, true, but that is not necessarily a bad thing. It will point the agent in the direction of high value targets. And compared to the 'discovery value' - one less parameter to worry about. So, for a game-like environment, as this is, it seems allright.

blumu · 2021-06-24T21:40:14Z

@jaromiru Fair point, though contrary to what I proposed earlier the fraction should not be calculated as value(nodes_newly_discovered) / value(nodes_revealed_by_attack), otherwise this could lead to different rewards depending on the attacks execution order.

For example, suppose you have two attacks A1 and A2 where:

A1 reveals nodes 1, 2 with respective value 1, 2;
A2 reveals node 2, 3 with respective value 2, 3.

Then A1 followed by A2 yields a total reward of 1 + 3/5
while A2 followed by A1 yields a total reward of 1 + 1/3.

blumu · 2021-06-24T23:11:31Z

In PR #26 that I just submitted, the intrinsic value of the newly discovered nodes gets returned as the reward.
Would this be a reasonable solution?

jaromiru · 2021-06-25T08:18:20Z

I thought that you propose simply [c * sum(intrinsic value of newly discovered nodes)], with some c < 1. That's additive and reasonable. Is that what you implemented?

blumu · 2021-06-25T17:00:29Z

@jaromiru Yes, that's what I implemented, except that currently c = 1. Also I added a fixed reward for newly discovered credentials and node properties.

jaromiru · 2021-06-28T14:12:12Z

Ok, I think we can close this for now.

jaromiru changed the title ~~The repeated discovery of the same node via different methods lead to reward~~ The repeated discovery of the same node via different methods leads to reward Jun 22, 2021

blumu added bug Something isn't working enhancement New feature or request labels Jun 24, 2021

jaromiru closed this as completed Jun 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The repeated discovery of the same node via different methods leads to reward #18

The repeated discovery of the same node via different methods leads to reward #18

jaromiru commented Jun 22, 2021 •

edited

blumu commented Jun 22, 2021

jaromiru commented Jun 23, 2021

blumu commented Jun 24, 2021

jaromiru commented Jun 24, 2021

blumu commented Jun 24, 2021 •

edited

blumu commented Jun 24, 2021

jaromiru commented Jun 25, 2021 •

edited

blumu commented Jun 25, 2021

jaromiru commented Jun 28, 2021

The repeated discovery of the same node via different methods leads to reward #18

The repeated discovery of the same node via different methods leads to reward #18

Comments

jaromiru commented Jun 22, 2021 • edited

blumu commented Jun 22, 2021

jaromiru commented Jun 23, 2021

blumu commented Jun 24, 2021

jaromiru commented Jun 24, 2021

blumu commented Jun 24, 2021 • edited

blumu commented Jun 24, 2021

jaromiru commented Jun 25, 2021 • edited

blumu commented Jun 25, 2021

jaromiru commented Jun 28, 2021

jaromiru commented Jun 22, 2021 •

edited

blumu commented Jun 24, 2021 •

edited

jaromiru commented Jun 25, 2021 •

edited