[Bug]: False positive penalty is inverted — rewards defenders instead of penalizing

## Bug Description

In `netsecgame/game/coordinator.py:563`, the defender's false positive penalty has an inverted sign, causing false positives to **increase** the defender's reward instead of decreasing it.

The reward configuration defines `false_positive` as a negative value (e.g., `-5` in the example config). The code then **subtracts** the product of that negative value:

```python
# Line 563 — BUGGY
self._agent_rewards[agent] -= self._agent_false_positives[agent] * self._rewards["false_positive"]
```

With `false_positive = -5` and 2 false positives:
```
rewards -= 2 * (-5)
rewards -= (-10)
rewards += 10    # False positives INCREASE the reward!
```

The comment on line 561 says "dicrease the reward for false positives" — confirming the intent is to penalize, but the math does the opposite.

This silently corrupts the reward signal for every defender agent that produces false positive detections, which directly undermines RL training quality. Defenders learn that false positives are beneficial rather than costly.

## Steps to Reproduce

1. Start a game with the example task configuration (`false_positive: -5` in rewards)
2. Register an Attacker and a Defender agent
3. Have the Defender perform `BlockIP` actions that result in false positives (blocking IPs that are not actually attackers)
4. Complete the episode (e.g., Attacker times out)
5. Observe the Defender's final reward — each false positive adds +5 instead of subtracting 5

## Expected Behavior

False positives should **decrease** the defender's reward. The fix is to use `+=` instead of `-=` (since the config value is already negative), consistent with how all other rewards are applied on lines 493, 547, 550, 556, and 559:

```python
# Option A: use += with the negative config value (consistent with rest of codebase)
self._agent_rewards[agent] += self._agent_false_positives[agent] * self._rewards["false_positive"]
```

Or alternatively, keep `-=` but take the absolute value:

```python
# Option B: subtract the absolute penalty
self._agent_rewards[agent] -= self._agent_false_positives[agent] * abs(self._rewards["false_positive"])
```

Option A is preferred as it matches how `step`, `success`, and `fail` rewards are all applied using `+=`.

## Version

1.1

## Installation / Deployment Method

Running locally from source

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: False positive penalty is inverted — rewards defenders instead of penalizing #478

Bug Description

Steps to Reproduce

Expected Behavior

Version

Installation / Deployment Method

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: False positive penalty is inverted — rewards defenders instead of penalizing #478

Description

Bug Description

Steps to Reproduce

Expected Behavior

Version

Installation / Deployment Method

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions