## Scenario Descriptions

### CAGE Challenge 2

Your firm has been contracted by Florin to trial your new autonomous defence agents. Florin have given your agent authority to defend the computer network at one of their manufacturing plants. The network, shown in Figure 1, contains a user network for staff, an enterprise network, and an operational network which contains the key manufacturing and logistics servers. The defence agent receives an alert feed from Florin’s existing monitoring systems. It can run detailed analyses on hosts, then remove malicious software if found. If an attacker is too well established on a host to be removed, the host can be restored from a clean backup. Finally, the defence agent can deceive attackers by creating decoy services.

The network owner has undertaken an evaluation of the factory systems and contracted your firm to defend these systems according to the following criteria:

Maintain the critical operational server to ensure information about the new weapon system is not revealed to Guilder and the production and delivery of the new weapon system remains on schedule.
Where possible, maintain enterprise servers to ensure day-to-day operations of the manufacturing plant are not disrupted or revealed.”

Subnet 1 consists of user hosts that are not critical. \
Subnet 2 consists of enterprise servers designed to support the user activities on Subnet 1. \
Subnet 3 contains the critical operational server and three user hosts.

<img src="https://github.com/cage-challenge/cage-challenge-2/raw/main/images/figure1.png">

The effect of each action on the state of a targeted host is summarized with the diagram.

<img src="https://github.com/cage-challenge/cage-challenge-2/raw/main/images/figure2.png" caption="State Diagram">

### Appendix - Action Sets

Blue Action Sets
    
| Action     | Purpose                                                                                                                                                                                                                                                 | Parameters                                                                    | Output                                 |
|:-----------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------|:---------------------------------------|
| Monitor | Collection of information about flagged malicious activity on the system\. Corresponds to action ID 1: Scan in the OpenC2 specification[^3]\.                                                                                                           | None *\(Note: This action occurs automatically if another action is chosen\)* | Network connections and associated processes that are identified as malicious\. |
| Analyse | Collection of further information on a specific host to enable blue to better identify if red is present on the system\. Corresponds to action ID 30: Investigate in the OpenC2 specification\.                                                         | Hostname                                                                      | Information on files associated with recent alerts including signature and entropy\.  |
| DecoyApache, DecoyFemitter, DecoyHarakaSMPT, DecoySmss, DecoySSHD, DecoySvchost, DecoyTomcat | Setup of a decoy service (as specified by the action name) on a specified host\. Green agents do not access these services, so any access is a clear example of red activity\.                                                                                                            | Hostname                                                                | An alert if the red agent accesses the new service\. |
| Remove | Attempting to remove red from a host by destroying malicious processes, files and services\. This action attempts to stop all processes identified as malicious by the monitor action\. Corresponds to action ID 10: Stop in the OpenC2 specification\. | Hostname                                                                      | Success/Failure |
| Restore | Restoring a system to a known good state\. This has significant consequences for system availability\. This action punishes Blue by \-1\. Corresponds to action ID 23: Restore in the OpenC2 specification\.                                            | Hostname                                                                      | Success/Failure  |


Red Action Sets
    
| Action     | Purpose                                                                                                                                                    | Parameters       | Output                                 |
|:-----------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|:---------------------------------------|
| Discover Remote Systems | ATT&CK[^4] Technique T1018 Remote System Discovery\. Discovers new hosts/IP addresses in the network through active scanning using tools such as ping\.    | Subnet           | IP addresses in the chosen subnet from hosts that respond to ping |
| Discover Network Services | ATT&CK Technique T1046 Network Service Scanning. Discovers responsive services on a selected host by initiating a connection with that host\.              | Subnet           | Ports and service information |
| Exploit Network Services | ATT&CK Technique T1210 Exploitation of Remote Services\. This action attempts to exploit a specified service on a remote system\.                          | IP Address, Port | Success/Failure <br /> Initial recon of host if successful. |
| Escalate | ATT&CK Tactic TA0004 Privilege Escalation\. This action escalates the agent’s privilege on the host\.                                                      | Hostname         | Success/Failure <p> Internal information now available due to increased access to the host |
| Impact | ATT&CK Technique T1489 Service Stop\. This action disrupts the performance of the network and fulfils red’s objective of denying the operational service\. | Hostname         | Success/Failure  |

[^3]: Open Command and Control \(OpenC2\), [https://openc2\.org/](https://openc2\.org/)

[^4]: MITRE ATT&CK, [https://attack\.mitre\.org/](https://attack\.mitre\.org/)

### Agents

- Rules-based agents
- Deep Reinforcement Learning agents

### Rule-based agents

- Blue
    - RandomAgent: Takes a random action or a test action based on the epsilon value
    - BlueReactRemoveAgent: Adds suspicious hosts to the host list if the monitor finds something and then it will remove the program(session)
    - BlueReactRestoreAgent: Similar steps but restore the host from a cleaned backup.
- Red
    - RedMeanderAgent: Explores the network one subnet at a time, seeking to gain privileged access to all hosts in a subnet before moving on to the next one, eventually arriving at the Operational Server.
    - B_lineAgent: Attempts to move straight to the Operational Server using prior knowledge of the network layout.

In [1]:
# %pip list

In [1]:
import subprocess
import inspect
import time
import os
from statistics import mean, stdev
import random
import collections
from pprint import pprint

# import matplotlib.pyplot as plt
import networkx as nx
from networkx import connected_components
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd

from CybORG import CybORG, CYBORG_VERSION

from CybORG.Agents import B_lineAgent, BlueReactRestoreAgent, BlueReactRemoveAgent, \
     RedMeanderAgent, SleepAgent
from CybORG.Agents.MainAgent import MainAgent

from CybORG.Agents.Wrappers.ChallengeWrapper import ChallengeWrapper
# from CybORG.Agents.Wrappers.ChallengeWrapper2 import ChallengeWrapper2
from CybORG.Agents.Wrappers import EnumActionWrapper
from CybORG.Agents.Wrappers.FixedFlatWrapper import FixedFlatWrapper
from CybORG.Agents.Wrappers.IntListToAction import IntListToActionWrapper
from CybORG.Agents.Wrappers.OpenAIGymWrapper import OpenAIGymWrapper
# from CybORG.Simulator.Scenarios.FileReaderScenarioGenerator import FileReaderScenarioGenerator

from CybORG.Tutorial.Visualizers import NetworkVisualizer
from CybORG.Tutorial.GameStateManager import GameStateManager
from CybORG.Tutorial.Mininet import MininetAdapter

In [2]:
# diagram.pyhps://diagrams.mingrammer.com/docs/getting-started/examples
# from diagrams import Diagram
# from diagrams.aws.compute import EC2
# from diagrams.aws.database import RDS
# from diagrams.aws.network import ELB

# with Diagram("Web Service", show=False):
#     ELB("lb") >> EC2("web") >> RDS("userdb")

## A Runner

In [3]:
# from CybORG.Tutorial.CybORG_As_A_Service import InteractiveCybORG

In [4]:
## Test InteractiveCybORG

## Visualization

- input 
    - Observation from Red and Blue
    - Action for Blue / Red?
    - link_diagram

Mouse over event oberservations 
Different Shape
Different State

In [5]:
MAX_EPS = 1
agent_name = 'Blue'
random.seed(0)
cyborg_version = CYBORG_VERSION
scenario = 'Scenario2'

def wrap(env):
    # return ChallengeWrapper2(env=env, agent_name='Blue')
    return ChallengeWrapper(env=env, agent_name='Blue')

In [6]:
def main():
    cyborg_version = CYBORG_VERSION
    scenario = 'Scenario2'
    # commit_hash = get_git_revision_hash()
    commit_hash = "Not using git"
    # ask for a name
    name = "John Hannay"
    # ask for a team
    team = "CardiffUni"
    # ask for a name for the agent
    name_of_agent = "PPO + Greedy decoys"

    lines = inspect.getsource(wrap)
    wrap_line = lines.split('\n')[1].split('return ')[1]

    # Change this line to load your agent
    agent = MainAgent()
    
    print(f'Using agent {agent.__class__.__name__}, if this is incorrect please update the code to load in your agent')

    path = str(inspect.getfile(CybORG))
    # path = path[:-7] + f'/Simulator/Scenarios/scenario_files/Scenario2.yaml'
    path = path[:-10] + '/Shared/Scenarios/Scenario2.yaml'

    print(f'using CybORG v{cyborg_version}, {scenario}\n')
    
    # game manager initialization
    game_state_manager = GameStateManager()
    mininet_adapter = MininetAdapter()

    for num_steps in [3]:
        for red_agent in [B_lineAgent]:
            # red_agent = red_agent()
            cyborg = CybORG(path, 'sim', agents={'Red': red_agent})
            wrapped_cyborg = wrap(cyborg)

            observation = wrapped_cyborg.reset()
            # observation = cyborg.reset().observation

            # set up game_state_manager
            # game_state_manager.set_environment(cyborg=cyborg,
            #                                    red_agent=red_agent,
            #                                    blue_agent=agent,
            #                                    num_steps=num_steps)

            # Reset mininet adapter 
            mininet_adapter.set_environment(cyborg=cyborg)
            mininet_adapter.reset()
            
            action_space = wrapped_cyborg.get_action_space(agent_name)
            # action_space = cyborg.get_action_space(agent_name)
            total_reward = []
            actions = []
            for i in range(MAX_EPS):
                r = []
                a = []
                
                # cyborg.env.env.tracker.render()
                for j in range(num_steps):
                    action = agent.get_action(observation, action_space)
                    observation, rew, done, info = wrapped_cyborg.step(action)
                    # result = cyborg.step(agent_name, action)
                    r.append(rew)
                    # r.append(result.reward)
                    a.append((str(cyborg.get_last_action('Blue')), str(cyborg.get_last_action('Red'))))
                    pprint(a)
                    # create state for this step
                    # state_snapshot = game_state_manager.create_state_snapshot()
                    # game manager store state
                    # game_state_manager.store_state(state_snapshot, i, j)
                    mininet_adapter.perform_emulation()

                # game manager reset
                agent.end_episode()
                total_reward.append(sum(r))
                actions.append(a)
                # observation = cyborg.reset().observation
                observation = wrapped_cyborg.reset()
                # game_state_manager.reset()
                mininet_adapter.reset()
        
        mininet_adapter.clean()
    return game_state_manager.get_game_state()

In [7]:
# %%capture
game_state_pretained = main()

Using agent MainAgent, if this is incorrect please update the code to load in your agent
using CybORG v2.1, Scenario2

Cleaned up the topology successfully
{'Defender': IPv4Address('10.0.120.156'),
 'Enterprise0': IPv4Address('10.0.120.152'),
 'Enterprise1': IPv4Address('10.0.120.158'),
 'Enterprise2': IPv4Address('10.0.120.155'),
 'Op_Host0': IPv4Address('10.0.110.38'),
 'Op_Host1': IPv4Address('10.0.110.42'),
 'Op_Host2': IPv4Address('10.0.110.46'),
 'Op_Server0': IPv4Address('10.0.110.34'),
 'User0': IPv4Address('10.0.214.186'),
 'User1': IPv4Address('10.0.214.187'),
 'User2': IPv4Address('10.0.214.182'),
 'User3': IPv4Address('10.0.214.180'),
 'User4': IPv4Address('10.0.214.189')}
An error occurred while creating the YAML file:
'Enterprise_router'


Traceback (most recent call last):
  File "/home/ubuntu/justinyeh1995/CASTLEGym/CybORG-v2/CybORG/CybORG/Tutorial/Mininet/Adapter.py", line 93, in _create_yaml
    self.topology_data['topo']['lans'] = get_lans_info(self.cyborg, self.cyborg_to_mininet_name_map)
  File "/home/ubuntu/justinyeh1995/CASTLEGym/CybORG/CybORG/Tutorial/Mininet/utils/util.py", line 34, in get_lans_info
    router_ip = str(cyborg.get_ip_map()[f'{lan_name}_router'])
KeyError: 'Enterprise_router'


Mininet Topology Created Successfully:


KeyError: '10.0.120.152'

In [None]:
nv = NetworkVisualizer(game_state_pretained)
# nv.set_game_state(game_state_pretained)
nv.plot(save=False)

In [None]:
def main_simple_agent():
    cyborg_version = CYBORG_VERSION
    scenario = 'Scenario2'
    # commit_hash = get_git_revision_hash()
    commit_hash = "Not using git"
    # ask for a name
    name = "John Hannay"
    # ask for a team
    team = "CardiffUni"
    # ask for a name for the agent
    name_of_agent = "PPO + Greedy decoys"

    lines = inspect.getsource(wrap)
    wrap_line = lines.split('\n')[1].split('return ')[1]

    # Change this line to load your agent
    agent = BlueReactRemoveAgent()
    
    print(f'Using agent {agent.__class__.__name__}, if this is incorrect please update the code to load in your agent')

    path = str(inspect.getfile(CybORG))
    # path = path[:-7] + f'/Simulator/Scenarios/scenario_files/Scenario2.yaml'
    path = path[:-10] + '/Shared/Scenarios/Scenario2.yaml'

    # sg = FileReaderScenarioGenerator(path)

    print(f'using CybORG v{cyborg_version}, {scenario}\n')
    
    # game manager initialization
    game_state_manager = GameStateManager()
    # mininet adapter initialization
    mininet_adapter = MininetAdapter()

    
    for num_steps in [5]:
        for red_agent in [B_lineAgent]:
            red_agent = red_agent()
            cyborg = CybORG(path, 'sim', agents={'Red': red_agent})

            observation = cyborg.reset()
            # print('observation is:',observation)
            
            # Rest set up game_state_manager
            game_state_manager.set_environment(cyborg=cyborg,
                                               red_agent=red_agent,
                                               blue_agent=agent,
                                               num_steps=num_steps)
            game_state_manager.reset()


            # Reset mininet adapter 
            mininet_adapter.set_environment(cyborg=cyborg)
            mininet_adapter.reset()
            
            
            action_space = cyborg.get_action_space(agent_name)

            total_reward = []
            actions = []
            for i in range(MAX_EPS):
                r = []
                a = []
                
                # cyborg.env.env.tracker.render()
                for j in range(num_steps):
                    blue_action_space = cyborg.get_action_space('Blue')
                    blue_obs = cyborg.get_observation('Blue') # get the newest observation
                    blue_action = agent.get_action(blue_obs, blue_action_space)
                    # pprint(blue_action)
                        
                    result = cyborg.step('Blue', blue_action, skip_valid_action_check=False)
                    
                    # create state for this step
                    state_snapshot = game_state_manager.create_state_snapshot()
                    # game manager store state
                    game_state_manager.store_state(state_snapshot, i, j)

                    # The adapter should pass the action as a param
                    mininet_adapter.perform_emulation()
                    # pprint(mininet_adapter)

                    
                # game manager reset
                agent.end_episode()
                total_reward.append(sum(r))
                actions.append(a)
                # observation = cyborg.reset().observation
                observation = cyborg.reset()
                # game state manager reset
                game_state_manager.reset()
                # mininet adapter reset
                mininet_adapter.reset()
        
        mininet_adapter.clean()

    return game_state_manager.get_game_state()

In [24]:
%%capture
# game_state_simple = main_simple_agent()


In [25]:
# game_state_simple

In [26]:
#nv = NetworkVisualizer(game_state_simple)
#nv.plot(save=False)

In [None]:
def get_node_color(node, discovered_subnets=None, discovered_systems=None, exploited_hosts=None, escalated_hosts=None):
    color = "green"
    
    if 'router' in node:
        if node in discovered_subnets:
            color = 'rosybrown'
    
    if node in discovered_systems:
        color = "lightgreen"
    
    if node in escalated_hosts:
        color = "red"
        
    elif node in exploited_hosts:
        color = "orange"
    
    return color

def get_node_border(node, target_host=None, reset_host=None):
    if node in target_host:
        border = dict(width=2, color='#99004C')
    elif node in reset_host:
        border = dict(width=2, color='blue')
    else:
        border = dict(width=0, color='white')
    return border

In [5]:
# animation

# router blink
# surrond any change icon with dashlines(happnening this steps)

### Example 1: Blue Remove Agent

1. Plot action by action, instead of step by step
2. Unify default color
3. discover remote systems: light pink -> pink (gradually turns to red)
4. black border for the latest action
5. When hovering over the node, show running processes, the operating systems, images used
6. configuration should be defined at the top / in __init__

7. Try scaling up the scenarios: 10,000 nodes

8. html panel / information panel / controller of the game which does animation (live/historical)




In [13]:
# %%capture
# agent = BlueReactRemoveAgent()
# agent_game_states = run_episode_cc2(agent)

In [1]:
# interactive_plot(agent_game_states)

### Example 2: Blue Restore Agent

In [38]:
# %%capture
# agent = BlueReactRestoreAgent()
# agent_game_states = run_episode_cc2(agent)

In [39]:
# interactive_plot(agent_game_states)

### Example 3: Blue Pre-trained Agent

In [44]:
# %%capture
# agent_game_states = run_pretrained_agent()

In [12]:
# interactive_plot(agent_game_states)

In [20]:
# from dash import Dash, dcc, html, Input, Output
# import plotly.graph_objects as go

# app = Dash(__name__)


# app.layout = html.Div([
#     html.H4('Live data control'),
#     dcc.Graph(id="graph"),
#     html.P("Change the position of the right-most data point:"),
#     html.Button("Move Up", n_clicks=0, 
#                 id='btn-up'),
#     html.Button("Move Down", n_clicks=0,
#                 id='btn-down'),
# ])

# @app.callback(
#     Output("graph", "figure"), 
#     Input("btn-up", "n_clicks"),
#     Input("btn-down", "n_clicks"))
# def make_shape_taller(n_up, n_down):
#     n = n_up-n_down
#     fig = go.Figure(go.Scatter(
#         x=[1, 0, 2, 1], y=[2, 0, n, 2], # replace with your own data source
#         fill="toself"
#     ))
#     return fig


# app.run_server(debug=True)

In [46]:
# def render_game_with_matplotlib(game_states):
#     node_shapes = {
#         'host': 'o',    # Circle
#         'router': '^',  # Triangle
#         'server': 's'   # Square
#     }
#     for i, state in game_states.items():
#         link_diagram = state['link_diagram']
#         compromised_hosts = state['compromised_hosts']
#         node_colors = state['node_colors']
#         agent_actions = state['agent_actions']
#         ip_map = state['ip_map']
#         host_map = state['host_map']
        
#         pos = nx.spring_layout(link_diagram, seed=3113794652)  # positions for all nodes
#         for node in link_diagram.nodes:
#             type = ""
#             if 'Server' in node:
#                 type = 'server'
#             elif 'router' in node:
#                 type = 'router'
#             else:
#                 type = 'host'
                
#             if node in compromised_hosts:
#                 nx.set_node_attributes(link_diagram, {node:{'compromised':True, 'name':node, 'type':node_shapes[type]}})
#             else:
#                 nx.set_node_attributes(link_diagram, {node:{'compromised':False, 'name':node, 'type':node_shapes[type]}})
            
            
#         shapes = nx.get_node_attributes(link_diagram, 'type')
        
        
#         nx.draw_networkx(link_diagram, pos, node_color=node_colors, with_labels=True, font_weight='bold', font_size=6)

#         # Increase this value to add more space between rows
#         vertical_padding = 0.1

#         # Display Agent Action Information
#         for idx, (agt, action_info) in enumerate(agent_actions.items()):
#             plt.text(1.05, 1 - vertical_padding * idx, f"{agt}: {action_info['action']} - {action_info['success']}",
#                      transform=plt.gca().transAxes, fontsize=12, verticalalignment='top', bbox=dict(boxstyle="round", facecolor="wheat", alpha=0.5))

#         ip_map_start_pos = 1 - len(agent_actions) * vertical_padding - 0.1

#         # Display IP Address Mapping to the right of the plot
#         for idx, (ip, host) in enumerate(ip_map.items()):
#             plt.text(1.05, ip_map_start_pos - idx * vertical_padding, f"{ip}: {host}",
#                      transform=plt.gca().transAxes, fontsize=12, verticalalignment='top', bbox=dict(boxstyle="round", facecolor="lightblue", alpha=0.5))
        
#         plt.title(f'Step {i+1}: Network State')
#         plt.show()

In [24]:
# render_game_with_matplotlib(game_states)

## Play trained Agent

## Training

### Q-Learning Agent

In [30]:
def run_training_example_q_learning(scenario):
    agent_name = 'Red'
    path = str(inspect.getfile(CybORG))
    path = path[:-7] + f'/Simulator/Scenarios/scenario_files/{scenario}.yaml'
    sg = FileReaderScenarioGenerator(path)
    cyborg = CybORG(scenario_generator=sg)
    cyborg = OpenAIGymWrapper(agent_name=agent_name,
                              env=FixedFlatWrapper(cyborg))

    observation = cyborg.reset()
    action_space = cyborg.action_space
    print(f"Observation size {len(observation)}, Action Size {action_space}")
    action_count = 0
    agent = RandomAgent()
    print(agent)
    agent.set_initial_values(action_space, observation)
    for i in range(MAX_EPS):  # laying multiple games
        print(f"\rTraining Game: {i}\n", end='', flush=True)
        reward = 0
        for j in range(MAX_STEPS_PER_GAME):  # step in 1 game
            action = agent.get_action(observation, action_space)
            next_observation, r, done, info = cyborg.step(action)

            action_space = cyborg.action_space
            reward += r

            agent.train(observation)  # training the agent
            observation = next_observation
            if done or j == MAX_STEPS_PER_GAME - 1:
                print(f"Training reward: {reward}")
                break
        agent.end_episode()
        observation = cyborg.reset()
        action_space = cyborg.action_space
        agent.set_initial_values(action_space, observation)

### Deep RL Algorithms Agent

- PPO

In [31]:
#!pip install -U "ray[all]"

In [32]:
# import inspect
# import numpy as np
# from CybORG import CybORG
# from CybORG.Agents import B_lineAgent, GreenAgent
# from CybORG.Agents.Wrappers import ChallengeWrapper

In [33]:
# from ray import tune

# tune.run(
#     "PPO",
#     config={
#         "env": "YourEnvName",
#         "num_gpus": 0,
#         "other_config_options": "...",
#     }
# )

In [34]:
# from ray.rllib.algorithms.ppo import PPOConfig

# config = (  # 1. Configure the algorithm,
#     PPOConfig()
#     .environment("Taxi-v3")
#     .rollouts(num_rollout_workers=2)
#     .framework("torch")
#     .training(model={"fcnet_hiddens": [64, 64]})
#     .evaluation(evaluation_num_workers=1)
# )

# algo = config.build()  # 2. build the algorithm,

# for _ in range(5):
#     print(algo.train())  # 3. train it,

# algo.evaluate()  # 4. and evaluate it.

In [35]:
# from ray.rllib.algorithms import ppo
# from ray.rllib.env import ParallelPettingZooEnv
# from ray.tune import register_env

In [36]:
# from CybORG.Agents.Wrappers.PettingZooParallelWrapper import PettingZooParallelWrapper
# from CybORG.Simulator.Scenarios import FileReaderScenarioGenerator, DroneSwarmScenarioGenerator

In [48]:
class RLLibWrapper(ChallengeWrapper):
    def init(self, agent_name, env, reward_threshold=None, max_steps=None):
        super().__init__(agent_name, env, reward_threshold, max_steps)

    def step(self, action=None):
        obs, reward, done, info = self.env.step(action=action)
        self.step_counter += 1
        if self.max_steps is not None and self.step_counter >= self.max_steps:
            done = True
        return np.float32(obs), reward, done, info

    def reset(self):
        self.step_counter = 0
        obs = self.env.reset()
        return np.float32(obs)

In [None]:
def env_creator_CC1(env_config: dict):
    path = str(inspect.getfile(CybORG))
    path = path[:-7] + f'/Simulator/Scenarios/scenario_files/Scenario1b.yaml'
    sg = FileReaderScenarioGenerator(path)
    agents = {"Red": B_lineAgent(), "Green": GreenAgent()}
    cyborg = CybORG(scenario_generator=sg, environment='sim', agents=agents)
    env = RLLibWrapper(env=cyborg, agent_name="Blue", max_steps=100)
    return env

In [None]:
def env_creator_CC2(env_config: dict):
    path = str(inspect.getfile(CybORG))
    path = path[:-7] + f'/Simulator/Scenarios/scenario_files/Scenario2.yaml'
    sg = FileReaderScenarioGenerator(path)
    agents = {"Red": B_lineAgent(), "Green": GreenAgent()}
    cyborg = CybORG(scenario_generator=sg, environment='sim', agents=agents)
    env = RLLibWrapper(env=cyborg, agent_name="Blue", max_steps=100)
    return env

In [None]:
def env_creator_CC3(env_config: dict):
    sg = DroneSwarmScenarioGenerator()
    cyborg = CybORG(scenario_generator=sg, environment='sim')
    env = ParallelPettingZooEnv(PettingZooParallelWrapper(env=cyborg))
    return env

In [None]:
def print_results(results_dict):
    train_iter = results_dict["training_iteration"]
    r_mean = results_dict["episode_reward_mean"]
    r_max = results_dict["episode_reward_max"]
    r_min = results_dict["episode_reward_min"]
    print(f"{train_iter:4d} \tr_mean: {r_mean:.1f} \tr_max: {r_max:.1f} \tr_min: {r_min: .1f}")

In [None]:
# if __name__ == "__main__":
#     register_env(name="CC1", env_creator=env_creator_CC1)
#     register_env(name="CC2", env_creator=env_creator_CC2)
#     register_env(name="CC3", env_creator=env_creator_CC3)
#     config = ppo.DEFAULT_CONFIG.copy()
#     for env in ['CC1', 'CC2', 'CC3']:
#         agent = ppo.PPOTrainer(config=config, env=env)

#         train_steps = 1e2
#         total_steps = 0
#         while total_steps < train_steps:
#             results = agent.train()
#             print_results(results)
#             total_steps = results["timesteps_total"]