## Scenario Descriptions

### CAGE Challenge 2

Your firm has been contracted by Florin to trial your new autonomous defence agents. Florin have given your agent authority to defend the computer network at one of their manufacturing plants. The network, shown in Figure 1, contains a user network for staff, an enterprise network, and an operational network which contains the key manufacturing and logistics servers. The defence agent receives an alert feed from Florin’s existing monitoring systems. It can run detailed analyses on hosts, then remove malicious software if found. If an attacker is too well established on a host to be removed, the host can be restored from a clean backup. Finally, the defence agent can deceive attackers by creating decoy services.

The network owner has undertaken an evaluation of the factory systems and contracted your firm to defend these systems according to the following criteria:

Maintain the critical operational server to ensure information about the new weapon system is not revealed to Guilder and the production and delivery of the new weapon system remains on schedule.
Where possible, maintain enterprise servers to ensure day-to-day operations of the manufacturing plant are not disrupted or revealed.”

Subnet 1 consists of user hosts that are not critical. \
Subnet 2 consists of enterprise servers designed to support the user activities on Subnet 1. \
Subnet 3 contains the critical operational server and three user hosts.

<img src="https://github.com/cage-challenge/cage-challenge-2/raw/main/images/figure1.png">

The effect of each action on the state of a targeted host is summarized with the diagram.

<img src="https://github.com/cage-challenge/cage-challenge-2/raw/main/images/figure2.png" caption="State Diagram">

### Appendix - Action Sets

Blue Action Sets
    
| Action     | Purpose                                                                                                                                                                                                                                                 | Parameters                                                                    | Output                                 |
|:-----------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------|:---------------------------------------|
| Monitor | Collection of information about flagged malicious activity on the system\. Corresponds to action ID 1: Scan in the OpenC2 specification[^3]\.                                                                                                           | None *\(Note: This action occurs automatically if another action is chosen\)* | Network connections and associated processes that are identified as malicious\. |
| Analyse | Collection of further information on a specific host to enable blue to better identify if red is present on the system\. Corresponds to action ID 30: Investigate in the OpenC2 specification\.                                                         | Hostname                                                                      | Information on files associated with recent alerts including signature and entropy\.  |
| DecoyApache, DecoyFemitter, DecoyHarakaSMPT, DecoySmss, DecoySSHD, DecoySvchost, DecoyTomcat | Setup of a decoy service (as specified by the action name) on a specified host\. Green agents do not access these services, so any access is a clear example of red activity\.                                                                                                            | Hostname                                                                | An alert if the red agent accesses the new service\. |
| Remove | Attempting to remove red from a host by destroying malicious processes, files and services\. This action attempts to stop all processes identified as malicious by the monitor action\. Corresponds to action ID 10: Stop in the OpenC2 specification\. | Hostname                                                                      | Success/Failure |
| Restore | Restoring a system to a known good state\. This has significant consequences for system availability\. This action punishes Blue by \-1\. Corresponds to action ID 23: Restore in the OpenC2 specification\.                                            | Hostname                                                                      | Success/Failure  |

Red Action Sets

| Action     | Purpose                                                                                                                                                    | Parameters       | Output                                 |
|:-----------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|:---------------------------------------|
| Discover Remote Systems | ATT&CK[^4] Technique T1018 Remote System Discovery\. Discovers new hosts/IP addresses in the network through active scanning using tools such as ping\.    | Subnet           | IP addresses in the chosen subnet from hosts that respond to ping |
| Discover Network Services | ATT&CK Technique T1046 Network Service Scanning. Discovers responsive services on a selected host by initiating a connection with that host\.              | Subnet           | Ports and service information |
| Exploit Network Services | ATT&CK Technique T1210 Exploitation of Remote Services\. This action attempts to exploit a specified service on a remote system\.                          | IP Address, Port | Success/Failure <br /> Initial recon of host if successful. |
| Escalate | ATT&CK Tactic TA0004 Privilege Escalation\. This action escalates the agent’s privilege on the host\.                                                      | Hostname         | Success/Failure <p> Internal information now available due to increased access to the host |
| Impact | ATT&CK Technique T1489 Service Stop\. This action disrupts the performance of the network and fulfils red’s objective of denying the operational service\. | Hostname         | Success/Failure  |



### Agents

- Rules-based agents
- Deep Reinforcement Learning agents

### Rule-based agents

- Blue
    - RandomAgent: Takes a random action or a test action based on the epsilon value
    - BlueReactRemoveAgent: Adds suspicious hosts to the host list if the monitor finds something and then it will remove the program(session)
    - BlueReactRestoreAgent: Similar steps but restore the host from a cleaned backup.
- Red
    - RedMeanderAgent: Explores the network one subnet at a time, seeking to gain privileged access to all hosts in a subnet before moving on to the next one, eventually arriving at the Operational Server.
    - B_lineAgent: Attempts to move straight to the Operational Server using prior knowledge of the network layout.

In [1]:
from CybORG import CybORG
import inspect
import pprint

from CybORG.Agents import B_lineAgent, BlueReactRestoreAgent, BlueReactRemoveAgent, \
    RandomAgent, RedMeanderAgent
from CybORG.Agents.Wrappers import EnumActionWrapper
from CybORG.Agents.Wrappers.FixedFlatWrapper import FixedFlatWrapper
from CybORG.Agents.Wrappers.IntListToAction import IntListToActionWrapper
from CybORG.Agents.Wrappers.OpenAIGymWrapper import OpenAIGymWrapper
from CybORG.Simulator.Scenarios.FileReaderScenarioGenerator import FileReaderScenarioGenerator

  import distutils.spawn


path is: /home/ubuntu/test/CybORG/CybORG/env.py


In [2]:
def run_episode_cc2(cyborg, agent):
    cyborg.reset()
    a = ''
    for i in range(20):
        # print(cyborg.environment_controller.get_last_action('Red'))
        # stop = cyborg.render()
        action_space = cyborg.get_action_space('Blue')
        # print(action_space)
        obs = cyborg.get_observation('Blue') # get the newest observation
        # print(f"obs is {obs}")
        # if stop:
        #     break
        action = agent.get_action(obs, action_space)
        print(f"===Perform the {i+1} step at CybORG level===")
        results = cyborg.step('Blue', action)
        # a = input(f'Step {i}, use q to quit, use n to go to next demo')
        # if 'q' in a:
        #     quit()
        # if 'n' in a:
        #     break

In [3]:
# input('start? ')
# CC2
path = str(inspect.getfile(CybORG))
path = path[:-7] + f'/Simulator/Scenarios/scenario_files/Scenario2.yaml'
sg = FileReaderScenarioGenerator(path)
red_agent = RedMeanderAgent()
cyborg = CybORG(sg, 'sim', agents={'Red': red_agent})

env is: sim


  deprecation(


In [4]:
# input('demo react remove')
agent = BlueReactRemoveAgent()
run_episode_cc2(cyborg, agent)

Agent Interfaces are: {'Blue': <CybORG.Shared.AgentInterface.AgentInterface object at 0x7f618a259420>, 'Green': <CybORG.Shared.AgentInterface.AgentInterface object at 0x7f618a259c00>, 'Red': <CybORG.Shared.AgentInterface.AgentInterface object at 0x7f618a2598a0>}
===Perform the 1 step at CybORG level===
action is: {'Blue': Monitor}
In SimulationController script
In shared EnvironmentController script
Agent interface: {'Blue': <CybORG.Shared.AgentInterface.AgentInterface object at 0x7f618a259420>, 'Green': <CybORG.Shared.AgentInterface.AgentInterface object at 0x7f618a259c00>, 'Red': <CybORG.Shared.AgentInterface.AgentInterface object at 0x7f618a2598a0>}
action is: {'Blue': Monitor, 'Green': Sleep, 'Red': DiscoverRemoteSystems 10.0.232.0/28}
--> in actions
In simcontroller execute action
action is: Monitor its type is: <class 'CybORG.Simulator.Actions.AbstractActions.Monitor.Monitor'>
--> in actions
In simcontroller execute action
action is: DiscoverRemoteSystems 10.0.232.0/28 its type i

In [5]:
# input('demo react restore')
agent = BlueReactRestoreAgent()
run_episode_cc2(cyborg, agent)

Agent Interfaces are: {'Blue': <CybORG.Shared.AgentInterface.AgentInterface object at 0x7f6680dbfa30>, 'Green': <CybORG.Shared.AgentInterface.AgentInterface object at 0x7f6680dbfac0>, 'Red': <CybORG.Shared.AgentInterface.AgentInterface object at 0x7f6680dbdf00>}
obs is {'success': <TrinaryEnum.UNKNOWN: 2>, 'Defender': {'Interface': [{'Interface Name': 'eth0', 'IP Address': IPv4Address('10.0.250.167'), 'Subnet': IPv4Network('10.0.250.160/28')}], 'Sessions': [{'Username': 'ubuntu', 'ID': 0, 'Timeout': 0, 'PID': 2292, 'Type': <SessionType.VELOCIRAPTOR_SERVER: 8>, 'Agent': 'Blue'}, {'Username': 'ubuntu', 'ID': 9, 'Timeout': 0, 'PID': 2296, 'Type': <SessionType.VELOCIRAPTOR_CLIENT: 7>, 'Agent': 'Blue'}], 'Processes': [{'PID': 2292, 'Username': 'ubuntu'}, {'PID': 2296, 'Username': 'ubuntu'}], 'User Info': [{'Username': 'root', 'Groups': [{'GID': 0}]}, {'Username': 'ubuntu', 'Groups': [{'GID': 1000}]}], 'System info': {'Hostname': 'Defender', 'OSType': <OperatingSystemType.LINUX: 3>, 'OSDistr

## Training

In [6]:
MAX_STEPS_PER_GAME = 20
MAX_EPS = 20

In [7]:
def run_training_example_rb(scenario):
    agent_name = 'Red'
    path = str(inspect.getfile(CybORG))
    path = path[:-7] + f'/Simulator/Scenarios/scenario_files/{scenario}.yaml'
    sg = FileReaderScenarioGenerator(path)
    cyborg = CybORG(scenario_generator=sg)
    cyborg = OpenAIGymWrapper(agent_name=agent_name,
                              env=FixedFlatWrapper(cyborg))

    observation = cyborg.reset()
    action_space = cyborg.action_space
    print(f"Observation size {len(observation)}, Action Size {action_space}")
    action_count = 0
    agent = RandomAgent()
    print(agent)
    agent.set_initial_values(action_space, observation)
    for i in range(MAX_EPS):  # laying multiple games
        print(f"\rTraining Game: {i}\n", end='', flush=True)
        reward = 0
        for j in range(MAX_STEPS_PER_GAME):  # step in 1 game
            action = agent.get_action(observation, action_space)
            print(f"Action is {action} for {agent}")
            next_observation, r, done, info = cyborg.step(action)

            action_space = cyborg.action_space
            reward += r

            agent.train(observation)  # training the agent
            observation = next_observation
            if done or j == MAX_STEPS_PER_GAME - 1:
                print(f"Training reward: {reward}")
                break
        agent.end_episode()
        observation = cyborg.reset()
        action_space = cyborg.action_space
        agent.set_initial_values(action_space, observation)

In [8]:
run_training_example_rb('Scenario2')

env is: sim
Agent Interfaces are: {'Blue': <CybORG.Shared.AgentInterface.AgentInterface object at 0x7f6680c13970>, 'Green': <CybORG.Shared.AgentInterface.AgentInterface object at 0x7f6680c111e0>, 'Red': <CybORG.Shared.AgentInterface.AgentInterface object at 0x7f6680c13be0>}
Agent Interfaces are: {'Blue': <CybORG.Shared.AgentInterface.AgentInterface object at 0x7f6680c53c10>, 'Green': <CybORG.Shared.AgentInterface.AgentInterface object at 0x7f6680c53d00>, 'Red': <CybORG.Shared.AgentInterface.AgentInterface object at 0x7f6680c53070>}
Observation size 11293, Action Size Discrete(212)
RandomAgent
Training Game: 0
Action is 34 for RandomAgent
action is: {'Red': ExploitRemoteService 10.0.103.246}
In SimulationController script
In shared EnvironmentController script
Agent interface: {'Blue': <CybORG.Shared.AgentInterface.AgentInterface object at 0x7f6680c53c10>, 'Green': <CybORG.Shared.AgentInterface.AgentInterface object at 0x7f6680c53d00>, 'Red': <CybORG.Shared.AgentInterface.AgentInterface

### Q-Learning Agent

In [9]:
def run_training_example_q_learning(scenario):
    agent_name = 'Red'
    path = str(inspect.getfile(CybORG))
    path = path[:-7] + f'/Simulator/Scenarios/scenario_files/{scenario}.yaml'
    sg = FileReaderScenarioGenerator(path)
    cyborg = CybORG(scenario_generator=sg)
    cyborg = OpenAIGymWrapper(agent_name=agent_name,
                              env=FixedFlatWrapper(cyborg))

    observation = cyborg.reset()
    action_space = cyborg.action_space
    print(f"Observation size {len(observation)}, Action Size {action_space}")
    action_count = 0
    agent = RandomAgent()
    print(agent)
    agent.set_initial_values(action_space, observation)
    for i in range(MAX_EPS):  # laying multiple games
        print(f"\rTraining Game: {i}\n", end='', flush=True)
        reward = 0
        for j in range(MAX_STEPS_PER_GAME):  # step in 1 game
            action = agent.get_action(observation, action_space)
            next_observation, r, done, info = cyborg.step(action)

            action_space = cyborg.action_space
            reward += r

            agent.train(observation)  # training the agent
            observation = next_observation
            if done or j == MAX_STEPS_PER_GAME - 1:
                print(f"Training reward: {reward}")
                break
        agent.end_episode()
        observation = cyborg.reset()
        action_space = cyborg.action_space
        agent.set_initial_values(action_space, observation)

### Deep RL Algorithms Agent

- PPO

In [10]:
#!pip install -U "ray[all]"

In [11]:
import inspect
import numpy as np
from CybORG import CybORG
from CybORG.Agents import B_lineAgent, GreenAgent
from CybORG.Agents.Wrappers import ChallengeWrapper

In [12]:
# from ray import tune

# tune.run(
#     "PPO",
#     config={
#         "env": "YourEnvName",
#         "num_gpus": 0,
#         "other_config_options": "...",
#     }
# )

In [13]:
# from ray.rllib.algorithms.ppo import PPOConfig

# config = (  # 1. Configure the algorithm,
#     PPOConfig()
#     .environment("Taxi-v3")
#     .rollouts(num_rollout_workers=2)
#     .framework("torch")
#     .training(model={"fcnet_hiddens": [64, 64]})
#     .evaluation(evaluation_num_workers=1)
# )

# algo = config.build()  # 2. build the algorithm,

# for _ in range(5):
#     print(algo.train())  # 3. train it,

# algo.evaluate()  # 4. and evaluate it.

In [None]:
from ray.rllib.algorithms import ppo
from ray.rllib.env import ParallelPettingZooEnv
from ray.tune import register_env

In [None]:
from CybORG.Agents.Wrappers.PettingZooParallelWrapper import PettingZooParallelWrapper
from CybORG.Simulator.Scenarios import FileReaderScenarioGenerator, DroneSwarmScenarioGenerator

In [None]:
class RLLibWrapper(ChallengeWrapper):
    def init(self, agent_name, env, reward_threshold=None, max_steps=None):
        super().__init__(agent_name, env, reward_threshold, max_steps)

    def step(self, action=None):
        obs, reward, done, info = self.env.step(action=action)
        self.step_counter += 1
        if self.max_steps is not None and self.step_counter >= self.max_steps:
            done = True
        return np.float32(obs), reward, done, info

    def reset(self):
        self.step_counter = 0
        obs = self.env.reset()
        return np.float32(obs)

In [None]:
def env_creator_CC1(env_config: dict):
    path = str(inspect.getfile(CybORG))
    path = path[:-7] + f'/Simulator/Scenarios/scenario_files/Scenario1b.yaml'
    sg = FileReaderScenarioGenerator(path)
    agents = {"Red": B_lineAgent(), "Green": GreenAgent()}
    cyborg = CybORG(scenario_generator=sg, environment='sim', agents=agents)
    env = RLLibWrapper(env=cyborg, agent_name="Blue", max_steps=100)
    return env

In [None]:
def env_creator_CC2(env_config: dict):
    path = str(inspect.getfile(CybORG))
    path = path[:-7] + f'/Simulator/Scenarios/scenario_files/Scenario2.yaml'
    sg = FileReaderScenarioGenerator(path)
    agents = {"Red": B_lineAgent(), "Green": GreenAgent()}
    cyborg = CybORG(scenario_generator=sg, environment='sim', agents=agents)
    env = RLLibWrapper(env=cyborg, agent_name="Blue", max_steps=100)
    return env

In [None]:
def env_creator_CC3(env_config: dict):
    sg = DroneSwarmScenarioGenerator()
    cyborg = CybORG(scenario_generator=sg, environment='sim')
    env = ParallelPettingZooEnv(PettingZooParallelWrapper(env=cyborg))
    return env

In [None]:
def print_results(results_dict):
    train_iter = results_dict["training_iteration"]
    r_mean = results_dict["episode_reward_mean"]
    r_max = results_dict["episode_reward_max"]
    r_min = results_dict["episode_reward_min"]
    print(f"{train_iter:4d} \tr_mean: {r_mean:.1f} \tr_max: {r_max:.1f} \tr_min: {r_min: .1f}")

In [None]:
if __name__ == "__main__":
    register_env(name="CC1", env_creator=env_creator_CC1)
    register_env(name="CC2", env_creator=env_creator_CC2)
    register_env(name="CC3", env_creator=env_creator_CC3)
    config = ppo.DEFAULT_CONFIG.copy()
    for env in ['CC1', 'CC2', 'CC3']:
        agent = ppo.PPOTrainer(config=config, env=env)

        train_steps = 1e2
        total_steps = 0
        while total_steps < train_steps:
            results = agent.train()
            print_results(results)
            total_steps = results["timesteps_total"]