NetSecGame

The NetSecGame (Network Security Game) is a framework for training and evaluation of AI agents in the network security tasks (both offensive and defensive). It builds a simulated local network using the CYST network simulator, adds many conditions on the environment and can train reinforcement learning (RL) algorithms on how to better attack and defend the network. Examples of implemented agents can be seen in the submodule NetSecGameAgents.

Install and Dependencies

To run this code you need an environment and access to cyst code. However, the venv needs to be created for your own user

If you don't have your environment

python -m venv ai-dojo-venv-<yourusername>

The environment can be activated with

source ai-dojo-venv<yourusername>/bin/activate

Install the requirements with

python3 -m pip install -r requirements.txt

If you use conda use

conda create --name aidojo python==3.10
conda activate aidojo
python3 -m pip install -r requirements.txt

Architecture

The architecture of the environment can be seen here.

Components of the NetSecGame Environment

The NetSecGame environment has several components in the following files:

File env/network_security_game.py implements the game environment
File env/game_components.py implements a library with objects used in the environment. See detailed explanation of the game components.
File utils/utils.py is a collection of utils function which the agents can use
Files in the env/scenarios folder, such as env/scenarios/scenario_configuration.py. Implements the network game's configuration of hosts, data, services, and connections. It is taken from CYST. The scenarios define the topology of a network (number of hosts, connections, networks, services, data, users, firewall rules, etc.) while the task-configuration is to be used for definition of the exact task for the agent in one of the scenarios (with fix topology).
Agents compatible with the NetSecGame are located in a separate repository NetSecGameAgents

Assumptions of the NetSecGame

NetSecGame works with the closed-world assumption. Only the defined entities exist in the simulation.
Actions have no Delete effect. No entity is removed from the environment, agents do not forget discovered assets.
If the attacker does a successful action in the same step that the defender successfully detects the action, the priority goes to the defender. The reward is a penalty, and the game ends.

The action FindServices finds the new services in a host. If in a subsequent call to FindServices there are less services, they completely replace the list of previous services found. That is, each list of services is the final one and no memory is retained of previous open services.

Assumption and Conditions for Actions

When playing the ExploitService action, it is expected that the agent has discovered this service before (by playing FindServices in the target_host before this action)
The Find Data action finds all the available data in the host if successful.
The Find Data action requires ownership of the target host.
Playing ExfiltrateData requires controlling BOTH source and target hosts
Playing Find Services can be used to discover hosts (if those have any active services)
Parameters of ScanNetwork and FindServices can be chosen arbitrarily (they don't have to be listed in known_newtworks/known_hosts)

Actions for the defender

In this version of the environment, the defender does not have actions, and it is not an agent. It is an omnipresent entity in the network that can detect actions from the attacker. This follows the logic that in real computer networks, the admins have tools that consume logs from all computers simultaneously, and they can detect actions from a central position (such as a SIEM). There are several modes of the defender (see Task Configuration - Defender for details.

Starting the game

The environment should be created prior strating the agents. The properties of the environment can be defined in a YAML file. The game server can be started by running: python3 coordinator.py

When created, the environment:

reads the configuration file
loads the network configuration from the config file
reads the defender type from the configuration
creates starting position and goal position following the config file
starts the game server in specified address and port

Interaction with the Environment

When the game server is created, agents connect to it and interact with the environment. In every step of the interaction, agents submits an Action and receives Observation with next_state, reward, is_terminal, end, and info values. Once the terminal state or timeout is reached, no more interaction is possible until the agent asks for game reset. Each agent should extend the BaseAgent class in agents.

Configuration

The NetSecEnv is highly configurable in terms of the properties of the world, tasks and agent interacation. Modification of the world is done in the YAML configuration file in two main areas:

Environment (env section) controls the properties of the world (taxonomy of networks, maximum allowed steps per episode, probabilities of success of actions etc.)
Task configuration defines the agents properties (starting position, goal)

Environment configuration

The environment part defines the properties of the environment for the task (see the example below). In particular:

random_seed - sets seed for any random processes in the environment
scenario - sets the scenario (network topology) used in the task (currently, scenario1_tiny, scenario1_small, and scenario1 are available)
max_steps - sets the maximum number of steps an agent can make before an episode is terminated
store_replay_buffer - if True, interaction of the agents is serialized and stored in a file
use_dynamic_addresses - if True, the network and IP addresses defined in scenario are randomly changed at the beginning of EVERY episode (the network topology is kept as defined in the scenario. Relations between networks are kept, IPs inside networks are chosen at random based on the network IP and mask)
use_firewall - if True firewall rules defined in scenario are used when executing actions. When False, firewall is ignored and all connections are allowed (Default)
goal_reward - sets reward which agent gets when it reaches the goal (default 100)
detection_reward - sets reward which agent gets when it is detected (default -50)
step_reward - sets reward which agent gets for every step taken (default -1)
actions - defines probability of success for every ActionType

env:
  random_seed: 42
  scenario: 'scenario1'
  max_steps: 15
  store_replay_buffer: True
  use_dynamic_addresses: False
  use_firewall: True
  goal_reward: 100
  detection_reward: -5
  step_reward: -1
  actions:
    scan_network:
      prob_success: 0.9
    find_services:
      prob_success: 0.9
    exploit_services:
      prob_success: 0.7
    find_data:
      prob_success: 0.8
    exfiltrate_data:
      prob_success: 0.8

Task configuration

The task configuration part (section coordinator[agents]) defines the starting and goal position of the attacker and type of defender that is used.

Attacker configuration (`attackers`)

Configuration of the attacking agents. Consists of two parts:

Goal definition (goal) which describes the GameState properties that must be fullfiled to award goal_reward to the attacker:
- known_networks:(set)
- known_hosts(set)
- controlled_hosts(set)
- known_services(dict)
- known_data(dict)
Each of the part can be empty (not part of the goal, exactly defined (e.g. known_networks: [192.168.1.0/24, 192.168.3.0/24]) or include keyword random (controlled_hosts: [213.47.23.195, random], known_data: {213.47.23.195: [random]}. Addtitionally if random keyword is used int he goal definition, randomize_goal_every_episode. If set to True, each keyword random is replaced with a randomly selected, valid option at the beginning of EVERY episode. If set to False, randomization is performed only once when the environment is
Definiton of starting position (start_position) which describes the GameState in which the attacker starts. It consists of:
- known_networks:(set)
- known_hosts(set)
- controlled_hosts(set)
- known_services(dict)
- known_data(dict)
The initial network configuration must assign at least one controlled host to the attacker in the network. Any item in controlled_hosts is copied to known_hosts so there is no need to include these in both sets. known_networks is also extended with a set of all networks accessible from the controlled_hosts

Example attacker configuration:

agents:
  attackers:
    goal:
      randomize_goal_every_episode: False
      known_networks: []
      known_hosts: []
      controlled_hosts: []
      known_services: {192.168.1.3: [Local system, lanman server, 10.0.19041, False], 192.168.1.4: [Other system, SMB server, 21.2.39421, False]}
      known_data: {213.47.23.195: ["random"]}

    start_position:
      known_networks: []
      known_hosts: []
      # The attacker must always at least control the CC if the goal is to exfiltrate there
      # Example of fixing the starting point of the agent in a local host
      controlled_hosts: [213.47.23.195, random]
      # Services are defined as a target host where the service must be, and then a description in the form 'name,type,version,is_local'
      known_services: {}
      known_data: {}

Defender configuration (`defenders`)

Definition of defending agent's properties. Currently, the defender is NOT a separate agent but it is considered part of the environment. type - Type of the defender. Three types are currently implemented:

NoDefender (default) - interation without defender
StochasticDefender - detections are based on ActionType probabilities (defined in the task configuraion, section action_detetection_prob).
StochasticDefenderWithThreshold - Modification of stochastic defender. Detection probabilities are used IF threasholds in the particular ActionType is reached. Thresholds are computed in time windows defined by tw_size (tw_size=5 means that 5 previous actions are taken into account). If ratio of some ActionType within the timewindow is above the threshold, the probability defined in the task configuraion, section action_detetection_prob is used to determine if the action was detected. For action BELOW the thresholds, no detection is made. Additionally, thresholds for consecutive action type is defined in consecutive_actions. For example with

  scan_network:
    consecutive_actions: 2

if the agent uses action ScanNetwork (regardless of the parameters) twice or more, the detection can occur. Action types FindData and exploit_service have additional thresholds for repeated actions (with parameters) throughout the WHOLE episode (e.g. if action <ActionType.FindData|{'target_host': 192.168.2.2}> is played more than 2 with following configuration, the detection can happen based on the defined probability).

Example of defender configuration:

agents:
  defenders:
    type: 'StochasticWithThreshold'
    tw_size: 5
    thresholds:
      scan_network:
        consecutive_actions: 2
        tw_ratio: 0.25
      find_services:
        consecutive_actions: 3
        tw_ratio: 0.3
      exploit_service:
        repeated_actions_episode: 2
        tw_ratio: 0.25
      find_data:
        tw_ratio: 0.5
        repeated_actions_episode: 2
      exfiltrate_data:
        consecutive_actions: 2
        tw_ratio: 0.25
    action_detetection_prob:
        scan_network: 0.05
        find_services: 0.075
        exploit_service: 0.1
        find_data: 0.025
        exfiltrate_data: 0.025

Definition of the network topology

The network topology and rules are defined using a CYST simulator configuration. Cyst defines a complex network configuration, and this environment does not use all Cyst features for now. CYST components currently used are:

Server hosts (are a NodeConf in CYST)
- Interfaces, each with one IP address
- Users that can login to the host
- Active and passive services
- Data in the server
- To which network is connected
Client host (are a Node in CYST)
- Interfaces, each with one IP address
- To which network is connected
- Active and passive services if any
- Data in the client
Router (are a RouterConf in CYST)
- Interfaces, each with one IP address
- Networks
- Allowed connections between hosts
Internet host (as an external router) (are a Node in RouterConf)
- Interfaces, each with one IP address
- Which host can connect
Exploits
- which service is the exploit linked to

Scenarios

In the current state, we support a single scenario: Data exfiltration to a remote C&C server.

Data exfiltration to a remote C&C

For the data exfiltration we support 3 variants. The full scenario contains 5 clients (where the attacker can start) and 5 servers where the data which is supposed to be exfiltrated can be located. scenario1_small is a variant with a single client (attacker always starts there) and all 5 servers. scenario1_tiny contains only single server with data. The tiny scenario is trivial and intended only for debuggin purposes.

Scenario 1	Scenario 1 - small	Scenario 1 -tiny

Testing the environment

It is advised after every change you test if the env is running correctly by doing

tests/run_all_tests.sh

This will load and run the unit tests in the tests folder.

Code adaptation for new configurations

The code can be adapted to new configurations of games and for new agents. See Agent repository for more details.

About us

This code was developed at the Stratosphere Laboratory at the Czech Technical University in Prague.

Name		Name	Last commit message	Last commit date
Latest commit History 1,577 Commits
.github/workflows		.github/workflows
NetSecGameAgents @ 8606f53		NetSecGameAgents @ 8606f53
docs		docs
env		env
notebooks		notebooks
readme_images		readme_images
tests		tests
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
coordinator.conf		coordinator.conf
coordinator.py		coordinator.py
requirements.txt		requirements.txt

License

stratosphereips/NetSecGame

Folders and files

Latest commit

History

Repository files navigation