# DEM (Defense, Escort, and Movement) Environment Tutorial

This notebook provides a comprehensive tutorial for the DEM environment, where agents must protect a VIP character while escorting them from a starting position to a target location, defending against various threats.

## Table of Contents
1. Environment Overview
2. Task Description
3. Action Space
4. Observation Space
5. CTDE Global State Space
6. Reward System
7. Usage Examples
8. Integration with MARL Algorithms

## 1. Environment Overview

The DEM environment is a complex multi-agent defense and escort scenario where agents must protect a VIP (Very Important Person) while guiding them to safety. This environment tests coordination, defense, and strategic planning capabilities.

**Key Features:**
- VIP protection and escort mechanics
- Dynamic threat spawning (rushers and shooters)
- Terrain effects (rivers, forests)
- Role emergence possibilities (guards, vanguards, snipers)
- Rich observation and reward systems
- Configurable difficulty levels

In [1]:
# Import necessary libraries
import sys
import os
sys.path.append(os.path.abspath('..'))

import numpy as np
import matplotlib.pyplot as plt
from DEM.env_dem import create_dem_env
from DEM.env_dem_ctde import create_dem_ctde_env

## 2. Task Description

### Objective
Protect and escort a VIP from their starting position to a target location while defending against dynamically spawning threats.

### Core Mechanics

**VIP Behavior:**
- Autonomous movement toward target
- Limited vision range and movement speed
- Can take damage from threats
- Must reach target alive for success

**Threat Types:**
1. **Rushers**: Close-range melee attackers
   - High movement speed (1 cell per step)
   - Short attack range (1 cell)
   - Moderate damage (8 HP)
   - Fast attack cooldown (1 step)

2. **Shooters**: Long-range ranged attackers
   - No movement (stationary)
   - Long attack range (5 cells)
   - High damage (15 HP)
   - Slow attack cooldown (3 steps)

**Terrain Effects:**
- **Rivers**: Impede movement (cannot move through)
- **Forests**: Provide damage reduction (30% less damage taken)

### Episode Termination
- VIP reaches target location (success)
- VIP dies (failure)
- Maximum steps reached
- Time limit exceeded

In [2]:
# Create environment and demonstrate basic functionality
env = create_dem_env(difficulty="normal", render_mode="")

print("Environment Information:")
env_info = env.get_env_info()
for key, value in env_info.items():
    print(f"  {key}: {value}")

print("\nEnvironment Configuration:")
print(f"  Grid size: {env.config.grid_size}x{env.config.grid_size}")
print(f"  Number of agents: {env.config.num_agents}")
print(f"  Max steps: {env.config.max_steps}")
print(f"  VIP initial HP: {env.config.vip_hp}")
print(f"  Agent HP: {env.config.agent_hp}")
print(f"  Max threats: {env.config.max_threats}")
print(f"  Difficulty levels: ['easy', 'normal', 'hard']")

Environment Information:
  n_agents: 3
  agent_ids: ['agent_0', 'agent_1', 'agent_2']
  action_space: Discrete(10)
  observation_space: Box(-1.0, 1.0, (59,), float32)
  global_state_dim: 41
  max_steps: 200
  episode_limit: 200
  obs_shape: 59
  n_actions: 10
  state_shape: 41

Environment Configuration:
  Grid size: 12x12
  Number of agents: 3
  Max steps: 200
  VIP initial HP: 60
  Agent HP: 50
  Max threats: 5
  Difficulty levels: ['easy', 'normal', 'hard']


## 3. Action Space

Each agent has 9 discrete actions combining movement and combat:

| Action ID | Action Name | Description |
|-----------|-------------|-------------|
| 0 | STAY | Agent remains in current position |
| 1 | UP | Agent moves one grid cell up |
| 2 | DOWN | Agent moves one grid cell down |
| 3 | LEFT | Agent moves one grid cell left |
| 4 | RIGHT | Agent moves one grid cell right |
| 5 | ATTACK_UP | Attack one cell up |
| 6 | ATTACK_DOWN | Attack one cell down |
| 7 | ATTACK_LEFT | Attack one cell left |
| 8 | ATTACK_RIGHT | Attack one cell right |

### Action Constraints
- Movement actions are invalid if target cell is occupied or blocked
- Attack actions are valid only if there's an enemy in range
- Attacks have cooldown periods
- Agents cannot move into rivers or VIP positions

In [4]:
# Demonstrate action space
obs = env.reset()

print("Action Space:")
print(f"  Number of actions: {env_info['n_actions']}")
print(f"  Action space: {env.action_space}")

Action Space:
  Number of actions: 10
  Action space: Discrete(10)


## 4. Observation Space

Each agent receives a rich local observation containing information about their surroundings:

**Vector format (length varies with configuration):**
- **Self information (5 values)**:
  - Agent position (x, y)
  - Current HP
  - Attack cooldown status
  - Last action taken

- **VIP information (4 values)**:
  - VIP position (x, y)
  - VIP current HP
  - VIP target position (x, y)

- **Threat information (variable, max 20 values)**:
  - Up to 5 nearest threats
  - Each threat: position (x, y), type, HP

- **Other agents information (variable, max 20 values)**:
  - Up to 5 nearest friendly agents
  - Each agent: position (x, y), HP

- **Terrain information (25 values)**:
  - 5x5 grid around agent showing terrain types
  - 0: empty, 1: river, 2: forest, 3: VIP, 4: threat

- **Goal information (1 value)**:
  - Distance to VIP target

In [5]:
# Demonstrate observation space
obs = env.reset()

print("Observation Space:")
for i, (agent_id, observation) in enumerate(obs.items()):
    print(f"\n  {agent_id}:")
    print(f"    Shape: {observation.shape}")
    print(f"    Min: {observation.min():.3f}, Max: {observation.max():.3f}")
    
    # Decode observation components
    idx = 0
    print(f"    Self info: pos=({observation[idx]:.1f}, {observation[idx+1]:.1f}), hp={observation[idx+2]:.1f}, cooldown={observation[idx+3]:.1f}, last_action={observation[idx+4]:.1f}")
    idx += 5
    
    print(f"    VIP info: pos=({observation[idx]:.1f}, {observation[idx+1]:.1f}), hp={observation[idx+2]:.1f}, target=({observation[idx+3]:.1f}, {observation[idx+4]:.1f})")
    idx += 5
    
    if len(observation) > idx + 2:
        print(f"    Distance to target: {observation[-1]:.1f}")
    
    if i == 0:  # Show details for first agent only
        break

Observation Space:

  agent_0:
    Shape: (59,)
    Min: -0.083, Max: 1.000
    Self info: pos=(0.0, 0.1), hp=1.0, cooldown=0.0, last_action=0.0
    VIP info: pos=(0.1, 0.9), hp=0.0, target=(1.0, 1.0)
    Distance to target: 1.0


## 5. CTDE Global State Space

For Centralized Training with Decentralized Execution (CTDE), the environment provides a comprehensive global state containing information about all entities in the environment.

**Global State Components:**
- **All agent positions and HP** (n_agents × 3 values)
- **VIP position, HP, and target** (5 values)
- **All threat positions, types, and HP** (variable, up to max_threats × 4 values)
- **Terrain information** (grid_size × grid_size values)
- **Time and step information** (2 values)
- **Communication history** (if enabled)

**Global State Types:**
- `concat`: Concatenation of all information (default)
- `mean`: Mean pooling of agent observations
- `max`: Max pooling of agent observations
- `attention`: Attention-based aggregation

In [6]:
# Demonstrate CTDE environment
ctde_env = create_dem_ctde_env(difficulty="normal", global_state_type="concat")

obs = ctde_env.reset()
global_state = ctde_env.get_global_state()

print("CTDE Environment:")
print(f"  Global state dimension: {len(global_state)}")
print(f"  Global state sample: {global_state[:15]}...")  # Show first 15 values

print("\nState breakdown:")
print(f"  Agents: {ctde_env.config.num_agents}")
print(f"  Max threats: {ctde_env.config.max_threats}")
print(f"  Grid size: {ctde_env.config.grid_size}x{ctde_env.config.grid_size}")

ctde_env.close()

CTDE Environment:
  Global state dimension: 41
  Global state sample: [0.08333334 0.08333334 1.         0.         0.         0.08333334
 1.         0.         0.08333334 0.         1.         0.
 0.16666667 0.08333334 1.        ]...

State breakdown:
  Agents: 3
  Max threats: 5
  Grid size: 12x12


## 6. Reward System

The reward system is designed to encourage VIP protection, escort efficiency, and strategic agent behavior.

**Primary Rewards:**
1. **VIP Reach Target**: +50.0 (main success reward)
2. **VIP Death**: -30.0 (major failure penalty)
3. **VIP Progress**: +0.2 per grid unit closer to target
4. **Threat Killed**: +3.0 per enemy eliminated
5. **VIP Damage**: -0.1 per HP point lost
6. **Agent Death**: -3.0 per friendly agent lost

**Role Emergence Rewards:**
- **Guard Adjacent**: +0.05 for being next to VIP
- **Guard Missing Penalty**: -0.02 if no agent near VIP
- **Body Block**: +0.5 for blocking threats from VIP
- **Vanguard Ahead**: +0.05 for being ahead of VIP
- **Vanguard Missing Penalty**: -0.02 if no agent ahead
- **Long Range Kill**: +1.0 for kills from ≥6 units away
- **Formation Rewards**: ±0.02 for good/bad agent spacing

**Movement Penalties:**
- **Collision**: -0.05 for agent-agent collisions
- **Invalid Action**: -0.1 for impossible actions

**Difficulty Scaling:**
- **Easy**: Reduced threat spawn rate, higher VIP/agent HP
- **Normal**: Balanced challenge level
- **Hard**: Increased threat spawn rate, lower HP

In [8]:
# Demonstrate reward system
print("Reward System Configuration:")
print(f"  VIP reach target: {env.config.reward_vip_reach_target}")
print(f"  VIP death: {env.config.reward_vip_death}")
print(f"  VIP progress: {env.config.reward_vip_progress} per unit")
print(f"  Threat killed: {env.config.reward_threat_killed}")
print(f"  VIP damage: {env.config.reward_vip_damage} per HP")
print(f"  Agent death: {env.config.reward_agent_death}")
print(f"  Body block: {env.config.reward_body_block}")
print(f"  Long range kill: {env.config.reward_long_range_kill}")

Reward System Configuration:
  VIP reach target: 50.0
  VIP death: -30.0
  VIP progress: 0.2 per unit
  Threat killed: 3.0
  VIP damage: -0.1 per HP
  Agent death: -3.0
  Body block: 0.5
  Long range kill: 1.0


### Different Difficulty Levels

In [10]:
# Compare different difficulty levels
difficulties = ['easy', 'normal', 'hard']

print("Difficulty Level Comparison:")
for diff in difficulties:
    test_env = create_dem_env(difficulty=diff, render_mode="")
    
    print(f"\n{diff.upper()} difficulty:")
    print(f"  Grid size: {test_env.config.grid_size}x{test_env.config.grid_size}")
    print(f"  VIP HP: {test_env.config.vip_hp}")
    print(f"  Agent HP: {test_env.config.agent_hp}")
    print(f"  Max threats: {test_env.config.max_threats}")
    print(f"  Threat spawn interval: {test_env.config.threat_spawn_base_interval}")
    print(f"  Rusher probability: {test_env.config.rusher_probability:.1f}")
    
    test_env.close()

Difficulty Level Comparison:

EASY difficulty:
  Grid size: 10x10
  VIP HP: 80
  Agent HP: 60
  Max threats: 3
  Threat spawn interval: 12
  Rusher probability: 0.4

NORMAL difficulty:
  Grid size: 12x12
  VIP HP: 60
  Agent HP: 50
  Max threats: 5
  Threat spawn interval: 8
  Rusher probability: 0.6

HARD difficulty:
  Grid size: 12x12
  VIP HP: 40
  Agent HP: 40
  Max threats: 8
  Threat spawn interval: 6
  Rusher probability: 0.8


## 8. Integration with MARL Algorithms

The DEM environment is designed for seamless integration with popular MARL algorithms.

In [12]:
# Example: Integration with MARL framework
try:
    from marl.src.envs import create_env_wrapper
    
    # Create MARL environment wrapper
    config = {
        'env': {
            'name': 'DEM',
            'difficulty': 'normal',
            'global_state_type': 'concat'
        }
    }
    
    marl_env = create_env_wrapper(config)
    
    print("MARL Integration:")
    print(f"  Environment created successfully")
    print(f"  Agent IDs: {marl_env.agent_ids}")
    print(f"  N agents: {marl_env.n_agents}")
    
    # Test MARL environment
    obs, _ = marl_env.reset()
    global_state = marl_env.get_global_state()
    
    print(f"  Observation shape: {list(obs.values())[0].shape}")
    print(f"  Global state shape: {global_state.shape}")
    
    # Run a few steps
    for step in range(5):
        actions = {agent_id: np.random.randint(0, 9) for agent_id in marl_env.agent_ids}
        obs, rewards, dones, infos = marl_env.step(actions)
        
        if any(dones.values()):
            print(f"  Episode completed at step {step+1}")
            break
    
    marl_env.close()
    print("  MARL integration test passed!")
    
except ImportError:
    print("MARL framework not available - this is normal if not installed")
except Exception as e:
    print(f"MARL integration test failed: {e}")

MARL Integration:
  Environment created successfully
  Agent IDs: ['agent_0', 'agent_1', 'agent_2']
  N agents: 3
  Observation shape: (59,)
  Global state shape: (41,)
  MARL integration test passed!


## Summary

The DEM environment provides a complex and realistic multi-agent defense scenario with the following key characteristics:

### Strengths:
- **Rich dynamics**: VIP escort, threat spawning, terrain effects
- **Role emergence**: Natural emergence of guard, vanguard, and sniper roles
- **Strategic depth**: Multiple objectives (protection, offense, positioning)
- **Scalable difficulty**: Easy to hard configurations
- **Comprehensive observations**: Local and global information for CTDE
- **Well-designed rewards**: Balance multiple objectives

### Use Cases:
- Testing multi-agent coordination under pressure
- Studying role emergence in teams
- Benchmarking defensive MARL algorithms
- Research on hierarchical agent behaviors
- Training cooperative defense strategies

### Configuration Tips:
- Start with `easy` difficulty to understand basic mechanics
- Use `normal` for standard benchmarks and research
- Try `hard` for challenging multi-objective scenarios
- Adjust threat spawn rates for custom difficulty curves
- Modify reward weights to emphasize different objectives
- Use CTDE versions for centralized training algorithms

### Key Challenges:
- Balancing protection vs. escort objectives
- Coordinating against diverse threat types
- Optimizing agent positioning and formations
- Managing limited resources (agent HP, attack cooldowns)
- Adapting to dynamic threat spawning

In [13]:
# Clean up
env.close()
print("\nDEM Environment Tutorial completed!")


DEM Environment Tutorial completed!
