# SMAC (StarCraft Multi-Agent Challenge) 环境封装器教程

本教程将介绍如何使用SMAC环境封装器进行多智能体强化学习实验。

## 目录
1. [环境简介](#环境简介)
2. [基础使用](#基础使用)
3. [CTDE兼容性](#ctde兼容性)
4. [不同地图测试](#不同地图测试)
5. [配置管理](#配置管理)
6. [性能测试](#性能测试)
7. [高级功能](#高级功能)
8. [与原版SMAC对比](#与原版smac对比)

## 环境简介

SMAC环境封装器是对现有SMAC库(`smac.env.StarCraft2Env`)的封装，使其具有与DEM、HRG、MSFS环境相同的统一接口。

### 主要特性

- **统一接口**: 与其他环境(DEM、HRG、MSFS)具有相同的方法和返回格式
- **基于原版SMAC**: 使用经过验证的SMAC库作为底层引擎
- **CTDE支持**: 提供集中式训练分布式执行的包装器
- **多种地图**: 支持所有SMAC标准地图(8m、3s、2s3z、MMM等)
- **动作掩码**: 支持智能体动作可用性检测
- **配置灵活**: 支持多种预设和自定义配置

### 支持的地图类型

- **8m**: 8个Marine vs 8个Marine - 对称战斗
- **3s**: 3个Stalker vs 3个Zealot - 异构单位战斗
- **2s3z**: 2个Stalker vs 3个Zealot - 数量不对称
- **MMM**: Marine、Medivac、Marauder组合 - 混合部队
- **corridor**: 狭窄走廊地图 - 限制移动空间
- **6h**: 6个Hydralisk vs 多个单位 - 大规模战斗

找到 23 个可用地图:
- 10m_vs_11m: 智能体数量=10, 敌人数量=11
- 1c3s5z: 智能体数量=9, 敌人数量=9
- 25m: 智能体数量=25, 敌人数量=25
- 27m_vs_30m: 智能体数量=27, 敌人数量=30
- 2c_vs_64zg: 智能体数量=2, 敌人数量=64
- 2m_vs_1z: 智能体数量=2, 敌人数量=1
- 2s3z: 智能体数量=5, 敌人数量=5
- 2s_vs_1sc: 智能体数量=2, 敌人数量=1
- 3m: 智能体数量=3, 敌人数量=3
- 3s5z: 智能体数量=8, 敌人数量=8
- 3s5z_vs_3s6z: 智能体数量=8, 敌人数量=9
- 3s_vs_3z: 智能体数量=3, 敌人数量=3
- 3s_vs_4z: 智能体数量=3, 敌人数量=4
- 3s_vs_5z: 智能体数量=3, 敌人数量=5
- 5m_vs_6m: 智能体数量=5, 敌人数量=6
- 6h_vs_8z: 智能体数量=6, 敌人数量=8
- 8m: 智能体数量=8, 敌人数量=8
- 8m_vs_9m: 智能体数量=8, 敌人数量=9
- MMM: 智能体数量=10, 敌人数量=10
- MMM2: 智能体数量=10, 敌人数量=12
- bane_vs_bane: 智能体数量=24, 敌人数量=24
- corridor: 智能体数量=6, 敌人数量=24
- so_many_baneling: 智能体数量=7, 敌人数量=32

In [1]:
# 导入必要的库
import numpy as np
import matplotlib.pyplot as plt
import sys
import os
import time
from typing import Dict, List, Any

# 添加环境路径
sys.path.append("..")  # 将上级目录添加到系统路径

# 导入SMAC环境封装器
from Env.SMAC import (
    create_smac_env, create_smac_ctde_env,
    create_smac_env_easy, create_smac_env_normal, create_smac_env_hard,
    get_easy_config, get_normal_config, get_hard_config
)

print("SMAC环境封装器导入成功！")

SMAC环境封装器导入成功！


## 基础使用

In [2]:
# 创建基础SMAC环境
env = create_smac_env(map_name="8m", episode_limit=50)

print(f"环境信息:")
env_info = env.get_env_info()
for key, value in env_info.items():
    if isinstance(value, (int, float)):
        print(f"- {key}: {value}")
    else:
        print(f"- {key}: {type(value).__name__}")

环境信息:
- state_shape: 168
- obs_shape: 80
- n_actions: 14
- n_agents: 8
- episode_limit: 120
- agent_features: list
- enemy_features: list
- agent_ids: list
- action_spaces: dict
- observation_spaces: dict
- act_dims: dict
- obs_dims: dict
- max_steps: 120
- global_state_dim: 168


In [3]:
# 重置环境
observations = env.reset()

print("初始观察信息:")
for agent_id, obs in observations.items():
    print(f"- {agent_id}: 形状={obs.shape}, 数据类型={obs.dtype}")
    print(f"  最小值={obs.min():.3f}, 最大值={obs.max():.3f}")
    print(f"  零值比例={np.mean(obs == 0):.2%}")
    break  # 只显示第一个智能体的信息

Version: B69232 (SC2.4.6-Publish)
Build: Oct 23 2018 01:43:04
Command Line: '"/home/sswun/StarCraftII/Versions/Base69232/SC2_x64" -listen 127.0.0.1 -port 41975 -dataDir /home/sswun/StarCraftII/ -tempDir /tmp/sc-vqe8l6o4/'
Starting up...
Startup Phase 1 complete
Startup Phase 2 complete
Creating stub renderer...
Listening on: 127.0.0.1:41975
Startup Phase 3 complete. Ready for commands.
Requesting to join a single player game
Configuring interface options
Configure: raw interface enabled
Configure: feature layer interface disabled
Configure: score interface disabled
Configure: render interface disabled
Entering load game phase.
Launching next game.
Next launch phase started: 2
Next launch phase started: 3
Next launch phase started: 4
Next launch phase started: 5
Next launch phase started: 6
Next launch phase started: 7
Next launch phase started: 8


初始观察信息:
- agent_0: 形状=(80,), 数据类型=float32
  最小值=-0.167, 最大值=1.000
  零值比例=51.25%


Game has started.
Sending ResponseJoinGame


In [4]:
# 执行一个简单的回合
total_rewards = {agent_id: 0 for agent_id in env.agent_ids}
step_count = 0
max_steps = 30

print("开始执行回合...")

while step_count < max_steps:
    # 随机选择动作
    actions = {}
    for agent_id in env.agent_ids:
        # 获取可用动作
        avail_actions = env.get_avail_actions(agent_id)
        action = np.random.choice(avail_actions) if avail_actions else 0
        actions[agent_id] = action
    
    # 执行动作
    observations, rewards, dones, infos = env.step(actions)
    
    # 累积奖励
    for agent_id, reward in rewards.items():
        total_rewards[agent_id] += reward
    
    step_count += 1
    
    # 打印进度
    if step_count % 10 == 0:
        avg_reward = np.mean(list(rewards.values()))
        print(f"步数 {step_count}: 平均奖励 = {avg_reward:.3f}")
    
    # 检查是否结束
    if any(dones.values()):
        print(f"回合在步数 {step_count} 结束")
        break

print("\n回合结果:")
for agent_id, total_reward in total_rewards.items():
    print(f"- {agent_id}: 总奖励 = {total_reward:.3f}")
print(f"- 团队平均奖励: {np.mean(list(total_rewards.values())):.3f}")

env.close()

开始执行回合...
步数 10: 平均奖励 = 0.000
步数 20: 平均奖励 = 0.000
步数 30: 平均奖励 = 0.000

回合结果:
- agent_0: 总奖励 = 0.234
- agent_1: 总奖励 = 0.234
- agent_2: 总奖励 = 0.234
- agent_3: 总奖励 = 0.234
- agent_4: 总奖励 = 0.234
- agent_5: 总奖励 = 0.234
- agent_6: 总奖励 = 0.234
- agent_7: 总奖励 = 0.234
- 团队平均奖励: 0.234


RequestQuit command received.
unable to parse websocket frame.
Closing Application...


## CTDE兼容性

SMAC环境封装器提供了CTDE（集中式训练分布式执行）包装器，支持QMIX、VDN等算法。

In [5]:
# 获取CTDE环境信息
ctde_env = create_smac_ctde_env(map_name="8m")
env_info = ctde_env.get_env_info()

# 重置环境
observations = ctde_env.reset()

# 获取全局状态
global_state = ctde_env.get_global_state()

print(f"  - 全局状态维度: {global_state.shape}")
print(f"  - 非零元素数量: {np.count_nonzero(global_state)}")
print(f"  - 数据范围: [{global_state.min():.3f}, {global_state.max():.3f}]")

# 执行一步并检查info中的全局状态
actions = {}
for agent_id in ctde_env.agent_ids:
    # 获取可用动作
    avail_actions = ctde_env.get_avail_actions(agent_id)
    action = np.random.choice(avail_actions) if avail_actions else 0
    actions[agent_id] = action
obs, rewards, dones, infos = ctde_env.step(actions)

if 'global_state' in infos:
    info_state = infos['global_state']
    print(f"  - Info中全局状态形状: {info_state.shape}")
    print(f"  - 状态一致性: {np.allclose(global_state, info_state)}")

print("CTDE环境信息:")
for key, value in env_info.items():
    if isinstance(value, (int, float)):
        print(f"- {key}: {value}")
    else:
        print(f"- {key}: {type(value).__name__}")

ctde_env.close()

Version: B69232 (SC2.4.6-Publish)
Build: Oct 23 2018 01:43:04
Command Line: '"/home/sswun/StarCraftII/Versions/Base69232/SC2_x64" -listen 127.0.0.1 -port 39819 -dataDir /home/sswun/StarCraftII/ -tempDir /tmp/sc-ryakqlbo/'
Starting up...
Startup Phase 1 complete
Startup Phase 2 complete
Creating stub renderer...
Listening on: 127.0.0.1:39819
Startup Phase 3 complete. Ready for commands.
Requesting to join a single player game
Configuring interface options
Configure: raw interface enabled
Configure: feature layer interface disabled
Configure: score interface disabled
Configure: render interface disabled
Entering load game phase.
Launching next game.
Next launch phase started: 2
Next launch phase started: 3
Next launch phase started: 4
Next launch phase started: 5
Next launch phase started: 6
Next launch phase started: 7
Next launch phase started: 8


  - 全局状态维度: (168,)
  - 非零元素数量: 42
  - 数据范围: [-0.242, 1.000]
  - Info中全局状态形状: (168,)
  - 状态一致性: False
CTDE环境信息:
- state_shape: 168
- obs_shape: 80
- n_actions: 14
- n_agents: 8
- episode_limit: 120
- agent_features: list
- enemy_features: list
- agent_ids: list
- action_spaces: dict
- observation_spaces: dict
- act_dims: dict
- obs_dims: dict
- max_steps: 120
- global_state_dim: 168


Game has started.
Sending ResponseJoinGame
unable to parse websocket frame.
RequestQuit command received.
Closing Application...


In [6]:
ctde_env.close()

## 不同地图测试

In [5]:
# 测试不同难度和地图
maps_to_test = ["8m", "MMM", "corridor"]

for map_name in maps_to_test:
    print(f"\n测试地图: {map_name}")
    
    try:
        env = create_smac_env(map_name=map_name, episode_limit=20)
        
        observations = env.reset()
        
        print(f"  - 智能体数量: {len(env.agent_ids)}")
        print(f"  - 智能体IDs: {env.agent_ids}")
        print(f"  - 观察空间: {env_info['obs_shape']} 维")
        print(f"  - 动作空间: {env_info['n_actions']} 个动作")
        print(f"  - 回合限制: {env_info['episode_limit']} 步")
        
        # 执行几步测试
        total_reward = 0
        for step in range(5):
            actions = {}
            for agent_id in env.agent_ids:
                avail_actions = env.get_avail_actions(agent_id)
                actions[agent_id] = np.random.choice(avail_actions) if avail_actions else 0
            
            obs, rewards, dones, infos = env.step(actions)
            total_reward += np.mean(list(rewards.values()))
            
            if any(dones.values()):
                break
        
        print(f"  - 5步平均奖励: {total_reward/5:.3f}")
        
        env.close()
        print(f"  ✓ 地图 {map_name} 测试成功")
        
    except Exception as e:
        print(f"  ✗ 地图 {map_name} 测试失败: {e}")
        print(f"  (这可能是因为该地图在当前SMAC安装中不可用)" )


测试地图: 8m


Version: B69232 (SC2.4.6-Publish)
Build: Oct 23 2018 01:43:04
Command Line: '"/home/sswun/StarCraftII/Versions/Base69232/SC2_x64" -listen 127.0.0.1 -port 33667 -dataDir /home/sswun/StarCraftII/ -tempDir /tmp/sc-fugmbr39/'
Starting up...
Startup Phase 1 complete
Startup Phase 2 complete
Creating stub renderer...
Listening on: 127.0.0.1:33667
Startup Phase 3 complete. Ready for commands.
Requesting to join a single player game
Configuring interface options
Configure: raw interface enabled
Configure: feature layer interface disabled
Configure: score interface disabled
Configure: render interface disabled
Entering load game phase.
Launching next game.
Next launch phase started: 2
Next launch phase started: 3
Next launch phase started: 4
Next launch phase started: 5
Next launch phase started: 6
Next launch phase started: 7
Next launch phase started: 8
Game has started.
Sending ResponseJoinGame
RequestQuit command received.
unable to parse websocket frame.
Closing Application...
Version: B69

  - 智能体数量: 8
  - 智能体IDs: ['agent_0', 'agent_1', 'agent_2', 'agent_3', 'agent_4', 'agent_5', 'agent_6', 'agent_7']
  - 观察空间: 80 维
  - 动作空间: 14 个动作
  - 回合限制: 120 步
  - 5步平均奖励: 0.005
  ✓ 地图 8m 测试成功

测试地图: MMM


Startup Phase 2 complete
Creating stub renderer...
Listening on: 127.0.0.1:37393
Startup Phase 3 complete. Ready for commands.
Requesting to join a single player game
Configuring interface options
Configure: raw interface enabled
Configure: feature layer interface disabled
Configure: score interface disabled
Configure: render interface disabled
Entering load game phase.
Launching next game.
Next launch phase started: 2
Next launch phase started: 3
Next launch phase started: 4
Next launch phase started: 5
Next launch phase started: 6
Next launch phase started: 7
Next launch phase started: 8
Game has started.
Sending ResponseJoinGame
unable to parse websocket frame.
RequestQuit command received.
Closing Application...
Version: B69232 (SC2.4.6-Publish)
Build: Oct 23 2018 01:43:04
Command Line: '"/home/sswun/StarCraftII/Versions/Base69232/SC2_x64" -listen 127.0.0.1 -port 44495 -dataDir /home/sswun/StarCraftII/ -tempDir /tmp/sc-_3uk0xoy/'
Starting up...
Startup Phase 1 complete


  - 智能体数量: 10
  - 智能体IDs: ['agent_0', 'agent_1', 'agent_2', 'agent_3', 'agent_4', 'agent_5', 'agent_6', 'agent_7', 'agent_8', 'agent_9']
  - 观察空间: 80 维
  - 动作空间: 14 个动作
  - 回合限制: 120 步
  - 5步平均奖励: 0.000
  ✓ 地图 MMM 测试成功

测试地图: corridor


Startup Phase 2 complete
Creating stub renderer...
Listening on: 127.0.0.1:44495
Startup Phase 3 complete. Ready for commands.
Requesting to join a single player game
Configuring interface options
Configure: raw interface enabled
Configure: feature layer interface disabled
Configure: score interface disabled
Configure: render interface disabled
Entering load game phase.
Launching next game.
Next launch phase started: 2
Next launch phase started: 3
Next launch phase started: 4
Next launch phase started: 5
Next launch phase started: 6
Next launch phase started: 7
Next launch phase started: 8


  - 智能体数量: 6
  - 智能体IDs: ['agent_0', 'agent_1', 'agent_2', 'agent_3', 'agent_4', 'agent_5']
  - 观察空间: 80 维
  - 动作空间: 14 个动作
  - 回合限制: 120 步
  - 5步平均奖励: 0.000
  ✓ 地图 corridor 测试成功


Game has started.
Sending ResponseJoinGame
unable to parse websocket frame.
RequestQuit command received.
Closing Application...


## 配置管理

In [None]:
# 测试不同难度配置
difficulties = ["easy", "normal", "hard"]

for difficulty in difficulties:
    print(f"\n难度配置: {difficulty}")
    
    if difficulty == "easy":
        env = create_smac_env_easy(episode_limit=20)
    elif difficulty == "hard":
        env = create_smac_env_hard(episode_limit=20)
    else:  # normal
        env = create_smac_env_normal(episode_limit=20)
    
    print(f"  - 地图名称: {env.config.map_name}")
    print(f"  - 难度: {env.config.difficulty}")
    print(f"  - 回合限制: {env.config.episode_limit}")
    print(f"  - 智能体数量: {len(env.agent_ids)}")
    
    env.close()

In [None]:
# 自定义配置示例
from Env.SMAC.config import SMACConfig

# 创建自定义配置
custom_config = SMACConfig(
    map_name="MMM",
    difficulty="custom",
    episode_limit=100,
    debug=True,
    seed=42
)

print("自定义配置:")
print(f"- 地图名称: {custom_config.map_name}")
print(f"- 难度: {custom_config.difficulty}")
print(f"- 回合限制: {custom_config.episode_limit}")
print(f"- 调试模式: {custom_config.debug}")
print(f"- 随机种子: {custom_config.seed}")

# 创建环境
custom_env = SMACEnv(custom_config)

print(f"\n自定义环境创建成功！智能体数量: {len(custom_env.agent_ids)}")
custom_env.close()

## 性能测试

In [None]:
# 性能测试
def benchmark_environment(map_name: str, num_steps: int = 50):
    env = create_smac_env(map_name=map_name, episode_limit=num_steps)
    observations = env.reset()
    
    start_time = time.time()
    
    total_rewards = []
    step_times = []
    
    for step in range(num_steps):
        step_start = time.time()
        
        # 随机动作
        actions = {}
        for agent_id in env.agent_ids:
            avail_actions = env.get_avail_actions(agent_id)
            actions[agent_id] = np.random.choice(avail_actions) if avail_actions else 0
        
        # 执行步骤
        observations, rewards, dones, infos = env.step(actions)
        
        step_end = time.time()
        step_times.append(step_end - step_start)
        total_rewards.append(np.mean(list(rewards.values())))
        
        # 如果回合结束，重置
        if any(dones.values()):
            observations = env.reset()
    
    total_time = time.time() - start_time
    
    env.close()
    
    return {
        'map_name': map_name,
        'total_time': total_time,
        'avg_step_time': np.mean(step_times),
        'steps_per_second': num_steps / total_time,
        'avg_reward': np.mean(total_rewards),
        'max_step_time': np.max(step_times),
        'min_step_time': np.min(step_times),
        'n_agents': len(env.agent_ids)
    }

# 运行不同地图的基准测试
maps_to_test = ["3s", "8m"]
results = []

for map_name in maps_to_test:
    print(f"\n测试地图性能: {map_name}")
    result = benchmark_environment(map_name, num_steps=30)
    results.append(result)
    
    print(f"地图: {result['map_name']}")
    print(f"  - 智能体数量: {result['n_agents']}")
    print(f"  - 总时间: {result['total_time']:.3f}秒")
    print(f"  - 平均每步时间: {result['avg_step_time']:.4f}秒")
    print(f"  - 每秒步数: {result['steps_per_second']:.1f}")
    print(f"  - 最大步时间: {result['max_step_time']:.4f}秒")
    print(f"  - 最小步时间: {result['min_step_time']:.4f}秒")
    print(f"  - 平均奖励: {result['avg_reward']:.4f}")
    
    # 性能评估 (SMAC可能比自定义环境慢，因为有游戏引擎开销)
    if result['avg_step_time'] < 0.1:
        print("  🚀 性能优秀: 适合大规模训练")
    elif result['avg_step_time'] < 0.5:
        print("  ✅ 性能良好: 适合常规训练")
    elif result['avg_step_time'] < 1.0:
        print("  ⚠️  性能一般: 可以用于训练，但可能较慢")
    else:
        print("  ❌ 性能较慢: 建议优化后再进行大规模训练")

## 高级功能

In [None]:
# 动作掩码演示
env = create_smac_env(map_name="3s", episode_limit=10)
observations = env.reset()

print("动作掩码演示:")
for agent_id in env.agent_ids:
    avail_actions = env.get_avail_actions(agent_id)
    
    print(f"\n智能体 {agent_id}:")
    print(f"  - 可用动作数量: {len(avail_actions)}")
    print(f"  - 可用动作: {avail_actions}")
    
    # 获取环境信息中的动作空间
    n_actions = env.get_env_info()['n_actions']
    print(f"  - 总动作数: {n_actions}")
    
    if len(avail_actions) > 3:  # 只显示前几个智能体的详细信息
        break

env.close()

In [None]:
# 观察空间详细分析
env = create_smac_env(map_name="8m", episode_limit=10)
observations = env.reset()

print("观察空间详细分析:")

# 分析一个智能体的观察
sample_agent_id = env.agent_ids[0]
obs = observations[sample_agent_id]

print(f"\n智能体 {sample_agent_id} 的观察分析:")
print(f"- 观察维度: {obs.shape}")
print(f"- 数据类型: {obs.dtype}")
print(f"- 最小值: {obs.min():.4f}")
print(f"- 最大值: {obs.max():.4f}")
print(f"- 均值: {obs.mean():.4f}")
print(f"- 标准差: {obs.std():.4f}")
print(f"- 零值比例: {np.mean(obs == 0):.2%}")

# 获取SMAC环境特征信息
env_info = env.get_env_info()
if 'agent_features' in env_info:
    print(f"\nSMAC环境特征:")
    print(f"- 智能体特征: {env_info['agent_features']}")
    print(f"- 敌人特征: {env_info['enemy_features']}")

env.close()

In [None]:
# 观察一致性测试
env = create_smac_env(map_name="3s", episode_limit=15)
observations = env.reset()

print("观察一致性测试:")

# 测试观察空间的一致性
obs_dims = []
for agent_id, obs in observations.items():
    obs_dims.append(obs.shape[0])
    print(f"- {agent_id}: {obs.shape[0]} 维")

print(f"\n所有智能体观察维度一致: {len(set(obs_dims)) == 1}")

# 测试观察在步骤间的变化
print("\n观察在步骤间的变化:")
for step in range(3):
    actions = {}
    for agent_id in env.agent_ids:
        avail_actions = env.get_avail_actions(agent_id)
        actions[agent_id] = np.random.choice(avail_actions) if avail_actions else 0
    
    obs, rewards, dones, infos = env.step(actions)
    
    means = [np.mean(o) for o in obs.values()]
    stds = [np.std(o) for o in obs.values()]
    
    print(f"  步骤 {step + 1}: 平均观察值 = {np.mean(means):.3f}, 平均标准差 = {np.mean(stds):.3f}")
    
    if any(dones.values()):
        break

env.close()

## 与原版SMAC对比

In [None]:
# 对比原版SMAC和封装器版本
import time

# 原版SMAC使用方式
print("=== 原版SMAC使用方式 ===")
from smac.env import StarCraft2Env

original_env = StarCraft2Env(map_name="8m")
original_env_info = original_env.get_env_info()

print(f"原版SMAC环境信息:")
print(f"- 智能体数量: {original_env_info['n_agents']}")
print(f"- 动作空间: {original_env_info['n_actions']}")
print(f"- 观察空间: {original_env_info['obs_shape']}")
print(f"- 状态空间: {original_env_info['state_shape']}")

# 测试原版SMAC的基本功能
original_env.reset()
obs_original = original_env.get_obs()
state_original = original_env.get_state()

print(f"\n原版SMAC数据格式:")
print(f"- 观察类型: {type(obs_original)}, 长度: {len(obs_original)}")
print(f"- 单个观察形状: {obs_original[0].shape}")
print(f"- 状态形状: {state_original.shape}")

original_env.close()

In [None]:
# 封装器版本使用方式
print("\n=== 封装器版本使用方式 ===")

wrapper_env = create_smac_env(map_name="8m", episode_limit=20)
wrapper_env_info = wrapper_env.get_env_info()

print(f"封装器环境信息:")
print(f"- 智能体数量: {wrapper_env_info['n_agents']}")
print(f"- 智能体IDs: {wrapper_env.agent_ids}")
print(f"- 动作空间: {wrapper_env_info['n_actions']}")
print(f"- 观察空间: {wrapper_env_info['obs_shape']}")
print(f"- 状态空间: {wrapper_env_info['state_shape']}")

# 测试封装器的基本功能
observations_wrapper = wrapper_env.reset()
state_wrapper = wrapper_env.get_global_state()

print(f"\n封装器数据格式:")
print(f"- 观察类型: {type(observations_wrapper)}")
print(f"- 观察字典键: {list(observations_wrapper.keys())[:3]}...")
print(f"- 单个观察形状: {list(observations_wrapper.values())[0].shape}")
print(f"- 状态形状: {state_wrapper.shape}")

wrapper_env.close()

In [None]:
# 兼容性演示 - 统一接口对比
print("\n=== 统一接口对比 ===")

# 所有环境都支持的统一接口
unified_methods = [
    "reset()",
    "step(actions)",
    "get_observations()",
    "get_global_state()",
    "get_avail_actions(agent_id)",
    "get_env_info()",
    "close()"
]

print("统一接口方法:")
for method in unified_methods:
    print(f"  ✓ {method}")

print("\n这些方法在SMAC、DEM、HRG、MSFS环境中都提供相同的接口和返回格式！")

## 完整示例：简单的随机智能体

In [None]:
import time
class RandomAgent:
    """简单的随机智能体，用于演示环境使用"""
    
    def __init__(self, agent_id: str):
        self.agent_id = agent_id
        self.total_reward = 0.0
        self.step_count = 0
    
    def choose_action(self, env, observations):
        """随机选择动作"""
        avail_actions = env.get_avail_actions(self.agent_id)
        if avail_actions:
            return np.random.choice(avail_actions)
        else:
            return 0
    
    def update_reward(self, reward: float):
        """更新累积奖励"""
        self.total_reward += reward
        self.step_count += 1


# 创建智能体团队
def create_agent_team(env):
    agents = {}
    for agent_id in env.agent_ids:
        agents[agent_id] = RandomAgent(agent_id)
    return agents


# 测试智能体团队
env = create_smac_env(map_name="3s", episode_limit=100)
agents = create_agent_team(env)

print("测试随机智能体团队...")

# 运行几个回合
episode_rewards = []
episode_lengths = []
wins = 0

for episode in range(3):
    observations = env.reset()
    step_count = 0
    
    while step_count < 100:
        # 获取所有智能体的动作
        actions = {}
        for agent_id in env.agent_ids:
            actions[agent_id] = agents[agent_id].choose_action(env, observations)
        
        # 执行动作
        observations, rewards, dones, infos = env.step(actions)
        
        # 更新智能体奖励
        for agent_id, reward in rewards.items():
            agents[agent_id].update_reward(reward)
        
        step_count += 1
        
        if any(dones.values()):
            print(f"  回合 {episode + 1}: 步数={step_count}, 平均奖励={np.mean(list(rewards.values())):.2f}")
            break
    
    # 统计回合结果
    episode_reward = np.mean([agent.total_reward for agent in agents.values()])
    episode_rewards.append(episode_reward)
    episode_lengths.append(step_count)
    
    # 重置智能体统计
    for agent in agents.values():
        agent.total_reward = 0.0
        agent.step_count = 0

print(f"\n随机智能体团队表现:")
print(f"- 平均回合奖励: {np.mean(episode_rewards):.2f} ± {np.std(episode_rewards):.2f}")
print(f"- 平均回合长度: {np.mean(episode_lengths):.1f} ± {np.std(episode_lengths):.1f}")
print(f"- 总回合数: {len(episode_rewards)}")

env.close()
print("✓ 随机智能体测试完成")

## 总结

本教程介绍了SMAC环境封装器的主要功能和使用方法：

1. **基础环境创建和使用**：展示了如何创建环境、重置、执行步骤
2. **CTDE兼容性**：演示了集中式训练分布式执行的支持
3. **多种地图**：支持多种SMAC标准地图
4. **配置管理**：介绍了不同难度和自定义配置
5. **性能测试**：评估了环境执行效率
6. **高级功能**：演示了动作掩码、观察分析等功能
7. **与原版对比**：展示了封装器版本与原版SMAC的兼容性

### 关键优势：
- ✅ **统一接口**: 与DEM、HRG、MSFS环境具有相同的方法
- ✅ **基于原版**: 使用经过验证的SMAC库作为底层引擎
- ✅ **CTDE兼容**: 支持主流多智能体强化学习算法
- ✅ **多种地图**: 支持所有SMAC标准地图和场景
- ✅ **动作掩码**: 支持智能体动作可用性检测
- ✅ **配置灵活**: 支持多种预设和自定义配置
- ✅ **向后兼容**: 可以与原版SMAC代码无缝切换

### 推荐使用场景：
- **QMIX/VDN算法测试**：使用CTDE版本进行集中式训练
- **算法性能基准**：作为标准测试环境评估新算法
- **教学演示**：用于多智能体强化学习的教学和演示
- **研究实验**：利用SMAC丰富的地图和单位配置进行研究
- **代码迁移**：从原版SMAC迁移到统一接口环境

SMAC环境封装器现在可以作为统一的MARL环境框架的一部分，与其他环境无缝集成使用！