# HRG (Heterogeneous Resource Gathering) Environment Tutorial

This notebook provides a tutorial for the HRG environment where heterogeneous agents must cooperate to gather resources and deposit them at a base.

## Overview

The HRG environment features:
- **Heterogeneous agents**: Scouts, Workers, and Transporters with different abilities
- **Resource collection**: Gold (high value) and Wood (low value)
- **Cooperative gameplay**: Multiple agents must work together for efficiency
- **Complex observations**: Local vision with agent-specific information
- **Fast versions**: Optimized configurations for quick training

In [1]:
import sys
import os
sys.path.append(os.path.abspath('..'))

import numpy as np
from HRG.env_hrg_ultra_fast import create_hrg_ultra_fast_env
from HRG.env_hrg_ultra_fast_ctde import create_hrg_ultra_fast_ctde_env

## Environment Setup

### Agent Types:
- **Scouts**: Fast movement, high vision range, no carrying capacity
- **Workers**: Moderate speed, can gather and carry resources
- **Transporters**: High carrying capacity, resource transfer specialists

### Resources:
- **Gold**: High value (10.0), clustered together
- **Wood**: Low value (2.0), more scattered

In [4]:
# Create ultra-fast environment (recommended for training)
env = create_hrg_ultra_fast_env()

## Action and Observation Spaces

In [6]:
# Reset environment and examine spaces
obs = env.reset()

print("Action Space:")
print(f"  Actions: 0=Stay, 1=Up, 2=Down, 3=Left, 4=Right, 5=Gather/Deposit/Transfer")

print("\nObservation Space:")
for agent_id, observation in obs.items():
    print(f"  {agent_id}: shape={observation.shape}, range=[{observation.min():.3f}, {observation.max():.3f}]")
    
    # Show observation breakdown for first agent
    if agent_id == list(obs.keys())[0]:
        print(f"    Contains: position, resources, base_info, inventory, energy, etc.")

Action Space:
  Actions: 0=Stay, 1=Up, 2=Down, 3=Left, 4=Right, 5=Gather/Deposit/Transfer

Observation Space:
  worker_0: shape=(24,), range=[0.000, 1.000]
    Contains: position, resources, base_info, inventory, energy, etc.
  transporter_0: shape=(24,), range=[0.000, 1.000]


## Reward System

### Reward Components:
- **Gathering**: 10% × resource value
- **Transferring**: 5% × resource value
- **Depositing**: 50% × resource value
- **Step penalty**: -0.001 (minimal)
- **Diversity bonus**: Additional rewards for balanced resource collection

## Usage Example

In [9]:
# Run a sample episode
def run_episode(env, max_steps=20):
    obs = env.reset()
    total_reward = 0
    
    print(f"Starting episode with {len(env.game_state.agents)} agents")
    
    for step in range(max_steps):
        # Get random actions
        actions = {}
        for agent_id in env.agent_ids:
            avail_actions = env.get_avail_actions(agent_id)
            actions[agent_id] = np.random.choice(avail_actions)
        
        # Execute step
        obs, rewards, dones, info = env.step(actions)
        
        step_reward = list(rewards.values())[0]
        total_reward += step_reward
        
        # Show progress
        if step % 5 == 0:
            total_resources = sum(len(agent.inventory) for agent in env.game_state.agents.values())
            print(f"  Step {step:2d}: Reward={step_reward:6.3f}, Carried={total_resources}, Done={any(dones.values())}")
        
        if any(dones.values()):
            break
    
    print(f"\nEpisode completed: Total reward = {total_reward:.3f}")
    return total_reward

# Run episode
reward = run_episode(env)

Starting episode with 2 agents
  Step  0: Reward= 0.050, Carried=4, Done=False
  Step  5: Reward= 0.033, Carried=4, Done=False
  Step 10: Reward=-0.035, Carried=4, Done=False
  Step 15: Reward=-0.070, Carried=4, Done=False

Episode completed: Total reward = -0.875


## CTDE Integration

In [10]:
# Create CTDE environment
ctde_env = create_hrg_ultra_fast_ctde_env()

obs = ctde_env.reset()
global_state = ctde_env.get_global_state()

print("CTDE Environment:")
print(f"  Global state dimension: {len(global_state)}")
print(f"  Agent IDs: {ctde_env.agent_ids}")
print(f"  N agents: {ctde_env.n_agents}")

ctde_env.close()

CTDE Environment:
  Global state dimension: 22
  Agent IDs: ['worker_0', 'transporter_0']
  N agents: 2


## Key Features

### 1. Role Specialization
- Agents have different capabilities encouraging role emergence
- Scouts explore and find resources
- Workers gather resources efficiently
- Transporters move resources between agents

### 2. Fast Training Configurations
- **Ultra-fast**: 2 agents, 6x6 grid, minimal complexity
- **Fast**: 6 agents, 10x10 grid, moderate complexity
- **Normal**: Full 6-agent team, standard complexity

### 3. MARL Integration Ready
- Compatible with QMIX, VDN, MADDPG algorithms
- CTDE interface for centralized training
- Unified observation format across all environments

In [11]:
# Clean up
env.close()
print("\nHRG Environment Tutorial completed!")
print("\nKey takeaways:")
print("  1. Use ultra_fast configuration for quick training")
print("  2. Agents naturally specialize into roles")
print("  3. Cooperation needed for efficient resource collection")
print("  4. Rich reward system encourages balanced strategies")


HRG Environment Tutorial completed!

Key takeaways:
  1. Use ultra_fast configuration for quick training
  2. Agents naturally specialize into roles
  3. Cooperation needed for efficient resource collection
  4. Rich reward system encourages balanced strategies
