MAS-DUO — Multi-Agent System for Dynamic Use and Optimization

Python implementation of the MAS-DUO multi-agent system described in the doctoral thesis:

"Improving the Decision Support in Shop Floor Operations by Using Agent-based Systems and Visibility Frameworks" Pablo García Ansola — University of Castilla-La Mancha (UCLM), 2024

The system models production and logistics environments (factory, airport) as PettingZoo AEC environments where multiple agent types with 4W state (What/Where/When/Why) collaborate to optimise orders/services under a configurable global policy.

Architecture
Project Structure
Installation
Core Concepts
- 4.1 4W State
- 4.2 Agent Types
- 4.3 BDI and MDP
- 4.4 IS Platform
- 4.5 Global Policy — Equation 12
- 4.6 Reward System
JSON Configuration
Environment API (PettingZoo AEC)
Pygame Renderer
Examples — Airport Scenario
- 8.1 airport_gh_check.py — Scenario Validation
- 8.2 airport_gh_demo.py — Greedy Policy Demo
Configuration Files
Module Reference

1. Architecture

MAS-DUO implements three architectural layers corresponding to the thesis chapters:

┌─────────────────────────────────────────────────────────────────┐
│                    IS PLATFORM (Layer 3)                         │
│   ERP (cost/capacity)  ·  CRM (QoS/clients)  ·  Expert System   │
│              MDP Proposal Negotiation                            │
│              Global Policy R(s,s')                               │
├────────────────────┬────────────────────────────────────────────┤
│  BDI Agent Loop    │          MDP Engine (Layer 2)               │
│  (Layer 1)         │                                             │
│  Beliefs (4W)  ─── │──► States   ──► Policy (A,B,C,D)           │
│  Desires           │   Actions   ──► Reward R(s,s')             │
│  Intentions   ─────│──► IS Proposal ──► Negotiation             │
├────────────────────┴────────────────────────────────────────────┤
│                   ENVIRONMENT (PettingZoo AEC)                   │
│   ProductAgent · WorkerAgent · RobotAgent · ConveyorAgent        │
│   GridWorld · OrderManager · LogisticsRenderer                   │
└─────────────────────────────────────────────────────────────────┘

Main flow per step:

The agent observes the environment (normalised 4W observation vector).
The BDI loop generates beliefs (GeneratedBelief) and selects an intention.
The MDP Engine evaluates the state transition and computes R(s,s').
If a proposal is pending, it is negotiated with the IS Platform.
The IS Platform may accept, reject, or propose a new global policy.
The updated policy is propagated to all product agents.

2. Project Structure

MAS-DUO/
├── config/
│   ├── factory_example.json        # Generic factory scenario
│   ├── airport_gh_example.json     # CRC Airport — 4 flights
│   └── airport_gh_demo.json        # CRC Airport — 8 flights (demo)
│
├── examples/
│   ├── env_check.py                # Factory environment validation
│   ├── airport_gh_check.py         # Airport scenario validation
│   └── airport_gh_demo.py          # Greedy demo with pygame renderer
│
├── logistics_env/
│   ├── __init__.py                 # Exports LogisticsMaEnv
│   ├── logistics_maenv.py          # Main PettingZoo AEC environment
│   ├── config_loader.py            # Loads and validates JSON configuration
│   ├── grid_world.py               # 2D grid with A* and occupancy tracking
│   │
│   ├── agents/
│   │   ├── base_agent.py           # BaseAgent: 4W state, BDI loop, MDP
│   │   ├── product_agent.py        # ProductAgent + ProductAction
│   │   ├── robot_agent.py          # RobotAgent + RobotAction
│   │   ├── worker_agent.py         # WorkerAgent + WorkerAction
│   │   └── conveyor_agent.py       # ConveyorAgent + ConveyorAction
│   │
│   ├── objects/
│   │   ├── epc.py                  # RFID EPC code (Pure Identity URI)
│   │   └── order_manager.py        # Work orders and tracking
│   │
│   ├── is_platform/
│   │   ├── __init__.py             # Exports ISPlatform, GlobalPolicy, PolicyParameters
│   │   ├── is_platform.py          # ISPlatform: ERP, CRM, Expert System
│   │   ├── global_policy.py        # GlobalPolicy and PolicyParameters
│   │   └── negotiation.py          # NegotiationProposal, NegotiationResult
│   │
│   └── rendering/
│       └── renderer.py             # LogisticsRenderer (Pygame)
│
├── train/
│   └── train_random.py             # Training with random policy
│
├── setup.py
└── requirements.txt

3. Installation

Requirements

Python 3.10+ (recommended 3.12+, tested with 3.14.2 arm64)
macOS / Linux

Virtual environment (recommended)

# From the project root
python -m venv .venv
source .venv/bin/activate

pip install -e .

Main dependencies

Package	Minimum version	Purpose
`pettingzoo`	≥ 1.24	AEC multi-agent environment framework
`gymnasium`	≥ 0.29	Observation/action spaces
`numpy`	≥ 1.24	Observation arrays
`pygame`	≥ 2.5	2D rendering (optional)

macOS Apple Silicon note: Ensure you use the native arm64 interpreter. The x86_64-based conda RL environment will raise libffi.8.dylib errors incompatible with arm64.

4. Core Concepts

4.1 4W State

All agents in the system maintain a 4W state (inspired by the EPCIS standard):

Dimension	Description	Example
What (`what`)	Object identity (EPC URI or ID)	`urn:epc:id:sgtin:0614150.B737.001`
Where (`where`)	Grid position `(x, y)`	`Position(x=5, y=3)`
When (`when`)	Current simulation step	`step=15`
Why (`why`)	Business state (BusinessStep)	`PhysicalBDIState.IN_TRANSIT`

In code:

# 4W state of any agent
agent.state.what   # → str (identity)
agent.state.where  # → Position(x, y)
agent.state.when   # → int (step)
agent.state.why    # → str (BDI state)
agent.state.zone_id  # → str ("PARK1", "TERMINAL", …)

4.2 Agent Types

ProductAgent — Flights / products

Represents the physical object that needs to be processed (in the airport scenario: the flight; in the factory scenario: the product). Implements the full Physical BDI loop.

Actions (ProductAction):

Action	Value	Description
`WAIT`	0	Wait without moving
`REQUEST_MOVE`	1	Request advance to the next route waypoint
`REQUEST_PROCESS`	2	Request processing at the current zone
`SIGNAL_READY`	3	Signal ready for dispatch

RobotAgent — Equipment / AGVs

Represents robots, guided vehicles, or ground handling equipment. Has a battery that recharges at a designated charging zone.

Actions (RobotAction):

Action	Value	Description
`IDLE`	0	No action
`MOVE_NORTH/SOUTH/EAST/WEST`	1-4	Cardinal movement
`LIFT`	5	Pick up product in the same cell
`DROP`	6	Deposit product
`CHARGE`	7	Recharge battery
`NAVIGATE`	8	Navigate via A* towards assigned target

WorkerAgent — Operational personnel

Represents human workers (ramp agents, supervisors, crew chiefs).

Actions (WorkerAction):

Action	Value	Description
`IDLE`	0	No action
`MOVE_NORTH/SOUTH/EAST/WEST`	1-4	Cardinal movement
`PICK`	5	Pick up product
`PLACE`	6	Deposit product
`PROCESS`	7	Process product at current zone
`SCAN`	8	Scan zone (RFID/EPCIS)
`REST`	9	Rest (recover energy)

ConveyorAgent — Belt conveyors

Represents baggage belts, production lines, or automated conveyors.

Actions (ConveyorAction):

Action	Value	Description
`STOP`	0	Stop the belt
`RUN`	1	Normal speed
`RUN_FAST`	2	High speed (higher energy consumption)
`REVERSE`	3	Reverse direction

4.3 BDI and MDP

Each ProductAgent (and optionally other agent types) implements a BDI (Belief-Desire-Intention) loop over a MDP (Markov Decision Process):

Observation → Beliefs (GeneratedBelief) → Desires → Intentions
                         ↓
              MDPEngine.step(state, action)
                         ↓
              Reward R(s,s') = f(A, B, C, D)
                         ↓
              IS Proposal (if applicable) → NegotiationProposal

Exported BDI structures:

from logistics_env.agents import (
    BusinessStep,       # Enum of business steps (IATA)
    PhysicalBDIState,  # Enum of physical BDI states
    GeneratedBelief,   # Belief generated by the agent
    BDIContext,        # Full context for a BDI round
)

4.4 IS Platform (ERP / CRM / Expert System)

The ISPlatform acts as the central arbiter of the system. It evaluates agent proposals and can modify the global policy at runtime.

Three sub-components:

Module	Function	Key parameters
ERP	Controls costs and capacity	`max_cost`, `max_delay`, `capacity`
CRM	Manages QoS and client priority	`min_qos`, `client_priority`
Expert System	Evaluates minimum acceptable reward	`min_reward`

API:

from logistics_env.is_platform import ISPlatform, GlobalPolicy, PolicyParameters

# Instantiate with initial policy
policy = GlobalPolicy(
    mode    = PolicyMode.STATIC,
    initial = PolicyParameters(A=0.5, B=0.4, C=0.0, D=0.1),
)
is_platform = ISPlatform(global_policy=policy)

# Configure sub-components
is_platform.configure_erp(max_cost=200.0, max_delay=5.0, capacity=1.0)
is_platform.configure_crm(min_qos=0.0, client_priority=0.5)
is_platform.configure_expert(min_reward=-5.0)

# Evaluate a proposal (called automatically by the environment)
result = is_platform.evaluate(proposal, step=15)
# result.outcome → NegotiationOutcome.APPROVED / REJECTED
# result.new_policy → PolicyParameters or None

# Platform status
summary = is_platform.get_summary()
# → {"total_negotiations": N, "approved": N, "approval_rate": 0.87, ...}

4.5 Global Policy — Equation 12

The global reward function (Equation 12 of the thesis) weights four dimensions:

R(s, s') = A·Delay + B·Cost + C·QoS + D·Energy

Parameter	Symbol	Description
Delay weight	`A`	Penalty for delays. Typical value: 0.5
Cost weight	`B`	Economic cost of operations. Typical value: 0.4
QoS weight	`C`	Quality of service / client preference. 0.0 in Common Use Model (no airline preference)
Energy weight	`D`	Energy efficiency. Typical value: 0.1

The values A=0.5, B=0.4, C=0.0, D=0.1 are those obtained in the thesis for the Ciudad Real Central Airport scenario (Common Use Model, no airline preference).

In the JSON configuration:

"global_policy": {
  "mode": "static",
  "initial": { "A": 0.5, "B": 0.4, "C": 0.0, "D": 0.1 },
  "scheduled_changes": []
}

The policy can change dynamically during the simulation via scheduled_changes:

"scheduled_changes": [
  {"step": 50, "A": 0.3, "B": 0.5, "C": 0.1, "D": 0.1}
]

4.6 Reward System

The environment emits per-agent rewards at each step:

Event	Reward	JSON parameter
On-time delivery	`+200.0`	`on_time_delivery`
Full order (bonus)	`+100.0`	`full_order_bonus`
Partial delivery	`+50.0`	`partial_order_bonus`
Late penalty	`-10.0` / step	`late_penalty_per_step`
Energy consumption	`-0.1` × units	`energy_penalty`
Wrong route	`-20.0`	`wrong_route_penalty`
Idle action	`-1.0`	`idle_penalty`

5. JSON Configuration

The entire scenario is defined in a single JSON file. Full structure:

{
  "factory": {
    "id":             "Unique scenario ID",
    "name":           "Descriptive name",
    "company_prefix": "GS1 Company Prefix (for EPCs)",

    "grid": {
      "width": 20,   "height": 15,
      "cell_size_meters": 5.0
    },

    "zones": [
      {
        "id": "PARK1", "name": "Stand P1",
        "x": 4, "y": 2, "width": 4, "height": 5,
        "type": "process"
      }
    ],

    "conveyor_belts": [
      {
        "id": "BAGBELT-P1",
        "path": [{"x": 5, "y": 6}, {"x": 5, "y": 7}],
        "speed_cells_per_step": 1,
        "energy_per_step": 0.3,
        "capacity": 5
      }
    ],

    "robots": [
      {
        "id": "PUSHBACK-1",
        "start_position": {"x": 1, "y": 2},
        "energy_capacity": 500.0,
        "energy_per_move": 2.0,
        "energy_per_lift": 5.0,
        "speed_cells_per_step": 2,
        "carrying_capacity": 1,
        "recharge_rate": 10.0,
        "recharge_threshold": 50.0
      }
    ],

    "workers": [
      {
        "id": "RAMP-1",
        "start_position": {"x": 1, "y": 2},
        "role": "ramp_agent",
        "energy_per_action": 0.5,
        "fatigue_rate": 0.01,
        "rest_threshold": 0.3,
        "speed_cells_per_step": 2,
        "allowed_zones": ["HANGAR", "PARK1", "PARK2"]
      }
    ],

    "product_types": [
      {
        "item_reference": "B737",
        "name": "Boeing 737-800",
        "processing_steps": ["TRANSIT", "PARK1", "TERMINAL"],
        "process_time_steps": 30,
        "requires_processing": true,
        "size": 5,
        "weight_kg": 70000.0
      }
    ],

    "orders": [
      {
        "order_id": "VY-1234",
        "products": [{"item_reference": "B737", "quantity": 1}],
        "deadline_steps": 30,
        "priority": "high",
        "destination": "TERMINAL"
      }
    ],

    "reward_weights": {
      "on_time_delivery": 200.0,
      "full_order_bonus": 100.0,
      "partial_order_bonus": 50.0,
      "late_penalty_per_step": -10.0,
      "energy_penalty": -0.1,
      "wrong_route_penalty": -20.0,
      "idle_penalty": -1.0
    },

    "sim_params": {
      "max_steps": 200,
      "step_duration_seconds": 60
    },

    "global_policy": {
      "mode": "static",
      "initial": {"A": 0.5, "B": 0.4, "C": 0.0, "D": 0.1},
      "scheduled_changes": []
    },

    "is_platform": {
      "max_allowed_cost": 200.0,
      "max_allowed_delay": 5.0,
      "production_capacity": 1.0,
      "min_qos_threshold": 0.0,
      "client_priority": 0.5,
      "min_reward_threshold": -5.0
    }
  }
}

Available zone types:

Type	Airport term	Renderer colour
`input`	Entry / Reception	Light green
`storage`	Storage / Hangar	Light blue
`process`	Stand / Processing zone	Cream
`output`	Terminal / Departure	Sky blue
`charging`	Charging zone	Yellow
`taxiway`	Taxiways	Asphalt grey
`stand`	Parking stand	Cream
`gate`	Boarding gate	Light blue
`apron`	Apron / Ramp	Light grey
`hangar`	GH Hangar	Mauve

6. Environment API (PettingZoo AEC)

The environment follows the PettingZoo AEC (Agent-Environment-Cycle) API:

from logistics_env import LogisticsMaEnv

# Create environment
env = LogisticsMaEnv(
    config_path  = "config/airport_gh_demo.json",
    render_mode  = "human",   # "human" | "rgb_array" | None
)

# Reset
env.reset(seed=42)

# Simulation loop
while env.agents:
    agent_id = env.agent_selection
    obs, reward, terminated, truncated, info = env.last()

    if terminated or truncated:
        env.step(None)
        continue

    action = env.action_space(agent_id).sample()  # or your policy
    env.step(action)

# State snapshot
snapshot = env.state_snapshot
# → {"step": N, "products": {...}, "robots": {...}, "orders": {...}, ...}

env.close()

Observation spaces

Agent type	Shape	Description
`product`	`(6,)` float32 [0,1]	Normalised position, route progress, deadline, energy
`worker`	`(6,)` float32 [0,1]	Position, energy, fatigue, zone, previous action
`robot`	`(5,)` float32 [0,1]	Position, battery, load, speed
`conveyor`	`(5,)` float32 [0,1]	State, occupancy, speed, energy

`state_snapshot`

Property returning the complete serialisable state:

snapshot = env.state_snapshot
# {
#   "step": 15,
#   "products":  { "urn:epc:...": {"what": ..., "where": ..., "when": ..., "why": ..., "zone_id": ...} },
#   "workers":   { "RAMP-1":      {...} },
#   "robots":    { "PUSHBACK-1":  {...} },
#   "conveyors": { "BAGBELT-P1":  {...} },
#   "orders":    { "VY-1234":     {"status": "complete", "dispatched": 1, "needed": 1, "deadline": 30, "priority": "high", "product_type": "B737"} },
#   "total_energy_consumed": 831.0,
#   "is_platform": {"total_negotiations": 5, "approved": 4, ...},
#   "global_policy": {"A": 0.5, "B": 0.4, "C": 0.0, "D": 0.1, "mode": "static"},
# }

7. Pygame Renderer

LogisticsRenderer provides real-time visualisation of the environment.

Instantiation

from logistics_env.rendering.renderer import LogisticsRenderer

renderer = LogisticsRenderer(
    factory_cfg   = env.factory_cfg,
    fps           = 3,              # 3 FPS = human-readable speed
    title_prefix  = "MAS-DUO",
)

Render call

renderer.render(
    grid      = env._grid,
    products  = env._products,
    workers   = env._workers,
    robots    = env._robots,
    conveyors = env._conveyors,
    step      = env._step_count,
    order_mgr = env._order_mgr,
    mode      = "human",      # "human" | "rgb_array"
    fps       = 3,            # per-frame FPS override
    extra_info = {            # additional info for the panel
        "scenario": "CRC Airport — Demo",
        "solver":   "Greedy EDF",
        "policy":   "A=0.5 B=0.4 C=0.0 D=0.1",
        "reward":   287.07,
    },
)

Visual legend

Element	Colour	Representation
`storage` / hangar zones	Blue/mauve	Rectangle with border
`process` / stand zones	Cream/yellow	Rectangle
`output` / terminal zones	Sky blue	Rectangle
`input` / taxiway zones	Light green / grey	Rectangle
Robots / GH equipment	Blue `(60,100,200)`	Rounded rectangle
Workers / personnel	Green `(60,160,60)`	Rounded rectangle
Supervisors / chiefs	Dark green `(30,130,30)`	Rounded rectangle
Products / flights	Orange `(230,120,20)`	Circle with label
Completed products	Green `(100,200,120)`	Circle
Baggage belts	Dark grey `(100,100,100)`	Row of cells with arrow
Low battery (<20%)	Red border	Border on robot

Side panel

The right-hand panel (360 px) displays in real time:

Header: scenario name, simulated clock (T+hh:mm), step, cumulative reward, active solver
FLIGHTS: status of each order — time remaining to deadline (T-XX), completed (OK), delayed (DELAY), priority [H/N/L], progress bar
EQUIPMENT: list of robots with battery bar and carried load
PERSONNEL: list of workers with energy level
BELTS: state (RUN/STOP) and products in transit

8. Examples — Airport Scenario

Both examples are based on Chapter 4.1 of the thesis, which describes Ciudad Real Central Airport (CRC) as a real-world MAS-DUO use case, with ground handling operations for B737-800 turnarounds within 30-45 minute windows.

Airport grid (20 x 15 cells, 5 m/cell = 100 x 75 m)

col ->  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19
row
  0  [ H  H  H  H | T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  ]  <- TRANSIT
  1  [ H  H  H  H | T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  ]    (Taxiway)
  2  [ H  H  H  H | P1 P1 P1 P1| P2 P2 P2 P2| P3 P3 P3 P3| P4 P4 P4 P4]
  3  [ H  H  H  H | P1 P1 P1 P1| P2 P2 P2 P2| P3 P3 P3 P3| P4 P4 P4 P4]
  4  [ H  H  H  H | P1 P1 P1 P1| P2 P2 P2 P2| P3 P3 P3 P3| P4 P4 P4 P4]
  5  [ H  H  H  H | P1 P1 P1 P1| P2 P2 P2 P2| P3 P3 P3 P3| P4 P4 P4 P4]
  6  [ H  H  H  H | P1 P1 P1 P1| P2 P2 P2 P2| P3 P3 P3 P3| P4 P4 P4 P4]
  7  [ H  H  H  H | TRM TRM TRM TRM TRM TRM TRM TRM TRM TRM TRM TRM TRM]
  8  [ H  H  H  H | TRM TRM TRM TRM TRM TRM TRM TRM TRM TRM TRM TRM TRM]
  ...                                (Passenger terminal)
 14  [ H  H  H  H | TRM TRM TRM TRM TRM TRM TRM TRM TRM TRM TRM TRM TRM]

  H   = HANGAR  (GH equipment base)
  T   = TRANSIT (taxiways / airside roads)
  P1-P4 = PARK1-PARK4 (parking stands)
  TRM = TERMINAL (boarding gates / departures)

ReadPoints and states (Equation 11 of the thesis)

ReadPoints (RP) = { HANGAR, PARK1, PARK2, PARK3, PARK4 }  -> 5 RPs

Resource BusinessSteps (BS_Resource):
  { Free, Busy, InTransit, NotAvailable }  -> 4 BS

Resource system state space:
  |S_resource| = |RP| x |BS_resource| = 5 x 4 = 20 states

IATA Flight BusinessSteps (BS_Flight):
  { OMS, STM, LOD, PAX, BAG, HDL, AGM, CGM, ... }  -> 8+ BS

Flight system state space:
  |S_flight| approx 4 stands x 10 BS = 40 states

8.1 `airport_gh_check.py` — Scenario Validation

File: examples/airport_gh_check.py Config: config/airport_gh_example.json (4 flights)

A validation and exploration script for the airport scenario. It does not optimise anything — it executes random actions to verify that all system components work correctly: config loading, agent creation, 4W states, IS negotiations, and order summaries.

Usage

python examples/airport_gh_check.py

Step-by-step walkthrough

Loads the configuration airport_gh_example.json and creates the environment.
Displays the airport grid (zones, sizes) using airport terminology (TRANSIT -> taxiways, PARK1-4 -> stands, TERMINAL -> boarding gates).
Lists the ReadPoints and BusinessSteps (Equation 11 of the thesis) with the 20 resource states and ~40 flight states.
Prints the initial state of all agents — type, observation shape, action space size:
- ProductAgent (flights: B737, A320, LCC, CARGO)
- RobotAgent (pushbacks, stairs, fuel trucks, cargo loaders, pax buses)
- WorkerAgent (ramp agents, supervisors, crew chiefs)
- ConveyorAgent (baggage belts gate to terminal)
Displays the initial airport state using real-world terminology:
- Active flights with type, position, and BDI state
- GH equipment with position and assigned zone
- Ramp personnel with position and role
- Active baggage belts
- Scheduled flights with deadline and status
IS Platform status — global policy, negotiations, approval rate.
Simulates 10 steps with random actions (env.action_space(agent).sample()), showing rewards and IS negotiations when they occur.
Final summary — energy consumed, flight-by-flight status, IS platform status.

Expected output (excerpt)

================================================================
  MAS-DUO -- Airport Ground Handling
  Airport: Ciudad Real Central Airport -- Ground Handling MAS-DUO
  GH Policy: R = 0.5*Delay + 0.4*Cost + 0.0*QoS + 0.1*Energy
  (Equation 12 -- Ciudad Real Central Airport)
================================================================

  Max steps    : 120 steps (120 minutes of operations)
  Step duration: 60 s/step (= 1 min/step)
  Scheduled flt: 4
  Total agents : 21

--------------------------------
  SYSTEM STATES (ReadPoint x BusinessStep -- Equation 11)

  GH resource ReadPoints:
    RP[HANGAR    ] -> 'GH Equipment Hangar'     (0,0) 4x15
    RP[PARK1     ] -> 'Gate 1 -- Stand'         (4,2) 4x5
    ...

  -> States per resource = 5 RP x 4 BS = 20 states (Equation 11)

  Flight BusinessSteps (IATA, Section 4.1.2):
    1. OMS  -- Organisation & Management System
    2. STM  -- Station Management System
    ...

--------------------------------
  FLIGHTS IN OPERATIONS  (Physical BDI Product Agents)

  FLT  urn:epc:id:sgtin:0614150.B737.0000001
       Type     : Boeing 737-800
       Position : (2, 1)  ->  Taxiways / Airside
       Step     : 0
       State    : CREATED

  ...

--------------------------------
  IS PLATFORM  (ERP-SITA / CRM / Expert System)

  Global policy (Equation 12 of the thesis):
    R(s,s') = 0.5*Delay + 0.4*Cost + 0.0*QoS + 0.1*Energy
      -> C=0.0 (no airline preference -- 'Common Use Model')

8.2 `airport_gh_demo.py` — Greedy Policy Demo

File: examples/airport_gh_demo.py Config: config/airport_gh_demo.json (8 flights)

Advanced demo with a Greedy EDF (Earliest Deadline First) policy and pygame rendering at human-readable speed. Displays the airport state in real time with a fully annotated side panel.

Usage

# Pygame demo at 3 FPS (human-readable speed)
python examples/airport_gh_demo.py

# Headless (console only, faster)
python examples/airport_gh_demo.py --headless

# Custom FPS, seed, and log frequency
python examples/airport_gh_demo.py --fps 2 --seed 100 --verbose-every 5

# Alternative config
python examples/airport_gh_demo.py --config config/airport_gh_example.json --headless

Arguments

Argument	Default	Description
`--config`	`config/airport_gh_demo.json`	Path to configuration JSON
`--headless`	False	Run without pygame window
`--seed`	`42`	Seed for reproducibility
`--fps`	`3`	Frames per second for pygame renderer
`--verbose-every`	`10`	Print console summary every N steps

Scenario — 8 concurrent flights

config/airport_gh_demo.json sets up a scenario with:

Flights (8 orders):

ID	Type	Deadline	Priority
`VY-1234`	B737-800	30 min	High
`IB-4567`	A320	35 min	High
`FR-8901`	Generic LCC	25 min	Normal
`UX-2345`	Cargo flight	60 min	Low
`VY-5678`	B737-800	40 min	High
`IB-9012`	A320	45 min	High
`W6-3456`	Generic LCC	30 min	Normal
`DHL-7890`	Cargo flight	55 min	Low

GH equipment (12 robots):

ID	Type	Energy
`PUSHBACK-1/2/3`	Pushback tractor	500 u
`STAIRS-1/2/3`	Hydraulic stairs	300 u
`FUEL-TRK-1/2`	Fuel truck	800 u
`CARGO-LOADER-1/2`	ULD Loader	400 u
`BUS-PAX-1/2`	Passenger bus	600 u

GH personnel (7 workers):

ID	Role
`RAMP-1` to `RAMP-5`	Ramp agents
`SUPERVISOR-1`	Operations supervisor
`CREW-CHIEF-1`	Ground crew chief

Baggage belts (4 conveyors): BAGBELT-P1 to BAGBELT-P4 — Gate N to Terminal (3-cell length each)

Greedy EDF policy — how it works

The GreedyGHPolicy class implements Earliest Deadline First with a priority-based urgency boost:

# Base urgency: inverse of remaining time until deadline
urgency = 1.0 / max(1, deadline - current_step)

# Order priority boost
if priority == "high":   urgency *= 2.0
if priority == "low":    urgency *= 0.5

Rules per agent type:

ProductAgent (flight):
- In target zone -> REQUEST_PROCESS (advance IATA step)
- Otherwise -> REQUEST_MOVE (request transfer)
- At end of route -> SIGNAL_READY
RobotAgent (GH equipment):
- Battery < 15% -> CHARGE
- Carrying a flight -> NAVIGATE towards destination zone, then DROP
- Idle -> find most urgent flight (EDF), NAVIGATE + LIFT
WorkerAgent (personnel):
- Energy < 20% -> REST
- Flight in same cell -> PROCESS (in target zone) or PICK
- Carrying a flight -> move towards destination zone (MOVE_*) and PLACE
- Otherwise -> SCAN (RFID scan for visibility)
ConveyorAgent (belt):
- Always -> RUN (maximum throughput)

Console output

================================================================
  MAS-DUO  Ground Handling Demo -- Ciudad Real Central Airport
  Policy: Greedy EDF (Earliest Deadline First)
  Thesis: Pablo Garcia Ansola (2024), Ch. 4.1 -- Equation 12
  Reward = 0.5*Delay + 0.4*Cost + 0.0*QoS + 0.1*Energy
================================================================

  Active agents: 31
  Flights (orders): 8
  Max steps: 200

  -- Step   0 | T+00:00 | Score: {'total': 8, 'done': 0, ...} --
    VY-1234      [H] 0/1            T-30
    IB-4567      [H] 0/1            T-35
    FR-8901      [N] 0/1            T-25
    UX-2345      [L] 0/1            T-60
    ...

  -- Step  20 | T+20:00 | Score: {'total': 8, 'done': 3, ...} --
    VY-1234      [H] 1/1 ##########  OK
    IB-4567      [H] 1/1 ##########  OK
    FR-8901      [N] 0/1            T-05
    ...

Final report

================================================================
  FINAL REPORT -- MAS-DUO Ground Handling Demo
================================================================
  Steps executed   : 30
  Cumulative reward: +287.07
  Total energy     : 831.00 units

  Completed flights   : 4/8
  On-time (deadline)  : 4/8
  Delayed             : 1/8

  Per-flight detail:
  OK  VY-1234    B737     complete   30    high
  OK  IB-4567    A320     complete   35    high
  !!  FR-8901    LCC      failed     25    normal
  ~   UX-2345    CARGO    pending    60    low
  ...

  Global Policy (Equation 12 -- MAS-DUO):
    A (Delay)  = 0.50
    B (Cost)   = 0.40
    C (QoS)    = 0.00  <- 0.0 = Common Use (no airline preference)
    D (Energy) = 0.10
================================================================

Exit code: 0 if all flights complete on time, 1 if any delay is detected.

9. Configuration Files

File	Flights	Grid	Resources	Use
`config/factory_example.json`	—	20x15	Generic products	Factory validation
`config/airport_gh_example.json`	4	20x15	8 robots, 5 workers, 3 belts	Airport validation (`check`)
`config/airport_gh_demo.json`	8	20x15	12 robots, 7 workers, 4 belts	Full demo with renderer

10. Module Reference

`logistics_env.logistics_maenv`

Main class LogisticsMaEnv(AECEnv):

Method / Property	Description
`reset(seed)`	Initialises the environment, creates all agents
`step(action)`	Executes the action for the current agent
`observe(agent)`	Returns the agent's observation
`last()`	`(obs, reward, term, trunc, info)` for the current agent
`state_snapshot`	Complete dict of all agents' 4W state
`render()`	Draws a pygame frame if `render_mode="human"`
`close()`	Shuts down the renderer
`_step_count`	Current simulation step
`_total_energy`	Cumulative total energy consumed
`factory_cfg`	`FactoryConfig` object (loaded from JSON)
`_products / _workers / _robots / _conveyors`	Agent dictionaries
`_order_mgr`	`OrderManager` with all order states
`_is_platform`	Active `ISPlatform`
`_global_policy`	Active `GlobalPolicy`

`logistics_env.config_loader`

from logistics_env.config_loader import load_factory_config, FactoryConfig

cfg = load_factory_config("config/airport_gh_demo.json")
cfg.name           # str
cfg.grid           # GridConfig(width, height, cell_size_meters)
cfg.zones          # List[ZoneConfig]
cfg.robots         # List[RobotConfig]
cfg.workers        # List[WorkerConfig]
cfg.conveyor_belts # List[ConveyorConfig]
cfg.product_types  # List[ProductTypeConfig]
cfg.orders         # List[OrderConfig]
cfg.reward_weights # RewardWeights
cfg.sim_params     # SimParams(max_steps, step_duration_seconds)
cfg.global_policy  # GlobalPolicyConfig(mode, A, B, C, D, scheduled_changes)
cfg.is_platform    # ISPlatformConfig

# Helpers
cfg.zone_by_id("PARK1")         # -> ZoneConfig or None
cfg.product_type_by_ref("B737") # -> ProductTypeConfig or None

`logistics_env.is_platform`

from logistics_env.is_platform import (
    ISPlatform,
    GlobalPolicy, PolicyMode, PolicyParameters,
    NegotiationProposal, NegotiationResult,
)

`logistics_env.grid_world`

from logistics_env.grid_world import GridWorld

grid = GridWorld(factory_cfg)
grid.zone_of(pos)          # -> ZoneConfig or None
grid.zone_center("PARK1")  # -> Position(x, y) or None
grid.astar(start, goal, agent_id)  # -> List[Position] or None
grid.is_occupied(x, y)     # -> bool
grid.conveyor_at(pos)      # -> str (conveyor_id) or None

Quick Reference — Commands

# Activate virtual environment
source .venv/bin/activate

# Validate airport scenario (4 flights, console only)
python examples/airport_gh_check.py

# Full demo with pygame renderer (8 flights, 3 FPS)
python examples/airport_gh_demo.py --fps 3

# Headless demo (8 flights, console only)
python examples/airport_gh_demo.py --headless --seed 42

# Custom demo
python examples/airport_gh_demo.py \
    --config config/airport_gh_demo.json \
    --fps 2 \
    --seed 100 \
    --verbose-every 5

# Validate generic factory environment
python examples/env_check.py

Licence and Authorship

Implementation based on the MAS-DUO architecture described in the doctoral thesis of Pablo Garcia Ansola (UCLM, 2024). The code is a reference implementation of the system described in Chapters 3 (BDI/MDP/IS architecture) and 4.1 (airport use case).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
config		config
examples		examples
logistics_env		logistics_env
train		train
.gitignore		.gitignore
README.md		README.md
Thesis2.99F.pdf		Thesis2.99F.pdf
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

MAS-DUO — Multi-Agent System for Dynamic Use and Optimization

Table of Contents

1. Architecture

2. Project Structure

3. Installation

Requirements

Virtual environment (recommended)

Main dependencies

4. Core Concepts

4.1 4W State

4.2 Agent Types

ProductAgent — Flights / products

RobotAgent — Equipment / AGVs

WorkerAgent — Operational personnel

ConveyorAgent — Belt conveyors

4.3 BDI and MDP

4.4 IS Platform (ERP / CRM / Expert System)

4.5 Global Policy — Equation 12

4.6 Reward System

5. JSON Configuration

6. Environment API (PettingZoo AEC)

Observation spaces

state_snapshot

7. Pygame Renderer

Instantiation

Render call

Visual legend

Side panel

8. Examples — Airport Scenario

Airport grid (20 x 15 cells, 5 m/cell = 100 x 75 m)

ReadPoints and states (Equation 11 of the thesis)

8.1 airport_gh_check.py — Scenario Validation

Usage

Step-by-step walkthrough

Expected output (excerpt)

8.2 airport_gh_demo.py — Greedy Policy Demo

Usage

Arguments

Scenario — 8 concurrent flights

Greedy EDF policy — how it works

Console output

Final report

9. Configuration Files

10. Module Reference

logistics_env.logistics_maenv

logistics_env.config_loader

logistics_env.is_platform

logistics_env.grid_world

Quick Reference — Commands

Licence and Authorship

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`state_snapshot`

8.1 `airport_gh_check.py` — Scenario Validation

8.2 `airport_gh_demo.py` — Greedy Policy Demo

`logistics_env.logistics_maenv`

`logistics_env.config_loader`

`logistics_env.is_platform`

`logistics_env.grid_world`

Packages