GitHub - leehe228/LogisticsEnv: UAV Logistics Environment for Multi-Agent Reinforcement Learning / Unity ML-Agents / Unity 3D

UAV Logistics Environment for MLRL

This UAV Logistics Environment with a continuous observation and discrete action space, along with physical based UAVs and parcels which powered by Unity Engine. Used in Paper "Multiagent Reinforcement Learning Based on Fusion-Multiactor-Attention-Critic for Multiple-Unmanned-Aerial-Vehicle Navigation Control"(MDPI Energies 2022, 15(19), 7426 (SCIE), 2022.10.10.) and "Multi-agent Reinforcement Learning-Based UAS Control for Logistics EnvironmentsMulti-agent Reinforcement Learning-Based UAS Control for Logistics Environments"(Springer LNEE, volume 913 (SCOPUS). 2022.09.30.)

📢 Upgrading Environment and Transitioning to Issac Sim

The Unity MLAgents, PyTorch, and CUDA versions in this LogisticsEnv are very old and incompatible with modern GPUs and OS, so I am in the process of upgrading dependencies and this environment. I am also in the process of transitioning to the Issac Sim environment.

📌 LogisticsEnv Builds Release (1.0.0)

(2024. 3. 11.)

📌 Trained Model

Download - MAAC Trained Model files(.pt)

model_path = '~~/<trained_model_name>.pt'  # write model path
model = AttentionSAC.init_from_save(model_path) # load model data from saved file

Requirements

Python 3.8 (minimum)
OpenAI baselines, version 0.1.6
pytorch, < 1.9.0, >= 1.6.0 (compatible to your CUDA version)
tensorboard (compatible to PyTorch version)
OpenAI gym, version 0.15.7
Unity mlagents, version 0.27.0 (Release 18)

My Environments

Ubuntu 20.04 LTS / Python 3.8.10 / Anaconda 3.1 (Virtual Environment)
NVIDIA GeForce RTX 3070 / Intel(R) Core(TM) i5-10400F (@2.90GHz) / 32GB Memory (RAM)
CUDA 11.1 / CuDNN 8.1.1
torch 1.8.2+cu111 / torchaudio 0.8.2 / torchvision 0.9.2+cu111
It took 13.9 days to train MAAC model with 150K episodes.

Unity Editor

Unity Editor, version 2020.3.x (LTS) (minimum)
Unity ML-Agents Package Release 2.1.0 exp1 (Not compatible with other versions)

Getting Started

install requirements packages (in a virtual-env)
git clone https://github.com/dmslab-konkuk/LogisticsEnv.git
cd MAAC or cd MADDPG
edit parameters in main.py (learning parameters)
Followed by your OS, select built environment file between Build_Windows or Build_Linux (give right path)
if your OS is Linux(Ubuntu), before training grant permission is essential, with sudo chmod a+xwr /Build_Linux/Logistics.x86_64
python main.py to run training
To replay with your trained model, set path to the model.pt on replay.py and python replay.py to replay

Tensorboard

MAAC/models/Logistics/MAAC/ or MADDPG/models/Logistics/MADDPG
tensorboard --logdir=runX
open localhost:6006

Parcel Counter

MAAC/CSV/countXXXX.csv : number of successfully shipped parcel is written in this csv file. (XXXX must be yyyyMMddHHmmss of training start time)
first row is number of small box, second is number of big box, third is sum of both.

Timer

MAAC/CSV/timerXXXX.csv : spent time to finish shipping given boxes. (finishing condigion follows max_smallbox and max_bigbox parameters)
if UAVs failed to ship all of given parcels, a time-written line will be not appended.
the line is milli-second. (1Kms is a second)

Scenario

UAVs need to move parcels(boxes) from hubs to destinations.
There are two types of boxes - big boxes and small boxes. A big box can only be moved if two or more UAVs cooperate.
The size of map and the number of UAVs and obstacles can be customized for various environments.
When UAV touches the box, the box is connected. If UAV is farther than a certain distance from the box, the box will fall off.

Used Algorithm

Multi-Actor-Attention-Ctritic (MAAC) from Actor-Attention-Critic for Multi-Agent Reinforcement Learning (Iqbal and Sha, ICML 2019)
Multi-Agent DDPG (MADDPG)

Python API

Gym Functions

This Logistics Environment follows OpenAI Gym API design :

from UnityGymWrapper5 import GymEnv - import class (newest version is Wrapper5)
env = GymEnv(name="path to Unity Environment", ...) - Returns wrapped environment object.
obs = reset() - Resets environment to the initial state. Returns initial observation.
obs, reward, done, info = step(actions) - A single step. Require actions, returns observation, reward, done, information list.

example

from UnityGymWrapper5 import GymEnv # Unity Gym Style Wrapper
env = GymEnv(name="../Build_Linux/Logistics") # Call Logistics Environment
done, obs = False, env.reset() # reset Environment

while not done:
    actions = get_actions(obs) # get actions
    next_obs, reward, done, info = env.step(actions) # next step
    obs = next_obs

Unity Gym Wrapper This Wrapper can wrap Unity ML-Agents Environment (API version 2.1.0 exp1, mlagents version 0.27.0) which has multiple Discrete-Action-Agent.

GymWrapper provided by Unity supports only single agent environment. UnityGymWrapper5.py is in Github Repository.

Parameter Configurations env = GymEnv(name='', width=0, height=0, ...)

width : Defines the width of the display. (Must be set alongside height)
height : Defines the height of the display. (Must be set alongside width)
timescale : Defines the multiplier for the deltatime in the simulation. If set to a higher value, time will pass faster in the simulation but the physics may perform unpredictably.
quality_level : Defines the quality level of the simulation.
target_frame_rate : Instructs simulation to try to render at a specified frame rate.
capture_frame_rate : Instructs the simulation to consider time between updates to always be constant, regardless of the actual frame rate.
name : path to Unity Built Environment (ex : ../Build_Linux/Logistics)
mapsize : size of map in virtual environment (x by x)
numbuilding : number of buildings (obstacle)
max_smallbox : max number of small box will be generated
max_bigbox : max number of big box will be generated

Observation

Observation size for each agent

29 + 7 x (nagent - 1) + (27 : ray-cast obs)

This UAV Information

3 : (x, y, z) coordinates of this UAV
3 : (x, y, z) velocity of this UAV
3 : one hot encoding of box type (not holding, small box, big box)
7 x (n - 1) : other UAVs information (3 - coordinates, 1 - distance, 3 - box type one-hot encoding)
6 : (x, y, z, x, y, z) big box hub and small box hub coordinates
2 : each distance from this to the big box hub and small box hub
6 : (x, y, z, x, y, z) nearest big and small box coordinates (if there's no box nearby, zero)
2 : each distance from this to the nearest big box and the nearest small box (if there's no box nearby, zero)
4 : (x, y, z, d) if UAV holds any box, the coordinates and distances are given. if not, zero

Raycast Observation (from Unity ML-Agents)

1 - distance (0 if nothing is detected)
2 - one hot encoding of detected object (nothing, building)
(1 + 2) x 9 - rays per direction

Actions

UAV can move to 6 directions (up, down, forward, backward, left, right) or not move

The action is discrete action, and size of action set is 7.

index 0 : not move
index 1 : forward
index 2 : backward
index 3 : left
index 4 : right
index 5 : up
index 6 : down

Reward

Driving Reward

(pre distance - current distance) * 0.5

To make UAV learn driving forward destination, distance penalty is given per every step. If UAV holds any parcel, the distance is calculated with a destination where the parcel have to shipped. If UAV have to pick some parcel, distance between UAV and a big box or a small box, whichever is closer to UAV is calculated.

Shipping Reward

+20.0 : Pick up a small box
+20.0 : complete small box shipping
+10.0 : First UAV picks up a big box
+10.0 : First UAV which holds a big box when second UAV picks up
+20.0 : Second UAV picks up a big box
+30.0 : complete big box shipping
-8.0 : when the first UAV dropped a big box
-15.0 : when tow UAVs dropped a big box

These values are designed to make UAV work efficiently.

Collision Penalty

-10.0 : when UAV collide with another UAV or a building

UAV has to avoid buildings and another UAV with raycast observation.

Training Result

We trained model with random-decision model, reinforcement model (SAC, DQN, MADDPG) and MAAC (Multi-Attention-Actor-Critic for Multi-Agent) model. We trained 30k episode each model.

Credit

developed by Hoeun Lee (in DMS Lab in Dept. of Computer Science and Engineering, Konkuk University, Seoul, Korea)

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.vscode		.vscode
Assets		Assets
Build_Linux		Build_Linux
Build_Windows		Build_Windows
Library		Library
Logs		Logs
MAAC		MAAC
MADDPG		MADDPG
Packages		Packages
ProjectSettings		ProjectSettings
UserSettings		UserSettings
obj/Debug		obj/Debug
.gitignore		.gitignore
Assembly-CSharp-Editor.csproj		Assembly-CSharp-Editor.csproj
Assembly-CSharp.csproj		Assembly-CSharp.csproj
LICENSE.md		LICENSE.md
LogisticsEnv.sln		LogisticsEnv.sln
README.md		README.md
packages.txt		packages.txt

License

leehe228/LogisticsEnv

Folders and files

Latest commit

History

Repository files navigation

UAV Logistics Environment for MLRL

📢 Upgrading Environment and Transitioning to Issac Sim

📌 LogisticsEnv Builds Release (1.0.0)

📌 Trained Model

Requirements

My Environments

Getting Started

Scenario

Used Algorithm

Python API

Observation

Actions

Reward

Training Result

Credit

About

Topics

Resources

License

Stars

Watchers

Forks

Languages