Skip to content

This was made with reference to RollerBall for the following purposes. For measuring the learning time of ML-Agents and for a sample (practice) of learning methods possible with ML-Agents

machidyo/RollerBall

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RollerBall

This was made with reference to RollerBall for the following purposes.

  1. For measuring the learning time of ML-Agents
  2. For a sample (practice) of learning methods possible with ML-Agents

Prerequisite

Enviroment Version
Windows 10
Python 3.7.9
TensorFlow 2.3.0
Unity 2019.4.10f1
ML-Agents Release6

Single or Multi and other options

Scenes

Scene name Agent Sample yaml
SingleScene 1 RollerBall.yaml
MultiScene 4 RollerBall.yaml

Learning time measurement results

The time is measured until the Mean Reward reaches 1.000. Since each measurement is performed once, a few seconds is within the margin of error.

UnityEditor/Build CPU/GPU Single/Multi Time(s) CPU Usage(%) remarks
UnityEditor CPU Single 68 25
UnityEditor GPU Single 71 25 set GPU at Behavior
Build CPU Single 86 13 *1
Multi run CPU Multi exe 55 35 --num-envs=4
Multi agents CPU Multi agents 21 25 four agents

*1 The build version is slower than UnityEditor because Python and the build app are competing for CPU.

Trainer types, actions and observations

Conditions

Conditions Scene name Trainer type Action Observation Sample yaml
1 SingleScene PPO Continuous Vector RollerBall.yaml
2 SingleScene SAC Continuous Vector SacEx.yaml
3 DiscreteScene PPO Discrete Vector DiscreteScene.yaml
4 VisualObservation PPO Continuous Visual VisualObservation.yaml
5 RaycastObservation PPO Discrete Raycast RaycastObservation.yaml

Mesurement results

The time and number of steps are measured until the Mean Reward reaches 1.000. Since each measurement is performed once, a few seconds is within the margin of error.

Algorithm Time(s) Steps(k) Remarks
PPO 68 13
SAC 191 121
PPO 159 31
SAC 1177 138
PPO 1318 236 *1

*1 It took too long to learn, so I stopped halfway through.

Curios or no curiosity or Imitation

conditions

You can choose whether are curious or not by switching yaml files. And you can use the demo file I made. If you'd like to make new demo file, check below the explanation.

Conditions Scene name Curiosity Sample yaml Demo file
1 Curiosity Yes CuriosityEx.yaml No need
2 Curiosity No NoCuriosityEx.yaml No need
3 Imitation Yes Immitation.yaml ImitationEx.demo

About learning in this time

The learning agent will be rewarded for one lap counterclockwise. If you can learn well, you will be able around high speed.

Demo file for imitation

The demo is made by turning it counterclockwise for about 5 laps by my own operation.

Resutls

Curiosity

Oragne: curios, Blue: no curiosity

Curious agent can learn quickly. No cuirosity agent can NOT learn in this time.But it can learn to around by chance for long time.

This time it's a simple enviroment that just around, so from a certain point you can see that the reward for curiosity is decreasing.

Imitation

Light blue: with imitation, Blue: No imitation

You can see that learning is faster with imitation.

You can see that both Extrinsic and Curiosity are faster and better rewarded with imitation.

About

This was made with reference to RollerBall for the following purposes. For measuring the learning time of ML-Agents and for a sample (practice) of learning methods possible with ML-Agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages