This was made with reference to RollerBall for the following purposes.
- For measuring the learning time of ML-Agents
- For a sample (practice) of learning methods possible with ML-Agents
Enviroment | Version |
---|---|
Windows | 10 |
Python | 3.7.9 |
TensorFlow | 2.3.0 |
Unity | 2019.4.10f1 |
ML-Agents | Release6 |
Scene name | Agent | Sample yaml |
---|---|---|
SingleScene | 1 | RollerBall.yaml |
MultiScene | 4 | RollerBall.yaml |
The time is measured until the Mean Reward reaches 1.000. Since each measurement is performed once, a few seconds is within the margin of error.
UnityEditor/Build | CPU/GPU | Single/Multi | Time(s) | CPU Usage(%) | remarks |
---|---|---|---|---|---|
UnityEditor | CPU | Single | 68 | 25 | |
UnityEditor | GPU | Single | 71 | 25 | set GPU at Behavior |
Build | CPU | Single | 86 | 13 | *1 |
Multi run | CPU | Multi exe | 55 | 35 | --num-envs=4 |
Multi agents | CPU | Multi agents | 21 | 25 | four agents |
*1 The build version is slower than UnityEditor because Python and the build app are competing for CPU.
Conditions | Scene name | Trainer type | Action | Observation | Sample yaml |
---|---|---|---|---|---|
1 | SingleScene | PPO | Continuous | Vector | RollerBall.yaml |
2 | SingleScene | SAC | Continuous | Vector | SacEx.yaml |
3 | DiscreteScene | PPO | Discrete | Vector | DiscreteScene.yaml |
4 | VisualObservation | PPO | Continuous | Visual | VisualObservation.yaml |
5 | RaycastObservation | PPO | Discrete | Raycast | RaycastObservation.yaml |
The time and number of steps are measured until the Mean Reward reaches 1.000. Since each measurement is performed once, a few seconds is within the margin of error.
Algorithm | Time(s) | Steps(k) | Remarks |
---|---|---|---|
PPO | 68 | 13 | |
SAC | 191 | 121 | |
PPO | 159 | 31 | |
SAC | 1177 | 138 | |
PPO | 1318 | 236 | *1 |
*1 It took too long to learn, so I stopped halfway through.
You can choose whether are curious or not by switching yaml files. And you can use the demo file I made. If you'd like to make new demo file, check below the explanation.
Conditions | Scene name | Curiosity | Sample yaml | Demo file |
---|---|---|---|---|
1 | Curiosity | Yes | CuriosityEx.yaml | No need |
2 | Curiosity | No | NoCuriosityEx.yaml | No need |
3 | Imitation | Yes | Immitation.yaml | ImitationEx.demo |
The learning agent will be rewarded for one lap counterclockwise. If you can learn well, you will be able around high speed.
The demo is made by turning it counterclockwise for about 5 laps by my own operation.
Oragne: curios, Blue: no curiosity
Curious agent can learn quickly. No cuirosity agent can NOT learn in this time.But it can learn to around by chance for long time.
This time it's a simple enviroment that just around, so from a certain point you can see that the reward for curiosity is decreasing.
Light blue: with imitation, Blue: No imitation
You can see that learning is faster with imitation.
You can see that both Extrinsic and Curiosity are faster and better rewarded with imitation.