# Unity ML-Agents Toolkit
## Environment Basics
This notebook contains a walkthrough of the basic functions of the Python API for the Unity ML-Agents toolkit. For instructions on building a Unity environment, see [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Getting-Started-with-Balance-Ball.md).

### 1. Set environment parameters

Be sure to set `env_name` to the name of the Unity environment file you want to launch. Ensure that the environment build is in `../envs`.

In [1]:
##env_name = "../envs/3DBall"  # Name of the Unity environment binary to launch
env_name = None
train_mode = True  # Whether to run the environment in training or inference mode

### 2. Load dependencies

The following loads the necessary dependencies and checks the Python version (at runtime). ML-Agents Toolkit (v0.3 onwards) requires Python 3.

In [2]:
import matplotlib.pyplot as plt
import numpy as np
import sys

from mlagents.envs import UnityEnvironment

%matplotlib inline

print("Python version:")
print(sys.version)

# check Python version
if (sys.version_info[0] < 3):
    raise Exception("ERROR: ML-Agents Toolkit (v0.3 onwards) requires Python 3")

Python version:
3.6.7 |Anaconda, Inc.| (default, Oct 28 2018, 19:44:12) [MSC v.1915 64 bit (AMD64)]


### 3. Start the environment
`UnityEnvironment` launches and begins communication with the environment when instantiated.

Environments contain _brains_ which are responsible for deciding the actions of their associated _agents_. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [3]:
env = UnityEnvironment(file_name=env_name)

# Set the default brain to work with
default_brain = env.brain_names[0]
brain = env.brains[default_brain]


INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor.
INFO:mlagents.envs:
'LanderAcademy' started successfully!
Unity Academy name: LanderAcademy
        Number of Brains: 1
        Number of External Brains : 1
        Reset Parameters :
		
Unity brain name: LanderBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space size (per agent): 5
        Number of stacked Vector Observation: 1
        Vector Action space type: discrete
        Vector Action space size (per agent): [4]
        Vector Action descriptions: 


### 4. Examine the observation and state spaces
We can reset the environment to be provided with an initial set of observations and states for all the agents within the environment. In ML-Agents, _states_ refer to a vector of variables corresponding to relevant aspects of the environment for an agent. Likewise, _observations_ refer to a set of relevant pixel-wise visuals for an agent.

In [4]:
# Reset the environment
env_info = env.reset(train_mode=train_mode)[default_brain]

# Examine the state space for the default brain
print("Agent state looks like: \n{}".format(env_info.vector_observations[0]))

# Examine the observation space for the default brain
for observation in env_info.visual_observations:
    print("Agent observations look like:")
    if observation.shape[3] == 3:
        plt.imshow(observation[0,:,:,:])
    else:
        plt.imshow(observation[0,:,:,0])

Agent state looks like: 
[0. 0. 0. 0. 0.]


### 5. Take random actions in the environment
Once we restart an environment, we can step the environment forward and provide actions to all of the agents within the environment. Here we simply choose random actions based on the `action_space_type` of the default brain. 

Once this cell is executed, 10 messages will be printed that detail how much reward will be accumulated for the next 10 episodes. The Unity environment will then pause, waiting for further signals telling it what to do next. Thus, not seeing any animation is expected when running this cell.

In [5]:
for episode in range(1000):
    env_info = env.reset(train_mode=train_mode)[default_brain]
    done = False
    episode_rewards = 0
    while not done:
        action_size = brain.vector_action_space_size
        if brain.vector_action_space_type == 'continuous':
            env_info = env.step(np.random.randn(len(env_info.agents), 
                                                action_size[0]))[default_brain]
        else:
            action = np.column_stack([np.random.randint(0, action_size[i], size=(len(env_info.agents))) for i in range(len(action_size))])
            env_info = env.step(action)[default_brain]
        episode_rewards += env_info.rewards[0]
        done = env_info.local_done[0]
    print("Total reward this episode: {}".format(episode_rewards))

Total reward this episode: 36.409999802708626
Total reward this episode: 20.193999890238047
Total reward this episode: 24.376999985426664
Total reward this episode: 21.371999867260456
Total reward this episode: 13.9530000500381
Total reward this episode: 19.279999870806932
Total reward this episode: 8.939999908208847
Total reward this episode: 31.02399981021881
Total reward this episode: 23.562999863177538
Total reward this episode: 17.546999875456095
Total reward this episode: 18.01999992132187
Total reward this episode: 18.43799987435341
Total reward this episode: 23.62799984961748
Total reward this episode: 23.452999848872423
Total reward this episode: 27.8779998421669
Total reward this episode: 20.364999879151583
Total reward this episode: 32.05999983847141
Total reward this episode: 12.6029998883605
Total reward this episode: 33.38999991118908
Total reward this episode: 13.972999881953001
Total reward this episode: 18.519999869167805
Total reward this episode: 17.764999885112047
T

Total reward this episode: 7.160999920219183
Total reward this episode: 11.519999895244837
Total reward this episode: 32.18599983304739
Total reward this episode: 28.65399982780218
Total reward this episode: 7.35499994084239
Total reward this episode: 24.26699985936284
Total reward this episode: 20.56699986383319
Total reward this episode: 19.105999905616045
Total reward this episode: 40.46999981999397
Total reward this episode: 9.006999906152487
Total reward this episode: 7.495999924838543
Total reward this episode: 34.32999984920025
Total reward this episode: 5.895999923348427
Total reward this episode: 23.25199993699789
Total reward this episode: 31.39699985831976
Total reward this episode: 11.438999898731709
Total reward this episode: 34.27999983727932
Total reward this episode: 29.30699983611703
Total reward this episode: 14.233999889343977
Total reward this episode: 14.305999908596277
Total reward this episode: 38.589999839663506
Total reward this episode: 24.40899984911084
Total

Total reward this episode: 17.297999881207943
Total reward this episode: 29.512999836355448
Total reward this episode: 26.19099984690547
Total reward this episode: 7.156999915838242
Total reward this episode: 19.61499986425042
Total reward this episode: 12.155999898910522
Total reward this episode: 25.575999848544598
Total reward this episode: 10.671999901533127
Total reward this episode: 19.18099996075034
Total reward this episode: 13.721999883651733
Total reward this episode: 5.672999933362007
Total reward this episode: 19.006999868899584
Total reward this episode: 22.911999851465225
Total reward this episode: 37.249999821186066
Total reward this episode: 15.677999880164862
Total reward this episode: 19.20499987155199
Total reward this episode: 7.598999917507172
Total reward this episode: 37.90999981760979
Total reward this episode: 27.33799985051155
Total reward this episode: 7.668999917805195
Total reward this episode: 15.368999883532524
Total reward this episode: 28.5169998370111


Total reward this episode: 8.198999915271997
Total reward this episode: 18.92399987205863
Total reward this episode: 33.60899983718991
Total reward this episode: 8.690999913960695
Total reward this episode: 39.39899980649352
Total reward this episode: 37.37999980151653
Total reward this episode: 10.330999907106161
Total reward this episode: 17.04499987512827
Total reward this episode: 18.632999874651432
Total reward this episode: 11.192999925464392
Total reward this episode: 25.987999849021435
Total reward this episode: 8.907999902963638
Total reward this episode: 22.268999859690666
Total reward this episode: 10.883999910205603
Total reward this episode: 31.10099983587861
Total reward this episode: 19.143999870866537
Total reward this episode: 7.474999912083149
Total reward this episode: 14.446999926120043
Total reward this episode: 8.723999910056591
Total reward this episode: 14.296999886631966
Total reward this episode: 5.179999925196171
Total reward this episode: 14.016999896615744


Total reward this episode: 17.938999991863966
Total reward this episode: 8.619999930262566
Total reward this episode: 33.25999987125397
Total reward this episode: 9.508999913930893
Total reward this episode: 14.080999881029129
Total reward this episode: 7.441999915987253
Total reward this episode: 10.626999899744987
Total reward this episode: 28.298999842256308
Total reward this episode: 6.651999913156033
Total reward this episode: 15.824999883770943
Total reward this episode: 7.1369999423623085
Total reward this episode: 23.1429998613894
Total reward this episode: 19.573999870568514
Total reward this episode: 9.599999908357859
Total reward this episode: 7.860999923199415
Total reward this episode: 30.23399990797043
Total reward this episode: 30.11899984255433
Total reward this episode: 6.228999987244606
Total reward this episode: 9.029999915510416
Total reward this episode: 16.960999883711338
Total reward this episode: 30.34999978914857
Total reward this episode: 12.501999899744987
To

Total reward this episode: 26.612999822944403
Total reward this episode: 10.287999898195267
Total reward this episode: 13.386999938637018
Total reward this episode: 9.709999907761812
Total reward this episode: 8.199999909847975
Total reward this episode: 13.030999895185232
Total reward this episode: 22.29899987205863
Total reward this episode: 17.268999878317118
Total reward this episode: 26.938999891281128
Total reward this episode: 9.595999903976917
Total reward this episode: 26.12899984046817
Total reward this episode: 9.323999904096127
Total reward this episode: 25.32899985089898
Total reward this episode: 8.315999910235405
Total reward this episode: 20.611999854445457
Total reward this episode: 9.124999895691872
Total reward this episode: 19.95199989527464
Total reward this episode: 7.3809999115765095
Total reward this episode: 10.66599990427494
Total reward this episode: 35.40999984741211
Total reward this episode: 27.085999853909016
Total reward this episode: 21.885999858379364


### 6. Close the environment when finished
When we are finished using an environment, we can close it with the function below.

In [6]:
env.close()