# Training agents in the reacher environment

---

This notebook shows how to train a continuous control agent to learn intelligent behavior in Unity's reacher environment using (deep) reinforcement learning.

In [1]:
from unityagents import UnityEnvironment
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import sys

In [2]:
mpl.rcParams['figure.dpi'] = 200
mpl.rcParams['figure.figsize'] = 10, 5

## Load environment

In [3]:
# start the environment
env = UnityEnvironment(file_name="src/exec/Reacher20.app") # choose Reacher1.app or Reacher20.app
# get default brain (responsible for deciding agent actions)
brain_name = env.brain_names[0]
brain = env.brains[brain_name]
# examine state and action space
env_info = env.reset(train_mode=True)[brain_name]
action_size = brain.vector_action_space_size
state_size = brain.vector_observation_space_size
n_agents = len(env_info.agents)
print('Number of agents:', n_agents)
print('Action size:', action_size)
print('State size:', state_size)

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		goal_speed -> 1.0
		goal_size -> 5.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Number of agents: 20
Action size: 4
State size: 33


## Random agent(s)

Watch a random agent interact with the reacher environment

In [6]:
# reset environment
env_info = env.reset(train_mode=False)[brain_name]
# get current state (for each agent)
states = env_info.vector_observations
# initialize score (for each agent)
scores = np.zeros(n_agents)
while True:
    # select action (for each agent)
    actions = np.random.randn(n_agents, action_size)
    actions = np.clip(actions, -1, 1)
    # execute actions
    env_info = env.step(actions)[brain_name]
    # get next state, reward, done (for each agent)
    next_states = env_info.vector_observations
    rewards = env_info.rewards
    dones = env_info.local_done
    # update scores and states (for each agent)
    scores += env_info.rewards
    states = next_states
    if np.any(dones):
        break
print('Total score (averaged over agents) this episode: {}'.format(np.mean(scores)))

Total score (averaged over agents) this episode: 0.13699999693781137


When finished, you can close the environment.

In [6]:
env.close()

## Train agent

In this section, an example agent is trained. (coming soon)

In [7]:
# todo

When finished, you can close the environment.

In [8]:
env.close()