# Evaluate Agent on Unity Environment

---

## Start the Environment

Below assumes that one has followed the instruction on the README file such that the Unity environment is ready.

In [1]:
from unityagents import UnityEnvironment
import numpy as np

env = UnityEnvironment(file_name='Reacher.app')

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		goal_speed -> 1.0
		goal_size -> 5.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Environments contain brains which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [2]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

## Run the Agent

Specify the saved models to test.

In [3]:
actor_model_path = "actor.pt"
critic_model_path = "critic.pt"

Run below to see the agent interact with the Unity environment.

In [4]:
import torch
from d3pg import Controller

seed = 69

env_info = env.reset(train_mode=False)[brain_name]     # reset the environment
state_size = len(env_info.vector_observations[1])      # get state size
action_size = brain.vector_action_space_size           # get action size
num_agents = len(env_info.agents)                      # number of agents
scores = np.zeros(num_agents)                          # initialize the score (for each agent)

# initialize the algorithm controller and networks
controller = Controller(state_size, action_size, seed) 
controller.actor_local.load_state_dict(torch.load(actor_model_path, map_location=lambda storage, loc: storage))
controller.critic_local.load_state_dict(torch.load(critic_model_path, map_location=lambda storage, loc: storage))

states = env_info.vector_observations                  # get the current state (for each agent)

while True:
    actions = controller.act(states)                   # select an action (for each agent)
    env_info = env.step(actions)[brain_name]           # send all actions to the environment
    next_states = env_info.vector_observations         # get next state (for each agent)
    rewards = env_info.rewards                         # get reward (for each agent)
    dones = env_info.local_done                        # see if episode finished
    scores += rewards                                  # update the score (for each agent)
    states = next_states                               # roll over states to next time step
    if np.any(dones):                                  # exit loop if episode finished
        break

print('Total score (averaged over agents) this episode: {}'.format(np.mean(scores)))

Total score (averaged over agents) this episode: 38.266999144665895
