# Demo for agents in banana collector environment

This notebook shows how different agents behave in the banana collector environment. To run the notebook, please follow the instructions in the project's README [here](http://github.com/stevenabreu7/banana-collector).

In [1]:
from unityagents import UnityEnvironment
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import sys

Load the environment from Unity and show basic statistics about the environment, such as the state space and actions space.

In [2]:
env = UnityEnvironment(file_name="src/exec/Banana.app")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: BananaBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 37
        Number of stacked Vector Observation: 1
        Vector Action space type: discrete
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

In [4]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

In [5]:
# number of agents in the environment
print('Number of agents:', len(env_info.agents))

Number of agents: 1


In [6]:
# number of actions
action_size = brain.vector_action_space_size
print('Number of actions:', action_size)

Number of actions: 4


In [7]:
# examine the state space 
state = env_info.vector_observations[0]
print('States look like:', np.array(state))
state_size = len(state)
print('States have length:', state_size)

States look like: [1.         0.         0.         0.         0.84408134 0.
 0.         1.         0.         0.0748472  0.         1.
 0.         0.         0.25755    1.         0.         0.
 0.         0.74177343 0.         1.         0.         0.
 0.25854847 0.         0.         1.         0.         0.09355672
 0.         1.         0.         0.         0.31969345 0.
 0.        ]
States have length: 37


## Random agent
Here, you can see how an agent performing random actions performs in the environment

In [8]:
env_info = env.reset(train_mode=False)[brain_name]
state = env_info.vector_observations[0]
score = 0
while True:
    action = np.random.randint(action_size)
    env_info = env.step(action)[brain_name]
    next_state = env_info.vector_observations[0]
    reward = env_info.rewards[0]
    done = env_info.local_done[0]
    score += reward
    state = next_state
    if done:
        break
print("Score: {}".format(score))

Score: 1.0


## Trained agent
This sections shows how a trained agent performs in the environment. 

To run a different trained agent, simply change the file path and the corresponding network architecture. 

In [9]:
from src.testing import test_agent

In [10]:
path_network_weights = "src/params/end_ddqn_256x256x128_epsd998.pth"
network_layers = [256,256,128]

In [11]:
score = test_agent(env, brain_name, path_network_weights, network_layers)
print("Score: {}".format(score))

Score: 15.0


You can see that the agent does well as long as there are yellow bananas close. As soon as the agent is in an area where there are not many yellow bananas, it behaves unintelligently. This includes situations when the agent is facing a wall.