# Project 2: Reacher

---

In this notebook, I am going to run a model that was pretrained with the [reacher](reacher.ipynb) notebook.

## 1. Start the Environment

We begin by importing some necessary packages.

In [1]:
from unityagents import UnityEnvironment
from modules.ddpg_agent import Agent

In [2]:
env = UnityEnvironment(file_name="./Reacher_Linux/Reacher.x86_64")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		goal_size -> 5.0
		goal_speed -> 1.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


We retrieve some information like number of actions and states from the environment.

In [3]:
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

env_info = env.reset(train_mode=True)[brain_name]
state = env_info.vector_observations[0]

numberOfActions = brain.vector_action_space_size
numberOfStates = len(state)

## 2. Define the main ddpg function and start some episodes

In [4]:
def ddpg_play(agent, n_episodes, max_t):
    print("###############################")
    print("Running " + str(n_episodes) + " episodes now: ")

    eps = 0.0
    for i_episode in range(1, n_episodes+1):
        print("-------------------------------")
        print("Episode " + str(i_episode))
        env_info = env.reset(train_mode=False)[brain_name]  # reset the environment
        score = 0 
        
        for t in range(max_t):
            state = env_info.vector_observations[0]        # get the current state
            action = agent.act(state, eps)                 # select an action
            env_info = env.step(action)[brain_name]        # send the action to the environment
            reward = env_info.rewards[0]                   # get the reward
            score += reward
            #rint('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)), end="")
            print("\rScore: " + str(score), end="")
            
            done = env_info.local_done[0]                  # see if episode has finished
            if done:                                       # exit loop if episode finished
                break
        print("\nFinal score: " + str(i_episode) + ": " + str(score))
        print()

In [None]:
agent_play = Agent.for_playing(numberOfStates, numberOfActions, "./models/ddpg_")
ddpg_play(agent_play, 2, 1000)

cuda:0
###############################
Running 5 episodes now: 
-------------------------------
Episode 1
Score: 1.039999976754188532
Final score: 1: 1.0399999767541885

-------------------------------
Episode 2
Score: 0.499999988824129142

In [None]:
env.close()