# Project 3: Tennis

---

In this notebook, I am going to run a model that was pretrained with the [tennis](tennis.ipynb) notebook.

## 1. Start the Environment

We begin by importing some necessary packages.

In [1]:
from unityagents import UnityEnvironment
from modules.ddpg_agent import Agent

import numpy as np

In [2]:
env = UnityEnvironment(file_name="./Tennis_Linux/Tennis.x86_64")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: TennisBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 8
        Number of stacked Vector Observation: 3
        Vector Action space type: continuous
        Vector Action space size (per agent): 2
        Vector Action descriptions: , 


We retrieve some information like number of actions and states from the environment.

In [3]:
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

env_info = env.reset(train_mode=True)[brain_name]
state = env_info.vector_observations[0]

numberOfActions = brain.vector_action_space_size
numberOfStates = len(state)
num_agents = len(env_info.agents)

## 2. Define the main ddpg function and start some episodes

In [4]:
def ddpg_play(agent, n_episodes, max_t):
    print("###############################")
    print("Running " + str(n_episodes) + " episodes now: ")

    for i_episode in range(1, n_episodes+1):
        print("-------------------------------")
        print("Episode " + str(i_episode))
        env_info = env.reset(train_mode=False)[brain_name]  # reset the environment
        scores = np.zeros(num_agents)                       # initialize the score (for each agent)
        agent.reset()                                       # reset the agent
        
        for t in range(max_t):
            states = env_info.vector_observations           # get the current state (for each agent)
            
            actions = agent.act(states)                     # select an action (for each agent)
            env_info = env.step(actions)[brain_name]        # send the actions to the environment
            
            rewards = env_info.rewards                      # get the reward
            scores += rewards
            print("\rScore: " + str(scores), end="")
            
            dones = env_info.local_done                      # see if episode has finished
            if np.any(dones):                               # exit loop if episode finished
                break
        print("\nFinal score: " + str(i_episode) + ": " + str(np.max(scores)))
        print()

In [5]:
agent_play = Agent.for_playing(num_agents, numberOfStates, numberOfActions, "./models/ddpg_")
ddpg_play(agent_play, 2, 1000)

cuda:0
###############################
Running 2 episodes now: 
-------------------------------
Episode 1
Score: [2.60000004 2.60000004]
Final score: 1: 2.600000038743019

-------------------------------
Episode 2
Score: [2.60000004 2.60000004]
Final score: 2: 2.600000038743019



In [6]:
env.close()