# Collaboration and Competition

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the third project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program.

### 1. Start the Environment

We begin by importing the necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [None]:
from unityagents import UnityEnvironment
import numpy as np

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Tennis.app"`
- **Windows** (x86): `"path/to/Tennis_Windows_x86/Tennis.exe"`
- **Windows** (x86_64): `"path/to/Tennis_Windows_x86_64/Tennis.exe"`
- **Linux** (x86): `"path/to/Tennis_Linux/Tennis.x86"`
- **Linux** (x86_64): `"path/to/Tennis_Linux/Tennis.x86_64"`
- **Linux** (x86, headless): `"path/to/Tennis_Linux_NoVis/Tennis.x86"`
- **Linux** (x86_64, headless): `"path/to/Tennis_Linux_NoVis/Tennis.x86_64"`

For instance, if you are using a Mac, then you downloaded `Tennis.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Tennis.app")
```

In [None]:
env = UnityEnvironment(file_name="../../tennis/Tennis", worker_id=1)

### 2. Examine the State and Action Spaces

In [None]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

# reset the environment
env_info = env.reset(train_mode=True)

# number of agents in the environment
num_agents = len(env_info[brain_name].agents)
print('Number of agents:', num_agents)

# number of actions
action_size = brain.vector_action_space_size
print('Number of actions:', action_size)

# examine the observation space 
states = env_info[brain_name].vector_observations
state_size = states.shape[1]
print('Observations have length:', state_size)
print(states[0])

### 3. Take Random Actions in the Environment

In [None]:
for i_episode in range(10):
    # initialize environment, get initial states
    env_info = env.reset(train_mode=False)[brain_name]
    states = env_info.vector_observations
    scores = np.zeros(num_agents)
    while True:
        # select action (for each agent)
        actions = np.random.randn(num_agents, action_size)
        # take action (for each agent)
        env_info = env.step(actions)[brain_name]
        next_states = env_info.vector_observations
        rewards = env_info.rewards
        dones = env_info.local_done
        # assign new state and update score (for each agent)
        states = next_states
        scores += env_info.rewards
        if np.any(dones):
            break
    print('Total score (averaged over agents) this episode: {}'.format(np.mean(scores)))

### 4. Train the Agent

In [None]:
num_episodes = 10000
print_every = 100

# initialize the agent
agent = Agent(state_size, action_size)

scores = deque(maxlen=100)
all_scores = []

for i_episode in range(1, num_episodes+1):
    env_info = env.reset(train_mode=True)[brain_name]
    agent.reset()
    score = np.zeros(num_agents)
    actions = np.zeros((num_agents, action_size))
    while True:
        states = env_info.vector_observations
        for i in range(num_agents):
            actions[i] = agent.act(states[i])
            agent.learn_i()
        env_info = env.step(actions)[brain_name]
        agent.step(states, env_info)
        score += env_info.rewards
        if env_info.local_done[0]:
            scores.append(np.mean(score))
            all_scores.append(np.mean(score))
            break
        
    if i_episode % print_every == 0:
        print('Episode {}\tScores:{}\tAvg Scores: {}'.format(i_episode, score, np.mean(scores)))

### 6. Plot the Rewards

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(np.arange(1, len(all_scores)+1), all_scores)
plt.ylabel('Score')
plt.xlabel('Episode #')
plt.show()

In [None]:
import pickle

with open('scores', 'wb') as fp:
    pickle.dump(all_scores, fp)
    
#with open('scores', 'rb') as fp:
#    all_scores = pickle.load(fp)

### 7. Watch a Smart Agent!

In [None]:
#env = UnityEnvironment(file_name="../../ml-agents/python/Banana-vector-Mac.app", worker_id=2)
env_info = env.reset(train_mode=False)[brain_name]
score = 0
while True:
    state = env_info.vector_observations[0]
    action = agent.act(state, eps=0)
    env_info = env.step(action+1)[brain_name]
    score += env_info.rewards[0]
    if env_info.local_done[0]:
        break
print(score)

In [None]:
env.close()