# Neuroscience-inspired Exploration vs. Exploitation
In this notebook, we will examine the exploration vs. exploitation trade-off in an agent modeled after two recent papers in the neuroscience literature:

The idea is that the norepinephrine and serotonin sytems promote exploration and exploitation, respectively, by modulating system-wide neural activity.

## Initial setup

### Import packages

In [12]:
# NN packages
import numpy as np
import tensorflow as tf
%load_ext autoreload
%autoreload 2

# Environment setup
import sys
sys.path.insert(0, "../gridworld")
from gridworld import gameEnv

# Network setup
sys.path.insert(0, "../python")
from helper import create_agent, create_network, create_wrapper

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Load game environment
We'll use Arthur Juliani's [gridworld environment](https://github.com/awjuliani/DeepRL-Agents/blob/master/gridworld.py). If this proves too complex, then we can try a simpler gridworld implementation that avoids the need for a convnet.

In [13]:
# Settings
partial_env = False
env_size = 5

# Create env object and game wrapper
env = gameEnv(partial=partial_env, size=env_size)
game = create_wrapper(env, env_type="gridworld")

ModuleNotFoundError: No module named 'Wrapper'

### Create gridworld agent
We will utilize the Q-network class instantiated by the `create_network` helper function to build the computational graph. This class will then add basic functions to perform learning steps, make actions, etc.

In [None]:
agent_file_path = "./neuroscience_exploration/dqn_agent.json"

agent = create_agent(agent_file_path,
                     game=env, 
                     params_file=params_file_path,
                     action_set=action_set,
                     output_directory=results_dir)

## Exploration: LC-NE
First, we will look at adaptive gain theory, which, in so many words, states that LC-NE promotes exploratory activity by essentially increasing noise in the system. In the same way, we can add a noisy layer to some combination of input and hidden units that is controlled by a learned parameter–that is, the agent learns how much noise to inject at a given state to promote exploration.

The network will be a simple DQN created from the JSON file building scheme. For now, the only addition will be a noisy layer that feeds into the hidden layer. We will then create a parameter that controls the level of this noise as a placeholder, the value of which will can be modulated in several ways:
- Simple **linear decay** over training (much like the epislon-decay scheme commonly used)
- Proportional to **recent performance** (such as moving average of reward)
- **Learned** through training (a function of the input and/or hidden values)

We will examine each of them in order, but first we will look at the positive control: **epsilon-greedy exploration**.

### Epsilon-greedy exploration

In [None]:
net = create_network()

### Noise controlled by linear decay