# Policy gradients (Reinforce)
---

This notebook lets you run the reinforce algorithm on the "CartPole_v1" problem from Open AI gym (https://gym.openai.com/envs/CartPole-v1/).
As this is the solution, your task is to understand the code. I suggest you start with looking at the implementation of "Reinforce" in the address given below. 

The "CartPole_v1" problem is solved in two versions:
1. The observed state is a list of: [cart position, cart velocity, pole angle, pole velocity at tip]
2. The observed state is an image of the cart

To change between the two versions, select the appropriate network in the "config" dict. 
- 'CartPole_v1'
- 'CartPole_v1_image'

**Note:** <br/>
The rendering function showing the cartpole will most likely not work on the ML servers.


**Reinforce:** <br/>
https://github.com/pytorch/examples/blob/master/reinforcement_learning/reinforce.py

**Deep Q Learning (DQN):** <br/>
For deep Q learning, look at the pytroch's official tutorial. <br/>
https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html

You can also expand the code to work on other problems. Try changing the environment within "modelParam". 
- Acrobot-v1
- MountainCar-v0
- MountainCarContinous-v0
- Pendulum-v0

Software version:
- Python 3.6
- Pytorch 1.0


In [None]:
%matplotlib inline
%load_ext autoreload

In [None]:
from utils.saverRestorer import SaverRestorer
from utils.model import Model
from utils.trainer import Trainer
from utils.player import Player
from utils.environment import EnvironmentWrapper, EnvironmentWrapper_image

def main(config, modelParam):
    if config['network'] == 'CartPole_v1_image':
        env = EnvironmentWrapper_image(modelParam)
    else:
        env = EnvironmentWrapper(modelParam)

    # create an instance of the model you want
    model = Model(config, modelParam, env)

    # create an instacne of the saver and resoterer class
    saveRestorer = SaverRestorer(config, modelParam)
    model        = saveRestorer.restore(model)

    # here you train your model
    if modelParam['play'] == False:
        trainer = Trainer(model, modelParam, config, saveRestorer, env)
        trainer.train()

    #play
    if modelParam['play'] == True:
        player = Player(model, modelParam, config, saveRestorer, env)
        player.play_episode()

    return

In [None]:
modelParam = {
    'episode_batch': 32,           # Training batch size, number of games before parameter update
    'numb_of_updates': 100,        # Number of gradient descent updates
    'max_episode_len': 500,        # Max number of steps before the game is terminated
    'cuda': {'use_cuda': False,    # Use_cuda=True: use GPU
             'device_idx': 0},     # Select gpu index: 0,1,2,3
    'environment': 'CartPole-v1',  # Game selected
    'modelsDir': 'storedModels/',
    'restoreModelLast': 0,         # 0= train from scratch, 1=restore previously trained model
    'restoreModelBest': 0,
    'storeModelFreq': 3,           # How often you want to save your model
    'render': False,               # True if you want to visualize the game while training
    'is_train': True,
    'inNotebook': True,   # If running script in jupyter notebook
    'play': False         # False=train the model | True=restore pretrained model and play an episode
}

config = {
    'optimizer': 'adam',           # 'SGD' | 'adam' | 'RMSprop'
    'learningRate': {'lr': 0.01},  # learning rate to the optimizer
    'weight_decay': 0,             # weight_decay value
    'gamma': 0.99,                 # discount factor
    'seed': 543,
    'network': 'CartPole_v1'       # 'CartPole_v1' | 'CartPole_v1_image'
}

if modelParam['play'] == True:
    modelParam['restoreModelLast'] = 1
    modelParam['render'] = True

main(config, modelParam)