# Understanding the effects of PyCIGAR hyperparameters on training and results

The goal of this notebook is to gain intuition on the impact of the different hyperparameters currently specified to train a PPO agent in PyCIGAR.

Fixed settings in this experiment:
- PPO algorithm
- `ieee37busdata network` and load profile
- tracked device: `inverter_s701a`
- discrete single actions
- `CentralControlPVInverterEnv` environment


Hyperparameters identified:
- discount factor $\gamma$
- train batch size
- depth of NN
- widths of NN layers
- lr (or lr schedule)
- loss factors (penalties)
    - oscillation
    - action change
    - deviation from the initial command
    
    

Until now, mostly qualitative results were obtained, mostly by looking at the graphs of voltage, $y$ value, injected power and actions taken by a tracked device over the course of a test simulation. In order to have objective grounds on which we can compare different solutions, we need quantitative results. Some statistics that reflect different aspects of both the solutions and the training processes are listed in the next section.

## Statistics that summarize a training

foreach epoch: 
    - number of actions taken
    - average magnitude of the actions
    - total reward
    - time of earliest action
    - average Shannon entropy of the action distribution

- epoch at which the policy does not change anymore
- average runtime of an epoch


An additional qualitative result could be a GIF of the curves over epochs


## Methods to understand hyperparameters

Some methods that come to mind to obtain an intuition are:

- try extreme values, then compare statistics
- change independently: plot statistics
- bayesian optimization with statistics for objective


In [1]:
import time
import ray
from ray import tune
from ray.rllib.agents.ppo import PPOTrainer
from ray.tune.registry import register_env
from pycigar.utils.registry import make_create_env
from pycigar.utils.input_parser import input_parser

import gym
gym.logger.set_level(40) # remove gym warnings
from ray.tune import JupyterNotebookReporter

In [2]:
sim_params = input_parser('ieee37busdata')
pycigar_params = {"exp_tag": "cooperative_multiagent_ppo",
                  "env_name": "CentralControlPVInverterEnv",
                  "sim_params": sim_params,
                  "simulator": "opendss",
                  "tracking_ids": ['inverter_s701a']}

In [3]:
create_env, env_name, create_test_env, test_env_name = make_create_env(params=pycigar_params, version=0)
register_env(env_name, create_env)
register_env(test_env_name, create_test_env)

test_env = create_test_env()
obs_space = test_env.observation_space
act_space = test_env.action_space

print("Registered env", env_name)
ray.init()

2020-02-26 17:01:21,762	INFO resource_spec.py:212 -- Starting Ray with 31.15 GiB memory available for workers and up to 15.58 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).


Registered env CentralControlPVInverterEnv-v0


2020-02-26 17:01:22,285	INFO services.py:1078 -- View the Ray dashboard at [1m[32mlocalhost:8265[39m[22m


{'node_ip_address': '128.3.28.231',
 'redis_address': '128.3.28.231:37077',
 'object_store_address': '/tmp/ray/session_2020-02-26_17-01-21_740330_17115/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2020-02-26_17-01-21_740330_17115/sockets/raylet',
 'webui_url': 'localhost:8265',
 'session_dir': '/tmp/ray/session_2020-02-26_17-01-21_740330_17115'}

In [4]:
def coop_train_fn(config, reporter):
    agent1 = PPOTrainer(env=env_name, config=config)

    for epoch in range(100):
        result = agent1.train()
        reporter(**result)
        phase1_time = result["timesteps_total"]


        checkpoint = trainer.save()
        print("Checkpoint saved at", checkpoint)
        state = agent1.save('~/ray_results/checkpoint')
        done = False
        start_time = time.time()
        obs = test_env.reset()
        reward = 0
        while not done:
            act = agent1.compute_action(obs)
            obs, r, done, _ = test_env.step(act)
            reward += r
        end_time = time.time()
        ep_time = end_time-start_time
        print("\n Episode time is", ep_time)


        #    test_env.plot(pycigar_params['exp_tag'], test_env_name, i+1, reward)
    # save the params of agent
    # state = agent1.save()
    # stop the agent
    agent1.stop()

In [None]:
config = {
    "gamma": 0.5,
    'lr': 5e-04,
    'sample_batch_size': 50,
    'train_batch_size': 500,
    # 'lr_schedule': [[0, 5e-04], [12000, 5e-04], [13500, 5e-05]],

    'num_workers': 7,
    'num_cpus_per_worker': 1,
    'num_cpus_for_driver': 1,
    'num_envs_per_worker': 1,
    
    'evaluation_interval': 2,
    'evaluation_num_episodes': 1,
    "evaluation_config": {
        # Example: overriding env_config, exploration, etc:
        # "env_config": {...},
        "explore": False,
        "env_name": test_env_name
    },
    
    'log_level': 'INFO',

    'model': {
        'fcnet_activation': 'tanh',
        'fcnet_hiddens': [128, 64, 32],
        'free_log_std': False,
        'vf_share_layers': True,
        'use_lstm': False,
        'state_shape': None,
        'framestack': False,
        'zero_mean': True,
    }
}

# call tune.run() to run the coop_train_fn() with the config() above
reporter = JupyterNotebookReporter(overwrite=False, max_progress_rows=20)
tune.run(coop_train_fn, config=config, progress_reporter=reporter)